Data process


METdb data process relied on two independent workflows described using the Common workflow language (CWL) management system (https://www.commonwl.org) available here. The transcriptome assembly workflow has been used to generate assembled transcriptomes available in METdb. The assembled transcriptomes were then submitted to the EBI annotation pipeline to process the annotation step and provide the links to the annotation descriptions.

The transcriptome assembly workflow:

It includes 5 distinct steps:
  1. Quality evaluation of raw data with FastQC;
  2. Raw data processing with Trimmomatic to filter and trim reads according to their sequence quality;
  3. Readset comparison using Simka to detect possible cross libraries contaminations;
  4. De novo assembly step using Trinity ;
  5. Quality evaluation of the assembled transcripts using Transrate .

Github repository:

CWL viewer:

The annotation workflow:

Downstream analyses of assemblies includes:
  1. Transcriptome completion evaluation using Busco;
  2. Diamond a sequence aligner for protein and translated DNA searches
  3. Prediction of coding regions prediction using TransDecoder;
  4. cmsearch uses the covariance model (CM) in cmfile to search for homologous RNAs in seqfile, and outputs high-scoring alignments
  5. Functional annotation of predicted proteins using the InterProscan pipeline from EMBL-EBI.

Github repository:



CWL viewer: