The marine environment is extremely diverse and by far the largest habitat on Earth. Unlike marine prokaryotes, very little genomic data from marine organisms are available so far. In this context, we tends to develop a genomic reference database dedicated to micro-eukaryotic marine species transcriptomes, namely METdb. Most datasets were from Marine Microbial Eukaryote Transcriptome Sequencing Project and others from Roscoff marine station and Tara research projects. All datasets were assembled and analyzed using two workflows dedicated to de novo assembly and functional annotation both developed with the CWL workflow management system. The assembly compartment includes evaluation, filtering and trimming of raw data and the de novo assembly and evaluation of assembled transcripts. The annotation compartment defines the presence of coding regions and functional annotation is performed. Here is the link to the github CWL repository: (https://github.com/EBI-Metagenomics/workflow-is-cwl/tree/assembly/workflows). The resource includes transcriptome assemblies and associated data (metrics and annotations) of 463 distinct marine micro-eukaryotic taxa.
Erwan Corre
Mark Hoebeke
Guita Niang
Eric Pelletier
Arnaud Meng
Rob Finn
Maxim Scheremetjew