Eleusine Transcriptome or Gene expression
Three de novo transcriptome assemblers were used: Trinity 2014-04-13p1, Velvet 1.2.08_ maxkmer101, and SOAPdenovo2 v2.04. Trinity k-mer size was 25; Velvet k-mer size was 21 to 91 with step size of 10 and minimum contig length was 200 bp without scaffolding. SOAPdenovo2 k-mer size was 21 and 31. The three de novo assembler thus yielded 11 total assemblies for each species. Before merging, “N”s were removed from the assemblies and contigs shorter than 200 bp were discarded. All assemblies were combined into one merged assembly for each species individually. The merged assembly was processed by EvidentialGene tr2aacds pipeline. This pipeline takes as input the transcript fasta produced by any of the transcript assemblers and generates coding DNA sequences (CDSs) and amino acid sequences from each input contig then uses fastanrdb to quickly reduce perfect duplicate sequences, cd-hit and cd-hit-est to cluster protein and nucleotide sequences, and blastn and makeblastdb to find regions of local similarity between sequences. It output transcripts into three classes: Okay (the best transcripts with the unique CDS, which is close to a biologically real set regardless of how many millions of input assemblies.), Alternate (possible isoforms), and Drop (the transcripts did not pass the internal filter). The Okay and Alternate sets were used for further evaluation and annotation. The Okay sets were submitted to the NCBI Transcriptome Shotgun Assembly (TSA) database.


Accession number Name Taxon