CONTACT  |  SITE MAP  |  ABOUT US   
Ask an account
You are here : Home / Home URGI / About us / Publications / Archives / 2011 / Identification and annotation of transposable elements

2011

International,  COM (talks)

Comparative Genomics of Eukaryotic Microorganisms, 15-20 October 2011, San Feliu de Guixols, Spain

17 Oct 2011   Identification and annotation of transposable elements

Florian Maumus, Hadi Quesneville

Transposable elements (TEs) are mobile DNA sequences that populate virtually all prokaryotic and eukaryotic genomes. Because TEs are able to make copies of themselves, they constitute a significant fraction of most eukaryotic genomes such as ~45% of our own and ~80% in maize. Although they were long considered as junk DNA because most are selfish genes that do not participate to host cell biology, they are today recognized as a major force of genome evolution. Indeed, TEs have been shown to play key roles in the generation of genetic diversity by driving genome rearrangements, inserting into genes and regulatory sequences, and being domesticated into genes with cellular functions. Therefore, the identification of TEs is crucial to assess genome composition and evolution. In addition, because TEs are most of the time repressed by epigenetic marks such as DNA methylation and specific modified histones, their proper identification and mapping is also crucial to decipher epigenetic landscapes. Two complementary approaches, namely similarity-based and de novo, can be used to identify TEs in whole genomes. Similarity-based approaches rely on a starting TE databank and are thus most efficient at identifying TEs in plant, fungal, and animal genomes from which many TEs are referenced in the public databases. However, because TEs are fast-evolving sequences, similarity-based methods are of limited efficiency to identify TEs from other eukaryotic super-groups from which only few TEs have been reported to date. In contrast, de novo approaches allow the identification of all repeated sequences in a genome. Combined to similarity searches, it enables to classify repeated sequences and to uncover new repeats that lack significant homology to known TEs. Thus, the usage of both de novo and similarity-based approaches is especially recommended to screen for TEs in the genomes of organisms belonging to eukaryotic groups that received only little attention until now. Our laboratory has developed a bioinformatic pipeline called REPET that combines de novo and similarity-based approaches to identify and annotate repeated sequences in genomes, as well as a classifier tool called PASTEC that is used to analyse the REPET output in order to facilitate manual curation. Hence, our tools are remarkably well suited to identify and annotate TEs from a growing number of genomes belonging to yet little studied eukaryotic groups. We will describe the basic principles of our TE annotation tools as well as results obtained annotating the genomes of the diatom Fragilariopsis cylindrus, the brown algae Ectocarpus siliculosus, and the haptophyte Emiliania Huxleyi.

Update: 07 Aug 2014
Creation date: 10 Oct 2012