CONTACT  |  SITE MAP  |  ABOUT US   
Ask an account
You are here : Home / Home URGI / About us / Publications / 2013 / An iterative process for TEs annotation in large genomes

2013

International,  COM (posters)

Plant & Animal Genome XXI conference, January 12-16, 2013, San Diego, California, USA.

12 Jan 2013   An iterative process for TEs annotation in large genomes
Poster image

V Jamilloux, S Arnoux, T. Chaumier, M. Moissette, O Inizan, H Quesneville

The recent successes of new sequencing technologies are allowing us to sequence increasingly large genomes at reduced costs. Transposable elements (TEs) constitute the most structurally dynamic components and the largest portion of nuclear sequences of these large genomes, e.g. 85% of the maize genome (Schnable et al. 2009), and 88% of the wheat genome (Choulet et al. 2010). Therefore, TEs annotation should be considered a major task in genome projects. However, it still remains a major challenge, since a good TE annotation relies critically on an expertly assembled reference sequence set, data that currently cannot be obtained in an automatic fashion. This crucial step is now a bottleneck for many genome analysis. To this end, we scale up a repeat detection and annotation pipelines both part of the REPET package (Flutre et al. 2011) now at its v2.1 release. These two pipelines are : TEdenovo builds a TEs library and TEannot annotates TE copies in the genome. In addition, we propose TallymerPipe as pre-processing tool for a fast repetition approach, PostAnalyseTElib to get information about the resulting TE library and SegDup pipeline to detect segmental duplications.

 We apply a new strategy, dedicated to the very large genomes, to wheat an allohexaploid with three homoeologous genomes. One of the largest plant genomes ~17Gbp, with 88% of TEs, is very repetitive (Choulet, 2010). We start with the 3B chromosome, the first to be fully sequenced. This strategy is an iterative approach:

  1. Detection of young TEs with stringent parameters able to quickly find only the less degenerate ones to build a first TE library.
  2. TE annotation and splicing of the corresponding sequences from the initial contigs. We obtain a reduced genome sequence.
  3. Detection of the other TEs with sensitive parameters on the reduced genome sequence to build a second TE library.
  4. Annotation of the original contigs with the concatenation of the two TE libraries.

The logic here is that these large genomes are mostly made of few TE families that recently invaded. They will be detected in the first step and this will allow reducing the genome size by an important factor.

We will also present preliminary results on the 3B chromosome of wheat (AllLargeContigsV2.1, 294691 contigs, 986Mbp)

Choulet, F, T Wicker, C Rustenholz, et al. 2010. Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell 22:1686-1701.
Flutre, T, E Duprat, C Feuillet, H Quesneville. 2011. Considering transposable element diversification in de novo annotation approaches. PLoS One 6:e16526.
Schnable, PSD WareRS Fulton, et al. 2009. The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112-1115.


Keywords: detection annotation element transposable REPET wheat genome

Creation date: 05 Jun 2013