CONTACT  |  SITE MAP  |  ABOUT US   
Ask an account
You are here : Home / Home URGI / About us / Publications / 2012 / REPET V2: TEs detection and classification improvements

2012

International,  COM (posters)

3rd International Conference GIETE 2012, February 24-28 2012, Asilomar Pacific Grove, California, USA

24 Feb 2012   REPET V2: TEs detection and classification improvements
REPET v2 poster

S Arnoux, V Jamilloux, T Chaumier, C Hoede, O Inizan, H Quesneville

The REPET package (Flutre et al, 2011) integrates two pipelines, which are constantly improving : TEdenovo and TEannot for transposable elements (TEs) detection and annotation. In the last release TEdenovo changed significantly.

The TEdenovo pipeline strategy is to find as much as possible potential TEs, and then to classify putative TEs in order to filter out false positives. The pipeline starts by the detection of repeated sequences comparing by alignments the genome with itself. These alignments are independently clustered according to different tools (RECON, GROUPER, PILER).Then, it builds multiple alignments from the clusters, from which a consensus sequence is derived. These consensus are classified according to TE features and redundancy is removed. Finally, there is the possibility to remove false-positives according to the classification (SSR, host genes, rDNA and under-represented unclassified consensus).

Two steps have been improved in REPET v2:

  1. A structural TE detection approach is now implemented : LTRharvest (Ellinghaus et al, 2008) is used to search for LTR retrotransposons, using structural features of this TE category. It allows catching TEs present in only one or two copies in the genome. Potential TEs thus detected and all other derived consensus are put together before the classification and redundancy removal step.
  2. Classification benefits from improvement too with the integration of PASTEC, a new classifier that we have developed. It tests all TE classifications and each result is weighted according to the evidences found. In addition to similarities to known TEs in Repbase Update and the search for repeated structures, it also uses HMM profiles, which are interesting to classify TEs and to detect host genes. PASTEC gives precisions about completeness and indicates if TEs are potential chimeras.

Keywords: detection annotation repeats pipeline classification

Creation date: 14 Mar 2012