README
The REPET package is distributed under the
CeCILL license
. Please read distributed
LICENSE file
.
It has been deposited to the Agence de Protection des Programmes (APP) under the Inter Deposit Digital Number FR 001 480007
The current public release of the REPET rel-v3.0 package (november 2019) is freely downloadable here .
To install REPET and its dependencies read doc/INSTALL file from REPET package or this page .
Parallel computation
Documentation
Authors and contributors
Contact
References
Nowadays, it is common to work with large amounts of data. Hence, whenever possible, we parallelized our pipelines to save computer time and reduce software memory requirements.
The REPET package works with a jobs scheduler like Slurm (slurm.schedmd.com),SGE (Sun Grid Engine)and TORQUE (formerly OpenPBS), three free batch-queuing systems.
In this aim, we developed a specific Python module managing these tasks: launching the jobs in parallel, tracking the errors and re-launching each job in error up to two times. Errors can be due to power break, no more disk space...
All the jobs details are stored in a mysql database's table named "jobs". For this, if you use REPET on a computer cluster the Python package "MySQLdb" has to be reachable from the master AND slave nodes.
Beside "squeue" (from slurm)"qstat" or (from SGE and TORQUE), you can use directly the "jobs" table.
Here are the kind of SQL commands you may need:
mysql> DESCRIBE jobs;
mysql> SELECT DISTINCT groupid FROM jobs;
mysql> SELECT status, count(*) FROM jobs WHERE groupid="exRepet_Blaster_Piler_Map" GROUP BY status;
mysql> UPDATE jobs SET status="error" WHERE groupid="exRepet_Blaster_Piler_Map" AND status="waiting";
mysql> DELETE FROM jobs WHERE groupid="exRepet_Blaster_Piler_Map";
All documentation is in "doc/" directory of REPET instance.
The files "TEdenovo_tuto.txt" and "TEannot_tuto.txt" are short tutorials on respectively TEdenovo and TEannot pipeline.
The file "BLASTERsuite_doc.txt" gives more details about the programs BLASTER, MATCHER and GROUPER.
Post-treatment tools we advise to use are described in "README_*" files.
Other tools descriptions can be found in "HelpFrom*.txt" files.
All REPET pipelines outputs are listed in REPET_OutPutsPipelines.xlsx
Authors & contributors (in alphabetical order)
Acknowledgments to all members of the anagen team for their contribution in REPET project.
The main developpers, listed bellow, follow eXtreme programming guidelines since the release 1.3 (in July 2009) for develpments and tests.
Francoise Alfama | Tina Alaeitabar | Gwendoline Andres | Sandie Arnoux |
Delphine Autard | Benoit Bely | Marc Bras | Baptiste Brault |
Laetitia Brigitte | Timothee Chaumier | Johann Confais | Elodie Duprat |
Gael Faroux | Anna-Sophie Fiston-Lavier | Timothee Flutre | Emeric Henrion |
Claire Hoede | Olivier Inizan | Veronique Jamilloux | Jonathan Kreplak |
Valentin Marcon | Nacer Mohellibi | Mark Moissette | Erwan Ortie |
Eric Penneçot | Hadi Quesneville | Dorothee Valdenaire | Mariene Wan |
The contact and support adress is urgi-repet[[@]]inra.fr, You can repport bugs or asking for features are much welcome!
If you want to receive updates, send an email to urgi-repet[[@]]inra.fr with the following information:
- First name
- Last name
- Institution
- Address
- City
- Zip
- Country
- Job scheduling system (SGE or Torque):
If you want to cite the REPET package, please use these references:
Flutre T, Duprat E, Feuillet C, Quesneville H (2011)
Considering transposable element diversification in de novo annotation approaches.
PLoS ONE 6(1): e16526. doi:10.1371/journal.pone.0016526
Quesneville H, Bergman C, Andrieu O, Autard D, Nouaud D, Ashburner M, Anxolabehere D (2005)
Combined evidence annotation of transposable elements in genome sequences.
PLoS Comput Biol 1(2): e22. doi:10.1371/journal.pcbi.0010022
Hoede C, , Arnoux S, Moissette M, Chaumier T, Inizan O, Jamilloux V, Quesneville H.(2014)
'PASTEC: An Automatic Transposable Element Classification Tool.'
PLoS One. 2014 May 2;9(5):e91929. doi: 10.1371/journal.pone.0091929. eCollection 2014.
Below is a non-exhaustive list of publications related to the REPET package and the programs it integrates:
* BLASTER, GROUPER, MATCHER: Quesneville, H.; Bergman, C. M.; Andrieu, O.; Autard, D.; Nouaud, D.; Ashburner, M. & Anxolabéhère, D. (2005), 'Combined idence annotation of transposable elements in genome sequences.', PLoS Computational Biology 1(2).
* RECON: Bao, Z. & Eddy, S. R. (2002), 'Automated de novo identification of repeat sequence families in sequenced genomes.', Genome Research 12(8), 1269--1276.
* PILER: Edgar, R. C. & Myers, E. W. (2005), 'PILER: identification and classification of genomic repeats.', Bioinformatics 21 Suppl 1.
* MAP: Huang, X. (1994), 'On global sequence alignment.', Comput Appl Biosci 10(3), 227--235.
* REPBASE: Jurka, J.; Kapitonov, V. V.; Pavlicek, A.; Klonowski, P.; Kohany, O. & Walichiewicz, J. (2005), 'Repbase Update, a database of eukaryotic repetitive elements.', Cytogenet Genome Res 110(1-4), 462--467.
* CENSOR: Oleksiy K., Andrew J. G., Lukasz H., Jerzy, J. (2006). `Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor'. BMC Bioinformatics 7:474+.
* MCL: Enright A.J., Van Dongen S., Ouzounis C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30(7):1575-1584 (2002).
* REPEATMASKER: Smit, A. F. A.; Hubley, R. & Green, P. (1996-2004), RepeatMasker Open-3.0., <http://www.repeatmasker.org>.
* TRF: Benson, G. (1999), 'Tandem repeats finder: a program to analyze DNA sequences.', Nucleic Acids Res 27(2), 573--580.
* MREPS: Kolpakov, R.; Bana, G. & Kucherov, G. (2003), 'mreps: efficient and flexible detection of tandem repeats in DNA', Nucl. Acids Res. 31(13), 3672--3678.
* MAFFT: Katoh, K.; Misawa, K.; Kuma, K. & Miyata, T. (2002), 'MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.', Nucleic Acids Res 30(14), 3059--3066.
* RepeatScout Price A.L., Jones N.C. and Pevzner P.A. 2005. De novo identification of repeat families in large genomes. To appear in Proceedings of the 13 Annual International conference on Intelligent Systems for Molecular Biology (ISMB-05). Detroit, Michigan.
* Wicker et al., Nat.Rev.Genet., 2007 'A unified classification system for eukaryotic transposable elements'