INSTALL
The REPET package is distributed under the
CeCILL license
. Please read distributed
LICENSE file
.
It has been deposited to the Agence de Protection des Programmes (APP) under the Inter Deposit Digital Number FR 001 480007
The latest public release of the REPET rel-v3.0 package (November, 2019) is freely downloadable here .
For more information about Repet package read doc/README file from REPET package or this page
NEW : For a quick install, REPET and its dependencies are containerized in docker image downloadable
Brief installation instructions
Dependencies
Install
References
Proper usage of REPET requires a Unix-like system (64 bits) running on a cluster with the following, widely used components:
The full usage of the pipelines from the REPET package requires to install external programs.
Below a quick description of the programs, with
- its name under which it is known,
- the version under which it has been tested,
- the name under which it should be found in the user PATH,
- the URL to download it.
For each one, it is much advised to read carefully and follow its installation procedure.
The full usage of the pipelines from the REPET package requires from the user to install external programs.
Below is specified a quick description of the program, the name under which it is known, the version under which it has been tested, the name under which it should be found in the user PATH, and the URL to download it.
For each of these programs, it is much advised to read carefully its respective installation procedure as possible bugs may come, not from REPET but from bad installation of external programs.
- Programming language interpreter, Python > 2.6 and < 3.0
- Python modules, MySQLdb (with a computer cluster, this module has to be reachable from the master and slave nodes), logging, yaml
- Database management system, MySQL, v >= 5.0 . The table engine must be "MyISAM". With new MySQL versionit's always InnoDB. Set "default-storage-engine" option to "MyISAM" in "/etc/mysql/my.cnf"
- Batch-queuing system: Slurm, >= 17.11.2 on ubuntu 18.04, >= 17.02.9 on centos 7 https://slurm.schedmd.com/download.html, SGE >= 6.1u5 on centos 5.5 , >= 2011.11 on centos 6 http://gridscheduler.sourceforge.net/ ,TORQUE, >=version 3.0.2 version 3.0.2, PBS
- Pairwise alignment: NCBI-BLAST+ >=2.2.26, And/Or: NCBI-BLAST >= 2.2.26, And/Or: WU-BLAST 2.0
Optional but highly recommended:
- HSP clustering: RECON, version 1.08 , recon.pl And/Or: PILER , version 1.0, piler
- Protein domains search: hmmer3 (hmmpress and hmmscan) package, http://hmmer.org/
- Consensus clustering: blastclust, version > 2.2.20, from NCBI-BLAST suite, MCL version 1.008, 09-308
- Repeat masking: CENSOR, version 4.1 , censor ; And/Or: RepeatMasker, version 4.0.6. , RepeatMasker
- SSR detection program: TRF, version 4.04 , trf ; And/Or: MREPS, version 2.6 , mreps
- Randomized sequence generation, Shuffle, version 2.2 (in HMMER, squid), shuffle, or esl-shuffle in hmmer3 package
Optional banks (see tutorials in "doc" directory) but highly recommended:
- For the full usage of the pipelines, you will need Repbase Update, the well-known data-bank of known repeats. The REPET edition is available here .
- If you want to search for protein domains by HMM profiles in your TE consensus you need to have an appropriate bank of HMM profiles. A bank formatted for REPET here .
Optional:
- MAFFT, v = 6.240, mafft, https://mafft.cbrc.jp/alignment/software/
- RepeatScout version 1.0.5 https://bix.ucsd.edu/repeatscout/
- Structural search for LTR retrotransposons: LTRHarvest , from Genome Tools 1.5.10 package, gt
/!\Warning : MATCHER (which is part of the BLASTER suite distributed with REPET) is also an EMBOSS program. Possible name conflicts.
To install REPET package, extract files from REPET_linux-x64_X.X.tar.gz : tar -xvf REPET_linux-x64_X.X.tar.gz
Most parts of the REPET package are written in Python, an interpreted object language that does not require compilation.
The TE_finder suite included in REPET is written in C++ and the binaries are provided. If you need to install it from C++ sources go to the
github repository
The binaries of REPET package must be used only on Linux 64-bits computer. Please contact us at urgi-repet[[@]]inra.fr if you would like to run REPET on a different architecture.
/!\Warning In REPET package some C++ tools implement multithreading, so you need a workstation or PC with at least 4 cores (e.g.: 2 bi-cores cpus).
References
Below is a non-exhaustive list of publications related to the REPET package and the programs it integrates:
* RECON: Bao, Z. & Eddy, S. R. (2002), 'Automated de novo identification of repeat sequence families in sequenced genomes.', Genome Research 12(8), 1269--1276.
* PILER: Edgar, R. C. & Myers, E. W. (2005), 'PILER: identification and classification of genomic repeats.', Bioinformatics 21 Suppl 1.
* MAP: Huang, X. (1994), 'On global sequence alignment.', Comput Appl Biosci 10(3), 227--235.
* CENSOR: Oleksiy K., Andrew J. G., Lukasz H., Jerzy, J. (2006). `Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor'. BMC Bioinformatics 7:474+.
* MCL: Enright A.J., Van Dongen S., Ouzounis C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30(7):1575-1584 (2002).
* REPEATMASKER: Smit, A. F. A.; Hubley, R. & Green, P. (1996-2004), RepeatMasker Open-3.0., <http://www.repeatmasker.org>.
* TRF: Benson, G. (1999), 'Tandem repeats finder: a program to analyze DNA sequences.', Nucleic Acids Res 27(2), 573--580.
* MREPS: Kolpakov, R.; Bana, G. & Kucherov, G. (2003), 'mreps: efficient and flexible detection of tandem repeats in DNA', Nucl. Acids Res. 31(13), 3672--3678.
* MAFFT: Katoh, K.; Misawa, K.; Kuma, K. & Miyata, T. (2002), 'MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.', Nucleic Acids Res 30(14), 3059--3066.
* RepeatScout Price A.L., Jones N.C. and Pevzner P.A. 2005. De novo identification of repeat families in large genomes. To appear in Proceedings of the 13 Annual International conference on Intelligent Systems for Molecular Biology (ISMB-05). Detroit, Michigan.