ChangeLog of REPET

REPET release 3.0
(release date 28/11/2019)

New features:
- pipelines are adapted for Slurm and PBS schedulers.
- /!\ Warning: TEdenovo.cfg some parameters are changed /!\
- PASTEClassifier: new format (12 fields) of classif file, new stats file and post treatments classifying LTR at super-family if they are Gypsy or Copia.
- TEdenovo Step 4: At the end of this step, the consensus name is shorten to be the definitive identifiant, used in the followed steps.
  The fasta consensus file name is *_consensus_shortH.fa.
- TEdenovo Step 6: use the new classif format and post treatments are adapted
- TEannot Step 1: adaptation to obtain the same shuffled genome.
- TEannot Step 7: after long join procedure, homogeneous identity percentage calculation and stats calculation of annotation (coverage %)
- TEannot Step 8: in 9th GFF3 field, New alignIdentity % and alignLength added on each match line
New tools:
- PreProcess.py: to analyze and to format input genome
- new RepBase parser for REPET, with option to exclude sequences list.
- ConvertClassif9ColTo12Col.py: to convert classif file, with 9 colomns, obtain by Repet version until 2.5 to classif with 12 colomns from Repet version 3.0.
- statPASTECNew.py: to calculate new stats on classif file (12 colomns) from 3.0 Repet version
- ClassifExtractFromFastaFile.py : Extract a classif subset from fasta file
- GetSequenceFromAnnotation can use gff3 files

tools adaptation:
- CreateGFF3sForClassifFeatures.py: Manage several classif format (9 or 12 colomns), reverse features for reversed consensus, ressources parameter is added in its configuration file.

Bug fix:
- tests added to detect dependances presence/absence
- Adaptation to new output format of RM v4.0.*
- more tests about /scratch usage as tmp directory
- ....

****Au moment de la release, Penser à changer le no et la date de version*****

REPET release 2.5
(release date 31/03/2016)

New features:
- TEdenovo Step 3: Grouper,one of three clustering tools, is modified to use multithreading, so you need a station or PC with at least 2 bi-cores cpus. 
- TEdenovo Step 6: Several stats files (in text format) from consensus library.
- TEdenovo Step 7: Stats files (in text format) from filtered consensus library.
- TEannot Step 1: the seed option of shuffle tool is set to 1, so it will be generate always the same random genome  
- MaskSeqFromCoord.py, for masking genome from annotation using repet output

REPET release 2.4
(release date 22/12/2015)

Bug fixes:
- inserted DmelChr4.fa data set in db directory.

New features:
- All READMEs, tutorials of pipelines and tools are in doc directory with only one CHANGELOG file containing all pipelines changes
- Tools - CreateGFF3sForClassifFeatures.py to create gff3 files to load in a genome browser for the visualization of features on consensus
- TEdenovo Step 3: Added optional parallelization of Grouper clustering
- TEannot Step 3: Score threshold is now the 95th percentile of random matches scores instead of the 99th which is too stringent
- TEannot Step 7: Optimization time in Long-join procedure

REPET release 2.3
(release date 17/10/2014)

Bug fixes:
- MySQL connection port provided by the user was not used when exporting some tables to files
- MySQL connection port provided by the user was not used when using a configuration file in certain conditions
- Few tables were not using MyISAM as an engine
- TEdenovo Step 6: corrected bad RepeatScout headers formatting with "reversed" tag
- PASTEClassifier: now sets environment variables from configuration file

New features:
- improve database performances
- overall performances improvement
- overall memory usage decrease
- Tools - SplicerTesFromAnnotation.py: tool to splice annotations from genome. These annotations are Full Length Copies or Full Length Fragments according to consensus (or TEs) library
- Tools - TELibFormatter.py: tool to format headers of TEs banks (in fasta format) to be usable by REPET pipelines
- TEdenovo Step 1: Now removes N-stretches from chunks prior to alignment
- TEdenovo Step 5: Added possibility to use several nucleotide, amino acid and proteic HMM banks at the same time
- TEannot Step 1: Now removes N-stretches from chunks prior to alignment
- TEannot Step 1: Now creates more homogeneous batches 
- TEannot Step 2: Default Blaster sensitivity is set to 2 instead of 4
- TEannot Step 3: Score threshold is now the 99th percentile of random matches scores instead of the 95th
- TEannot Step 8: New way to eliminate overlapping between annotations in GFF file
- PASTEClassifier: Replace -r (reverse complement), -w (add Wicker code) command line options with rev_complement and add_wicker_code attributes in configuration file
- PASTEClassifier: Add add_noCat_bestHitClassif attribute in configuration file for classify noCat consensus with consensus well classified


REPET release 2.2
(release date 13/09/13)

- improve job and database management: tables engines is set to MyISAM
- MCL: Is now integrated into PostAnalyseTElib as an alternative clustering method
- LaunchMCL: Added options to set Query coverage & handle matcher inputs 
- Added LaunchRepeatScout tool wrapping the different required steps of RepeatScout
 
Bug fixes:
- TEannot - FilterAlign : added subject length filtering
- PASTEClassifier - limit_job_nb option correction

New features:
- TEannot - Step 3: Set all Blaster E-Values to 0 to be consistent with RepeatMasker and Censor
- TEannot - Step 4: Added TRFmaxPeriod option for TRF in configuration file
- SegDup_pipe - Initial release
- Tallymer_pipe - Initial release


REPET release 2.1
(release date 01/02/13)

Bug fixes:
- TEdenovo step 3 struct : orient sequences in each cluster
- TEdenovo step 3 : correction of the n longest sequences per group filter for Grouper
- TEdenovo step 4 : handle empty files after filter clustering
- TEdenovo step 6 : handle PASTEC coverage parameters from configuration file
- TEannot step 7 : handle identity > 100 from Censor
- correction of the cleaning system when jobs files are copied on node 

New features:
- TEdenovo step 2 struct: add structural parameters for LTRharvest in configuration file
- TEdenovo step 6 : improve PASTEC classification and add thresholds parameters in configuration file
- TEdenovo step 6 : improve consensus redundancy removal
- TEdenovo step 7 : add filtering parameters in configuration file
- TEannot step 2 : allow to launch Censor with NCBI-BLAST
- TEannot step 2 : allow to launch RepeatMasker with NCBI-BLAST
- TEannot step 3 : improve matches statistical filter based on random hits
- TEannot step 4 : allow to launch RepeatMaskerSSR with crossmatch
- TEannot step 8 : remove redundant features in the GFF3 file and add GFF3 formatting option in configuration file

- add TallymerPipe tool as pre-processing tool for fast repetition discovery
- add PostAnalyseTElib tool as post-processing tool (start to merge ClusterConsensus and GiveInfoTEannot tools)
- add SegDup pipeline as a tool to discover segmental duplications in genomes

- improve database management
- improve job management
- overall performances improvement, especially in the TEdenovo step 6


REPET release 2.0
(release date 23/02/12)

Bug fixes:
- TEdenovo step 1 : check headers
- TEdenovo step 3 : correction of the n longest sequences per group filter
- TEdenovo step 4 : in headers, convert coordinates from chunks to initial sequences
- TEdenovo step 4 : filter cluster with only one sequence if N% > 40%
- TEdenovo step 6 : in headers, always 3 characters for classification
- TEdenovo step 6 : classification statistics
- TEdenovo step 8 : headers must begin by classification
- TEannot step 2 : no more "low complexity" filter for WU-BLAST (when called by Censor)
- TEannot step 3 : filter match if N% > 90%
- PASTEC : avoid "MySQL server has gone away" error when there is too many consensus
- change jobs table to record computing resources information (to re-submit with the same resources)
- allow to specify more than one computing resource in configuration file
- information about jobs submission in standard output
- catch help message from WU-BLAST
- variance calculation

New features:
- TEdenovo step 2 struct : allow to change similarity parameter for LTRHarvest
- TEdenovo : allow to use only the structural search till the pipeline end
- PASTEC : allow to change limit job number in configuration file
- TEannot : allow an other jobs manager : Torque (launcher migration)
- TEannot : clean and drop_tables options in configuration file
- configuration : add "repet_job_manager" option and set environment variables from configuration file as much as possible


REPET release 1.4.2
(release date 02/12/11)

Bug fixes:
- force to use only one CPU for each blast
- no more "low complexity" filter for WU-BLAST

New features:
- TEdenovo step 6 : change classification tool : PASTEC integration
- TEdenovo step 6 : add Wicker's code in headers
- TEdenovo step 6 : reverse complement consensus sequences if necessary
- TEdenovo : allow an other jobs manager : Torque (launcher migration)
- TEannot step 8 : add identity in GFF3 files
- update Blaster parameters to use Blast+ (change NCBI-BLAST "gapopen" parameter to match NCBI-BLAST+)
- configuration : add "copy" option


REPET release 1.4.1
(release date 23/09/11)

Bug fixes:
- TEdenovo step 6 : correction when headers are shortened

New features:
- TEdenovo step 1 : add this step to prepare data (all following steps are shifted)
- TEdenovo step 2 : bank copy in "tmpDir" (node directory) for Blaster 
- TEdenovo step 3 : change Grouper parameters (gap penalty=1 and gap distance=10)
- TEdenovo step 2+3 : add structural search : LTRHarvest and Blastclust integration


REPET release 1.4
(release date 15/04/11)

Bug fixes:
- update NCBI-BLAST parameters (since blastall 2.2.21, all combinations between "penalty" and "gapextend" are not allowed)
- in configuration, "cluster" option isn't available anymore : pipelines can't be launched without a job scheduler (SGE)

New features:
- in configuration, change "queue" option to "resources" option (not mandatory) to set if specific resources are needed (memory or time)


REPET release 1.3.13
(not released)

Bug fixes:
- in TEdenovo and TEannot, job submission management
- in TEdenovo and TEannot, when too many files are produced : change files management
- in TEdenovo and TEannot, when too many files are produced : change jobs recording in database
- in TEdenovo step 5, correction of headers formating, to allow project names containing "_"
- update WU-BLAST parameters to match NCBI-BLAST (in case : sensibility=3), and use only one CPU

New features:
- improvement of GiveInfoTEannot.py (script which compute statistics from the annotation results)
- limit to 10000 the number of jobs in "waiting" status.
- in TEdenovo, add step 6 : filter of SSR and noCat consensus built from less than 10 fragments
- in TEdenovo, add step 7 : clustering of final consensus


REPET release 1.3.12.1
(release date 20/10/2010)

Bug fix:
- in TEdenovo and TEannot, handle MySQL timeout


REPET release 1.3.12
(release date 08/10/2010)

Bug fixes:
- in TEannot step 3, update all match scores when more than one alignment method is used to avoid fragmented matches in MATCHER
- in TEannot step 7, manage DB connection

New features:
- in TEdenovo step 1, add support for megablast by BLASTER (for large genomes)
- in TEdenovo step 2, add possibility to cluster in GROUPER by assessing overlaps in terms of number of nucleotides
- in TEdenovo step 2, better handle identical members in GROUPER (likely to be palindromic repeats)
- in TEdenovo step 4, adding the profile HMM search
- in TEdenovo step 4 and 5, add support for 'TEclass' (Abrusan et al., 2009)
- in both pipelines, improve management of job memory shortage, job timeout and node failure
- in both pipelines, each step is run on cluster node(s)


REPET release 1.3.11
(release date 28/07/2010)

Bug fixes:
- in TEdenovo and TEannot, give summary of execution times with only one job
- in TEannot, filter matches even if minimum score is given as a float
- in TEannot, properly compute weighted identity for long joins when some matches are overlapping

New features:
- in TEannot, allow to decide what to clean thanks to an option in command-line
- in TEannot, allow to launch RepeatMasker with default sensitivity
- add several tools in the public releases


REPET release 1.3.10
(release date 06/07/2010)

Bug fixes:
- avoid to use "/tmp" when handling large data sets from MySQL tables

New features:
- in TEannot step 8, allow to delete MySQL tables once GFF3 or gameXML files have been made
- in TEdenovo and TEannot, provide summary of execution times for jobs launched in parallel
- in TEannot, add options to tune RepeatMasker options


REPET release 1.3.9
(release date 05/05/2010)

Bug fixes:
- in TEdenovo step 5, when setting "cluster: yes" in the configuration file
- in TEdenovo step 4, handle WU-BLAST exceptions when using tblastx and blastx

New features:
- in TEdenovo step 4 and 5, use super-family information when available in Repbase formatted for REPET
- in TEdenovo step 5, improve the host's gene filter


REPET release 1.3.8
(release date 06/04/2010)

Bug fixes:
- in TEannot step 1, take into account the kind of BLAST when preparing the banks

New features:
- in TEannot, add option in configuration file to specify BLAST (NCBI or WU)

Development:
- new package for data persistence called pyRepetUnit.commons.sql 


REPET release 1.3.7
(release date 26/02/2010)

Bug fixes:
- in TEdenovo step 5, properly remove redundancy within consensus of the same category

New features:
- in TEannot step 7, remove copies (chains of matches) shorter than a given length (20 bp by default)

Development:
- when launching jobs in parallel, free temporary directory in case of program error
- new package for sequences manipulation called pyRepetUnit.commons.seq.

Warning:
- problem of portability on Solaris for Grouper (TEdenovo)


REPET release 1.3.6
(release date 29/01/2010)

New features:
- in TEdenovo step 1, remove redundant matches due to parallelized all-by-all
- in TEdenovo step 2, for Grouper, set stringent behavior when merging
- in TEdenovo step 5, use blastn results when classifying consensus
- in TEannot, add comparison with an amino-acid data-bank via blastx
- in TEdenovo & TEannot, add checks (binaries, input files, parameters and SGE)

Development:
- new package for coordinates manipulation called pyRepetUnit.commons.coord. 

Warning:
- problem of portability on Solaris for Grouper (TEdenovo)


REPET release 1.3.5
(release date 17/12/2009)

New features:
- none

Development:
- generation of super-agent output
- multifasta parser only with SNP (length = 1)
- in package coord, migration of SetUtils

Warning:
- problem of portability on Solaris for Grouper (TEdenovo)


REPET release 1.3.4
(release date 02/12/2009)

New features:
- in TEannot step 7, when trying to join two fragments, compute the weighted identity
- when launching jobs in parallel, use a unique directory to avoid conflicts between users

Development:
- create utility classes for Path and Set objects (add a specific sort)

Warning:
- problem of portability on Solaris for Grouper (TEdenovo)


REPET release 1.3.3.1
(release date 10/11/2009)

Patch:
- fix erratic troubles in TEdenovo step 4 when launching tblastx

New features:
- when launching jobs in parallel, check there is at least 1Gb in temporary directory


REPET release 1.3.3
(release date 06/11/2009)

New features:
- allow to pass options to qsub via configuration file
- in TEdenovo step 2, improve scalability on large, complex genomes

Development:
- "coord" API: 80% tested
- "launcher" API: first template and implementations (Map, Mafft, RepeatMasker)

Warning:
- problem of portability on Solaris for Grouper (TEdenovo)
- erratic troubles in TEdenovo step 4 when launching tblastx


REPET release 1.3.2
(release date 05/10/2009)

New features:
- improve redundancy removal in Grouper
- clean temporary directory in pipelines

Development:
- choose Doxygen for documentation
- document tools and APIs (see Makefile)

Warning:
- problem of portability on Solaris for Grouper (TEdenovo)


REPET release 1.3.1
(release date 08/09/2009)

New features:
- specify tmpDir for jobs execution in TEdenovo.cfg & TEannot.cfg

Development:
- continuous integration (unitary and functional tests)


REPET release 1.3
(release date 21/07/2009)

New features
- in TEdenovo, in step 2, options to launch the clustering programs on a cluster node from within the pipeline
- in TEdenovo, in step 2, when launching Grouper, build only the groups and not the clusters which speeds up a lot
- in TEannot, in step 3, when launching Matcher, do the cleaning procedure after joining the HSPs
- in TEannot, in step 7, use the weighted identity to approximate the age of a chain of TE fragments

Development:
- new methodology: test-driven development (TDD)
- add some code with Python unitary tests in both pipelines


REPET release 1.2.3
(release date 16/03/2009)

New features:
- in TEdenovo, options to filter HSPs after the clustering
- in TEdenovo, option to choose the minimum number of bases to edit a consensus
- in TEdenovo, speed up the step 4 to handle a larger number of consensus
- in TEdenovo, option to launch the step 5 in parallel
- in TEannot, option to compare the TE library on the randomized genomic sequences
- in TEannot, option to force the usage of default values when filtering false positive

Bug fixes:
- in TEdenovo, correctly parse the joins made by GROUPER
- in TEannot, update the connected fragments after the jobs were launched in parallel
 

REPET release 1.2.2
(release date 31/10/2008)

New features:
- in TEdenovo, add option for GROUPER in the configuration file
- in TEannot, use C++ parsers in some time-limiting steps
- in TEannot, add option for BLASTER's sensitivity in the configuration file
- in TEannot, allow not to launch tblastx alignments
- in both, verbose options have been added in all python scripts (not yet for the C++ programs)

Bug fixes:
- in TEannot, rename all sequence headers internally to prevent errors (also for GFF3 export)


REPET release 1.2.1
(release date 09/10/2008)

Bug fixes:
- in TEdenovo, for precaution, remove the space in sequence headers between "class" and "I" or "II" in the classification step
- in TEannot, when necessary, take the default threshold from the configuration file
- in TEannot, add option "join" in the step 3 when using MATCHER


REPET release 1.2.0
(release date 30/09/2008)

New feature:
- in TEannot, add option -c in step 3 to allow different combinations of alignment programs

Bug fixes:
- in TEannot, rename all headers of the reference TE library to prevent crashes of RepeatMasker when they are longer than 50 characters
- in pyRepet, don't use Unix's sed anymore as some sequence headers can be interpreted as regular expression


REPET release 1.1.0
(release date 25/09/2008)

New features:
- in TEdenovo, increase BLASTER stringency to speed up the redundancy removal procedure
- in TEdenovo, change the default redundancy removal procedure in the configuration file
- in TEannot, manage IUPAC characters replacement before launching MREPS
- in TEannot, decrease BLASTER sensitivity to speed up the computations

Bug fixes:
- in TEdenovo, fix problems when parsing outputs from RECON and PILER
- in TEdenovo, keep N stretches when building chunks


REPET release 1.0.0
(release date 31/07/2008)