RepetDB is an Intermine database that provides repeat consensus detected and classified by TEdenovo and used by TEannot to annotate copies in genomes. TEdenovo and TEannot are part of the REPET software package.
If you want to go back to RepetDB Click here
The easiest way to search for consensus repeats in RepetDB is to use the consensus search form on the homepage.
This form can search consensus repeats by organism (Taxon group selection), by classification (Wicker classification), by classification quality (confused, manual validation, unclassified) or by similarity features found on them (protein profile features or blast hit on Repbase transposons).
The taxon group selection contains a tree of taxonomy groups fetched from the NCBI taxonomy database. You can select a specific species or a larger taxonomy group like ‘Eukaryota’.
The Wicker classification selection is divided in three parts: classes, orders and super families. Selecting any class, order or super family will restrict the selection of other ranks. Example: You select the ‘Helitron’ order, the ‘Class II’ will then automatically be selected.
Activating the “only confused” checkbox will search for repeats consensus that have an ambiguous classification.
Activating the “only not confused” checkbox will search for repeats consensus that does not have an ambiguous classification.
Unclassified or not TE contains classification like: potential host gene, virus or unclassified (for consensus having an unknown Wicker classification).
Only TE contains consensus with a known Wicker classification
The similarity features search is a text area in which you can list accession or entry names from GyDB and PFAM for protein profiles and accessions or entry names from Repbase for blast hit features. This field will search protein profiles or blast hits detected on the consensus repeats. Only exact identifier can be queried (PF13650.5 works, not PF13650).
To download consensus fasta sequences, you need to click the export buttons on result page (Following the use of the Consensus search form for example)
This button will open a dialog box in which you’ll find a download tab
On this box, you need to select format from “.tsv” to “Fasta sequences” You can now download your consensus fasta file
Once you have searched consensus repeats and clicked on one, you get to the consensus page. This page contains the following parts:
The header part of the consensus page contains basic information like the consensus identifier, the consensus length and the wicker classification. But it can also contain information on the genomic annotation of the consensus with the number of copies and full-length copies, the number of fragments and full-length fragments and the cumulative coverage of the copies on the genome.
In this section is described the dataset in which the consensus have been detected, classified and annotated. You can find the origin organism and genome assembly. Optionally this section can also contain the list of the software used to generate this dataset, a comment on how this dataset was created and the name of the person that could inform you if you have any questions on this dataset. You can find Genome assembly fasta file, genome TE annotation gff3 file and consensus library fasta file are available for download
This part contains a table of statistics on the consensus copies annotated on the genome. This table regroups statistics on the copies length, identity and coverage over consensus.
All the features that have been detected on the consensus can be visualized on an embedded Jbrowse browser. The reference sequence in this Jbrowse is the consensus on which similarity features and structural features are located. To check the details on these features, you can check the following “Similarity features” and “Structural features” sections.
If the consensus has any similarity features like protein profiles or transposons blast hits, this section lists them in a detailed table. Each type of feature has its own table displaying the positions on the consensus (“Query start” and “Query stop”), positions on the hit (“Hit start” and “Hit stop”), the hit e-value, the hit identity and a details on the hit (its source databse, accession, description or classification).
Transposons blast hit matches Repbase Transposon elements with a link that will lead you to the Repbase database.
Protein profile hit matches GyDB or PFAM profiles with a link that will lead you to the profile card on the databse website.
Like with the similarity features, the structural features have a table for each type of feature containing: SSR regions, ORF regions, Terminal repeats regions (TR) and Poly A regions.
Like any Intermine databases, RepetDB benefits standard features like the query builder, template queries or lists. In this section, these features will be described to help you use them at their full potential.
The query builder is a complex interface that can be used to make any kind of custom queries in the RepetDB database.
If you want to learn more on how to use the query builder, check to the following tutorial.
You can also use the query builder to modify an existing query.
To modify a query from the consensus search form:
Template queries are queries with parameters that has the advantage of being easily reusable. RepetDB shares public templates that you can find on the templates page but anyone can add private (for-your-eyes-only) templates.
To create a template:
RepetDB can operate on custom lists of data. You can save lists from results pages or create them by uploading lists of identifiers. Lists can be used when running template queries and analyzed by a series of widgets on a list analysis page. You can merge, subtract and find common members if you have more than one list.
All lists, public ones as well as personal ones (if you are logged in) can be viewed on the Lists page, where you can search them and do operations on them. To create a new list yourself, click on ‘Lists’, and then on ‘Upload’ in the toolbar on any RepetDB page: RepetDB’s list creation tool helps you upload a list of identifiers, the list can contain a mix of identifier types.
To preserve a list from query:
Descriptions and tags can also be edited after a list is saved.
All lists and queries you ran will be saved temporarily in RepetDB for the current session. To save them permanently, you can create a MyMine account. You only need to provide an email address and a password to generate an account, there is no other information required. Your saved data is always private.
You can then access all your lists, queries and templates via the MyMine page. In MyMine you can save lists and queries you create in the QueryBuilder. You can even use the QueryBuilder to turn queries into new templates of your own. You can export/import queries and templates as XML to share them with others.