RepetDB User Guide

RepetDB is an Intermine database that provides repeat consensus detected and classified by TEdenovo and used by TEannot to annotate copies in genomes. TEdenovo and TEannot are part of the REPET software package.

1. Search repeats consensus

The easiest way to search for consensus repeats in RepetDB is to use the consensus search form on the homepage.

This form can search consensus repeats by organism (Taxon group selection), by dataset name, by classification (Wicker classification), by classification quality (confused, manual validation, unclassified) or by similarity features found on them (protein profile features or blast hit on Repbase transposons).

The taxon group selection contains a tree of taxonomy groups fetched from the NCBI taxonomy database. You can select a specific species or a larger taxonomy group like ‘Eukaryota’.
The dataset selection contains all the RepetDB dataset entry for your taxon selection. Some species have several datasets available.
The Wicker classification selection is divided in three parts: classes, orders and super families. Selecting any class, order or super family will restrict the selection of other ranks. Example: You select the ‘Helitron’ order, the ‘Class II’ will then automatically be selected.
Activating the “only confused” checkbox will search for repeats consensus that have an ambiguous classification.
Activating the “only not confused” checkbox will search for repeats consensus that does not have an ambiguous classification.
Unclassified or not TE contains classification like: potential host gene, virus or unclassified (for consensus having an unknown wicker classification).
Manual Validation will select depending on your choice, no: Automatic classification from PASTEC, classification: Classification checked, classification and structure: Classification and completness checked or all the consensus.
The similarity features search is a text area in which you can list accession or entry names from GyDB and PFAM for protein profiles and accessions or entry names from Repbase for blast hit features. This field will search protein profiles or blast hits detected on the consensus repeats. Only exact identifier can be queried (PF13650.5 works, not PF13650).

1.2. Get consensus fasta

To download consensus fasta sequences, you need to click the export buttons on result page (Following the use of the Consensus search form for example)

This button will open a dialog box in which you’ll find a download tab

On this box, you need to select format from “.tsv” to “Fasta sequences” downloadFile You can now download your consensus fasta file

2. Consensus page

Once you have searched consensus repeats and clicked on one, you get to the consensus page. This page contains the following parts:

2.1. Header information

The header part of the consensus page contains basic information like the consensus identifier, the consensus length and the wicker classification. But it can also contain information on the genomic annotation of the consensus with the number of copies and full-length copies, the number of fragments and full-length fragments and the cumulative coverage of the copies on the genome.

2.2. Material and method

In this section is described the dataset in which the consensus have been detected, classified and annotated. You can find the origin organism and genome assembly. Optionally this section can also contain the list of the software used to generate this dataset, a comment on how this dataset was created and the name of the person that could inform you if you have any questions on this dataset. You can find Genome assembly fasta file, genome TE annotation gff3 file and consensus library fasta file are available for download

2.3. Consensus copy statistics

This part contains a table of statistics on the consensus copies annotated on the genome. This table regroups statistics on the copies length, identity and coverage over consensus.

2.4. Feature browser

All the features that have been detected on the consensus can be visualized on an embedded Jbrowse browser. The reference sequence in this Jbrowse is the consensus on which similarity features and structural features are located. To check the details on these features, you can check the following “Similarity features” and “Structural features” sections.

2.5. Similarity features

If the consensus has any similarity features like protein profiles or transposons blast hits, this section lists them in a detailed table. Each type of feature has its own table displaying the positions on the consensus (“Query start” and “Query stop”), positions on the hit (“Hit start” and “Hit stop”), the hit e-value, the hit identity and a details on the hit (its source databse, accession, description or classification).

Transposons blast hit matches Repbase Transposon elements with a link that will lead you to the Repbase database.

Protein profile hit matches GyDB or PFAM profiles with a link that will lead you to the profile card on the databse website.

2.6. Structural features

Like with the similarity features, the structural features have a table for each type of feature containing: SSR regions, ORF regions, Terminal repeats regions (TR) and Poly A regions.

3. Other Intermine features

Like any Intermine databases, RepetDB benefits standard features like the query builder, template queries or lists. In this section, these features will be described to help you use them at their full potential.

3.1. Query builder

The query builder is a complex interface that can be used to make any kind of custom queries in the RepetDB database.

If you want to learn more on how to use the query builder, check to the following tutorial.

You can also use the query builder to modify an existing query.

To modify a query from the consensus search form:

Go to the “Consensus search result page”
Click on the arrow next to the “Get code” button
Select “XML”
Copy the displayed XML code
Go to the query builder page
Click on “Import query from XML” at the bottom of the page
Paste the XML query and submit

3.2. Template query

Template queries are queries with parameters that has the advantage of being easily reusable. RepetDB shares public templates that you can find on the templates page but anyone can add private (for-your-eyes-only) templates.

To create a template:

Log in
Construct a query using Query Builder
Must include at least one constrain condition (e.g. restrict the format of the identifier)
“Start building a template query”
Fill in Name, Title and Description, and optionally comment
Make necessary adjustments, then “Save template”

3.3. Lists

RepetDB can operate on custom lists of data. You can save lists from results pages or create them by uploading lists of identifiers. Lists can be used when running template queries and analyzed by a series of widgets on a list analysis page. You can merge, subtract and find common members if you have more than one list.

All lists, public ones as well as personal ones (if you are logged in) can be viewed on the Lists page, where you can search them and do operations on them. To create a new list yourself, click on ‘Lists’, and then on ‘Upload’ in the toolbar on any RepetDB page: RepetDB’s list creation tool helps you upload a list of identifiers, the list can contain a mix of identifier types.

To preserve a list from query:

Log in.
Make a query (Consensus search form or Query Builder).
On the Result page, top right corner, select “Create / Add to List” -> “Create New List” -> “All of Columns …”, or “Choose individual items from the table” for further refinement.
Provide for the list: a Name, an informative Description, and Tags. Add one tag at a time. Tags help to group lists. Hit “Create”.

Descriptions and tags can also be edited after a list is saved.

3.4. Data Sources

All available dataset are described on this page with - Organism name, - Genome assembly, - Software used to make the annotation, - Comments on the annotation process and dataset creation, - Contact, - Publications if available

3.5. API

API menu Data can be accessed by intermine API. Available language are Perl, Python, Ruby and Java. Query can often be translated with the API to obtain the data.

3.6. URGI Blast

URGI Blast link open our blast webservice with many fasta sequence available. In the group selection, you can select Repeats group with the consensus fasta sequence available for blast

On the result page, you will find a link to the repetdb consensus card corresponding to the match.

3.7. How to use

You can access this documentation by this link.

3.8. MyMine section

All lists and queries you ran will be saved temporarily in RepetDB for the current session. To save them permanently, you can create a MyMine account. You only need to provide an email address and a password to generate an account, there is no other information required. Your saved data is always private.

You can then access all your lists, queries and templates via the MyMine page. In MyMine you can save lists and queries you create in the QueryBuilder. You can even use the QueryBuilder to turn queries into new templates of your own. You can export/import queries and templates as XML to share them with others.