Access restricted to GenOak project consortium
Genome assemblies: fasta files
Assemblies | Qrob_V1 (Plomion et al. 2016) | Qrob_V2 | Qrob_H2.3 (Haplome) | Qrob_pseudomecules v2 | Qrob_unassigned |
Assembly size (bp) | 1,354,311,717 | 1,455,104,916 | 814,282,569 | 716,731,785 | 97,636,684 |
# Scaffolds | 17,910 | 8,827 | 1,409 | 12 Chr (871 scaffolds) | 538 |
N50 | 256,640 | 821,707 | 1,342,530 | 57,352,617 | 621,902 |
L50 | 1,468 | 537 | 192 | 5 | 37 |
N90 | 35,065 | 198,501 | 333,129 | 44,977,106 | 96,968 |
L90 | 6,626 | 1,880 | 649 | 10 | 195 |
Max length (bp) | 1,922,255 | 5,542,037 | 5,871,596 | 115,639,695 | 2,943,817 |
Min length (bp) | 2003 | 2000 | 2095 | 39,860,516 | 2,095 |
#N (% assembly) | 156,586,910 (11.6%) | 67,010,588 (4.6%) | 23,989,938 (2.9%) | 20,278,712 (2.8%) | 3,790,126 (3.9%) |
N50: shortest sequence length at 50% of the genome
L50: number of scaffolds whose summed length is N50
Gene prediction statistics
Assembly / gene release | v1 / v1 | v2 / v2.2 | Haplome_2.3 | ||||
Gene quality | regular | unreliable | regular & manual | Low confidence | regular & manual | Low confidence | Final list of genes |
#Genes | 110,058 | 79052 | 43755 | 25808 | |||
58,778 | 51,280 | 53,484 | 23,547 | 29,665 | 13,575 | 25808 | |
# Uncomplete genes | 4,500 | 2,021 | 515 | 0 | |||
Gene space (Mb) | 120.6 | 22.5 | 150 | 20 | 83.5 | 11.7 | 75 |
Gene mean / median size (bp) | 2,051 / 1,530 | 439 / 291 | 2,809 / 2,050 | 851 / 417 | 2,813 / 2,055 | 858 / 422 | 2,907 / 2,137 |
CDS mean / median size (bp) | 1,025 / 810 | 266 / 243 | 1,062 / 831 | 286 / 273 | 1,068 / 831 | 286 / 273 | 1,174 / 942 |
#Polypeptide < 500 bp | 10,881 | 51,280 | 12,847 | 23,547 | 7,000 | 13,575 | 4,367 |
#Polypeptide > 3Kb | 1,695 | NA | 2,019 | NA | 1,162 | NA | 1,162 |
#genes with introns | 44,595 (76%) | 18,999 (37%) | 42,627 (80%) | 13,313 (57%) | 23,723 (80%) | 7,736 (57%) | 20,297 (79%) |
# introns per gene | 2.7 | 0.5 | 3.2 | 0.9 | 3.2 | 0.9 | 3.3 |
Transcript sequences
Pseudomolecule + unassigned scaffolds: JBrowse , OakMine_PM1N
Haplome release v2.3 gene release October 2016/11/25
We use the results of a first round of orthoMCL+ CAFE results to curate the set of genes.
We deleted many genes associated to TE, unreliable genes and small regular genes splitted or very badly predicted belonging to clusters containing only Qrob genes (not any of the other 15 species in analysis). Finally we kept 25808 predicted protein
Quercus Robur gene information file
List of genes deleted
GFF file
Coding sequences (Nucleotides + Amino acid sequences
Haplome release v2.3 unfiltered gene release
43755 genes (out of 79052 from v2) were mapped on Haplome v2.3 assembly. Genes IDs and structure are inchanged.
Some tags could have been modified between Gene in V2 and gene in Haplome for 28 new genes tagged uncomplete (due to their percentage of N (>20%) in coding sequence or due to a stop in frame).
GFF file of Gene prediction H2.3 (Eugene + manual annotation) on assmelby release Haplome v2.3
Coding (uncomplete not included) sequences predicted on Haplome v2.3 (43240 proteins)
List of 2088 manually predicted genes in V2.2 (2067 using 1567 from V1 + 21 only in V2). 1181 genes out of these genes were recovered in Haplome. four out of them tagged uncomplete (stop in frame), thus not translated.
List of 35297 genes in V2 not recovered in Haplome
GFF file of TEannotation recovered in Haplome v2.3
Counterpart between V2 (diploid) and Haplome V2.3
Assembly release v2.2 gene prediction : JBrowse , OakMine
79052 genes were predicted and tagged using different flags of quality reported in GFF file in tag gene_qual
Genes updated between gene prediction V2 and V2.2: 150 genes manually curated or restored from eugene prediction to recover good ORF. 4 gene were deleted. Qrob_v2_Genes_v2.2_20151202_genes_modified.lst (2.34 kB)
GFF file of Gene prediction V2.2 (Eugene + manual annotation) on assembly release V2.
Coding (uncomplete not included) sequences predicted (V2.2) on assembly release V2.
Other annotation files (Annotation on assembly release V2)
Assembly release v1 JBrowse
Genes (Coding sequences) predicted by Eugene :
These fasta files contain reliable and unreliable Eugene predicted gene (without UTRs)- Qrob_Pxxxxxxx.1 : gene with length > 500 nt OR (gene with length < 500 nt with oak transcript contig evidence)- Qrob_uPxxxxxxx.1 : gene with length < 500 nt without transcript evidence (at the time of prediction pipeline)
List of plant species used to detect expansion/contraction of gene families in oak
scientific name | species acronyme | number of predicted proteins | version | reference (doi) | |
Quercus robur | Qr | 43240 | haplome (v2.3) | this study | |
Malus domestica | Md | 63514 | v1.0 | 10.1038/ng.654 | Phytozome11 |
Prunus persica | Pp | 27864 | v2.1 | 10.1038/ng.2586 | |
Populus trichocarpa | Pt | 41335 | v3.0 | 10.1126/science.1128691 | |
Citrus clementina | Cc | 24533 | v1.0 | 10.1038/nbt.2906 | |
Fragaria vesca | Fv | 32831 | v1.1 | 10.1038/ng.740 | |
Arabidopsis lyrata | Al | 32657 | v1.0 | 10.1038/ng.807 | |
Solanum tuberosum | St | 35119 | v3.4 | 10.1038/nature10158 | |
Arabidopsis thaliana | At | 27416 | TAIR10 | 10.1038/35048692 | |
Ricinus communis | Rc | 31221 | v0.1 | 10.1038/nbt.1674 | |
Glycine max | Gm | 56044 | Wm82.a2.v1 | 10.1038/nature08670 | |
Vitis vinifera | Vv | 26346 | Genoscope.12X | 10.1038/nature06148 | |
Carica papaya | Cp | 27584 | ASGPBv0.4 | 10.1038/nature06856 | |
Theobroma cacao | Tc | 29452 | v1.1 | 10.1186/gb-2013-14-6-r53 | |
Eucalyptus grandis | Eg | 36376 | v2.0 | 10.1038/nature13308 | |
Citrullus lanatus | Wa | 23440 | v1 | 10.1038/ng.2470 | Download |