Polymorphism data management
This page shows the principles chosen to store and display polymorphism data in GnpSNP.
GnpSNP database manages 3 types of sequence polymorphisms :
- SNPs (Single Nucleotide Polymorphism)
- DIPs (Deletion Insertion Polymorphism)
- SSR (Simple Sequence Repeats) or microsatellites
This document describes the different types of polymorphisms authorized and the way they are stored in the database.
In general, the data required to store one polymorphism in GnpSNP are : the type of polymorphism (SNP, DIP or SSR) and the values of the alleles for a given marker and for the different lines. According to the type of marker, other types of information are importants:
- For SNP and DIP, the length and position in the reference sequence are also required.
- For SSR markers, the run protocol conditions, DNA labeling and harware description are also required.
SNP (Substitution of one base by another)
According to the definition, the length of the polymorphism is always 1 base.
4 SNPs in position 3, 4, 6 and 8 on the reference sequence (9 bases long).
Their values are T or A for the SNP in position 3, G or C for the SNP in position 4 (2 alleles), A, G or T for the SNP in position 6 (3 alleles) and A, C, T, G or - for the SNP in position 8 (5 alleles). GnpSNP database does not manage MNPs (Multiple Nucleotide Polymorphism) as we can see in position 3 with the values TG or AC and 2 bases of length.
Position | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Ref seq | A | T | T | G | C | A | C | A | T |
Line 1 | A | T | A | C | C | G | C | C | T |
Line 2 | A | T | A | C | C | A | C | T | T |
Line 3 | A | T | T | G | C | T | C | G | T |
Line 4 | A | T | T | G | C | G | C | - | T |
DIP (Insertion or deletion of a variable number of bases)
The database allows polymorphisms long of 1000 bases maximum. Beyond 1000 bases, the submitter has to add a comment (see exchange format).
The position indicated for an insertion corresponds to the position of the base preceding the insertion ; the position indicated for a deletion represents the position of the first base deleted. The length of the DIP corresponds to the maximal number of nucleotides inserted or deleted among all the lines represented and implicated in this polymorphism. A deletion is represented by the character "-".
: 8 DIPs in a reference sequence of 17 bases long.
- We see an insertion in position 2 with the allele value G and long of 1 base (2 alleles : G and -)
- A SNP in position 4 with the value G, - or A and long of 1 base (3 alleles). Remark : the SNP takes precedence over the deletion
- An insertion in position 5 with the value CC-- or CCAT or --AT or C-A- and long of 4 bases (5 alleles including the allele ----)
- A deletion in position 8 with the value GA or -- long of 2 bases (2 alleles)
- A deletion in position 10 with the value TC or -- and long of 2 bases (2 alleles)
- An insertion in position 12 with the value - or A or C and long of 1 base (3 alleles)
- A deletion in position 14 with the value GG or -- and long of 2 bases (2 alleles)
- An insertion in position 16 with the value ----- or ACGTA or ACG-- or AC--- or A---- and long of 5 bases (5 alleles)
Position | 1 | 2 | - | 3 | 4 | 5 | - | - | - | - | 6 | 7 | 8 | 9 | 10 | 11 | 12 | - | 13 | 14 | 15 | 16 | - | - | - | - | - | 17 |
Ref seq | A | G | - | A | G | T | - | - | - | - | A | C | G | A | T | C | C | - | T | G | G | A | - | - | - | - | - | T |
Line 1 | A | G | - | A | - | T | C | C | - | - | A | C | - | - | - | - | C | A | T | G | G | A | A | C | G | T | A | T |
Line 2 | A | G | G | A | - | T | C | C | A | T | A | C | G | A | - | - | C | C | T | - | - | A | A | C | G | - | - | T |
Line 3 | A | G | - | A | G | T | - | - | A | T | A | C | - | - | T | C | C | C | T | G | G | A | A | C | - | - | - | T |
Line 4 | A | G | G | A | A | T | C | - | A | - | A | C | G | A | T | C | C | - | T | - | - | A | A | - | - | - | - | T |
SSR (or microsatellites : adjacent repetitions of a pattern of 2 or more nucleotides)
Simple exemple:
Marker | A | A | B | B | B | C | C |
Allele | A1 | A2 | B1 | B2 | B3 | C1 | C2 |
Bandsize | 122 | 143 | 201 | 214 | 229 | 157 | 182 |
Line 1 | x | x | x | x | x | ||
Line 2 | x | x | x | ||||
Line 3 | x | x | x |
Here we see a genotype card of 3 lines for 3 SSR markers.