We present here the basic keywords used in this information system.
Existence of two or more alternate forms (alleles) of a chromosomal locus that differ in nucleotide sequence or have variable numbers of repeated nucleotide units.When studied in the context of a population, these variations in DNA sequences are called polymorphisms. They may occur in coding regions (exons) or noncoding regions of genes. In scientific studies, the different types of polymorphism like SNP (Single Nucleotide Polymorphism), STR (Short Tandem Repeat), Insertion-Deletion are very useful for many research applications like tracking genes of medical or agronomical importance. They are essential in the mapping studies in Genetics. They also represent powerful tools for the genetic of population studies.
A SNP represents a variation of one base in the DNA. The nucleotide observed is different from the usual at this position. Each individual has many single nucleotide polymorphisms that together create a unique DNA arrangement for that individual. SNPs are usually thought to be point mutations that have been evolutionarily successful enough to be present in a significant proportion of the population. This change can be in coding regions or not, with or without consequence over the sequence of amino-acid and with or without influence on the structure of the protein. But in all the cases, SNPs represent essential tools in Genetics and Genomic research.
A deletion polymorphism represents a lack of one or several base in one sequence comparing to a reference sequence. The database allows polymorphisms long of 1000 bases maximum. Beyond 1000 bases, the submitter has to add a comment.....
A simple sequence repeat (or microsatellite) is a type of polymorphism that is observed when a pattern of two or more bases are repeated with a structure where the repeated sequences are directly adjacent to each other.
In GnpSNP, batches of data are organised in projects. The concept of project in GnpSNP is not distinct of its real significance. It corresponds to an entity with public or private funds that gather one or several experience(s) around one general subject.
In GnpSNP, project are composed of one or several scientific thematic(s) whose title describe the scientific context of the experiences. One thematic is connected to the batches which scientific context correspond to the title of the thematic.
An experiment (or batch) in GnpSNP corresponds to the submission of the polymorphisms data collected from the sequence alignment (or genotyping) of a variable number of individuals from a Sanger experience with a specific protocol associed and a reference sequence.
A sequence variation represents , in a sanger batch or in a NGS run, the value of the allellc variation (allele) and its length, for an individual on a position of the reference sequence. It is also characterised by the type of polymorphism (SNP/DIP/STR).
When a reference sequence (genomic sequence, contigs) is available, the sequence variations of one or several experiment (or run) can be mapped on this reference sequence (if it is not already the case) and aggregated (when they are different alleles of the same marker) in a unique polymorphic loci. We then have a summary of the diversity across several genotypes of the same species (or between related species).
Then we generate a gff3 file and these loci can be visualized in GnpGenome (our internal GMOD genome browser) in order to have an outline of the polymorphisms found in a chosen specie.