Study : Whole genome sequence of one individual of Betula nana from Scotland, using Illumina paired-end and mate-paired libraries
Identification
Name
Whole genome sequence of one individual of Betula nana from Scotland, using Illumina paired-end and mate-paired libraries
Identifier
dXJuOkVWQS9zdHVkeS9QUkpFQjU3Ng==
Description
We present a de novo reference genome sequence assembly, from 66× short read coverage, of Betula nana (dwarf birch) – a diploid that is the keystone woody species of sub-arctic scrub communities but of conservation concern in Britain. Genome Sequencing was conducted at the Beijing Genomic Institute, China. Five DNA libraries were constructed: three paired-end libraries with insert sizes of 200bp, 500bp and 800bp, and two mate-paired libraries with 2,000bp and 5,000bp insert sizes. All libraries were sequenced through the Illumina HiSeq 2000 pipeline, the paired end libraries with read length of 95 bp and the mate paired libraries with read lengths of 49bp. Reads were filtered to exclude reads with: more than 2% “N” calls, polyA structures, adapter contamination, quality scores of less than 7 for 40% of the reads in paired end libraries and 60% of the reads in mate paired libraries, overlapping paired reads in paired end libraries, and identical sequences at each end of a paired-end. A total of 42.05 Gb of raw data was produced, which yielded 29.84Gb of clean data. This represents 66× coverage of the genome, assuming a genome size of 450Mb (Anamthawat-Jónsson et al. 2010). The data was partitioned among the different libraries as follows: 9.20Gb clean data from the 200bp insert size library, 7.64Gb clean data from the 500bp insert size library, 6.21Gb clean data from the 800bp insert size library, 4.83Gb clean data from the 2,000bp insert size library, and 1.96Gb clean data from the 5,000bp insert size library.