Chinese Spring Bread Wheat Survey Sequence version 150616 --------------------------------------------- Assembly created by scaffolding and gapfilling the IWGSC Chinese Spring Survey sequences. Early release for use by IWGSC membership only, do not distribute. This work was completed by David JF Konkin, Kevin Koh, Carling Clarke and Andrew Sharpe at the National Research Council of Canada as part of the Genomics Assisted Breeding Pillar of the Canadian Wheat Improvement Flagship. A manuscript describing this work is under development. Library preparation ------------------- Seventeen mate pair libraries (16 Illumina Nextera and 1 Lucigen NxSeq libraries) were produced from nuclear DNA of a Chinese Spring + 7EL addition line and sequenced on Illumina Hiseq (2x150 bp) and Miseq (2x250 bp) platforms. A total of 584 Gbp raw sequence was acquired. These sequences were filtered for PhiX spike-in, 7EL sequence using Bowtie v1.0 and duplicates using FastUniq and a custom script. Mate pair junctions were removed using a custom script and mates without an identified junction sequence were removed for the NxSeq library. Pairs of mates that map to more than a single chromosome arm were removed using Bowtie v1.0 and custom scripts. Scaffolding and Gapfilling -------------------------- Scaffolding was completed with SSPACE-STANDARD version 3.0 requiring a minimum 3 connections to form a junction. Gapfilling was performed with gapcloser using the paired end sequences used for the initial IWGSC assembly. Annotations ----------- Gene models - 150616_CSS_MIPS_mapped_hc_transcripts.gff MIPS high confidence gene models (from each chromosome were mapped against each chromosome arm assembly using GMAP version 2013-08-31. No minimum coverage or identity was imposed. No matches were identified for 29 of the 293032 gene models. 98.7% of the gene models were mapped with at least 98% identity and 70% coverage. Positions of IWGSC v1 scaffolds - wheat_v1_to_v3 Start and end coordinates of the original IWGSC contigs were calculated based on scaffolding and gapfilling logs. Masked and reduced genomic sequence - CSS_150616_masked.tar.gz Scaffolds were masked with vmatch as previously described against mipsREdat_9.3_Poaceae_TEs.fasta and scaffolds with less than 100 bp of unmasked sequence or greater than 98% repeat were removed.