Contig reduction in genomic assembly of Pst isolates from Western Canada

We earlier assembled the genomes of Pst isolates collected in western Canada using assembly of Illumina paired-end sequences. Two isolates, LSW3_2012_SP2 and SWS484_SPF, were assembled with ?15,150 and ?11,700 contigs each when compared to references North American PST-78 and Chinese CYR32, respectively. In order to reduce the number of contigs and therefore obtain a longer display of contiguous genes, we used the PacBio and Illumina Mate Pair (MP) technologies to achieve that goal using the two isolates. We had to modify our current protocol for DNA isolation from Pst spores to obtain DNA fragments of ?35 Kb suitable for construction of large insert genomic sequencing libraries. Libraries of 8-10 Kb and 3.5-6 Kb were used for PacBio and Illumina MP analyses, respectively. We obtained a 26x coverage of the Pst genome with the PacBio results with an mean size of 7,400 bp and 6,500 bp for the two libraries, and a 190x coverage with the Illumina MP sequencing information. We are using the Ray assembler with the datasets and the 50x Illumina paired end sequencing information from previously independently associated isolates LSW3_2012_SP2 and SWS484_SPF. The quality of our assembly will be compared to the contigs and supercontigs available for the reference isolates PST-78 and CYR32. These results will also enable us to establish the physical relationship among isolate-specific genes. Finally, the impact of the large insert libraries on the proportion of short paired-end unassembled reads will be discussed as it was 78% and 50% for LSW3_2012_SP2 and SWS484_SPF, respectively, after assembly of 100 bp paired-end reads.

Agriculture and Agri-Food Canada, Lethbridge Research Centre, Canada
