We leveraged next-generation sequencing technology to obtain a genome-wide and unbiased understanding of C. elegans population structure. Through collaboration and generous donations, we obtained a set of 202 wild isolates from throughout the world. In order to reduce the 100 Mb genome to a manageable size amenable to multiplexing, we employed Restriction-Assisted DNA marker sequencing (1) where the genome of each strain was cut using EcoRI and sequenced in both directions from each restriction site. This method allowed us to sequence the same eight megabases from each strain in two runs of an Illumina Genome Analyzer. We sequenced to an average of 12.6X coverage of each region, and SNPs were identified using SAMtools after mapping to the C. elegans genome using bwa. We will present the results of our analysis of population structure, linkage disequilibrium, and indications of genome-wide selection using the roughly 20,000 identified SNPs with minor allele frequencies greater than 5%. So far, we found the average pair-wise differences between strains is roughly 1/900 base pairs, as compared to the reference N2 genome. However, there is a wide range in the pair-wise differences with some strains being much more divergent from the reference N2 than the Hawaiian strain CB4856. These data will allow us to pursue genome-wide association studies and new recombinant inbred line crosses with maximally diverse wild isolates. (1) Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, et al. 2008 Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers. PLoS ONE 3(10):
e3376.