To better characterize the C. elegans genome, to identify possible cis-regulatory elements, and to define genomic traits associated with hermaphroditism, the Caenorhabditis Genome Analysis Consortium <
;http://wormgenomes.caltech.edu/consortium.html> is analyzing and comparing the genomes from seven gonochoristic (dioecious) relatives of C. elegans (C. remanei, C. brenneri, C. japonica, C. sp. 5 JU800, C. sp. 7 JU1286, C. sp. 9 JU1422, and C. angaria PS1010) versus three hermaphroditic genomes (the published C. elegans and C. briggsae, and the newly sequenced C. sp. 11 JU1373). Gene predictions for newly sequenced or resequenced genomes are being performed with AUGUSTUS guided by RNA-seq; genome sequences and annotations will be available through WormBase. Most of the genomic sequences are nonrepetitive, with only 11-20% being predicted to comprise repetitive elements; these elements include likely vestiges of horizontal DNA transfer to individual species. In nonrepetitive genomic DNA, we find a core set of 8.8K orthologous multi-exon protein-coding genes conserved in all species examined, with 5.5K genes variably lost in different individual taxa. The evolutionary tree for 245 syntenic strict orthologs of protein-coding genes supports the previous consensus phylogeny, though it indicates rapid radiation of some taxa. Both intron length and codon bias vary substantially between orthologous protein-coding genes in different Caenorhabditis species. There are three prevalent core promoter motifs (TATA, SL1, and Sp1), along with extensive conservation of gene-specific cis-regulatory motifs such as AIY and DAF-12/M-2. Most sex-determination genes are strictly conserved, with
sdc-2,
dpy-21, and
fog-3 showing increased change in hermaphroditic species. We are currently defining which regions of the genome assemblies are most likely to represent minor alleles from unresolved heterozygosity in outbreeding species; given this, we expect to be able to resolve which features of hermaphroditic genomes (ranging in size from 90 to 104 Mb) are being consistently lost with respect to gonochoristic genomes (ranging in size from 130 to 190 Mb).