The amino acid (aa) sequences of the polypeptides encoded by five collagen genes of the nematode Caenorhabditis elegans,
col-6,
col-7 (partial),
col-8,
col-14, and
col-19, were determined. These collagen polypeptides, as well as those encoded by the previously sequenced C. elegans collagen genes
col-1 and
col-2, share a common organization into five domains: an amino-terminal leader, a short (30-33 aa) (Gly-X-Y)n domain, a non(Gly-X-Y) spacer, a long (127-132 aa) (Gly-X-Y)n domain, and a short carboxyl-terminal domain. The domain organizations and intron positions of these polypeptides were compared with those of the polypeptides encoded by Drosophila and Strongylocentrotus type IV, and vertebrate types I, II, III, IV, and IX collagen genes; the C. elegans collagen polypeptides are most similar to the vertebrate type IX collagens. It is suggested that the collagen gene family comprises two divergent subfamilies, one of which includes the vertebrate interstitial collagen genes, and the other of which includes the invertebrate collagen genes and the vertebrate type IV and type IX collagen genes. Only the vertebrate interstitial collagen genes display clear evidence of evolution via the tandem duplication of a 54-bp exon.