We determined the DNA sequences of several collagen genes with different expression patterns during development and compared them to the previously sequenced genes
col-1 and
col-2. The genes chosen for study were
col-1 and
col-14 which are expressed at varying levels throughout development,
col-2 and
col-6 which are dauer-specific and
col-7, e expressed primarily in animals molting into adults. Each gene is 1.0 to 1.2 kb in length and includes one or two short introns at variable positions. The presumptive promoter regions contain the expected eukaryotic TATA and CAAT sequences. The sequence TAT CTTTCTCTY TTCTTYCT (Y=C or T) is present 30 bp and 74 bp upstream of the CAAT box in
col-2 and
col-6, respectively. The sequence AAATTT YAYCAATRT TTATT AATT is present 203 and 183 bp upstream of the presumptive CAAT boxes in
col-7 and
col-19 ( R=A or G; the relevant region of
col-8 was not sequenced). The correlation between the presence of these sequences and the similar expression profiles of the relevant genes suggests that these sequences may be involved in the developmental regulation of the genes. The domain structure of the collagen polypeptides is similar to that determined for
col-1 and
col-2: each polypeptide contains two main triple-helix forming (Gly-X-Y)n domains, one of 30-33 amino acids, and the other of 127-132 amino acids. The latter domain is interrupted by short (2-8 amino acids) non-Gly-X-Y segments in each polypeptide. Sets of cysteine residues flank the (Gly-X-Y)n domains in all of the polypeptides. The genes can be placed into three families based upon structural features (overall protein length and organization of (Gly-X- Y)n domains), positions of cysteine residues and amino acid sequence homologies. The amino acid sequence homologies are most evident in the non-Gly-X-Y domains. As an example, the C-terminal tail sequences are shown below. Col-1 and
col-2 comprise one family,
col-6 and col- 14 comprise a second family and
col-8 and
col-19, with the less homologous
col-7, comprise the third family. Members of a family can be coordinately regulated as in the case of
col-8, ave different expression patterns as in the cases of
col-1 and
col-2 or
col-6 and
col-14.The codon usage in all of the genes is highly asymmetrical, with adenine appearing in the third position of 85% of the Gly codons and 93% of the Pro codons.