A prototype automated DNA sequence analysis system,
gm2 is available to laboratories interested in serving as external test sites.
gm2 consists of a set of pattern recognition and statistical analysis modules, together with a geometric modeling system. It accepts as input a DNA sequence, consensus matrices for locating splice sites, translational start sites, and polyadenylation sites, match-quality cutoff values for consensus searches, and base frequency and codon usage standards for coding regions and introns. It produces as output schematic models of the possible genes contained in the sequence that show the locations of the coding sequences, introns, and control signals; it also produces translations of each of the gene models into amino acid sequences. The current version of
gm2 generates all possible models of the gene content of a sequence that are consistent with the input parameters. It is capable of analyzing sequences containing partial genes or multiple genes as well as sequences containing a single complete gene.
gm2 is implemented entirely in C. It employs a simple, prompt- driven user interface; it can also accept input from a file. It prints output to ASCII files.
gm2 can be run from a conventional text- only terminal. The system has been tested on Sun 3 and Sun 4 workstations. It will run on a Sun 3/50 with 4 Mb memory; a larger memory improves performance. The system has been tested on a number of C. elegans sequences in the 10 kb size range, and on composite sequences of up to 40 kb. Complete and correct models of multi-exon genes, e.g.
myo-2 and
unc-15, can be generated on a Sun 3 in run times ranging from less than 1 minute to roughly 30 min, depending on the search parameters used. Runs on the Sun 4 are approximately four times faster.
gm2 is available to university or other nonprofit laboratories, under the condition that they do not redistribute the software. Users of the software will be asked to supply us with results of running
gm2, descriptions of problems that are encountered and suggestions for improvements. Source code and documentation for
gm2, and for ...