As of March 2001, 16175 yk clones have been sent to researchers around the world, in answer to 3463 requests. The main progress since the last worm meeting has been the isolation of full length cDNA libraries, selected by the cap and stage specific. All 83,000 cDNA clones and 9000 sequences collected from other sources including public databases were coaligned using the Acembly program on the 99.4 Mb of genome sequence (many thanks to the Consortium). We confirm the very high quality of the genome and estimate, from the unaligned clone proportion, that we are now missing only 0.7 to 1.2 Mb of genome. The basecall of the cDNAs was edited, and cloning artifacts, such as mosaic clones, internal deletions or inversions, internal oligo dT priming or unspliced RNA / DNA, were flagged and treated with appropriate care: these amounted to 1% of the clones from the first yk libraries, 5% of the capped clones and 2% of the clones in Genbank. As of today, we have cDNAs in Acembly/AceView for just over half of the C.elegans genes: 10126 genes produce 14154 transcripts through alternative splicing or alternative polyadenylation. We have started to annotate the proteins and to submit the results to the public databases. We will submit in priority the genes you ask for. We use names, such as 1K18 (
mec-8), that contain chromosome, megabase (letter), kilobase number and strandedness (even/odd) and allow easy distinction from the predictions. Indeed, contrary to the outstanding quality of the genome sequence, the Wormpep de novo predictions are usually incorrect: in a sample of 111 newly hit genes, only 33% were exactly correctly predicted: 7% were not predicted, the first and last exons were incorrect in 35% and 28% cases respectively (in particular 9% of genes touch two predictions), and finally, in the common region, 33% of the genes had at least one internal exon incorrectly predicted. Gene prediction remains a difficult problem. The new capped libraries are a huge step forward for the transcriptome project. More than 80% of the 12000 capped clones actually contain the entire mRNA, from 5' cap to polyA. Transpliced leaders are present in up to 65% of the genes, the remaining 35% genes are not transpliced (for example, collagens or surface proteins). We identify 12 types of transpliced leaders, possibly encoded by 30 SL genes clustered in 15 loci: each SL gene is followed in the genome by a strong donor site. Among the genes transpliced, 76% are transpliced to SL1 exclusively, the remaining 24% are transpliced to SL2/SL12. We confirm Tom Blumenthal et al 's hypothesis relating minor SLs to operons: the 700 genes with SL2/12 are most often located less than 300 bp downstream of an expressed gene in cis. The proportion of pure SL1 genes with such a close neighbor is 20 times lower, and, for genes not transpliced, 10 times lower. The specific transpliced leader used appears to be stage dependent: in particular, usage of the minor leaders increases from 5% in embryos to more than 25% in adults. In a compact genome such as C.elegans', close genes tend to be cotranscribed. Such physical constraints for coexpression of close genes during development could be tuned by controlled availability of minor leaders.