There has been a disturbing tendency in recent times to have gene names that are overly descriptive. We do not have an endless supply of three-letter tags (17,576 to be precise), whereas numbers stretch out into infinity. For example, there is a number of single-member gene names (
anc-1,
caf-1, ...) which is just plain wasteful. Impressive lists of dafs, dpys, lins, and uncs were not built up by trivially worrying about information content. Where would we be today if we distinguished between daf-constitutives and daf- defectives, muscle uncs and neural uncs, Muvs and Vuls, plain dpys and dosage-compensation dpys? I propose we start some rationalization. Firstly, all the xols, sdcs, dosage compensation dpys, hers, tras, sems, sogs, oocs, sers and spes can become sex. Come to think of it, we can probably throw in ehas, glps, mabs, msps, pals, plgs, sels, sems and vits. This would give a nice
sex-1 through
sex-100 or thereabouts, with a gene-name compaction ratio (g.c.r.) of 19. Then we have a lot of enzymes, cytoskeletal components and suchlike. These can be conveniently recalled by the new designator cel. This takes care of ace, act, ama, ben, cad, cal, cha, deb, ges, gus, hch, kin, mlc, myo, nuc, phm, rpl, rpo, rrn, rrs, the rtxs, sup and sus, to produce
cel-1 to
cel-63 (gcr=22), by my reckoning. Notice that this set was particularly guilty of a low average former gene multiplicity ( f.g.m.) of about 3 versus about 5 for the sex set. Next to go are all the drug resistance names (cas, kra, lan, lev, and tpa), now
drg-1 to
drg-11 (gcr=5, fgm=2). With a clear conscience I think behavioral names can be fused with unc, to create the splendid exi for excitable cells. This means aex, bor, che, das, deg, eat, egl, enu, exp, mec, osm, pbo, sns, tax, ttx and unc become
exi-1 to
exi-256 (gcr=16, fgm=16). Now some might argue that this f.g.m. is a bit high, and therefore the compaction not warranted. But in light of the high g.c. r., and the fact that it is far more traditional to have gene names of lower information content, I think we should stick with it. Moving quickly now, ali, bli, dpy, lon, mor, rol, sma and sqt become
shp-1 to
shp-56 (for shape, gcr=8, fgm=7); anc, ced, ces, clr, dig, mig and ncl become
nom-1 to
nom-24 (for nomarski, gcr=7, fgm=3); cat, flu, pup and srf become
stn-1 to
stn-15 (for staining, gcr=4, fgm=4). Genes defined by cloning become
dna-1 to
dna-26 (from clb, col, hsp and uvt; gcr=4, fgm=6.5); things affecting development become
dev-1 to
dev-68 ( from gro, ham, lin, mab and vab; gcr=5, fgm=13); things killing worms are now
kil-1 to
kil-359 (from age, emb, let, mel, par, zyg; gcr=6, fgm=60-attests to excellent example set by let). This leaves just him, mei, mut, rad and rec, which become
chr-1 to
chr-30 (gcr=5, fgm=6). The astute reader will note a number of arbitrary assignments, such as ama into cel rather than drg, and pal into sex rather than dev. The astute reader is quite right. After all, do not drugs affect things in cells? Do not things that affect sexual development merely form a subset of all those things which affect development? Thus, in the interests of consistency, I'm forced to propose that drg and stn be absorbed into cel; and sex, shp, nom and kil be absorbed into dev. This yields
cel-1 to
cel-89,
dev-1 to
dev-544,
exi-1 to
exi-256,
dna-1 to
dna-26 and
chr-1 to
chr-30. These last two seem woefully underrepresented, so they may as well be joined with cel to produce
cel-90 to
cel-145. It is sadly true that in reality, all things that affect development or excitable cells are really only affecting cells, and are all components of cells. Hence I think it is inevitable that exi and dev fuse into cel to produce the final
cel-1 to
cel-945. Note also that one can construe 'cel' to mean C. elegans as well, useful in case any restriction enzymes are discovered in C. elegans tissues. I think this is a rather flexible final arrangement, one which can encompass most of the likely future mutant descriptions, and of course one which can tolerate considerable expansion.