Current attempts to infer regulatory networks from genome-wide expression data require motif detection algorithms to identify transcription factor (TF) binding sites in the regulatory regions of co-expressed genes. These algorithms typically use either over-represention or phylogenetic conservation to predict TF binding sites. While these approaches have been successful in yeast [1,2,5], in larger genomes motif detection is significantly more challenging, and will likely require comparative genomics. Several recent algorithms have begun to combine these two sources of information to predict functional sites more accurately. [1,2] However, many rely on an initial global alignment of orthologous regulatory regions as a preprocessing step. Unfortunately, TFs generally recognize sites that are short and degenerate, and global alignment may be overly restrictive for the detection of such sites. We have developed a non-alignment based Gibbs sampling [3,4] algorithm that searches for over-represented sequence elements, conserved across orthologous regulatory regions in related species. Motif composition is represented probabilistically by a position weight matrix and a posterior probability score is used to assess the significance of the motif found. The algorithm employs a stochastic search to converge to optima and unlike other alignment-based techniques, does not require a prior estimate of the evolutionary relationship between species sampled. We compare the performance of our algorithm on synthetic data to that of existing algorithms that employ either over-representation, phylogenetic conservation, or both. We further validate our motif predictions on in vivo binding data in yeast. [5] We present novel motif predictions from sets of coexpressed genes C. elegans and the related sequenced nematode species C. briggsae, and C. remanei. 1. Siddharthan R, Siggia ED, van Nimwegen E. PLOS Comp. Biol. 1,
e67 (2005). 2. Sinha S, Blanchette M, Tompa M. BMC Bioinf. 5, 170 (2004). 3. Roth FP, Hughes JD, Estep PW, Church, GM. Nat. Biotech. 16, 939-945 (1998). 4. Neuwald AF, Liu JS, Lawrence CE. Prot. Sci. 4, 1618-1632 (1995). 5. Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, et al. Nature 431, 99-104 (2004).