Brain as an Information Machine

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Science 24 July 2009: Vol. 325. no. 5939, pp. 429 - 432

Transcriptional Regulatory Circuits: Predicting Numbers from Alphabets

Harold D. Kim,¹ Tal Shay,² Erin K. O’Shea,¹ Aviv Regev²

¹ Howard Hughes Medical Institute, Harvard University Faculty of Arts and Sciences Center for Systems Biology, Departments of Molecular and Cellular Biology and Chemistry and Chemical Biology, Cambridge, MA 02138, USA.
² Department of Biology, Massachusetts Institute of Technology, and Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.

(paraphrase)

Transcriptional regulatory circuits govern how cis and transfactors transform signals into messenger RNA (mRNA) expressionlevels. With advances in quantitative and high-throughput technologiesthat allow measurement of gene expression state in differentconditions, data that can be used to build and test models oftranscriptional regulation is being generated at a rapid pace.Here, we review experimental and computational methods usedto derive detailed quantitative circuit models on a small scaleand cruder, genome-wide models on a large scale. We discussthe potential of combining small- and large-scale approachesto understand the working and wiring of transcriptional regulatorycircuits.

Genome-wide sequencing and profiling efforts have spurred thedevelopment of numerous approaches to reconstruct cis regulatoryfunctions that explain observed expression levels by the type,number, and organization of cis regulatory elements in promoters.Linear, Bayesian, and thermodynamic approaches have been usedand each reflects a different set of assumptions on the biochemicalbasis of transcriptional regulation. Linear modelsassume that the expression output is a linear combination ofcis-element inputs. Probabilistic Bayesian approachescan handle the noisy nature of large-scale data and are ableto capture the combinatorial logic and organization of promotersby modeling combinations of motifs, as well as their relativedistance and orientation. Some studies cast a linear modelin a probabilistic setting, combining the benefit of both approaches. More realistic thermodynamic models have recently emerged.For example, expression patterns in Drosophila segmentationwere predicted by calculating the probabilities of all possibleconfigurations of trans factors on the cis regulatory sequenceand summing their contributions to expression. Sequencepreferences for nucleosomes and transcription factors were usedto predict expression in yeast. However, thermodynamicmodels have not yet been applied on a genomic scale to testtheir general power. Genome-wide inference of cis regulatoryfunctions critically depends on accurate and comprehensive detectionof cis elements in DNA sequences—this is a notoriouslydifficult problem, especially in higher organisms.

Estimating the success of the models in predicting gene expressionis a challenging task. Ideally, large-scale models should betrained on one data set and then tested for their ability togeneralize to unseen data. However, most data sets are of limitedscale, and systematic experimental follow-up is lacking. Indeed,many studies do not report such objective success rates, whereasothers use various measures to assess the quality of their prediction.For example, models that predict module assignment for eachgene typically report the percentage of correct assignment ofcoregulated genes or genes that share the same function withsuccess rate reported from ~30% to 73% in studiesin yeast. Other works report the likelihood of the data giventhe model, but this measure is hard to compare betweenmodels of different complexity. The most quantifiable successlevel to date is reported for models that predict the actualexpression of each gene; they typically report the percentageof the variance in gene expression that is accounted for bythe model. Both Bayesian and linear models that predict geneexpression from cis regulatory sequences have reported highrates of success in yeast [e.g., 51% and 52 to 72%].However, the success of similar approaches in mammalian cellshas been much more modest [e.g., 6% to 24%].It is important to consider the amount of expression that wecan expect to explain with a particular model. A recent studywith synthetic promoter libraries in yeast estimated that cisregulation can explain at most 65% of the variance in expressionand that a thermodynamic small-scale model explains 44 to 59%. Similar work in the urochordate Ciona explains 30 to 89%of the variance at the cis level. Establishing standardapproaches and data sets on which we can compare the performanceof different models is an important goal. The DREAM projectaims to achieve such a fair comparison by posting challengesfor the community.

Cis input alone cannot predict expression, because dynamic expressionpatterns change with environmental conditions, cell type, andcell cycle stage. These changes are accompanied by correspondingchanges in trans factors,

The level, activity, and wiring of trans factorscan be inferred from mRNA levels because many trans regulatorsare embedded within transcriptional feedback regulatory loops.

In oneapproach, temporal expression data were collected to characterizeprocesses such as cell differentiation and responses to environmentalstimuli in mammalian systems, which showed that thetranscriptional program was propagated by sequential waves oftranscription controlled by different TFs. However, inferringregulatory activity from output levels limits the model’sability to distinguish between causality and correlation.

The most comprehensive and successful model in animals, thusfar, has been reconstructed in the sea urchin, including a completemodel of the gene regulatory network that specifies the skeletogenicmicromere lineage. In this model organism, systematic manipulationand measurement of trans factors, cis elements in promoters,and mRNA output measures were combined to devise a detailedvalidated model of gene regulation.

Despite advances from small-scale and large-scale analysis ofgene regulation, most studies do not yet bridge the gap betweenthese approaches. Small-scale approachescan generate fine, realistic details and extensive validation,but are limited to a few genes (often one). However, large-scaleapproaches examine many genes, but often rely on regulationfunctions that are biologically unrealistic (e.g., Boolean logicor linearity) and lack validation. Findings from large-scalestudies can be followed up in detail with small-scale approaches.For instance, the behavior of a yeast regulon observed in mRNAprofiles was explained by small-scale studies on a single inputcircuit controlling a representative target gene. Bayesiannetwork approaches, originally developed to study mRNA profiles, have been successfully applied to single-cell measurementsof signaling pathways.

A useful stepping stonewill be to construct well-defined subnetworks ("toy-models"),and then to study their regulation and dynamics. The fundamental challenge for all approaches may be buildingmodels that truly generalize to novel states. How successful we can be may depend on our understanding ofthe underlying features of nonlinearity and combinatorial effectsin gene-regulation functions.

Transcriptional circuits are tightly coupled to signaling,metabolic, and localization systems that are part of the complexthree-dimensional organization of cells and organisms. It isthe way in which this complex system processes information andexecutes functions that ultimately determines the phenotype.

(end of paraphrase)