Scientific Understanding of Consciousness
Science 24 July 2009: Vol. 325. no. 5939, pp. 429 - 432
Transcriptional Regulatory Circuits: Predicting Numbers from Alphabets
Harold D. Kim,1 Tal Shay,2 Erin K. O’Shea,1 Aviv Regev2
1 Howard Hughes Medical Institute, Harvard University Faculty of Arts and Sciences Center for Systems Biology, Departments of Molecular and Cellular Biology and Chemistry and Chemical Biology, Cambridge, MA 02138, USA.
Transcriptional regulatory circuits govern how cis and trans factors transform signals into messenger RNA (mRNA) expression levels. With advances in quantitative and high-throughput technologies that allow measurement of gene expression state in different conditions, data that can be used to build and test models of transcriptional regulation is being generated at a rapid pace. Here, we review experimental and computational methods used to derive detailed quantitative circuit models on a small scale and cruder, genome-wide models on a large scale. We discuss the potential of combining small- and large-scale approaches to understand the working and wiring of transcriptional regulatory circuits.
Genome-wide sequencing and profiling efforts have spurred the development of numerous approaches to reconstruct cis regulatory functions that explain observed expression levels by the type, number, and organization of cis regulatory elements in promoters. Linear, Bayesian, and thermodynamic approaches have been used and each reflects a different set of assumptions on the biochemical basis of transcriptional regulation. Linear models assume that the expression output is a linear combination of cis-element inputs. Probabilistic Bayesian approaches can handle the noisy nature of large-scale data and are able to capture the combinatorial logic and organization of promoters by modeling combinations of motifs, as well as their relative distance and orientation. Some studies cast a linear model in a probabilistic setting, combining the benefit of both approaches. More realistic thermodynamic models have recently emerged. For example, expression patterns in Drosophila segmentation were predicted by calculating the probabilities of all possible configurations of trans factors on the cis regulatory sequence and summing their contributions to expression. Sequence preferences for nucleosomes and transcription factors were used to predict expression in yeast. However, thermodynamic models have not yet been applied on a genomic scale to test their general power. Genome-wide inference of cis regulatory functions critically depends on accurate and comprehensive detection of cis elements in DNA sequences—this is a notoriously difficult problem, especially in higher organisms.
Estimating the success of the models in predicting gene expression is a challenging task. Ideally, large-scale models should be trained on one data set and then tested for their ability to generalize to unseen data. However, most data sets are of limited scale, and systematic experimental follow-up is lacking. Indeed, many studies do not report such objective success rates, whereas others use various measures to assess the quality of their prediction. For example, models that predict module assignment for each gene typically report the percentage of correct assignment of coregulated genes or genes that share the same function with success rate reported from ~30% to 73% in studies in yeast. Other works report the likelihood of the data given the model, but this measure is hard to compare between models of different complexity. The most quantifiable success level to date is reported for models that predict the actual expression of each gene; they typically report the percentage of the variance in gene expression that is accounted for by the model. Both Bayesian and linear models that predict gene expression from cis regulatory sequences have reported high rates of success in yeast [e.g., 51% and 52 to 72%]. However, the success of similar approaches in mammalian cells has been much more modest [e.g., 6% to 24%]. It is important to consider the amount of expression that we can expect to explain with a particular model. A recent study with synthetic promoter libraries in yeast estimated that cis regulation can explain at most 65% of the variance in expression and that a thermodynamic small-scale model explains 44 to 59%. Similar work in the urochordate Ciona explains 30 to 89% of the variance at the cis level. Establishing standard approaches and data sets on which we can compare the performance of different models is an important goal. The DREAM project aims to achieve such a fair comparison by posting challenges for the community.
Cis input alone cannot predict expression, because dynamic expression patterns change with environmental conditions, cell type, and cell cycle stage. These changes are accompanied by corresponding changes in trans factors,
The level, activity, and wiring of trans factors can be inferred from mRNA levels because many trans regulators are embedded within transcriptional feedback regulatory loops.
In one approach, temporal expression data were collected to characterize processes such as cell differentiation and responses to environmental stimuli in mammalian systems, which showed that the transcriptional program was propagated by sequential waves of transcription controlled by different TFs. However, inferring regulatory activity from output levels limits the model’s ability to distinguish between causality and correlation.
The most comprehensive and successful model in animals, thus far, has been reconstructed in the sea urchin, including a complete model of the gene regulatory network that specifies the skeletogenic micromere lineage. In this model organism, systematic manipulation and measurement of trans factors, cis elements in promoters, and mRNA output measures were combined to devise a detailed validated model of gene regulation.
Despite advances from small-scale and large-scale analysis of gene regulation, most studies do not yet bridge the gap between these approaches. Small-scale approaches can generate fine, realistic details and extensive validation, but are limited to a few genes (often one). However, large-scale approaches examine many genes, but often rely on regulation functions that are biologically unrealistic (e.g., Boolean logic or linearity) and lack validation. Findings from large-scale studies can be followed up in detail with small-scale approaches. For instance, the behavior of a yeast regulon observed in mRNA profiles was explained by small-scale studies on a single input circuit controlling a representative target gene. Bayesian network approaches, originally developed to study mRNA profiles, have been successfully applied to single-cell measurements of signaling pathways.
A useful stepping stone will be to construct well-defined subnetworks ("toy-models"), and then to study their regulation and dynamics. The fundamental challenge for all approaches may be building models that truly generalize to novel states. How successful we can be may depend on our understanding of the underlying features of nonlinearity and combinatorial effects in gene-regulation functions.
Transcriptional circuits are tightly coupled to signaling, metabolic, and localization systems that are part of the complex three-dimensional organization of cells and organisms. It is the way in which this complex system processes information and executes functions that ultimately determines the phenotype.
(end of paraphrase)