Scientific Understanding of Consciousness
Genotype-Tissue Expression (GTEx) pilot analysis
Science 8 May 2015: Vol. 348 no. 6235 pp. 648-660
The Genotype-Tissue Expression (GTEx) pilot analysis
The GTEx Consortium
Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.
Over the past decade, there has been a marked increase in our understanding of the role of genetic variation in complex traits and human disease, especially via genome-wide association studies (GWAS) that have cataloged thousands of common genetic variants affecting human diseases and other traits. However, the molecular mechanisms by which this genetic variation predisposes individuals to disease are still poorly characterized, impeding the development of therapeutic interventions.
The majority of GWAS variants are noncoding, likely manifesting their effects via the regulation of gene expression. Thus, characterization of the regulatory architecture of the human genome is essential, not only for understanding basic biology but also for interpreting GWAS loci. Expression quantitative trait locus (eQTL) analysis is the most common approach used to dissect the effects of genetic variation on gene expression. However, comprehensive eQTL data from a range of human tissues are lacking, and eQTL databases are biased toward the most accessible tissues. Additionally, although many regulatory regions act in a tissue-specific manner; it is unknown whether genetic variants in regulatory regions have tissue-specific effects as well. Complex diseases are often caused by the dysfunction of multiple tissues or cell types, such as pancreatic islets, adipose, and skeletal muscle for type 2 diabetes, so it is not obvious a priori what the causal tissue(s) are for any given GWAS locus or disease. Hence, understanding the role of regulatory variants, and the tissues in which they act, is essential for the functional interpretation of GWAS loci and insights into disease etiology. The Genotype-Tissue Expression (GTEx) Project was designed to address this limitation by establishing a sample and data resource to enable studies of the relationship among genetic variation, gene expression, and other molecular phenotypes in multiple human tissues. To facilitate the collection of multiple different tissues per donor, the project obtains recently deceased donors through consented next-of-kin donation, from organ donation and rapid autopsy settings. The results described here were generated during the project’s pilot phase, prior to scaling up collection to 900 donors.
We recruited 237 postmortem donors, collecting an average of 28 tissue samples per donor spanning 54 distinct body sites. Blood-derived DNA samples were genotyped at approximately 4.3 million sites, with additional variants imputed using the 1000 Genomes phase I, resulting in ~6.8 million single-nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) of ≥5% after quality control.
We performed 76–base pair (bp) paired-end mRNA sequencing on a total of 1749 samples, of which 1641 samples from 43 sites, and 175 donors, constituted our final “pilot data freeze” reported on here. Median sequencing depth was 82.1 million mapped reads per sample. The final data freeze included samples from 43 body sites: 29 solid-organ tissues, 11 brain subregions (with two duplicated regions), a whole-blood sample, and two cell lines derived from donor blood [EBV-transformed lymphoblastoid cell lines (LCLs)] and skin samples (cultured fibroblasts). Median sample size for the nine high-priority tissues was 105; median sample size for the other 34 sampled sites was 18.5.
Hierarchical clustering demonstrated that expression profiles accurately recapitulate tissue type, with blood samples forming the primary outgroup. The multiple brain regions cluster strongly together as a single unit, but among those the 11 individual subsampled regions are less distinct. The most distinct brain region is the cerebellum with preservation method having little impact on that signal.
A primary goal of the GTEx project is to identify eQTLs for all genes for a range of human tissues. Because of our small sample sizes, we primarily examined eQTLs that act in cis to the gene (cis-eQTLs), as the expected effect size of trans-eQTLs is too low to be efficiently detected at this time.
Consistent with previous work, the majority of the significant cis-eQTLs clustered around the TSS of target genes in all nine tissues.
The specificity or sharing of eQTLs among different tissues and cell types is of considerable biological interest, yielding insights into differential genetic regulation among tissues.
We used allele-specific expression (ASE) of genes to indirectly estimate the overall effect of cis-regulatory variants on the expression of nearby genes. Individuals that are heterozygous for a cis-regulatory variant may differentially express each of the two alleles of the affected gene.
We have described a large in-depth data set of multitissue human gene expression. We assessed the variability of the transcriptome among individuals in a large number of tissues at a resolution that provides unique insights in to the diversity and regulation of gene expression among tissues. This analysis provides a unified view of genetic effects on gene expression across a broad range of tissue types, most of which have not been studied for eQTLs previously. We look forward to scaling up the resource to create a data set that will transform our understanding of how genetic variability influences different tissues and biological systems and ultimately complex diseases.
[end of paraphrase]
Return to — Autism Spectrum Disorder