Skip to main content
An official website of the United States government

From Variants to Functions: New Strategies for the Interpretation of Cancer Genomes

, by JT Neal, Ph.D., Jesse S. Boehm, Ph.D., and William C. Hahn, M.D., Ph.D.

UMAP plot of single-cell RNA sequencing data

UMAP plot of single-cell RNA sequencing data from over 300,000 A549 cells expressing a library of 100 P53 variants seen in humans.

Credit: Oana Ursu

The large-scale sequencing of cancer genomes has identified hundreds of thousands of genetic variants present in human tumors. This variation is often composed of a small number of “driver” mutations, which contribute to tumor development and progression, buried amongst a large number of “passenger” mutations that confer little or no selective advantage to tumor cells. The function of most cancer variants remains unknown, and the ability to identify and link causal variants to disease biology and available drugs remains a key bottleneck to unlocking the potential of precision medicine in cancer. A major goal of the Dana-Farber Cancer Target Discovery and Development (CTD2) Center is the functional characterization of such variants through the development of key technologies and pipelines that will accelerate the translation of thousands of additional cancer variants into disease mechanisms and actionable therapeutic hypotheses.

A significant hurdle to accomplishing this goal is the lack of scalable assays to distinguish passenger mutations from driver mutations across different genes and cell types, a process that now often requires the development of bespoke assays for every gene (and sometimes each cell type as well) that one wishes to study. This often means that a graduate student or postdoc must spend years of their training on assay development, rather than on studying disease biology. In collaboration with investigators at the Broad Institute, our CTD2 center is developing new cellular profiling methods that use changes in a cell’s gene expression profile to assess whether a variant is impactful or not; an approach that overcomes such bottlenecks by enabling an investigator to test variants in multiple genes from multiple cancer types in a single experiment.

In two recent studies,1, 2 we demonstrated the effectiveness of this type of approach. In the first, we looked across cancer types and tested 474 mutant alleles curated from 5,338 tumors in pooled tumor formation assays and arrayed expression profiling. Using these methods, we were able to identify 12 transforming alleles - including two in genes (PIK3CB, POT1) that have not been previously shown to be tumorigenic. Additionally, several alleles that were found only once in 5,338 sequenced tumors still exhibited potent activity, demonstrating the importance of functional assays in determining variant impact. In the second study, we focused on lung adenocarcinoma.2 Here, we used similar approaches to characterize 194 somatic mutations including rare somatic clinically actionable variants in EGFR, ARAF, ERBB2, and BRAF.

We have also shown that transcriptional signatures can be used to quantitatively stratify allele impact into gain, loss, or change of function categories, providing additional resolution. Lastly, we demonstrated that transcriptional signatures can be used to predict more complex phenotypes such as drug-resistance and tumorigenesis. These studies demonstrated the value of combining functional assessment with genomic characterization for the identification of rare tumorigenic variants and serve as an important test case for the use of transcriptional signatures as a proxy for more laborious gene-specific cell-based assays.

We have now extended this work to develop new methods that use droplet-based single-cell RNA sequencing (Figure) instead of bulk RNA sequencing to assess variant impact in pooled format. These new methods will potentially enable massive increases in scale and resolution in cellular screens and allow the simultaneous assessment of large perturbation libraries across numerous cellular contexts in a single experiment. As a proof-of-concept, we have validated impactful mutations in the tumor suppressor gene TP53 and the oncogene KRAS using only single-cell transcriptional signatures as a readout, recapitulating phenotypic data that we had previously generated3 using more traditional gene-specific screening approaches. Our long-term goal is for these methods to ultimately enable massively multiplexed single-cell transcriptional profiling for the characterization of cancer variants, for high-dimensional readout of CRISPR knockout screens for genetic vulnerabilities, and for the broad characterization of common and rare genetic variation in human health and disease.

JT Neal, Ph.D., Jesse S. Boehm, Ph.D., and William C. Hahn, M.D., Ph.D.

JT Neal, Ph.D., Jesse S. Boehm, Ph.D., and William C. Hahn, M.D., Ph.D. from the Broad Institute

Moving forward, we are exploring whether other signature-based approaches, such as image-based optical profiling, can be used to complement our transcriptional profiling studies. Cellular morphology is a rich source of phenotypic information that can be used to interpret a wide range of normal and perturbed states, and recent advances in computational image analysis have dramatically increased the number of cellular morphological features that can be extracted in a single assay.

We have recently demonstrated that these features can be used to classify cancer alleles by function at the single-cell level, using only fluorescent images as a data source,4 at a small fraction of the cost of single-cell RNA sequencing. We are currently piloting approaches to compare image-based optical signatures to transcriptional signatures across a range of cellular perturbations to determine the degree to which these methods provide complementary and/or overlapping information about cellular biology.

In parallel, we are developing new methods to generate cancer variants de novo, using CRISPR-based strategies as an alternative to cDNA overexpression. In particular, we are utilizing CRISPR base editing, which enables the high-efficiency generation of DNA variants with single base pair resolution, without double-strand DNA cleavage.5,6 These editors will enable us to engineer variants in an even larger fraction of the genome, including those in large genes and non-coding regions, which can be difficult to study using overexpression-based approaches. We aim to adapt these editors for use in a wide variety of cell types, including primary cell lines and organoids derived from patient tumors, in order to enable modeling of rare variants and cancer types that are not represented in traditional cell line collections.

Taken together, these efforts represent a significant step towards the implementation of a first-generation variant-to-function pipeline that will dramatically accelerate the interpretation of genomic variants in cancer and other genetic diseases. Over the next five years, we envision that such a pipeline will enable the generation and functional characterization of tens to hundreds of thousands of cancer variants, moving us closer to our ultimate goals of understanding the functional impact of all genetic variation in cancer, and making precision medicine a reality for all cancer patients.


  1. Kim E, Ilic N, Shrestha Y, et al. Systematic functional interrogation of rare cancer variants identifies oncogenic alleles. Cancer Discovery. 2016 Jul;6(7):714-26. (PMID: 27147599)
  2. Berger AH, Brooks AN, Wu X, et al. High-throughput phenotyping of lung cancer somatic mutations. Cancer Cell. 2017 Dec 11;32(6):884. (PMID: 27478040)
  3. Giacomelli AO, Yang X, Lintner RE, et al.  Mutational processes shape the landscape of TP53 mutations in human cancer. Nature Genetics. 2018 Oct;50(10):1381-1387. (PMID: 30224644)
  4. Rohban MH, Singh S, Wu X, et al.  Systematic morphological profiling of human gene and allele function via Cell Painting. Elife. 2017 Mar 18;6. (PMID: 28315521)
  5. Komor AC, Kim YB, Packer MS, et al.  Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature. 2016 May 19;533(7603):420-4. (PMID: 27096365)
  6. Gaudelli NM, Komor AC, Rees HA, et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature. 2017 Nov 23;551(7681):464-471. (PMID: 29160308)
< Older Post

Epstein-Bar Virus (EBV) Status Identifies Distinct Burkitt Lymphoma (BL) Phenotype in Pediatric Endemic and Sporadic BL

Newer Post >

Systems Cancer Immunology for the Masses

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “From Variants to Functions: New Strategies for the Interpretation of Cancer Genomes was originally published by the National Cancer Institute.”