Questions About Cancer? 1-800-4-CANCER
NCI Perspective

The benefits of looking across many cancer genomes: A perspective

  • Posted: September 27, 2013
  • Updated: August 12, 2014

For much of the 20th century cancer had been thought of as  not a single entity, but rather as more than 100 complex and distinct diseases, with most individual cancer types demanding unique treatment strategies. The need for a more modern, comprehensive understanding led, in part, to the launch in 2006 of The Cancer Genome Atlas (TCGA), a joint venture supported by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), both part of NIH.

From the outset, TCGA has been generating and analyzing data according to the organ in the body from which a tumor first arose. By 2014 the TCGA Research Network had published nearly a dozen papers examining genomic changes in tumor types, including two manuscripts in the summer of 2014 on gastric and lung adenocarcinomas each a comprehensive characterization of a type of cancer. The organ-specific findings have been revealing, providing new information on cancer development and behavior, as well as new insights into molecular pathways and genetic alterations.

Just as importantly, in many cases, researchers have uncovered shared molecular patterns among cancers, including similar genomic changes occurring across tumor types. For example, TCGA’s 2012 breast cancer analysis found evidence that a subtype of breast cancer shows marked similarity to a form of ovarian cancer. The subtype, basal-like breast cancer, and high-grade serous ovarian cancer shared similar mutation characteristics as well as other genomic features, suggesting that the two cancers are of a similar molecular origin and may respond to the same treatments. In fact, basal-like breast cancer has more similarities, genomically speaking, to high-grade serous ovarian cancer than to other subtypes of breast cancer.

Line graph with parabolic line and shading beneath it showing long tail on far right of graph with long tail region shaded in yellow and higher frequencies on right shaded in light green
Long tail diagram depicts the individual mutations, ordered by number of occurrences on the x-axis and number of mutations (frequency) on the y-axis.

In addition, data from TCGA’s analyses show that most cancer types possess a great number of mutations that occur at a low frequency. The collection of mutations has been termed the “long tail” because in a graph of the frequency of specific changes, they represent a lengthy but low section of the chart. Scientists have found that some of these mutations are shared by sets of tumor types. The long tail and the shared inter-tumor molecular patterns are the first suggestions that a cross-tumor analysis may yield clinically meaningful new findings.

With such similarities increasingly apparent, TCGA researchers developed a formal project for a cross-tumor analysis, called the Pan-Cancer project. Its goal was to assemble TCGA’s wealth of data across tumor types, analyze and interpret those data, and finally, make both the analyses and the data available. The group, led by Joshua M. Stuart, Ph.D., University of California Santa Cruz, analyzed 12 cancer types, whose selection was based upon numbers of samples available and comprehensiveness of the data as of 2012. The 12 types were glioblastoma multiforme, acute myeloid leukemia, lung squamous cell carcinoma, lung adenocarcinoma, colon and rectal adenocarcinomas, head and neck squamous cell carcinoma and ovarian, breast, clear cell kidney, endometrial and bladder cancers. Because of the breadth and depth of TCGA data, the Pan-Cancer group believed the analysis would have statistical power to detect genomic changes across the cancers, find changes specific to each organ-of-origin, and identify molecular commonalities across tumor types.

The culmination of this effort has been a series of manuscripts tied together through “threads” featured on the website ofNature (see http://www.nature.com/ng/journal/v45/n10/full/ng.2780.html), similar to what was done for papers resulting from the similarly expansive project, the Encyclopedia of DNA Elements (ENCODE). Each thread is centered on a specific theme and comprises relevant information across papers and journals. “Each cross-cutting piece offers perspectives on a topic discussed in several papers. This is a new exciting way to organize information to help bring out themes that unite the work,” Dr. Stuart said. To this purpose, Dr. Stuart described the genesis, makeup, goals and promise of the Pan-Cancer project in a commentary online Sept. 26, 2013, in Nature Genetics. As TCGA collects and analyzes more samples, researchers continue to  be better able to detect rare mutations that apply to numerous tumor types.

This new perspective of analyzing cancers according to their genomic profiles signals a shift away from organizing cancer by organ of origin. Many clinicians are beginning to imagine a future where cancers are described by their mutations, such as an ERBB2 amplified tumor or a PI3K-pathway mutant carcinoma. As Dr. Stuart wrote in the Nature Genetics perspective article, which accompanied two of the group of Pan-Cancer research papers, “Only time will tell whether the integration of molecular characteristics with data on histology, organ site and metastatic location will contribute to an improvement in patient outcomes. But the balance is shifting in this direction.”

Two research papers published in Nature Genetics as part of the first round of Pan-Cancer papers point to the benefits from this type of cross-cancer analysis. In one paper, researchers at Memorial Sloan-Kettering Cancer Center, New York, analyzed data on more than 3,000 tumor samples from 12 cancer types from TCGA, and determined that a limited number of genetic alterations are responsible for most cancer subtypes. These alterations, no matter what tissue they originated in, fall into two general categories of “oncogenic” signatures: genetic mutations and copy number changes (changes in the number of copies of genes in a cell), with many smaller subclasses. The scientists hope that these results will eventually help to tailor treatment strategies to subsets of patients, resulting in clinical trials based on matching individual patients whose tumors have been profiled – and oncogenic signatures identified and classified – with a corresponding drug or combination of therapies.

The second study examined patterns of changes in the number of gene copies in cells, one of the most common types of mutations that lead to cancer. Investigators at The Broad Institute, Cambridge, Mass., and Dana-Farber Cancer Institute, Boston, and elsewhere, compiled a database of gene copy number changes across genes in 5,000 tumors and 12 cancer types. They used this database to identify 140 regions where these mutations tend to occur most often, pointing to genes in these regions that are likely to contribute to cancer formation. The scientists showed that the patterns of mutations can provide clues as to how these genes contribute to cancer. These findings clarify how cancers develop, identify genes that are particularly important in cancer initiation and may serve as effective therapeutic targets.

A grid of blue and red rectangles, with Pan-Cancer integrated subtypes with high scores depicted in red, indicating over-activity, while low scores are depicted in blue, indicating lower activity compared to normal, and with lines interconnecting commonalities
Representation of integrated tissue subtypes (e.g., a BRCA breast gene subtype is depicted in the third row from the bottom) compared to various pathways, depicted vertically ; scores are for the Pan-Cancer integrated subtypes with high scores (red) indicating over-activity while low scores (blue) indicating lower activity compared to normal.

The Pan-Cancer project also suggests directions for the future of cross-tumor analysis, several of which are highlighted in Dr. Stuart’s commentary article. These may include integrating data sources to increase the power of genomic analyses, using molecular profiles to categorize cancers for making treatment decisions, figuring out if “predictive signatures” derived from genes transcend tissue types and determining whether or not comprehensive protein analyses using tools such as mass spectrometry can extend the power of genomic analyses from TCGA.

In August 2014, two additional studies were published that provide even greater insight into Pan-Cancer efforts and TCGA data analyses collectively. In one of them, Dr. Stuart, Charles Perou, Ph.D. (University of North Carolina, Chapel Hill), and colleagues, used six different types of analyses to examine the molecular characteristics of more than 3,500 tumors across 12 different tumor types to see how they compared to each other. The researchers showed that the cancers were more likely to be genetically and molecularly similar based on the type of cell in which the cancer originated as opposed to the tissue type in which it originated.

In the study, investigators identified 11 integrated cancer subtypes. While many of the subtypes had molecular profiles linked to their tissue of origin, there were some subtypes that had to be split further, as tumors showed several different tissues of origin. For example, the study showed at least three different subtypes of bladder cancer, with one subtype closely resembling lung adenocarcinoma and another similar to head and neck and lung squamous cell cancers. The findings also confirmed differences in breast cancer subtypes and similarities to other cancers seen in earlier research. Basal-like breast cancers, the study found, looked more like ovarian cancer and cancers of squamous cell origin than other types of breast cancer. In revealing a new approach to classifying cancers, the authors suggest that at least one in 10 cancer patients could be classified differently by this new system.

In a second study, Roel Verhaak, Ph.D., M.D. Anderson Cancer Center, Houston, and colleagues reported results in the journal Oncogene suggesting that cancers from all origins can be classified according to a limited set of gene expression “footprints.” The researchers compared subtypes of gene expression across 12 tumor types. They identified eight gene expression “superclusters” characterized by the types of disease pathways that are commonly turned on in cancer, and similarities in the kinds of genes expressed. The investigators found, for example, that one of the largest superclusters involved an increase in the number of mutations in TP53, a commonly altered gene that promotes cell growth and proliferation. The supercluster was also marked by an absence of the gene CDKN2A, which helps control cell proliferation, increased numbers of DNA double strand breaks – a hallmark of many types of cancer – and higher-than-usual expression of cyclin B1 protein, which plays a role in cell division. These kinds of changes affect the cell’s DNA damage control response, and can turn on certain cell pathways that contribute to cancer development.

The researchers also saw a second pattern in nine of 11 solid tumor types they examined. They found a gene expression subtype related to tumor-associated stroma, which can provide a matrix on which tumors can grow, and can affect cancer growth. Taken together, these results suggest that tumors can be grouped according to their gene expression similarities as seen across tumor types, pointing to a limited number of molecular themes in cancer.

While follow-up studies are needed to confirm these findings and additional tumor samples and tumor types will expand the work, this past year’s worth of initial PanCancer-12 analyses lays the groundwork for a more detailed classification of tumors that include molecularly defined subtypes, unlike all prior cancer classification systems.