Skip to main content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know: https://www.cancer.gov/coronavirus

Get the latest public health information from CDC: https://www.coronavirus.gov

Get the latest research information from NIH: https://www.nih.gov/coronavirus

Over 44,000 AACR Project GENIE Cases Available in the GDC

, by Louis M. Staudt, M.D., Ph.D.

Credit: The American Association for Cancer Research

NCI’s Genomic Data Commons (GDC) has released data for 44,756 cancer cases from American Association for Cancer Research's Project Genomics Evidence Neoplasia Information Exchange, more simply known as AACR Project GENIE. This massive project was launched in 2015 with the goal of building an international, pan-cancer registry with tens of thousands of patients to empower precision oncology.

The urgent need for broad data sharing in the cancer research community spurred the AACR, along with eight global academic leaders in clinical cancer genomics, to initiate AACR Project GENIE. By making the data available in the GDC, we’re making the data available to researchers in more ways, further expanding the utility of the data and potential impact of the project.

The data released in the GDC covers 294 unique cancer types, including many cases of rare bone and soft-tissue cancers new to the GDC. The contribution has more than doubled the number of cases in the GDC.

Harmonizing Across Different Labs and Platforms: A Technical Feat

More than a dozen targeted-sequencing platforms used by 8 institutions across 4 different countries are represented in the batch of data released at the GDC. And those numbers are increasing, as GENIE continues to add participating institutions and their respective cases.

I commend AACR for the huge amount of work devoted to harmonizing these diverse datasets from a broad collection of institutes. Bringing together data produced from different physical platforms and processed through different analytic methodologies requires heavy lifting, but the work is necessary to gain the numbers and diversity needed to comprise a valuable dataset.

Much of the harmonization work was done prior to integration into the GDC, by AACR Project GENIE in collaboration with their partners at Sage Bionetwork, led by Kristen Dang and Thomas Yu, and at Memorial Sloan Kettering Cancer Center (MSKCC), led by Stacy Thomas.

“We made sure to sit down and work with the original contributing sites to develop a common data model utilizing existing standards and definitions where possible,” says Jocelyn Lee, Lead Project Manager of AACR Project GENIE. “This was a key step in making a project of this magnitude work.”

Bringing Together Data Models for Integration into the GDC

Mapping between clinical vocabularies is known to be a particularly difficult task. But having done the legwork of establishing a well-thought-out model, AACR Project GENIE and their partners were able work with the GDC team to map GENIE data to the GDC data model with relative ease.

“Several cases were straightforward and mapped one-to-one, such as mapping values for sex, race, and ethnicity,” explained Kristen Dang, who is a Principal Scientist at Sage Bionetworks. “And in other cases, such as assay-level details, we had to go back and collect more structured information from GENIE or have the GDC create new data elements.”

As for genomic data, mapping processed data from GRCh37 to the GRCh38 reference human genome was essentially all that was required. Again, the relative simplicity of this integration was thanks to the harmonization work already done by AACR Project GENIE, Sage Bionetworks and MSKCC.

Greater Numbers for Finding Rarer Driver Mutations

This dataset is extremely large—about three times the number of cases collected by TCGA, NCI’s flagship cancer characterization program. Large sample numbers will enable the discovery of rare mutations in cancer drivers that are recurrent in a small percentage of patients. Such mutations would otherwise be ignored or hidden, but because they are recurrent, they can be implicated as pathogenic. 

For example, the AKT1-E17K mutation is only present in 3–5% of all cancers. AKT inhibitors are showing promise in estrogen receptor positive (ER+) breast cancers with this mutation—a mere 4% of the ER+ breast cancer population. GENIE is enabling researchers to find more of these rare mutations, analyze their clinicopathologic features, and inform the drug development process.

Studies like these will sharpen our precision medicine clinical trials going forward, as it is critical to be able to distinguish between driver mutations and passenger mutations that are functionally irrelevant. 

Understanding Oncogenes Across Different Contexts

The many distinct tumor types represented in the GENIE registry might also provide clues to how the same oncogenes function in different cellular contexts. For example, a recent study suggested that mutations in the prototypical oncogene BRCA may be biologically neutral in many cancer types outside of breast and ovarian.

We may find that the different cancer types contained in the GENIE registry recurrently acquire different mutations in the same oncogenes, and our understanding of this behavior will have implications for how we test and treat each patient.

We look forward to the many exciting, clinically actionable findings that are imminent to come from the GENIE dataset. Congratulations to AACR and their partners for their continued progress on this project and their efforts to promote cancer data sharing as an engine for cancer research.

Dr. Staudt serves on the external Scientific Advisory Board of the AACR Project GENIE.

< Older Post

Genome-wide Cell-Free DNA Fragmentation: A Potential Low-Cost Cancer Screening Method

Newer Post >

DREAM Challenges, Precision Medicine Algorithms, and Perspectives on Cancer Genomics Research: December OCG e-Newsletter

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “Over 44,000 AACR Project GENIE Cases Available in the GDC was originally published by the National Cancer Institute.”

Featured Posts

Archive