Skip to main content

Build a National Cancer Data Ecosystem

NCI has announced several funding opportunities that align with the Cancer Moonshot.

See Funding Opportunities

There has been rapid advancement in the development of databases and analytic tools in the cancer research and care communities. However, without an infrastructure for sharing and integrating these resources, important opportunities for novel insights are being missed. To accelerate progress against cancer, researchers, clinicians, and patients across the country need to collaborate by sharing their collective data and knowledge about the disease.

The goal of this recommendation is to create a National Cancer Data Ecosystem to enable and encourage all participants across the cancer research and care continuum to share, access, combine, and analyze diverse data, to enable new discoveries and reduce the burden of cancer.

The Cancer Data Ecosystem will be supported by a technical infrastructure that allows disparate data to be discovered, integrated, and shared by the cancer research and care communities. Interactive portals will provide access to these data as well as analytic capabilities, and enable researchers, patients, and clinicians to incorporate their own data, enriching available knowledge and advancing precision medicine in cancer.

NCI Cancer Research Data Commons (CRDC)

The NCI CRDC is a data science infrastructure that connects cancer research data collections with analytical tools, leveraging the elastic compute of the cloud. The CRDC is one component of the broader Cancer Data Ecosystem and is central to NCI’s activities that support the BRP recommendation. The CRDC can be used to store, analyze, share, and visualize cancer research data for improved understanding and new insights about cancer, and includes projects that are aligned with the objectives of the National Cancer Data Ecosystem called for by the Cancer Moonshot, including the NCI Genomic Data Commons (GDC), the NCI Cloud Resources, and the Data Commons Framework. In addition, NCI is supporting several research projects with Moonshot funding that provide foundational components of the CRDC such as the Imaging Data Commons, Infrastructure for Semantic Resources, and an initiative to broaden genomic data sharing across the research community.   

The GDC is a resource for sharing genomic and clinical data to facilitate research into understanding the genetic drivers of cancer. This knowledge will be used to support development of novel and precision cancer treatments. NCI is also working with the NCBI Sequence Data Delivery Pilot to broaden sharing of genomic data from NCI-funded research that is currently not available in the GDC. The Imaging Data Commons will be a resource for sharing and analyzing imaging data from clinical and basic cancer research studies.

The Data Commons Framework provides the core components for building and expanding the CRDC, including services for securing data, finding and annotating data, as well as user workspaces for analyzing data and sharing results. The Infrastructure for Semantic Resources provides tools and data dictionaries that help users understand research data and enables linkages across data repositories. This infrastructure is being expanded and will be available to the entire cancer research community to ensure interoperability between the CRDC repositories as well as other community resources.

The NCI Cloud Resources allow researchers to access and analyze large scale genomic data in the cloud using a variety of analytic tools and pipelines, without the need to download data to their local computer.

Ultimately, the CRDC will provide access to many other cancer research data types including proteomics, animal models, immune-oncology, and epidemiological cohorts. The goal of the NCI CRDC is to allow researchers, clinicians, and patients to share important data and resources for advances in cancer research.

In addition, NCI is supporting several other research projects with Cancer Moonshot funding that contribute to the ability for the cancer research community to share and analyze data:

Cancer Data Aggregator

To enable integration of data from CRDC repositories, a Cancer Data Aggregator (CDA) is being created that will support search and analysis across distinct data types. The CDA will allow researchers to combine data from diverse scientific domains and perform integrated analysis which can be shared with collaborators. The CDA will be supported by tools to ensure that data are described using common terminologies, which will support the discovery and linkage of data from different repositories.

Privacy Preserving Patient Record Linkage Software

An important aspect of data sharing and the National Cancer Data Ecosystem is the ability to link data at the patient level across disparate data sources, without exposing identifiable information. NCI is evaluating approaches for generating unique patient identifiers that will enable linkage of patient-level data from different sources without sharing identifiable information beyond organizations authorized to hold such information. The software-generated identifiers will preserve the private information of cancer patients who are sharing their data with the cancer research community.

NCI Office of Data Sharing

The NCI Office of Data Sharing was created in response to a BRP recommendation to advance the sharing of cancer research data. It coordinates data sharing policies across NCI and the cancer research community. This office manages NCI data submissions and access processes to online databases and provides education and outreach related to NCI data sharing policies. The NCI Office of Data Sharing also examines the uptake and use of NCI data.

NCI Genomics Evidence Neoplasia Information Exchange (GENIE) Supplements

NCI is promoting genomic and clinical data sharing by Cancer Centers through supplements to Cancer Centers that are part of the GENIE consortium. The American Association for Cancer Research (AACR) GENIE (Genomics Evidence Neoplasia Information Exchange) program, which includes NCI-supported Cancer Centers, is working to link genomic data with clinical outcomes from thousands of cancer patients. The program is also establishing standards for collecting and integrating clinical data from cancer patients.

The supplements support the sharing of the GENIE genomic and clinical data with the GDC from NCI-funded Comprehensive Cancer Centers involved in GENIE, ensuring that researchers outside the GENIE consortium also have access to this data.

Cancer Data Ecosystem Projects Awarded Cancer Moonshot Funding

Awarded Projects
Funding Opportunity Project Title Institution Principal Investigator(s)
Genomic Data Analysis Network: Processing Genomic Data Center Global Infrastructure for Collaborative High-Throughput Cancer Genomics Analysis Broad Institute, Inc. Getz, Gad A
  • Posted:

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “Build a National Cancer Data Ecosystem was originally published by the National Cancer Institute.”