Genomic Data Commons
Cancer is fundamentally a disease of the genome, caused by mutations and other harmful genomic changes that alter its function and contribute to the malignant behavior of cancer cells. Genomic aberrations can influence the aggressiveness of tumors and the response of tumors to particular drugs.
The GDC contains standardized data from more than 33,000 patients with cancer. About 14,500 of those case are derived from large-scale NCI programs, such as The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET).
The GDC also includes data from about 18,000 cancer patients provided by Foundation Medicine, Inc., a molecular information company, and more than 1,000 patients with multiple myeloma contributed by the Multiple Myeloma Research Foundation (MMRF), a nonprofit advocacy organization.
These represent some of the largest and most comprehensive cancer genomics datasets in the world, together comprising more than three petabytes of data.
Breaking Down Research Barriers
As the success of the landmark TCGA program demonstrates, releasing the knowledge available in these huge datasets requires collaboration and data sharing across the cancer research community.
In today's cancer research framework, several barriers prevent most researchers from fully exploiting all of the genomic data that is available. Below is how the GDC addresses each of those barriers to facilitate progress:
- Genomic data from different projects, clinical trials, and cancer types are siloed in different locations with local management systems, making it difficult to share the data.
The GDC brings genomics datasets and associated clinical data into one location that any researcher may access.
- The data are often generated using different methods, so that even if researchers can access two different datasets, they cannot use both in a single study.
The GDC “harmonizes” the data, enabling datasets generated from different protocols to be studied side-by-side. Combining these datasets also increases their potential analytical power.
- Sophisticated analysis tools that allow researchers to derive useful knowledge from large, complex data sets are not available to all researchers.
The GDC offers Data Analysis, Visualization, and Exploration (DAVE) tools to empower the broader cancer research community. The latest analytic technologies are applied to GDC data, allowing researchers to select a custom cohort of patients to study, perform a variety of analyses, and produce publication-ready figures using an interactive web interface.
The GDC makes these data and tools available from secure servers operated by the University of Chicago Center for Data Intensive Science, and the NCI Cancer Genomics Cloud Pilots are making GDC data available in a secure cloud computing environment. By democratizing access to these resources, GDC makes it possible for any researcher to ask new and fundamental questions about cancer.
A Strong Foundation for Cancer Research
The NCI GDC is more than just a data repository; it is a cancer analysis system that continues to evolve by encouraging independent groups such as clinical research consortia, companies, and advocacy organizations to contribute their own cancer genomic data to the GDC. Submitters can use GDC tools to analyze their data and compare their results with other data sets in the GDC. Within 6 months, contributed data are made available to qualified researchers, thereby expanding the genomic and clinical data available to the cancer research community, deepening our understanding of cancer mechanisms and enabling advances in cancer diagnosis and treatment.
The GDC will also house data from a new era of NCI programs that will sequence the DNA of patients enrolled in NCI clinical trials. These datasets will lead to a much deeper understanding of which therapies are most effective for individual cancer patients.
With each new addition, the GDC will evolve into a smarter, more comprehensive knowledge base that will foster important discoveries in cancer research and increase the success of cancer treatment for patients.
This NCI initiative is being built and managed by the University of Chicago Center for Data Intensive Science, in collaboration with the Ontario Institute for Cancer Research and under a subcontract with Leidos Biomedical Research.
Learn more about the GDC from CCG Director Louis M. Staudt, M.D., Ph.D., and other GDC experts in their article entitled Toward a Shared Vision for Cancer Genomic Data, published in the New England Journal of Medicine. To find about more about the resources provided for data access, analysis, and submission, download the GDC Fact Sheet.