Genomic Data Commons
Cancer is fundamentally a disease of the genome, caused by mutations and other harmful genomic changes that alter its function and contribute to the malignant behavior of cancer cells. Genomic aberrations can influence the aggressiveness of tumors and the response of tumors to particular drugs.
The GDC contains standardized data from approximately 14,500 cancer patients that derive from large-scale NCI programs, such as:
- The Cancer Genome Atlas (TCGA)
- Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
These represent some of the largest and most comprehensive cancer genomics datasets in the world, together comprising more than two petabytes of data.
The GDC will also soon expand to provide data on over 30,000 cancer patients with the submission of data from about 18,000 cancer patients provided by Foundation Medicine, Inc., a molecular information company, and over 1,000 patients with multiple myeloma contributed by the Multiple Myeloma Research Foundation (MMRF), a non-profit advocacy organization.
Breaking Down Research Silos
As the success of the landmark TCGA program demonstrates, releasing the knowledge available in these huge datasets requires collaboration and data sharing across the cancer research community.
In today's cancer research framework, several barriers prevent most researchers from fully exploiting all of the genomic data that is available, impeding progress:
- Genomic data from different projects, clinical trials, and cancer types are siloed in different locations with local management systems, making it difficult to share the data.
- The data are often generated using different methods, so that even if researchers can access two different datasets, they cannot use both in a single study.
- Sophisticated analysis tools that allow researchers to derive useful knowledge from large, complex data sets are not available to all researchers.
The NCI GDC breaks down these barriers by bringing cancer genomics datasets and associated clinical data into one location that any researcher may access, and “harmonizing” the data so that datasets that were generated with different protocols can be studied side by side. Then, by making these data available using secure and compliant cloud technology through the NCI Cancer Genomics Cloud Pilots, the GDC will make it possible for any researcher to ask new and fundamental questions about cancer.
A Strong Foundation for Cancer Research
The NCI GDC is more than just a data repository; it continues to evolve by encouraging independent groups such as clinical research consortia, companies, and advocacy organizations to submit their own cancer genomic data to the GDC. Submitters can analyze the data that they contributed in concert with other data sets in the GDC, while expanding on the resources available to the cancer research community.
The GDC will also house data from a new era of NCI programs that will sequence the DNA of patients enrolled in NCI clinical trials. These datasets will lead to a much deeper understanding of which therapies are most effective for individual cancer patients.
With each new addition, such as that from Foundation Medicine, Inc. and others, the GDC will evolve into a smarter, more comprehensive knowledge base that will foster important discoveries in cancer research and increase the success of cancer treatment for patients.
This NCI initiative is being built and managed by the University of Chicago Center for Data Intensive Science, in collaboration with the Ontario Institute for Cancer Research and under a subcontract with Leidos Biomedical Research.
Learn more about the GDC from CCG Director Louis M. Staudt, M.D., Ph.D., and other GDC experts in their article entitled Toward a Shared Vision for Cancer Genomic Data, published in the New England Journal of Medicine. To find about more about the resources provided for data access, analysis, and submission, download the GDC Fact Sheet.