Computational Genomics Research

Shown is Joshua Miller, lead softward architect for the Genomic Data Commons, University of Chicago, looking at computer screens of GDC data.

The Genomic Data Commons (GDC) was launched at the University of Chicago on June 6, 2016. GDC is a unified data system that promotes sharing of genomic and clinical data between researchers.

Credit: Univ. of Chicago

Computational genomics applies algorithms and statistical models to big datasets. CCG generates large genomic and clinical datasets through the Genome Characterization Pipeline, shares data through the Genomic Data Commons (GDC), and makes data accessible on commercial clouds by partnering with the NCI Cloud Resources. Members of CCG’s Genomics Data Analysis Network and external researchers from around the world develop and apply a range of computational techniques to analyze data in the GDC.

Key Questions

  • Can analyzing cancer genomic datasets compiled from an exceedingly large number of patients increase our power to discover new cancer driver mutations?
  • What are the best ways to display cancer genomic data such that cancer researchers can explore and visualize large, complex datasets?
  • How can investigators effectively integrate data from multiple modes of genomic analysis into a unified view of oncogenic pathways?
  • What new technologies provide fresh views of cancer mechanisms, such as single cell DNA and RNA sequencing?

Tools and Methods

Computational genomics applies algorithms and statistical models to big datasets. CCG generates large genomic and clinical datasets through the Genome Characterization Pipeline, shares data through the Genomic Data Commons (GDC), and makes data accessible on commercial clouds by partnering with the NCI Cloud Resources. Members of CCG’s Genomics Data Analysis Network and external researchers from around the world develop and apply a range of computational techniques to analyze data in the GDC.

Programs and Collaborations

NCI Cloud Resources

Three Cloud Resources (formerly Cancer Genomic Cloud Pilots), developed and implemented through contracts from NCI’s Center for Biomedical Informatics & Information Technology, democratize access to CCG data by making the data available in a secure, compliant, and scalable cloud environment. The Cloud Resources reduce the costs of large-scale computation and enable all researchers to perform computationally-intensive genomic analyses. Read our blog post discussing how Cancer Genomics Cloud Pilots democratize and expand access to CCG's data.

Genomic Data Commons

CCG established the NCI Genomic Data Commons as a unified knowledge base that integrates and stores diverse datasets from CCG’s programs as well as data from independent submitters. The GDC facilitates investigation of its robust genomic data by enabling users to repeatedly mine the data and combine it with data from their own research or from third parties, and by providing researchers with co-located visualization and analysis tools.

  • Posted: August 4, 2017

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “Computational Genomics Research was originally published by the National Cancer Institute.”

We welcome your comments on this post. All comments must follow our comment policy.