Computational Genomics Research

  • Resize font
  • Print
  • Email
  • Facebook
  • Twitter
  • Google+
  • Pinterest
Shown is Joshua Miller, lead softward architect for the Genomic Data Commons, University of Chicago, looking at computer screens of GDC data.

The Genomic Data Commons (GDC) was launched at the University of Chicago on June 6, 2016. GDC is a unified data system that promotes sharing of genomic and clinical data between researchers.

Credit: Univ. of Chicago

Computational genomics research, also called bioinformatics, derives knowledge from large genomic datasets using statistical and computational approaches. With expert programming, computers can identify exceedingly complex relationships in cancer genomic databases that contain as much information as hundreds of thousands of DVDs, file cabinets piled nearly to the moon, or years of HD video streaming. In addition to the size, further complexity is added to the data by multiple genomic platforms, including analysis of DNA sequence, chromosomal copy number, rearrangements, RNA expression, and epigenetics. Computational research on The Cancer Genome Atlas (TCGA) has already enabled the development of algorithms that can improve cancer diagnostics and personalize treatment.

Key Questions

  • Can analyzing cancer genomic datasets compiled from an exceedingly large number of patients increase our power to discover new cancer driver mutations?
  • What are the best ways to display cancer genomic data such that cancer researchers can explore and visualize large, complex datasets?
  • How can investigators effectively integrate data from multiple modes of genomic analysis into a unified view of oncogenic pathways?
  • What new technologies provide fresh views of cancer mechanisms, such as single cell DNA and RNA sequencing?

Tools and Methods

Computational genomics applies algorithms and statistical models to big datasets. CCG generates large genomic and clinical datasets through the Genome Characterization Pipeline, shares data through the Genomic Data Commons (GDC), and makes data accessible on commercial clouds by partnering with the Cancer Genomics Cloud Pilots. Members of CCG’s Genomics Data Analysis Network and external researchers from around the world develop and apply a range of computational techniques to analyze data in the GDC.

Programs and Collaborations

Cancer Genomics Cloud Pilots

Three Cloud Pilots, developed and implemented through contracts from NCI’s Center for Biomedical Informatics & Information Technology, democratize access to CCG data by making the data available in a secure, compliant, and scalable cloud environment. The Cloud Pilots reduce the costs of large-scale computation and enable all researchers to perform computationally-intensive genomic analyses. Read our blog post discussing how Cancer Genomics Cloud Pilots democratize and expand access to CCG's data.

Genomic Data Commons

CCG established the NCI Genomic Data Commons as a unified knowledge base that integrates and stores diverse datasets from CCG’s programs as well as data from independent submitters. The GDC facilitates investigation of its robust genomic data by enabling users to repeatedly mine the data and combine it with data from their own research or from third parties, and by providing researchers with co-located visualization and analysis tools.

  • Posted: August 4, 2017

Most text on the National Cancer Institute website may be reproduced or reused freely. The National Cancer Institute should be credited as the source and a link to this page included, e.g., “Computational Genomics Research was originally published by the National Cancer Institute.”

Please note that blog posts that are written by individuals from outside the government may be owned by the writer, and graphics may be owned by their creator. In such cases, it is necessary to contact the writer, artists, or publisher to obtain permission for reuse.

We welcome your comments on this post. All comments must follow our comment policy.