Computational Genomics Research
Computational genomics research, also called bioinformatics, derives knowledge from large genomic datasets using statistical and computational approaches. With expert programming, computers can identify exceedingly complex relationships in cancer genomic databases that contain as much information as hundreds of thousands of DVDs, file cabinets piled nearly to the moon, or years of HD video streaming. In addition to the size, further complexity is added to the data by multiple genomic platforms, including analysis of DNA sequence, chromosomal copy number, rearrangements, RNA expression, and epigenetics. Computational research on The Cancer Genome Atlas (TCGA) has already enabled the development of algorithms that can improve cancer diagnostics and personalize treatment.
- Can analyzing cancer genomic datasets compiled from an exceedingly large number of patients increase our power to discover new cancer driver mutations?
- What are the best ways to display cancer genomic data such that cancer researchers can explore and visualize large, complex datasets?
- How can investigators effectively integrate data from multiple modes of genomic analysis into a unified view of oncogenic pathways?
- What new technologies provide fresh views of cancer mechanisms, such as single cell DNA and RNA sequencing?
Tools and Methods
Computational genomics applies algorithms and statistical models to big datasets. CCG generates large genomic and clinical datasets through the Genome Characterization Pipeline, shares data through the Genomic Data Commons (GDC), and makes data accessible on commercial clouds by partnering with the Cancer Genomics Cloud Pilots. Members of CCG’s Genomics Data Analysis Network and external researchers from around the world develop and apply a range of computational techniques to analyze data in the GDC.
Programs and Collaborations
Three Cloud Pilots, developed and implemented through contracts from NCI’s Center for Biomedical Informatics & Information Technology, democratize access to CCG data by making the data available in a secure, compliant, and scalable cloud environment. The Cloud Pilots reduce the costs of large-scale computation and enable all researchers to perform computationally-intensive genomic analyses. Read our blog post discussing how Cancer Genomics Cloud Pilots democratize and expand access to CCG's data.
CCG established the NCI Genomic Data Commons as a unified knowledge base that integrates and stores diverse datasets from CCG’s programs as well as data from independent submitters. The GDC facilitates investigation of its robust genomic data by enabling users to repeatedly mine the data and combine it with data from their own research or from third parties, and by providing researchers with co-located visualization and analysis tools.