A brief tour of DAVE: The Genomic Data Commons analysis toolkit
September 12, 2017, by Zhining Wang, Ph.D.
At the end of June, the Genomic Data Commons (GDC) released DAVE: Data Analysis, Visualization, and Exploration Tools. As Project Officer for the GDC and Biomedical Informatics Specialist at NCI’s Center for Cancer Genomics (CCG), I am excited about the major milestone that DAVE represents.
When we launched the GDC, genomic and clinical data from thousands of cancer patients became accessible to researchers via download, but sophisticated tools and skills were still required to analyze the data. Our ultimate goal with DAVE is to make the data readily usable by anyone in the research community. With DAVE, users can perform instant analyses right in the web browser. That means a range of researchers and clinicians can integrate the data into their research without specialized bioinformatics knowledge or skills.
Getting Started with DAVE
To launch DAVE, go to portal.gdc.cancer.gov/exploration, or click the purple “Exploration” button from the main data portal. Importantly, there are two sets of tabs: on the left are tabs with filtering options, and towards the center are tabs with viewing or output options.
To view cases by primary site, select one or more sites under the Cases filter menu, such as “Breast” and “Ovary.”
Clicking on the Genes view tab in the center of the screen reveals the following for the selected cases: a plot of the most frequently mutated genes, a plot of the overall survival, and a table of mutated genes sorted by the percentage of affected cases.
Each gene entry in the table includes a Survival button, which will redraw the survival plot with patients grouped according to the presence or absence of a mutation in that gene.
Many studies show that the location and type of mutation can make a major difference in the mutation’s role in the disease. To view the effect of a particular mutation, go to the Genes filter menu on the left and type the name of a gene, e.g., “TP53.” Then go to the Mutations view tab in the center for the table of TP53 mutations. Click on the Survival button at the end of a row to view how a particular mutation affects survival in the cohort of patients under investigation.
Click on any DNA change and scroll to the bottom of the page for the protein “lolliplot.” This plot overlays the mutation and frequency information on top of the protein functional domains for the chosen transcript model.
Lastly, the OncoGrid displays the cases with the most mutations and the 50 most mutated genes.
In addition to visualizing the most frequently occurring mutations, this tool may also help identify combinations of mutations that may be biologically meaningful, such as the mutual exclusivity of EGFR and KRAS in lung adenocarcinoma.
To highlight this example, clear any previous selections and select “TCGA-LUAD” from “Project” under the Cases filter menu. Under the Genes filter menu, select “True” for “Is Cancer Gene Census.” Select the OncoGrid view. Along the left edge of the Oncogrid, drag and drop “EGFR” so that it is beneath “KRAS.” The cases will automatically reorder, revealing that there is only a single case with mutations in both genes.
What Sets GDC and DAVE Apart From Other Genomic Data Platforms?
While some other useful genomics tools exist, we are striving to make DAVE a uniquely valuable resource:
- The GDC is one of the most comprehensive sources of high quality, harmonized cancer genomics data. Over the last year, the GDC team has collected, quality checked, and processed over 450 TB of data–a feat in itself. All of the data are aligned to the latest genome build and processed with state-of-the-art, standardized pipelines, facilitating cross-project comparisons. In most cases, genomic data are associated with rich clinical data.
- Multiple robustly tested mutation caller pipelines. Identifying somatic mutations from various types of cancer sequencing data is a complex and continuously evolving craft. After extensive testing, our experts have selected four different methods to help ensure that we call a comprehensive list of mutations.
- Customizable cohorts means users can interrogate the cases that are relevant to their work. This is my favorite feature. Researchers need this flexibility because cancers are better classified by molecular features than by tissue of origin or histology in many cases. Researchers can build a custom cohort based on tissue type, gene, mutation, and patient features using the Cases filter menu.
Help Us Help the Research Community
We’re very proud that the DAVE release has generated a lot of excitement from the research community. “I really like the way DAVE enables access to different aspects of data per cohort, patient, gene, and variant with different pages linking to each other,” said JianJiong Gao, lead Scientist for Knowledge Systems at Memorial Sloan Kettering Cancer Center who participated in user acceptance testing. “DAVE is a great start, but there [is] a lot to be done in order to serve the diverse user base in the research community,” he continues.
Indeed, the recent DAVE release is just the beginning, and the GDC team is working continuously to build a transformative research tool. We have recently added features, such as uploading user-defined gene sets and incorporating longitudinal data into the GDC model. We also have major plans for additional tools related to gene expression, alternative splicing, copy number analyses, further integration with external databases, and much more.
We truly want GDC data and resources to help as many people as possible, from cancer biologists to clinicians to patients. In addition, the GDC data model and interface are being adopted by other initiatives, for example to support investigations of the microbiome. Going forward, input from the community will play an important role in shaping our software development, and we welcome suggestions for new features from GDC users. With your help and feedback, we hope DAVE will become an invaluable resource that changes how genomic data is utilized in cancer research and beyond.
Please send questions and comments to firstname.lastname@example.org.