Introducing DAVE: Online Analysis Tools for the Genomic Data Commons
June 29, 2017, by Louis Staudt, M.D., Ph.D.
In the year since it was launched, the GDC has collected and harmonized a vast quantity of cancer genomics data—more than 4.5 petabytes—which has been fundamental in the recent progress against cancer and holds the promise for continued improvement in our ability to diagnose, treat, and care for patients.
But our vision for the GDC has been much grander than creating a data repository; we have been building a foundation for a knowledge system with the power for in-depth analyses that will extend the reach and utility of the data to a wide community of scientists. Today, in transforming the GDC into this knowledge system with DAVE, we also hope to transform the research community.
Making the GDC Accessible and Usable for Many
The growth in cancer genomics has been one of the most exciting scientific and technological developments in cancer research, spurring significant advances in patient care and laying the groundwork for many future advances.
As a data-sharing platform, the GDC helps to standardize data collected from a diversity of patients and tumor types. The end result is a catalog of harmonized genomic and clinical data, all of which must meet rigorous quality standards and undergo processing using the latest methods. This repository of accurate and robust data will continue to grow as NCI and the research community submit new data to it.
Now, as a data-analysis system, the GDC is taking major steps toward engaging the broader research community and encouraging further collaboration and data sharing. Central to NCI’s decision to make such a large trove of data available to the public was the understanding that NCI and its grantees do not have all the skills, tools, and knowledge necessary to unearth every hidden gem in the data. To make diverse discoveries about an exceedingly complex disease, we need great scientific minds across different disciplines—from laboratory scientists to statisticians to drug developers—to work together.
DAVE helps the GDC fulfill its mission of making research data widely accessible and usable by bringing the information technology infrastructure required for downloading, storing, and analyzing big data directly to researchers, making it easy for anyone in the cancer research community to work with the data available in the GDC. DAVE also makes it easier for experts in diverse areas of biology and other disciplines to incorporate cancer genomic data into their research.
DAVE: Data Analysis, Visualization, and Exploration
DAVE is a new web interface for exploring and analyzing cancer genomic data, in real time, online, without the need to download or process the data. Researchers can navigate from project cohorts to individual patients, to specific genes and mutations of interest. DAVE includes specialized graphs to help researchers visualize genomic “signatures” of cancer and identify potential drivers of disease. Users can also plot patient survival curves and identify the molecular consequence of a mutation on the resultant protein.
Notably, DAVE provides an unprecedented level of flexibility in exploring the data. Researchers can create custom cohorts for analysis by selecting patients with particular altered genes or other relevant biological and clinical features. And researchers are no longer bound to analyzing patients only in the context of their original project cohorts—a powerful innovation given the recent evidence that a tumor’s molecular features are far more accurate and informative for cancer subtyping than tissue of origin or histology.
This level of customization allows researchers to dive deeply into their area of expertise to answer a host of research questions.
The Future of DAVE
Importantly, NCI and GDC staff will work with the community to develop additional DAVE tools and features, and the response from the community will play an important role in how this resource evolves.
We encourage your feedback on the function and usefulness of the research tools included in this initial release of what we believe will be a valuable resource. We’re hopeful that many in the research community will use the GDC and the DAVE tools to answer important research questions that will, in the end, help to rapidly improve our understanding of cancer and, ultimately, benefit patients.