Developer Interview: Vincent Ferretti
November 21, 2016, by Amy E. Blum, M.A.
For this Developer Interview, CCG Communications Manager Amy E. Blum, M.A., spoke with Vincent Ferretti, Ph.D., Co-Principal Investigator for Genomic Data Commons (GDC) from the Ontario Institute for Cancer Research (OICR).
Amy E. Blum: Tell me a little bit about what you do and your role in the GDC.
Vincent Ferretti: I lead a team of bioinformaticians and software developers at OICR, and we design and develop the GDC web interfaces, including the data portal and the data submission system. We collaborate with the University of Chicago, who works on the “back end” of the data and we focus on the “front end,” which is what you see on the web.
AEB: How is the GDC data portal organized? And how does its organization improve the user experience?
VF: Right now the GDC has over 270,000 files. The most important consideration in the design is that researchers coming to the GDC need to be able to find the specific files – the genomic and clinical information – that interest them quickly and easily. To facilitate that, we organized the portal into projects, which have a descriptive page about them and the data they contain, and into cases, which contain all of the files associated with a particular patient.
We created a faceted search tool for the GDC portal that provides researchers with a custom list of facets, each of which gives an aggregate view of that variable, and use them to filter their search results with mouse clicks. Although simple, faceted browsing is surprisingly powerful. For example, users can readily formulate complex queries such as “all somatic mutation files called using the MUSE algorithm and generated from whole genome sequencing experiments in female patients with stage I brain cancer.” With another click, users can put the obtained files into a shopping cart, just like they would on an online store, and then download files in the cart.
AEB: What was your creative process for designing the portal?
VF: Our best design tools are use-cases. These are hypothetical research scenarios that get us thinking about what kind of system would facilitate these particular cases. For example, cancer researchers often focus on a particular cancer type, and within that cancer, a modality of genomic analysis. A use-case might be a researcher who wants to download files describing the DNA mutations of all patients with bladder cancer from the TCGA project. We then built the portal around these use-cases.
Another important source of guidance on the portal comes from our contract with the University of Chicago. There are certain functionalities of the portal that are required, and those helped give us a strong foundation from which we can design additional functions.
AEB: What’s next for the GDC portal?
VF: We are working on expanding the search capabilities of the portal as well as incorporating data visualization tools.
To enhance the portal’s searchability, we are annotating individual genes and mutations and incorporating them into the search function, and we are creating a page for every cancer gene and mutation. These pages will be automatically and dynamically populated by information from the GDC, such the rate of a particular mutation across cancers or the most common mutations of a certain gene. This information will be interconnected with all of the other GDC data, making it easy to search, for example, for all lung cancer cases that have particular KRAS mutation, or to view a summary of the KRAS gene.
After finding their particular cohort of interest, we then want researchers to be able to derive insight about that cohort directly from the GDC. To accomplish this, my team is designing data visualization tools to interface with user searches. These tools will provide, for example, a map of the DNA mutations in a particular set of GDC cases, including both what the mutations are and where they occur in the genome. The visualization tools will also integrate different types of genomic data, such as copy number alterations, epigenetics, and gene expression, into a dynamic and readable output. We also want users to be able to visualize the differences between two cohorts. For instance, how head and neck cancers caused by HPV infection differ from HPV-negative cancers in terms of their molecular profiles. Our team began our design by creating use-cases for data visualizations, and we are now implementing tools that will make those types of research questions possible for future GDC users.