What is Data Visualization?
Data visualization is the representation of data in the form of elements such as charts, pictures, networks, etc.
Whether you’re working with data that you’ve collected yourself (which you want to present in a way that makes sense) or you want to explore existing data to form new hypotheses, data visualization can help!
At this stage in the data science lifecycle, you choose from a variety of available tools to gain a deeper understanding of the data you’re working with and clearly communicate insights with others.
Why is Data Visualization Important for Cancer Research?
Effectively analyzing data and sharing research results is essential to advancing cancer research. By embracing data visualization, you and many other individuals across the cancer research and care continuum can better analyze and understand diverse data.
Data visualization can:
- clarify complex or large data.
- generate broader interest from the research community.
- improve analysis.
- tell a story.
- identify relationships between data.
- reveal trends.
- communicate results.
- enable insights.
- reveal outliers.
What Do I Need to Know?
Data Visualization Concepts
You’ll want to know the basics of some of the most popular charts and how they’re used in data visualization for cancer research!
Example Graphic | Definition | Example of Use in Cancer Research Data Visualization |
---|---|---|
Bar Chart | Bar Chart: Uses horizontal or vertical bars to show discrete, numerical comparisons across categories. One axis of the chart shows the categories, and the other is a discrete value scale. | The bar chart is a frequently used chart type. Bar charts are good for use with genomics data where you want to know if the expression of a gene is up, down, etc. |
Gannt Chart | Gantt Chart: Displays a list of activities or tasks with their duration over time for organizational purposes. | Gantt charts are useful when presenting timelines in a grant proposal or funding request. |
Heat Map | Heat Map: Visualizes data through variation in coloring applied to a tabular format. | Heat maps are good for showing value across multiple variables to reveal patterns. This graphic is common for visualizing genomics data. |
Histogram Diagram | Histogram: Visualizes the distribution of data categories within a continuous interval. | A histogram may be useful to compare age range data for your cancer research (e.g., adults 18–25, adults 26–35, etc.). |
Network Diagram | Network Diagram: Shows how things are interconnected by linking nodes of data with lines to represent their connections. | A network diagram can help analyze relationships between cancer occurrences in various communities. |
Pie Chart | Pie Chart: Breaks a circle into segments to illustrate proportions and percentages between categories. | Pie charts can be helpful for showing population data. |
Scatter Plot | Scatterplot: Places points on a Cartesian Coordinates system to show the relationship between two sets of data. | The scatterplot is a frequently used chart type. Once you make a scatterplot, you can draw a curve through the datapoints using a mathematical formula. You might use this chart for dose response curves. |
Fundamental Tips for Effective Data Visualization
- Be clear about your purpose. What do you want to learn about the data? How will you use visualization to explore or explain data? Does the visual clearly depict your message?
- Know what resources are available. There are many platforms and/or tools you can use to perform your visualization, such as the tools available in the NCI Cancer Research Data Commons (CRDC).
- Selecting the right tool is key. The graph, chart, or image should clearly convey the results to the audience.
- Use more than one visualization approach to fully address your research question, when it makes sense. Make sure all the pieces fit together to give you useful information.
- Be mindful when preparing your data. Prepare the data according to the tool you’ve selected and the type of visual you’re creating.
- Keep it straightforward. Don’t overcomplicate the data or the visual.
NCI Data Visualization Resources and Initiatives
Now that you have a sense of the basics, use the following resources to discover more about the topic and understand NCI’s investment in this stage of the
Recurring Events
- DataViz + Cancer: Supported by Cancer Moonshot℠, this event series explores the intersection of data visualization and cancer research. Check out past event recordings, as well as upcoming micro-labs.
- NCI Emerging Technologies Seminar Series: Discover novel technologies supported by NCI awards that seek to transform cancer research and clinical care. These seminars may spotlight additional tools that can be used in visualization efforts.
Resources and Tools
- NCI’s CRDC: This infrastructure allows you access to a comprehensive collection of cancer research data, as well as visualization tools to analyze the data within many of the data portals or by accessing the data through cloud resources. Within CRDC, researchers can access various types of data and relevant visualization tools (look specifically at the Genomic, Imaging, Integrated Canine, and Proteomic Data Commons).
- 3DVizSNP: Use this tool to visually evaluate large numbers of missense mutations in three-dimensional structural context. The tool enables rapid screening of mutations taken from a variant caller format file using the iCn3D protein structure and sequence viewing platform.
- Minerva: This is a light-weight, narrative image browser for multiplexed tissue images.
- UpSetR: This is an R package to generate UpSetR plots. This technique visualizes set intersections in a matrix layout.
- UCSC Xena: Use this online exploration tool for public and private multi-omic and clinical/phenotype data.
Blogs
- Visualizing Genetic Mutations in Three-Dimensions—Pro-Tips from a Structural Biology Perspective: Learn how the 3DVizSNP tool can help you visualize genetic data in a three-dimensional format.
- Visualizing Data Using Circular Heatmaps and Biplots—Pro-Tips from NCI Researchers
:Drs. Arashdeep Singh and Sridhar Hannehalli explain what circular heatmaps and biplots are and how to use them.
Publications
- GenomicSuperSignature Facilitates Interpretation of RNA-seq Experiments Through Robust, Efficient Comparison to Public Databases. Nature Communications, 2022. | Explore this method for interpreting new transcriptomic data sets through comparison to public
data sets without high-performance computing requirements. - Longitudinal Collection of Patient-Reported Outcomes and Activity Data during CAR-T Therapy: Feasibility, Acceptability, and Data Visualization. Cancers, 2022. | See an example of how researchers integrated data visualization into their study.
Additional Data Visualization Resources
Keep an eye on the NIH Library Course Catalog for upcoming courses on data visualization. Here, we’ve highlighted some of the classes in which you may be interested.
- Principles of Effective Data Visualization: This class will provide an overview of how to construct data visualizations and how to create visualizations that are appealing and informative.
Next
- Ready to start your project? Get an overview of the lifecycle and what you should do in each stage.
- Want to learn the basic skills for cancer
data science ? Check out our basics skills video course. - Need answers to
data science questions? Visit our Training Guide Library.