Skip to main content
An official website of the United States government

Training Guide Library

In a hurry and need quick instructions on the cancer data science lifecycle stages? Browse this list of guides and resources.

Data Generation and Collection

Identify and gather the data you need to address a problem.

  • Beginner
  • Advanced
    • [Training Video Recording] WebMeV | Discover how to use this intuitive, web-based, bioinformatics analysis toolkit designed for non-bioinformaticians. This walkthrough includes steps on how to upload data files, run a single-cell analysis (using the tools available within the toolkit), and how to navigate/create public data sets available within WebMeV.

Data Cleaning

Fix discrepancies and handle missing values in your data.

  • Beginner

Data Exploration and Analysis

Study your data, then form a hypothesis.

  • Beginner
    • [Article] Exploring and Analyzing Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
    • [Article] How to Use the Cancer Data Aggregator | Read about this resource and how it can help you in your search for data across NCI's Cancer Research Data Commons.
    • [Article] Copy Number Variation (CNV) Calling: The Basics | Get tips on generating CNV data and how to choose the right tool.
    • [Article] Spatial Transcriptomics (ST): The Basics | Describes the basics of using ST for cancer research, including spotlighting an NCI-funded tool that doesn’t require specialized skills in bioinformatics.
    • [Training Video Recording] WebMeV | Receive a demonstration on how this web-based software for genomic data analysis can upload data and perform various analyses such as normalization, clustering, and principal component analysis.
    • [Training Video Recording] XNAT | Learn about this open source imaging informatics software platform that enables data ingestion, curation, annotation, quality control, and computational workflows using Docker containers.
    • [Training Video Recording] User-Friendly Analysis of Spatial Transcriptomics with spatialGE | Learn more about this user-friendly web application that integrates the spatial R package. This package, enhanced with additional ST analysis methods (such as SpaGCN, STdeconvolve, and InSituType), makes it more valuable for the cancer research community.
    • [Training Video Recording] RNAseq Data Analysis in Qlucore | Import and analyze RNA-sequencing data in Qlucore Omics Explorer—software that visualizes the data in 3D plots and can help you identify hidden structures and patterns.
    • [Training Video Recording] An Introduction to Bioconductor for Genomic Data Science | Learn the basics of integrative data containers for genome-scale experiments and components of analytic workflows for transcriptomics and epigenetics. You will also discover resources for annotation of genomic data.
    • [Training Video Recording] Introduction to FlowJo™ Software | Learn about FlowJo’s workspace—including how to load files, evaluate sample quality, draw gates, and generate tabular and graphical layouts to perform single-cell flow cytometry analysis.
    • [Training Video Recording] QIAGEN Ingenuity Pathway Analysis (IPA) Webinar | Learn how to analyze large data sets (including RNA-sequencing and proteomics) using QIAGEN IPA to perform interactive core and pathway analyses.
  • Advanced

Predictive Modeling

Use computational tools like machine learning models to make predictions with your data.

Data Visualization

Communicate your data findings using interactive images, plots, and charts.

  • Beginner
  • Advanced
    • [Blog] Visualizing Data Using Circular Heatmaps and Biplots—Pro-Tips From NCI Researchers | Discover how to use these plots and why they are valuable.
    • [Training Video Recording] Data Visualization with R | Learn how to use the ggplot2 package in the programming language R to graph plots that can form the basis of analysis. Note: This video is one of six videos that make up a course series exclusive to NCI staff and provided by the Bioinformatics Training and Education Program. In this recording, R Studio is accessed via DNAnexus.
    • [Training Video Recording] DNASTAR Lasergene Software | Learn about this software and its applications in molecular biology, including topics such as enzyme labels, primer design, cloning processes, construct analysis, and clone verification using Sanger sequencing.
    • [Training Video Recording] Next-Generation Clustered Heat Maps (NG-CHMs) | Learn how NG-CHMs can help you navigate large omic databases, zoom in on patterns, access external metadata resources, produce high-resolution graphics, and save metadata for later use. NG-CHMs play a valuable role in NIH projects, encompassing phenotypic and genotypic data at DNA, RNA, protein, and metabolite levels in bulk and single-cell studies.

Data Sharing

Accelerate discovery by making your data available to others.

Share Ideas

Help us help you! If you believe content is missing or needs modifying, email NCI CBIIT.

  • Updated:

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “Training Guide Library was originally published by the National Cancer Institute.”

Email