Skip to main content
An official website of the United States government

Analysis Tools

Working across NCI, our team is developing new approaches and technologies for using data to predict, diagnose, and treat cancer. We’ve supported the following informatics and data science tools available for all researchers.

ToolDescriptionCategoryExperience LevelProject
ATOM Modeling PipeLine (AMPL)

AMPL is an open-source pipeline for building machine learning models on molecular data for in silico drug discovery.

You can use this tool to:

  • train models to predict chemical properties and drug response.
  • support cancer drug discovery workflows.
  • standardize and share modeling workflows across teams.

Code Repository: Available 
Origin Project: Accelerating Therapeutics for Opportunities in Medicine (ATOM) | NCI-Department of Energy (DOE) Collaboration

ModelingAdvancedNCI-DOE
FORGEdb

FORGEdb is an online platform with API designed to help researchers interpret genetic variants associated with disease at the tissue and cell type level.  

You can use this tool to:

  • link variants to regulatory features (e.g., enhancers, promoters, DNase 1 hypersensitivity peaks).
  • identify likely target genes and mechanisms.
  • prioritize variants for lab testing through FORGEdb score (calculated based on the amount of evidence suggesting the variant plays a regulatory role).

Code Repository: Available 

Exploration/AnalysisLow/No-code 
Multi-omics Pathway Workflow (MOPAW)

MOPAW is a fully automated, multi-omics pathway for exploring specific genes and pathways associated with different types of cancer. The tool is currently available through the Cancer Genomics Cloud (CGC).

You can use the tool to:

run multi-omics analysis with your own or publicly available data sets on the CGC (e.g., The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium).

identify unique pathways for sample groups using multivariate single sample gene set analysis (MOGSA).

Code Repository: Not available

Exploration/AnalysisLow/No-code 
Multiscale Machine-Learned Modeling Infrastructure (MuMMI)

MuMMI is a machine learning–driven molecular dynamics simulation framework for studying protein–membrane interactions across scales.

You can use this tool to:

  • simulate how cancer-related proteins (e.g., RAS) interact with cell membranes.
  • link molecular- and macro-scale models to study signaling dynamics.
  • generate large simulation data sets to support hypothesis testing and modeling.

Code Repository: Available
Origin Project: Artificial Intelligence-Driven Multiscale Investigation of the RAS/RAF Activation Lifecycle (ADMIRRAL) | NCI-DOE Collaboration

ModelingAdvancedNCI-DOE
OmicCircos

OmicCircos is an R based application that helps you manage, analyze, and visualize omics data with high-quality circular plots.

You can use the tool to:

  • create circular plots that integrate multi-omics data.
  • compare data across chromosomes or samples.
  • spot gene variation and expression.

Code Repository: Available

VisualizationIntermediate 
Rapid 3D Visualization of SNP Mutations (3DVizSNP)

3DVizSNP is an online platform that visualizes protein structure three-dimensionally from a single variant calling file (VCF).

You can use the tool to:

  • identify mutations within a single amino acid.
  • explore mutant versus wild-type mutations.
  • sort, filter, and prioritize variants for lab testing.

Code Repository: Available

VisualizationLow/No-code 
RNA-Seq Latent Featurizer Using Center Loss Cost Function (CLRNA)

CLRNA is a python package providing a deep learning algorithm researchers can use to learn about generalized gene expressions for drug response.

You can use this tool to:

  • reduce noise and batch effects in gene expression data.
  • generate features that improve drug response modeling.
  • leverage labeled and unlabeled data to strengthen analysis.

Code Repository: Available
Origin Project: Modeling Outcomes Using Surveillance Data and Scalable Artificial Intelligence for Cancer (Pilot 1 Cellular Level Pilot) | NCI-DOE Collaboration

Exploration/AnalysisAdvancedNCI-DOE
scCorr

scCorr is an R package to improve gene to gene correlation estimates in single-cell RNA sequencing data by reducing the impact of zero values that can bias your analyses.

You can use the tool to:

  • compare multiple gene pairs to see which ones are co-expressed in specific tumor types.
  • reduce the noise from “dropouts” values.
  • use alongside other single-cell tools.

Code Repository: Available

Exploration/AnalysisAdvanced 
Synthetic Data Generator (SYNDATA)

SYNDATA is a suite of machine learning tools for creating realistic synthetic clinical text data.

You can use this tool to:

  • generate data sets when real patient data is limited.
  • train and test models while protecting patient privacy.
  • simulate scenarios to evaluate model performance.

Code Repository: Available
Origin Project: Modeling Outcomes Using Surveillance Data and Scalable Artificial Intelligence for Cancer (Pilot 3 Population Level Pilot) | NCI-DOE Collaboration

ModelingAdvancedNCI-DOE
  • Updated:
Email