Project Overview
For almost 10 years, NCI and the U.S. DOE engaged in a strategic, interagency collaboration to simultaneously accelerate advances in precision oncology and advanced scientific computing, including the use of artificial intelligence and machine learning (AI/ML).
The NCI-DOE Collaboration was part of the Cancer MoonshotSM, dedicated to ending cancer as we know it.
Outputs and Impact
The Collaboration's Projects
NCI and DOE fostered a growing, predictive oncology community via four collaborative projects.
AI-Driven Multi-Scale Investigation of the RAS/RAF Activation Lifecycle (ADMIRRAL) Project
Summary
ADMIRRAL (originally named “Pilot 2 or the Molecular Level Pilot”) aimed to develop a more comprehensive, mechanistic understanding of RAS-RAF-driven cancer initiation and growth. The intent was to develop more effective treatments targeting RAS. Combining ML, molecular dynamics, high performance computing, and experimentation, the project involved delineating large-scale domain rearrangement (with molecular resolution) of the RAS-RAF complex and simulating the activation of RAF kinase. In short, ADMIRRAL used molecular dynamics coupled with AI to develop effective strategies for treating RAS-driven cancers.
Code Repository
Visit the MuMMI Github to access the Multiscale Machine-learned Modeling Infrastructure methodology developed by ADMIRRAL to study the interaction of active KRAS with the plasma membrane on large time and length scales.
Select Publications
- Computational Lipidomics of the Neuronal Plasma Membrane. Biophysical, 2017.
- Flux: Overcoming Scheduling Challenges for Exascale Workflows. FGCS, 2020.
- Machine Learning–driven Multiscale Modeling Reveals Lipid-dependent Dynamics of RAS Signaling Proteins. PNAS, 2022.
- DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations for Protein Folding. IEEE, 2019.
- Capturing Phase Behavior of Ternary Lipid Mixtures /with a Refined Martini Coarse-Grained Force Field. Journal of Chemical Theory and Computation, 2018.
- A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RAS Initiation Pathway for Cancer. Conference Paper, 2019.
- Deep Clustering of Protein Folding Simulations. BMC Bioinformatics, 2018.
Innovative Methodologies and New Data for Predictive Oncology Model Evaluation (IMPROVE) Project
Summary
IMPROVE (originally named “Pilot 1” or the “Cellular Level Pilot”) helped address challenges in data-driven modeling for predicting cancer drug response. The project team did this by:
- establishing a framework for comparing and evaluating prediction models.
- enhancing ML models through novel data integration.
Code Repository
Visit the IMPROVE GitHub for more information on the framework. You can also find Cellular Level Pilot software tools on GitHub (e.g., Learning Curves, Enhanced Co-Expression Extrapolation, and Autoencoder Node Saliency).
Additionally, you can find the drug response prediction models (e.g., Uno, Combo), classification models (e.g., TULIP), software, and data sets on the NCI Predictive Oncology Model and Data Clearinghouse website.
Select Publications
- Improving FAIRness of Computational Approaches for Cancer: The Computational Resources for Cancer Research Portal. American Association for Cancer Research, 2024.
- Graph Convolutional Networks for Drug Response Prediction. IEEE/ACM, 2022.
- Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network. BMC Bioinformatics, 2019.
- Toward Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-Based Convolutional Encoders. Molecular Pharmaceutics, 2019.
- Converting Tabular Data into Images for Deep Learning with Convolutional Neural Networks. Nature, 2021.
- A Cross-Study Analysis of Drug Response Prediction in Cancer Cell Lines. Briefings in Bioinformatics, 2022.
- Pandemic Drugs at Pandemic Speed: Infrastructure for Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers. The Royal Society, 2021.
Modeling Outcomes Using Surveillance Data and Scalable AI for Cancer (MOSSAIC) Project
Summary
MOSSAIC developed AI solutions—including natural language processing, foundation models, and multimodal algorithms—to facilitate near real-time cancer surveillance through the NCI SEER program. The project also developed resources for both creating AI-ready cancer data and extracting structured data from clinical text documents.
Code Repository
Access MOSSAIC repositories via GitHub:
- Batch-processing Abstraction for Raw Data Integration (BARDI)
- The Framework for Exploring Scalable Computational Oncology (FrESCO)
- Whole-slide-image (WSI) Informative Slide Selection
- MOSSAIC NNCR Treatment Database
Select Publications
- Limitations of Transformers on Clinical Text Classification. IEEE, 2021.
- Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. American Association for Cancer Research, 2019.
- Hierarchical Attention Networks for Information Extraction from Cancer Pathology Reports. JAMIA, 2017.
- Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports. IEEE, 2018.
- Using GANs with Adaptive Training Data to Search for New Molecules. Journal of Cheminformatics, 2021.
- Class Imbalance in Out-of-Distribution Datasets: Improving the Robustness of the TextCNN for the Classification of Rare Cancer Types. Journal of Biomedical Informatics, 2022.
- A Pre-Training and Self-Training Approach for Biomedical Named Entity Recognition. PLOS, 2021.
Accelerating Therapeutics for Opportunities in Medicine (ATOM) Public-Private Partnership
Summary
ATOM sought to accelerate drug discovery by developing an open-source platform integrated with AI, high performance computing, and biomedical data. The ATOM Modeling Pipeline (AMPL)—an open-source, modular, and extensible software pipeline—enabled both advanced and emerging ML approaches for creating FAIR (findable, accessible, interoperable, and reusable) computational models. It extended the functionality of the open-source library DeepChem. ATOM employed active learning to identify and optimize new compounds to satisfy multiple pharmaceutical parameters concurrently.
Code Repository
Access the AMPL repository on GitHub for:
- instructions on how to install and run AMPL on your computer.
- tutorials on AMPL’s features.
Select Publications
- Rethinking Drug Design in the Artificial Intelligence Era. Nature, 2019.
- Improved Protein-Ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference. Journal of Chemical Information and Modeling, 2021.
- Artificial Intelligence and Pharmacometrics: Time to Embrace, Capitalize, and Advance? CPT, 2019.
- Enabling Rapid COVID-19 Small Molecule Drug Design through Scalable Deep Learning of Generative Models. Sage Journals, 2021.
- Machine Learning Models to Predict Inhibition of the Bile Salt Export Pump. Journal of Chemical Information and Modeling, 2021.
- High-Throughput Virtual Screening of Small Molecule Inhibitors for SARS-CoV-2 Protein Targets with Deep Fusion Models. ACM, 2021.
- Reimagining Dots and Dashes: Visualizing Structure and Function of Organelles for High-Content Imaging Analysis. Cell Chemical Biology, 2021.
An Interagency Effort
The interdisciplinary projects under the NCI-DOE Collaboration were led jointly by NCI and DOE, with representation from the agencies’ Federally Funded Research and Development Centers:
- NCI’s Frederick National Laboratory for Cancer Research (FNLCR)
- Argonne National Laboratory (DOE)
- Lawrence Livermore National Laboratory (DOE)
- Los Alamos National Laboratory (DOE)
- Oak Ridge National Laboratory (DOE)
Multiple NCI divisions and centers provided leadership and subject matter expertise for the NCI-DOE Collaboration projects:
- The Center for Biomedical Informatics and Information Technology (CBIIT), in conjunction with NCI’s FNLCR, provided overarching strategic direction, program management, engagement, cross-disciplinary workshops, and leadership in community building and development of new collaborative research areas.
- The Division of Cancer Control and Population Sciences (DCCPS) and the SEER Program provided scientific leadership, subject matter expertise, and data for the MOSSAIC project.
- The Division of Cancer Biology (DCB) provided leadership and subject matter expertise for the IMPROVE project.
The Collaboration’s Infrastructure
NCI and DOE built their interdisciplinary projects on an open-source software platform and a public repository.
- CANcer Distributed Learning Environment (CANDLE): Provides deep learning methodologies for accelerating cancer research. You can access CANDLE via GitHub as well as on the NIH Biowulf cluster.
- NCI Predictive Oncology Modeling and Data Clearinghouse (MoDaC): Although the NCI-DOE Collaboration is finished, you can still access computational resources, data sets, and associated models via MoDaC. You can download and use annotated data sets and models in this publicly searchable resource.
Learn More
If you have additional questions about the past project or its outputs, email NCI CBIIT.