Category | Data Collection Name | Description |
---|---|---|
Biospecimens | Biospecimen Research Database (BRD) | BRD is a free, publicly accessible literature database that contains peer-reviewed primary and review articles and Standard Operation Procedures (SOPs) in the field of human biospecimen science.
You can find SOPs in a system with Biospecimen Evidence Based Practices (BEBP). |
Cancer Screening Trial | Cancer Data Access System (CDAS) | CDAS is a submission and tracking system for the National Lung Screening Trial (NLST) data and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial data.
|
Clinical | Clinical and Translational Data Commons (CTDC) | CTDC provides access to a vast array of clinical and translational data from NCI-funded clinical trials, correlative studies, and interventional studies. While some August 2024 Update: |
Clinical | NCI National Clinical Trials Network (NCTN)/Community Oncology Research Program (NCORP) Data Archive | This centralized, controlled-access After approval of a signed Data Use Agreement (DUA), you can download patient level clinical |
Clinical | NCTN Biobanks | Make a request for well-annotated biospecimen samples, derived from phase II and phase III NCTN clinical trials. For your secondary research studies, use the newly available “NCTN Biospecimen Catalog” for comprehensive data searches. |
Clinical, Genomics | Personalized Cancer Therapy | The Personalized Cancer Therapy website is a tool for physicians and patients to assess potential therapy options based on specific tumor biomarkers. The focus is on the potential therapy strategies for tumors harboring certain genomic alterations regardless of disease site. The available data includes:
|
Drug Discovery | CellMinerCDB: National Center for Advancing Translational Sciences (NCATS) | The NCI Center for Cancer Research developed this |
Drug Discovery | NCI Panel of 60 Human Tumor Cell Lines (NCI-60) | You can find/analyze NCI-60 in CellMiner. NCI’s Developmental Therapeutics Program used this panel of 60 diverse human cancer cell lines to screen over 100,000 chemical compounds and natural products since 1990. You can download gene expression data files from NCI-hosted FTP sites:
|
Epidemiology | DCCPS Cancer Epidemiology Descriptive Cohort Database (CEDCD) | Use this database for information about cohort studies that follow groups of persons over time for cancer incidence, mortality, and other health outcomes. Information includes:
You can access CEDCD on the Division of Cancer Control and Population Sciences (DCCPS) website. |
Epidemiology | Surveillance, Epidemiology and End Results (SEER) database | SEER collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 50% of the U.S. population. The SEER
You can access SEER statistics and the DCCPS data resources on the DCCPS website. |
Epidemiology | SEER - CAHPS | The SEER-CAHPS data resource links data from NCI’s Surveillance, Epidemiology and End Results (SEER) cancer registry program, the Centers for Medicare & Medicaid Services’ (CMS) Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS®) patient experience surveys, and longitudinal Medicare claims data on utilization and costs of care for Fee-For-Service beneficiaries. You can request data access by emailing the required documents to NCISEERCAHPS@nih.gov. |
Epidemiology | SEER - MHOS | The SEER-MHOS You can email the required documents to SEER-MHOS@hcqis.org to request access to the data. |
Genomics | All of Us Researcher Workbench | You can register to access data and tools including whole genome sequencing (WGS) and genome-wide genotyping data. The NIH-wide All of Us initiative collects this data. |
Genomics | NCI Genomic Data Sets Available in Database of Genotypes and Phenotypes (dbGaP) | NCI developed dbGaP to archive and distribute the data and results from studies that investigated the interaction of genotype and phenotype in humans. You can request July 2024 Update: Request access to recently released data referenced in the study, “Childhood Cancer Data Initiative (CCDI): Single-Cell Atlas of NF1 Nerve Sheath Tumors.” The data includes 63 clinically annotated NF1-associated peripheral nerve sheath tumors. |
Genomics | Genomic Data Commons (GDC) | GDC provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG) including: |
Genomics | Integrated Canine Data Commons (ICDC) | ICDC provides the cancer research community with data that enables a comparative analysis between human and canine cancers. You can explore the |
Genomics | Cancer Genome Characterization Initiative (CGCI) | CGCI researchers develop and apply advanced sequencing and other genome-based methods to identify novel genetic abnormalities in both adult and pediatric cancers. The genetic profiles inform better cancer diagnosis and treatment.
|
Genomics | Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis, I-SPY1 | The I-SPY 1 TRIAL sought to identify indicators of response to neoadjuvant chemotherapy that predict survival in women with high-risk breast cancer.
|
Genomics | Molecular Targets for Cancer | Researchers have measured thousands of molecular targets in the NCI panel of 60 human tumor cell lines. You can search for or browse through a list of targets.
|
Genomics | NCI Brain Neoplasia Data (Rembrandt Database) | NCI Brain Neoplasia Data (Rembrandt
|
Genomics | TARGET: Therapeutically Applicable Research to Generate Effective Treatments | TARGET applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers. The goal is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications. The TARGET data matrix includes:
|
Genomics | The Cancer Genome Atlas (TCGA) | The Cancer Genome Atlas (TCGA) is a comprehensive effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. Through the TCGA Data Portal, you can search, download, and analyze data from over 30 different types of cancer. It contains:
|
Genomics | The NCI Director’s Challenge Adenocarcinoma Lung Study | A large, training, testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The study looked at whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) predicted overall survival in lung cancer subjects. |
Imaging | Imaging Data Commons (IDC) | IDC provides the cancer research community with a cloud-based repository of cancer imaging data, image annotations, and analysis results. IDC uses the Digital Imaging and Communication in Medicine (DICOM) standard to harmonize the data. You can find imaging data from the following NIH/NCI projects:
January 2025 Update: The repository now includes DICOM-converted whole hematoxylin and eosin-stained images from CCDI’s Molecular Characterization Initiative. It offers molecular testing to newly diagnosed children, adolescents, and young adults with central nervous system tumors, soft tissue sarcomas, some rare cancers, and certain neuroblastomas treated at a Children’s Oncology Group-affiliated hospital. |
Imaging | The Cancer Imaging Archive (TCIA) | TCIA is a curated archive of medical images that you can download. It includes data from the National Lung Screening Trial (NLST) and many subjects from The Cancer Genome Atlas (TCGA). You can find data divided into collections and grouped by common cancer types or research aims. You can also search these collections by modality, anatomic location, or various acquisition parameters. You can access pathology imaging, patient demographics/outcomes, expert-derived segmentations/annotations, genomics, and other available supporting data. |
Imaging | SLICE-3D | Use this data set, which contains more than 400,000 skin lesion image crops extracted from 3D total body photography (TBP), for skin cancer detection. Metadata entries include the following:
|
Multiple | General Commons (GC) | GC provides you with data storage and sharing capabilities for NCI-funded studies that meet particular requirements. You can find a variety of data types (the current majority is genomic and imaging data) in the GC that are both open and |
Nanomaterial Characterizations | caNanoLab | caNanoLab includes over 1,000 curated nanomaterials relevant in cancer with detailed characterizations and associated nanotechnology protocols and publications
|
Networks | The Network Data Exchange (NDEX) | NDEx allows you to share, store, manipulate and publish biological network information. The project maintains a public NDEx server and is a joint effort of the UC San Diego School of Medicine and the Cytoscape Consortium.
|
Pediatric, Adolescent, and Young Adult (AYA) | Childhood Cancer Data Initiative (CCDI) Childhood Cancer Data Catalog (CCDC) | The CCDI CCDC is an inventory of pediatric oncology data resources. This includes childhood cancer repositories, registries, knowledge bases, and catalogs that either manage or refer to data. May 2025 Update: The Catalog now includes the “Childhood Cancer Catalog of Extrachromosomal DNA” data set and resource. Also, be sure to check out the new data set from pediatric in vivo models, along with updates to these existing resources:
|
Pediatric, Adolescent, and Young Adult (AYA) | CCDI Hub Resources | CCDI Hub Explore Dashboard: This integrated tool provides you with the search functionality to connect participants with files and samples. It enables you to find data within a single study or across multiple studies, and create synthetic cohorts based on filtered metrics of interest (i.e. demographics, diagnosis, samples). March 2025 Update: The Hub now features a modified data model and enhanced tools for browsing and selecting cohorts. An updated “Explore Dashboard Participants” table means customizable columns, a feature for creating cohorts, and a map of associated information from the Cancer Participant Index. New TARGET
CCDI Molecular Target Platform (MTP): Use this tool to browse and identify associations between molecular targets, diseases, and drugs specific for childhood cancers. Childhood Cancer Clinical Data Commons (C3DC): This open access, web application allows you to find harmonized demographic and clinical data, create custom cohorts, and download data for local analysis. March 2025 Update: C3DC’s latest release features new and expanded CCDI Data Federation Resource: Search for de-identified individual-level data through the API, which provides an |
Proteomics | Proteomic Data Commons (PDC) | You can get
April 2025 Update: PDC released additional CPTAC Pan-Cancer analysis data, including both proteome and phosphoproteome data from the “PTRC Triple-negative Breast Cancer Mitotic Vulnerability Study.” |
Proteomics | The Clinical Proteomic Tumor Analysis Consortium (CPTAC) | CPTAC analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs). |
Target Discovery | Cancer Target Discovery and Development (CTD2) | CTD2 bridges the gap between the enormous volumes of data generated by genomic characterization studies and the ability to use these data for the development of human cancer therapeutics. It specializes in using computational and functional genomic approaches to translate next-generation sequencing data, and data from high-throughput and high content small molecule and genetic screens, into clinical applications. All data generated are
|
NCI Data Catalog
The NCI Data Catalog is a listing of data collections resulting from major NCI initiatives and other widely used data sets. Data collections in the catalog meet the following criteria:
- Products of NCI intramural researchers or major NCI initiatives, or regularly referenced NCI-funded extramural research data
- Available to all researchers and may be Open or Controlled Access (requiring approval by a Data Access Committee)
- Well documented and available for download
Although this is not a comprehensive listing of data sets available from NCI, you can expect quarterly updates.