| Category | Data Collection Name | Description |
|---|---|---|
| Biospecimens | Biospecimen Research Database (BRD) | BRD is a free, publicly accessible literature database that contains peer-reviewed primary and review articles and Standard Operation Procedures (SOPs) in the field of human biospecimen science.
|
| Cancer Screening Trial | Cancer Data Access System (CDAS) | CDAS is a submission and tracking system for the National Lung Screening Trial (NLST) data and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial data.
|
| Clinical | Clinical and Translational Data Commons (CTDC) | CTDC provides access to a vast array of clinical and translational data from NCI-funded clinical trials, correlative studies, and interventional studies. While some data sets are publicly accessible, others are registered-access and controlled-access tier. |
| Clinical | NCI National Clinical Trials Network (NCTN)/Community Oncology Research Program (NCORP) Data Archive | This centralized, controlled-access database is a repository of de-identified, patient level data from phase III clinical trials conducted by NCI’s NCTN network, NCI’s NCORP network, and the Canadian Cancer Trials Group. After approval of a signed Data Use Agreement, you can download patient level clinical data sets and their associated data dictionaries. |
| Clinical | NCTN Biobanks | Make a request for well-annotated biospecimen samples, derived from phase II and phase III NCTN clinical trials. For your secondary research studies, use the “NCTN Biospecimen Catalog” for comprehensive data searches. |
| Clinical, Genomics | Personalized Cancer Therapy | This web tool is for physicians and patients to assess potential therapy options based on specific tumor biomarkers. The focus is on the potential therapy strategies for tumors harboring certain genomic alterations regardless of disease site. The available data includes:
|
| Drug Discovery | CellMinerCDB: National Center for Advancing Translational Sciences (NCATS) | The NCI Center for Cancer Research developed this database that details how 2,600+ different compounds affect cancer cell growth. NIH NCATS tested 183 cancer cell lines, and you can find drug response data from that work. |
| Drug Discovery | NCI Panel of 60 Human Tumor Cell Lines (NCI-60) | You can find and analyze NCI-60 in CellMiner. Use the three-step process to select your analysis type, select your input format, and provide your email address to receive the results. NCI’s Developmental Therapeutics Program used this panel of 60 diverse human cancer cell lines to screen over 100,000 chemical compounds and natural products since 1990. |
| Epidemiology | DCCPS Cancer Epidemiology Descriptive Cohort Database (CEDCD) | Use this database for information about cohort studies that follow groups of people over time for cancer incidence, mortality, and other health outcomes. Information includes:
You can access CEDCD on the Division of Cancer Control and Population Sciences (DCCPS) website. |
| Epidemiology | Surveillance, Epidemiology and End Results (SEER) database | SEER collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 50% of the U.S. population. The SEER database includes incidence and population data associated by:
You can access SEER statistics and the Division of Cancer Control and Population Sciences (DCCPS) data resources on the DCCPS website. |
| Epidemiology | Surveillance, Epidemiology and End Results (SEER)—Consumer Assessment of Healthcare Providers and Systems (CAHPS®) | The SEER-CAHPS data resource links data from NCI’s SEER cancer registry program; the Centers for Medicare & Medicaid Services’ (CMS’) Medicare CAHPS patient experience surveys; and longitudinal Medicare claims data on utilization and costs of care for Fee-For-Service beneficiaries. You can request data access by emailing the required documents to NCISEERCAHPS@nih.gov. |
| Epidemiology | SEER—Medicare Health Outcomes Survey (MHOS) | This database links data from both NCI’s SEER cancer registry program and the CMS’ MHOS to provide information about the health-related quality of life (HRQOL) of Medicare Advantage Organization (MAO) enrollees. NCI and CMS sponsor the database. You can email the required documents to SEER-MHOS@hcqis.org to request access to the data. |
| Genomics | All of Us Researcher Workbench | You can register to access data and tools including whole genome sequencing (WGS) and genome-wide genotyping data. The NIH-wide All of Us initiative collects this data. |
| Genomics | NCI Genomic Data Sets Available in the Database of Genotypes and Phenotypes (dbGaP) | NCI developed dbGaP to archive and distribute the data and results from studies that investigated the interaction of genotype and phenotype in humans. Follow these instructions to request controlled-access data from over 150 NCI studies. |
| Genomics | Genomic Data Commons (GDC) | GDC provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including: |
| Genomics | Integrated Canine Data Commons (ICDC) | ICDC provides the cancer research community with data that enables a comparative analysis between human and canine cancers. You can explore the open access data within the ICDC portal, and you may analyze the associated data files in the Seven Bridges Cancer Genomics Cloud. |
| Genomics | Cancer Genome Characterization Initiative (CGCI) | CGCI researchers develop and apply advanced sequencing and other genome-based methods to identify novel genetic abnormalities in both adult and pediatric cancers. The genetic profiles inform better cancer diagnosis and treatment. You can access CGCI data through the project data matrix. |
| Genomics | Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis, I-SPY1 | The I-SPY 1 TRIAL sought to identify indicators of response to neoadjuvant chemotherapy that predict survival in women with high-risk breast cancer. You can download gene expression data files from an NCI-hosted FTP site. |
| Genomics | Molecular Targets for Cancer | Researchers have measured thousands of molecular targets in the NCI panel of 60 human tumor cell lines. You can search for or browse through a list of targets.
|
| Genomics | NCI Brain Neoplasia Data (REMBRANDT Data Set) | This data set integrates clinical and functional genomics data from clinical trials involving brain tumor patients. It provides the ability to perform ad hoc querying, reporting, and analysis across multiple data domains (including gene expression, gene copy number, and clinical data). |
| Genomics | TARGET: Therapeutically Applicable Research to Generate Effective Treatments | TARGET applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers. The goal is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications. The TARGET data matrix includes:
|
| Genomics | The Cancer Genome Atlas (TCGA) | TCGA is a comprehensive effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. Through the TCGA Data Portal, you can search, download, and analyze data from over 30 different types of cancer. It contains:
|
| Genomics | The NCI Director’s Challenge (DC) Adenocarcinoma Lung Study | A large, training, testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The study looked at whether microarray measurements of gene expression (either alone or combined with basic clinical covariates like stage, age, and sex) predicted overall survival in lung cancer subjects. The DC Lung Study data set is available for analysis in Gene Expression Omnibus (GEO). |
| Imaging | Imaging Data Commons (IDC) | IDC provides the cancer research community with a cloud-based repository of cancer imaging data, image annotations, and analysis results. IDC uses the Digital Imaging and Communication in Medicine (DICOM) standard to harmonize the data. You can find imaging data from the following NIH/NCI projects: |
| Imaging | The Cancer Imaging Archive (TCIA) | TCIA is a curated archive of medical images that you can download. It includes data from the National Lung Screening Trial (NLST) and many subjects from The Cancer Genome Atlas (TCGA). You can find data divided into collections and grouped by common cancer types or research aims. You can also search these collections by modality, anatomic location, or various acquisition parameters. You can access pathology imaging, patient demographics/outcomes, expert-derived segmentations/annotations, genomics, and other available supporting data. |
| Imaging | SLICE-3D | Use this data set (which contains more than 400,000 skin lesion image crops extracted from 3D total body photography [TBP]), for skin cancer detection. Metadata entries include:
|
| Multiple | General Commons (GC) | GC provides you with data storage and sharing capabilities for NCI-funded studies that meet particular requirements. You can find a variety of data types (the current majority is genomic and imaging data) in the GC that are both open and controlled access. Prior to requesting access though, consider searching and browsing the data via the GC Portal (no login required). |
| Nanomaterial Characterizations | caNanoLab | The caNanoLab includes over 1,000 curated nanomaterials relevant in cancer with detailed characterizations and associated nanotechnology protocols and publications. You can perform web-based queries and download reports for re-use and additional analysis. |
| Networks | The Network Data Exchange (NDEx) | NDEx allows you to share, store, manipulate, and publish biological network information. The project maintains a public NDEx server and is a joint effort of the UC San Diego School of Medicine and the Cytoscape Consortium.
|
| Pediatric, Adolescent, and Young Adult (AYA) | Childhood Cancer Data Initiative (CCDI) Childhood Cancer Data Catalog (CCDC) | The CCDI CCDC is an inventory of pediatric oncology data resources. This includes childhood cancer repositories, registries, knowledge bases, and catalogs that either manage or refer to data. |
| Pediatric, Adolescent, and Young Adult (AYA) | CCDI Hub Resources | CCDI Hub Explore Dashboard: This integrated tool provides you with the search functionality to connect participants with files and samples. It enables you to find data within a single study or across multiple studies, and create synthetic cohorts based on filtered metrics of interest (i.e. demographics, diagnosis, samples). CCDI Molecular Target Platform (MTP): Use this tool to browse and identify associations between molecular targets, diseases, and drugs specific for childhood cancers. Childhood Cancer Clinical Data Commons (C3DC): This open access, web application allows you to find harmonized demographic and clinical data, create custom cohorts, and download data for local analysis. CCDI Data Federation Resource: Search for de-identified individual-level data through the API, which provides an open access subset of the metadata and gives you the location of the complete data set. To determine data access, check the policies for the resource that submitted the data (currently includes the Kids First Data Resource Center, the Pediatric Cancer Data Commons, St. Jude Cloud, and the Treehouse Childhood Cancer Data Initiative). |
| Proteomics | Proteomic Data Commons (PDC) | You can get open access, highly curated, and standardized biospecimen, clinical, and proteomic data from PDC. You can analyze PDC data files using tools found in the CRDC Cloud Resources. Data sets include the following: |
| Proteomics | The Clinical Proteomic Tumor Analysis Consortium (CPTAC) | CPTAC analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers. |
| Target Discovery | Cancer Target Discovery and Development (CTD2) | CTD2 bridges the gap between the enormous volumes of data generated by genomic characterization studies and the ability to use these data for the development of human cancer therapeutics. It specializes in using computational and functional genomic approaches to translate next-generation sequencing data, and data from high-throughput and high content small molecule and genetic screens, into clinical applications. All data generated are open access. The CTD2 Data Portal consists of raw and analyzed primary data. |
NCI Data Catalog
The NCI Data Catalog is a listing of data collections resulting from major NCI initiatives and other widely used data sets. Data collections in the catalog meet the following criteria:
- Products of NCI intramural researchers or major NCI initiatives, or regularly referenced NCI-funded extramural research data
- Available to all researchers and may be Open or Controlled Access (requiring approval by a Data Access Committee)
- Well documented and available for download