Skip to main content
An official website of the United States government

NCI's Genome Characterization Pipeline

NCI’s Center for Cancer Genomics (CCG) coordinates research teams across the United States and Canada to produce rich cancer genomic and clinical datasets for the cancer research community. CCG implements this collaborative effort through an efficient and standardized workflow called the Genome Characterization Pipeline. Learn more about how it works below.

Tissue Collection and Processing

  • CCG partners with clinical trials and community oncology groups that collect tumor tissue samples and normal tissue, usually blood, from patients who choose to participate. Tumor tissues are preserved in one of two ways: the majority of the samples that CCG studies are formalin-fixed paraffin-embedded (FFPE) tissues and some are frozen tissue.
  • CCG’s Biospecimen Core Resource (BCR) has two components:

Genome Characterization

  • The Genome Characterization Centers (GCCs) generate data for samples received from the Biospecimen Core Resource (BCR).
  • Each GCC provides support for distinct genomic or epigenomic pipelines:
  • The GCCs send raw sequencing data, associated metadata, and other characterization data generated to the Genomic Data Commons (GDC), where it is shared with the Genomic Data Analysis Network (GDAN) and research community.

Genomic Data Analysis

  • The CCG Genomic Data Analysis Network (GDAN) is a diverse team of scientists from 13 institutions across the United States and Canada that takes the raw output of genomic characterization techniques from CCG’s Genome Characterization Centers (GCCs) and performs analyses that transform these big data into biological insight.
  • The GDAN has a wide range of expertise, from identifying genomic abnormalities to integrating and visualizing multi-omics data.
  • The GDAN works in CCG’s Analysis Working Groups and publishes results in scientific journals.

Data Sharing and Discovery

  • CCG’s Analysis Working Groups (AWG)s, collaborative teams of scientists and clinicians, analyze data from the Genomic Data Analysis Network (GDAN) and produce novel analyses, resulting in peer-reviewed publications that advance their fields and spur future research.
  • The Genomic Data Commons (GDC) harmonizes sequencing and other characterization data and applies the latest bioinformatic pipelines to produce derived data that a broad range of researchers can utilize.
  • The rich data generated by the CCG pipeline are made publicly available in the GDC for use by researchers across the world. Some data are released as open access and are available to anyone; others are controlled access and require users to apply for authentication. See Data Access Processes and Tools for details.
  • By bringing data to one location and increasing accessibility, the GDC aims to accelerate the rate of discovery about cancer and facilitate precision oncology. Learn more about how the GDC is providing an expandable data sharing platform.
  • Posted:

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “NCI's Genome Characterization Pipeline was originally published by the National Cancer Institute.”