NCI's Genome Characterization Pipeline
NCI’s Center for Cancer Genomics (CCG) coordinates research teams across the United States and Canada to produce rich cancer genomic and clinical datasets for the cancer research community. CCG implements this collaborative effort through an efficient and standardized workflow called the Genome Characterization Pipeline. Learn more about how it works below.
Tissue Collection and Processing
- CCG partners with clinical trials and community oncology groups that collect tumor tissue samples and normal tissue, usually blood, from patients who choose to participate. Tumor tissues are preserved in one of two ways: the majority of the samples that CCG studies are formalin-fixed paraffin-embedded (FFPE) tissues and some are frozen tissue.
- CCG’s Biospecimen Core Resource (BCR) has two components:
- The Biospecimen Processing Center at Nationwide Children’s Hospital processes all tissues to ensure that they meet rigorous quality standards and isolates DNA, RNA, proteins, and other analytes to send to CCG’s Genome Characterization Centers (GCC)s.
- The Clinical Data Center at Information Management Services, Inc. oversees informed consent to protect patients’ rights, and de-identifies all clinical data about patients to safeguard their privacy.
- The Genome Characterization Centers (GCCs) generate data for samples received from the Biospecimen Core Resource (BCR).
- Each GCC provides support for distinct genomic or epigenomic pipelines:
- The Broad Institute performs whole genome sequencing and whole exome sequencing.
- The University of North Carolina performs total RNA and microRNA sequencing.
- The New York Genome Center performs methylation arrays and single-cell sequencing.
- Azenta Life Science performs whole exome sequencing.
- Additional GCCs with other specialties may be utilized as new molecular technologies become available.
- The GCCs send raw sequencing data, associated metadata, and other characterization data generated to the Genomic Data Commons (GDC), where it is shared with the Genomic Data Analysis Network (GDAN) and research community.
Genomic Data Analysis
- The CCG Genomic Data Analysis Network (GDAN) is a diverse team of scientists from 13 institutions across the United States and Canada that takes the raw output of genomic characterization techniques from CCG’s Genome Characterization Centers (GCCs) and performs analyses that transform these big data into biological insight.
- The GDAN has a wide range of expertise, from identifying genomic abnormalities to integrating and visualizing multi-omics data.
- The GDAN works in CCG’s Analysis Working Groups and publishes results in scientific journals.
Data Sharing and Discovery
- CCG’s Analysis Working Groups (AWG)s, collaborative teams of scientists and clinicians, analyze data from the Genomic Data Analysis Network (GDAN) and produce novel analyses, resulting in peer-reviewed publications that advance their fields and spur future research.
- The Genomic Data Commons (GDC) harmonizes sequencing and other characterization data and applies the latest bioinformatic pipelines to produce derived data that a broad range of researchers can utilize.
- The rich data generated by the CCG pipeline are made publicly available in the GDC for use by researchers across the world. Some data are released as open access and are available to anyone; others are controlled access and require users to apply for authentication. See Data Access Processes and Tools for details.
- By bringing data to one location and increasing accessibility, the GDC aims to accelerate the rate of discovery about cancer and facilitate precision oncology. Learn more about how the GDC is providing an expandable data sharing platform.