OCG's Genome Characterization Pipeline

NCI’s Office of Cancer Genomics (OCG) coordinates research teams across the United States and Canada to produce rich cancer genomic and clinical datasets for the cancer research community. OCG implements this collaborative effort through an efficient and standardized workflow called the Genome Characterization Pipeline. Learn more about the components and their roles below.

Genome Characterization Pipeline Overview

View and Print Infographic

Tissue Collection and Processing

OCG partners with Tissue Source Sites (TSSs): clinical trials and community oncology groups that collect tumor tissue samples and normal tissue, usually blood, from patients who choose to participate. Tumor tissues are preserved in one of two ways:
- The majority of the samples are formalin-fixed paraffin-embedded (FFPE) tissues.
- Some samples are frozen tissue.
OCG’s Biospecimen Core Resource (BCR) is responsible for collecting and processing samples, as well as collecting and curating clinical data. The BCR has two components:
- The Biospecimen Processing Center at Nationwide Children’s Hospital processes all tissues to ensure that they meet rigorous quality standards and isolates DNA, RNA, proteins, and other analytes to send to OCG’s Genome Characterization Centers (GCC)s.
- The Clinical Data Center at Information Management Services, Inc. works with collection sites to ensure clinical data is properly collected and in the correct format and helps identify standardized clinical data points. Importantly, the Clinical Data Center also oversees informed consent to protect patients’ rights and de-identifies all clinical data about patients to safeguard their privacy.

Genome Characterization

The Genome Characterization Centers (GCCs) generate data from samples received from the Biospecimen Core Resource (BCR).
Each GCC provides support for distinct genomic or epigenomic pipelines:
- The Broad Institute performs whole genome sequencing and whole exome sequencing.
- The University of North Carolina performs total RNA and microRNA sequencing.
- The New York Genome Center performs methylation arrays and single-cell sequencing.
- Azenta Life Science performs whole exome sequencing.
- Additional GCCs with other specialities may be utilized as new molecular technologies become available.
The GCCs send raw sequencing data, associated metadata, and other characterization data generated to the Genomic Data Commons (GDC), where it is shared with the Genomic Data Analysis Network (GDAN) and the research community.

Genomic Data Analysis

The Genomic Data Analysis Network (GDAN) is a diverse and collaborative team of scientists from various institutions across the United States and Canada that takes the raw data output from Genome Characterization Centers (GCCs) and performs analyses that transform these large datasets into biological insights.
The GDAN has a wide range of expertise, from identifying genomic abnormalities to integrating and visualizing multi-omics data.
The GDAN forms Analysis Working Groups to develop and apply novel analyses on the data. Results and findings are published in peer-reviewed scientific journals.
The GDAN's overarching goal is to produce resources to spur the cancer genomics research community: detailed findings are published and data is made available in the Genomic Data Commons.

The Genomic Data Commons (GDC) harmonizes genomic data by applying a standardized set of data processing protocols and bioinformatic pipelines. Resulting data may be utilized by researchers in many ways.
Data generated by the Genome Characterization Pipeline are made available to the public via the GDC for use by researchers worldwide. Some data are released open access while others are controlled access and require users to apply for authentication. GDC's Data Access Processes and Tools provide details on data access.
By bringing harmonized data to one location and increasing accessibility, the GDC aims to accelerate the rate of cancer discoveries and facilitate precision oncology.