Skip to main content
An official website of the United States government

All the Childhood Cancer Data in the GDC (So Far)

, by Peggy I. Wang, Ph.D. and Pamela C. Birriel, Ph.D.

New data is available at the Genomic Data Commons.

New data is available at the Genomic Data Commons (GDC).

NCI’s Genomic Data Commons (GDC) recently released over 55 TB of newly harmonized childhood cancer data. By coincidence, this came during Childhood Cancer Awareness month, a time when we remind others and ourselves of the children and their families affected by these rare and devastating diseases.

The bulk of the childhood cancer data in the GDC comes from NCI’s Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Since 2007, TARGET has been collecting and curating molecular, genomic, and clinical data from children and young adults, and importantly, working to make the data available to researchers. Another NCI initiative characterizing childhood cancers is the Cancer Genome Characterization Initiative’s Burkitt Lymphoma Genome Sequencing Project.

The table below summarizes the 5,062 cases of TARGET data available so far in the GDC. With last month’s data release, the amount grew by nearly 43%, including an expansion of Acute Lymphoblastic Leukemia cases and new cases of Acute Leukemias of Ambiguous Lineage. New somatic variant calls using the algorithm Pindel were released for many existing cases.

TARGET childhood cancer data types available in the GDC
Platform Cancer Types (count)
Whole-exome sequencing (alignments and somatic variant calls) ALL (873), AML (22), NBL (221), WT (45)
Whole-genome sequencing (alignments) ALL (133), AML (227), CCSK (13), NBL (9), OS (31), RT (69), WT (81)
RNA sequencing (alignments and expression quantifications) ALL (591), AML (179), CCSK (13), NBL (155),  OS (88), RT (64), WT (125)
microRNA sequencing (alignments and expression quantifications) ALL (229), AML (701), RT (66), WT (127)
ALL, Acute Lymphoblastic Leukemia; AML, Acute Myeloid Leukemia; CCSK, Clear Cell Sarcoma of the Kidney; NBL, Neuroblastoma; OS, Osteosarcoma; RT, Rhabdoid tumor; WT, Wilms Tumor

For support in accessing genomic data and utilizing GDC resources, the team provides a monthly support webinar. In addition to getting tours of various tools and components of the GDC, the research community can directly ask the team members any genomic data related questions.

For finding additional pediatric data, the Pediatric Genomic Data Inventory (PGDI) provides a “cheat sheet” of large-scale data sets collected internationally, including who to contact to find it. PGDI also provides researchers an opportunity to list their datasets and help connect their patients’ information with the research community.

Cancer remains an important cause of child mortality, with an estimated 80,000 cancer-related deaths worldwide in children under age 20 annually.1 Researchers are finding that pediatric cancer may be fundamentally different from malignancies of the same name in an adult population, thereby requiring dedicated molecular characterization studies like TARGET to identify unique therapeutic strategies.


1. Bhakta N, Force LM, Allemani C et al. Childhood cancer burden: a review of global estimates. Lancet Oncol. 2019 Jan;20(1):e42-e53. doi: 10.1016/S1470-2045(18)30761-7.