Skip to main content
An official website of the United States government

Analysis of Ugandan Cervical Carcinomas Identifies Human Papillomavirus (HPV) Clade-specific Epigenome and Transcriptome Landscapes

, by Nick Griner, Ph.D. and Cindy Kyi, Ph.D.

The Cancer Genome Characterization Initiative (CGCI)

The Cancer Genome Characterization Initiative (CGCI) characterizes pediatric and rare adult cancers.

Cervical cancer is the leading cause of cancer death among women in Africa.1 This is largely due to patients being diagnosed at later stages of the disease, leading to poor prognosis. Other factors of poor prognosis include high prevalence of genital human papillomavirus (HPV) infection and limited treatment options. Limited use of vaccine and lack of utilization of Pap smear as a form of prevention in developing countries like Uganda are leading to predictions of a 50% increase in cervical cancer mortality by 2040.2

Human Immunodeficiency Virus (HIV) infection rate remains relatively high in developing countries. HIV has been classified as an indirect carcinogen through immune suppression that can lead to a number of specific cancers in AIDS patients. HIV is also linked to higher rates of HPV acquisition, and co-infection reduces the likelihood of HPV clearance, leading to increased risk of cervical cancer. It has been found that HPV infection is necessary for cervical cancer, but not sufficient.3 A number of genomic cervical cancer studies, primarily conducted in non-African patients, have revealed a molecular profile of a number of different pathways involved including copy number amplifications and somatic alterations.4,5 However, the role of HIV in these genomic cervical cancer studies has not been adequately addressed.

The HIV+ Tumor Molecular Characterization Project (HTMCP) was initiated by the Office of Cancer Genomics, along with the Office of HIV and AIDS Malignancies, to gain insight into the genetic events driving HIV-associated cancers. A recent HTMCP publication in Nature Genetics6 describes a large scale genomic and transcriptomic study consisting of 118 tumors from Ugandan patients, 72 of which were HIV+. The results from this study reveals differing HPV clade-specific genomic, epigenome, and transcriptome landscapes in this cohort of samples.

Picture of Uganda Cancer Institute

Figure 1. Uganda Cancer Institute in Kampala, Uganda

Project timeline of HIV+ Tumor Molecular Characterization Project

Figure 2. Flowchart of HIV+ Tumor Molecular Characterization Project

The study consisted of 212 cervical cancer cases, of which 118 were included in discovery cohort and 89 were included in extension cohort for validation of mutations. Whole genome sequencing (WGS) of the 118 HIV+ and HIV- discovery tumor cohort revealed an average of 22,942 somatic mutations per case, of which 311 are coding mutations. APOBEC mutation signature consistent with a mutational process driven by cellular response to viral infections were detected. However, there were no differences in mutation burden or mutation signatures (characteristic combinations of mutation types arising from specific mutagenesis processes) between HIV+ and HIV- cases.

The most recurrent significantly mutated gene (SMG) PIK3CA, was present in a higher proportion of HIV- tumors, and its expression was 1.3 times higher in HIV- tumors compared to HIV + tumors.  Analysis of copy number alterations largely showed similar landscapes between HIV+ and HIV- samples, although HIV+ samples exhibited more unique focal amplifications and deletions. Comparison of the copy number (CN) landscapes between the HIV- samples with those of TCGA cervical cancers revealed some differences. TCGA samples exhibited a larger number of significantly deleted regions, affecting 11 chromosomes while the HIV- samples of this cohort exhibited 3 unique amplified regions. These results suggest genetic background may affect CN differences between the cohort of this manuscript and the largely North American cohort of TCGA.

Analysis of non-coding mutations revealed seven high confidence non-coding “hotspots” including two in TERT promoter and two in a potential intronic enhancer of ADGRG6 which have been also reported in breast and bladder cancers. All observed hotspots were present in both HIV- and HIV+ cases.

Identification of HPV clades

WGS detected 17 HPV types, most of which are highly oncogenic HPV16(clade A9), 18 and 45 (clade A7). Clade A7 was more prevalent in this cohort compared to TCGA study, particularly among squamous cell carcinomas (SCCs). HPV types between HIV+ and HIV- tumors were similar.  

Next, cluster analyses were performed to characterize expression and DNA methylation landscapes correlated with tumor features. Three gene expression clusters enriched for adenocarcinomas, non-keratinizing SCCs and keratinizing SCCs were identified. Two of the DNA methylation clusters distinguished clade A9-infected SCCs from clade A7-infected squamous and non-squamous cell carcinomas.

Differential methylation analysis comparing clade A7-infected samples with clade A9-infected samples detected over a 100,000 differentially methylated probes with different distribution with respect to genomic features and proximity to CpG islands. Clade A9 samples were found to be enriched with keratin family genes which have a role in epithelial differentiation, and production of virus during HPV infection, and directing uncontrolled cell growth. In contrast, clade A7 samples had increased expression of genes linked to extracellular matrix organization, cell adhesion and migration pathways.

Since HPV viral genes regulate epithelial cell differentiation and promote tumorigenesis, unsupervised clustering of viral E1, E2, E6 and E7 RNA transcripts was performed to study their association with different clades. Three clusters with different levels of gene expression and clade-specific enrichment were identified for clade A7-infected samples and A9-infected samples. This suggests that clade-enriched tumor gene expression patterns may be influenced by expression of HPV genes. The more aggressive viral expression profile associated with clade A7 is consistent with the clinical phenotype as A7-infected patients have poorer prognosis compared to A9-infected patients.

Clade-specific nature of DNA methylation results and large number of somatic mutations observed in chromatin modifying genes prompted the HTMCP to investigate clade-specific differences in histone modifications. ChIP-seq and cluster of clusters analyses using histone modification “marks” revealed four clusters that separated clade-specific tumors to mainly two clusters: one enriched in clade A9-infected tumors and the other in clade A7-infected tumors, while the remaining clusters enriched for non-SCC tumors. Differences in abundance of histone marks at active promoter and enhancer regulatory regions were observed between clade A7-infected and clade A9-infected samples, suggesting that DNA methylation and epigenetic modification patterns are altered in an HPV clade-specific manner.

Next, the project identified HPV integration sites into the genome and analyzed association of HPV integration sites with the expression of nearby genes. Clade A7 integration events contained more integration sites per event than clade A9 events, likely resulting in an observed pronounced effect on gene expression in clade A7 samples. Fold changes in histone mark enrichment near integration events were also positively correlated with gene expression changes. Higher number of HPV integration sites were associated with increased expression of nearby genes. These results suggest integration of HPV events in the genomes are associated with altered histone modifications and expression of genes within the proximity of the integration sites.

Nearly a third of integration events in the tumors were not within 10 kb of a protein coding gene. Thus, the HTMCP sought other genomic features influenced by these events and identified endogenous retroviral sequences (ERVs) near 44% of integrations. ERVs are epigenetically silenced in the genome and their reactivation is associated with induction of antiviral pathways. ERV expression near integration events correlated with the number of HPV insertions within the event, and upregulation of ERVs near integration events was associated with histone modification changes. However, increased ERV expression was not associated with immune cell presence in samples with an ERV integration event.

This study outlines genomic, transcriptomic and epigenomic differences between HIV+ and HIV- samples in Ugandan cervical cancer patients. For the first time, molecular and epigenetic characteristics associated with HPV clades were observed within the A9 and A7-infected tumor samples. Histone modification profiles also distinguished infected samples by HPV clades. Finally, histone modification changes at HPV integration sites were correlated with the upregulation of nearby genes and endogenous retroviruses. Clade-specific differences observed in cervical tumors suggest a model where A7-infected tumors are less differentiated, more likely to have viral transcripts integrated into the genome, and more aggressive with a poor prognosis. In contrast, A9-infected tumors are more epithelial differentiated and have more episomal viral transcripts (less genomic integration), resulting in a better prognosis. The results from this study provides a better understanding of HPV infection-induced genomic, transcriptomic and epigenomic changes in cervical cancers.


  1. Bodily, J. & Laimins, L. A. Persistence of human papillomavirus infection: keys to malignant progression. Trends in Microbiology. 2011; 19: 33–39. PMID: 21050765
  2. Zubizarreta, E. H., Fidarova, E., Healy, B. & Rosenblatt, E. Need for Radiotherapy in Low and Middle Income Countries – The Silent Crisis Continues. Clinical Oncology. 2015; 27: 107–114. PMID: 25455407
  3. Torre, L. A., Islami, F., Siegel, R. L., Ward, E. M. & Jemal, A. Global Cancer in Women: Burden and Trends. Cancer Epidemiol. Biomarkers Prev.2017; 26: 444–457. PMID: 28223433
  4. The Cancer Genome Atlas Network, Integrated genomic and molecular characterization of cervical cancer. Nature. 2017; 543: 378–384. PMID: 28112728
  5. Ojesina, A. I., Lichtenstein, L., Freeman, S.S. et al. Landscape of genomic alterations in cervical carcinomas. Nature. 2014; 506: 371–375. PMID: 24390348
  6. Gagliardi A, Porter VL, Zong Z, et al. Analysis of Ugandan cervical carcinomas identifies human papillomavirus clade-specific epigenome and transcriptome landscapes. Nat Genet. 2020;52(8):800-810. doi:10.1038/s41588-020-0673-7 PMID: 32747824
< Older Post

Next-Generation Cancer Models Available as a Community Resource

Newer Post >

Bridging the Gap: CTD² Network Research Findings Undergoing Testing in Clinical Trials

If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “Analysis of Ugandan Cervical Carcinomas Identifies Human Papillomavirus (HPV) Clade-specific Epigenome and Transcriptome Landscapes was originally published by the National Cancer Institute.”