An Introduction to Proteogenomics: Appreciating the Complexity of the Central Dogma

September 24, 2020, by Anna Roberts-Pilgrim, Ph.D.

A simplified model of the central dogma of molecular biology.

Credit: National Cancer Institute

The Central Dogma of Molecular Biology is probably known to you if you are reading this post on this website. But for those who need a quick refresher, DNA represents the body’s instruction manual, from those instructions RNA transcripts are made, which in turn are translated into proteins. The proteins then carry out much of the work of the cell.

The funny thing about the Central Dogma is that it’s not a one-to-one flow. That means that Your Favorite Gene (YFG) doesn’t necessarily produce just one YFG RNA transcript and then one YFG protein. In fact, one gene, through the process of alternative splicing, can produce several variant RNA transcripts. Each variant transcript can then go on to produce several copies of a resultant variant protein that are uniquely modified further, ultimately creating an exponential increase from gene number to protein product.

Complicating matters further, there are additional regulations at the transcriptional level by microRNAs and other functional RNA products, and at the translational level, including by circular RNAs.

The estimated total number of proteins in the human proteome is over 1 million, while the total number of protein-encoding genes is estimated to be 20,000–25,000 and transcriptome estimates at 300,000. In short, the process from DNA to protein, as we now understand it, is really complex—a point researchers may forget to appreciate.

Post-Translational Modifications: Protein-Level Regulation

On top of all that’s impacting protein synthesis at the genomics level, translated proteins undergo yet another set of regulations called post-translational modifications (PTMs). PTMs transform a newly-formed protein into a fully “decorated” mature entity, dramatically changing its biological function in response to a cell’s needs. These mature proteins can then perform a myriad of tasks needed by the cell, such as moving about the cell or activating a signaling cascade.

With estimates of over 200 types, PTMs are the key mechanisms that increase the proteomic diversity far above both genomic and transcriptomic diversity.

PTMs entail covalently adding functional groups to specific amino acids or peptide linkages, enzyme-assisted protein folding or proteolytic processing of a protein. Many of these modifications are mediated by enzymes—which themselves are proteins—such as kinases, phosphatases, transferases and ligases, which add or remove functional groups, proteins, lipids or sugars to or from amino acid side chains.

Recent studies of PD-L1, a central protein in cancer immune suppression, suggest that PTMs such as glycosylation, phosphorylation, ubiquitination, sumoylation, and acetylation all play an important role in the regulation of the protein’s stability, translocation and protein-protein interactions. Any PTM changes directly influence PD-L1-mediated immune resistance.

“PTMs, protein-protein interactions, sub-cellular localization of proteins and their dynamic folding and unfolding mechanisms, etc., all contribute to disease states” comments Dr. Emily Boja, Program Director of the Clinical Proteogenomic Tumor Analysis Consortium (CPTAC) program at NCI’s Office of Cancer Clinical Proteomics Research (OCCPR). “Therefore, understanding all facets of proteins and their fundamental roles in the evolution of diseases such as cancer is critical.”

Genomics or Proteomics Cannot Stand Alone in the Search for Effective Cancer Drugs

Though researchers have been working valiantly to find new drugs against cancer, many experimental drugs are plagued with low efficacy. For example, researchers have been trying to take advantage of the fact that many cancers have increasingly unstable genomes, an “enabling characteristic” that promotes tumor sustaining mutations. Unfortunately, many experimental drugs targeting instability have had low patient response rates, being only effective in a subset of the population.

There are two widely-used general mechanisms-of-action that drugs utilize. Rooting out the problem where it begins, researchers can develop drugs to target a particular mutation in the tumor genome, preventing it from starting a downstream effect, such as with RNA interference drugs, exon-skipping drugs, viral-mediated gene therapies, mRNA-based drugs and genome-editing CRISPR therapeutics. Other drugs are developed to target proteins—usually ones that result from a mutated gene.

Unfortunately, both these methods have shown moderate drug efficacy—so why? One possibility is the unanticipated complexity of the process from gene to protein and the regulation at each step. Many of these targeted genes and proteins turn out to have no effect on a tumor’s survival, be parts of redundant or peripheral pathways that have no direct effect on the tumor when targeted alone, affect unanticipated pathways, or face unanticipated protein regulation that render it ineffective.

“The single gene targeting strategy disregards the complexity of cancer,” says Dr. Mehdi Mesri, CPTAC program coordinator and Program Director at OCCPR. “Understanding the biological context of a particular mutation is vital to finding an effective therapy.”

Enter Proteogenomics: A Whole-Picture Approach

Before targeting one end of the central dogma or the other, researchers need a way to take a careful look at everything that happens in between, in order to account for all the genomic variations and proteomic regulation, and how these change from patient-to-patient and cancer-to-cancer. Combining information gathered from genomics with transcriptomics and proteomics, a strategy we refer to as proteogenomics, allows researchers to take a more comprehensive look at each alteration of a tumor, including the identification, localization, and functional analysis of resultant proteins and their relationship to the larger tumor environment.

Proteogenomics works to enhance our understanding of the cancer genome biology by helping prioritize genomic alterations, subtyping tumors with proteomic features, illuminating alterations to PTMs responsible for the dysregulation of cancer signaling networks, and improving the understanding of drug response and resistance to therapies.

NCI’s CPTAC program does just that, gleaning powerful information on cancer development and progression. Through the study of the tumor micro-environment and immune landscape, CPTAC researchers have been able to proteogenomically characterize numerous cancer types using proteomics, phosphoproteomics, methylomics, acetylomics and glycomics analysis in conjunction with the well-established sequencing approaches. Researchers have discovered new tumor subtypes, tumor micro-environment variations and new potential proteins for targeted drug therapy. In our next post we will tell you how this type of research is done!