Update: Mutations in Human Cancers Through the Lens of KRAS
, by Jim Hartley and Ming Yi
Authors' correction: An alert reader notified us that the data in the Broad Institute's Genome Data Analysis Center shows that EGFR genes in lung adenocarcinomas are indeed highly mutated only when KRAS is wild type (0/66, mutant/wild type). Our script failed to return this bias due to a #DIV/0! error. The text below, Figure 2A, and the Spreadsheet have now been corrected.
In 2015 we published a RAS Dialogue that used TCGA (The Cancer Genome Atlas) data to analyze the kinds of KRAS mutations found in cancers, and the genes that were frequently co-mutated (mutated along with KRAS), frequently contra-mutated (mutated when KRAS was wild type), or frequently mutated independent of KRAS status. Our data were downloaded from the Broad Institute’s GDAC (Genome Data Analysis Center), and comprised 122 pancreas (PAAD, pancreatic ductal adenocarcinoma), 226 lung (LUAD, lung adenocarcinoma), 149 colon (COAD, colon adenocarcinoma), and 217 rectal (READ, rectum adenocarcinoma) samples, for a total of 714 patient tumors.
We have now repeated our analysis, taking advantage of the larger number of tumor samples that have now been analyzed at the Broad’s GDAC site, as well as at Memorial Sloan Kettering (cBioPortal) and through the AACR’s Genie project. In addition, the cBioPortal and Genie data include a combined category for colon and rectal (COADREAD) samples. (This is a distinct set of samples, not a combination of colon and rectal.) The analyses of these newer data are presented in graphical form here (download Figures), and also in numerical form here (download Spreadsheet).
Key to the figures
- As in 2015, we have represented the significant genes in each patient’s tumor with color coding. Each column represents an individual patient’s tumor.
- Each row represents a gene that is significantly mutated (see Methods, below).
- Gene names are colored according the following scheme:
- Blue gene names represents genes that are at least four times more likely to be mutated when KRAS is mutant.
- Red gene names represents genes that are at least four times more likely to be mutated when KRAS is wild type.
- Black gene names represents genes that are mutated in at least 20% of patient samples, independent of KRAS status.
- Colored rectangles represent genes that have missense mutations. White rectangles represent genes that are wild type. Black rectangles represent mutations in a particular mutant’s row that are different from that mutation, for example BRAF mutations that are not V600E, or KRAS mutations that are not G12D.
- The color of each rectangle denotes the particular amino acid change in that gene. Note that the colors are arbitrary, thus similar colors do not have any meaning except within a particular gene. For example, green rectangles in one row may represent mutations of valine to glutamic acid, while green rectangles in a different row may represent glycine mutated to aspartic acid.
- When a particular amino acid change in a gene was present often enough to pass a significance filter (for example BRAF V600E), it was given its own row.
- The “counts” on the right end of each gene’s row, a/b, represent a) the number of times that gene was mutated in samples that contained mutant KRAS; and b) the number of times that gene was mutated in samples that contained wild-type KRAS.
- The data used to generate the figures are contained within Spreadsheet. The gene names in the Spreadsheet are colored according to the same system as the gene names in the grids.
The number of patient samples available for analysis is much larger now than in 2015, totaling 20,503 tumor samples. Data from the three sources were analyzed for genes that were mutated in patient tumor samples when KRAS genes were mutated or wild-type, and for genes that were highly mutated regardless of KRAS mutation status. Results for pancreas, lung, colon, rectal, and colorectal cancers are displayed as a grid of rectangles colored to represent amino acid changes in each tumor sample, and are also summarized in Spreadsheet. The proportions of mutant KRAS in each data set are presented in Table 1.
|Sample type||Broad GDAC||MSK cBioPortal||AACR Genie|
|Number of samples (% mutant KRAS)|
|Pancreas (PAAD)||184 (76%)||766 (89%)||1973 (90%)|
|Lung (LUAD)||523 (31%)||890 (32%)||7429 (36%)|
|Colon (COAD)||347 (45%)||132 (38%)||3679 (44%)|
|Rectal (READ)||121 (51%)||327 (45%)||1021 (44%)|
|Colorectal (COADREAD)||---||1378 (39%)||1733 (51%)|
General observations on the method
- The colored grid format for displaying data from individual patients is less useful as the number of patients gets large, because the grid squares get so small that the colors become indistinct. Compare the Broad GDAC data on rectal adenocarcinoma (READ, N=121) with the AACR Genie data on lung adenocarcinoma (LUAD, N=7429).
- As the number of patient tumor samples increases, the number of genes that pass our significance filters decreases. For example, the Broad GDAC data for pancreas adenocarcinoma (PAAD, N=184) yield 14 genes that are more likely to be mutated when KRAS is mutated, 24 genes that are more likely to be mutated when KRAS is wild type, and 40 genes that are highly mutated regardless of whether KRAS is mutant or wild type. In contrast, the MSK cBioPortal and AACR Genie data for PAAD are based on 766 and 1973 samples, respectively, and after filtering those data no co-mutated and only one contra-mutated gene remained.
- For convenience, all the different KRAS mutants are listed, along with their number and percent contribution to each cancer in the “KRAS mutants” pages of Spreadsheet.
- Mutant KRAS remains the dominant driver in PAAD (Table 1, above).
- As the number of samples analyzed increased (184 -> 766 -> 1983), the number of genes that passed our tests decreased dramatically (78 -> 3 -> 2). See “data” pages of the Spreadsheet.
- TP53 is mutated in 60 – 70% of KRAS-mutant PAAD tumors, and about half that frequency when KRAS is wild type.
- The most frequent KRAS mutants in the three sets of PDAC data are the same, G12D > G12V > G12R >Q61H. See PAAD KRAS mutants page of the Spreadsheet.
- See figures 1A-1C in Figures.
- As the number of patient samples increased dramatically (523 -> 890 -> 7429) the proportion of samples containing mutant KRAS remained at about one third (Table 1).
- G12C mutations account for about 40% of the KRAS mutants found in LUAD, and thus about 12% of all LUAD tumors.
- EGFR mutations are much more common in lung samples, but not in other tissues, when KRAS is wild type (see Varmus, et al., 2018).
- In contrast to pancreas tumors, lung tumors are more likely to contain mutant TP53 genes when KRAS is not mutated.
- According to our tests, both Broad GDAC and MSK cBioPortal data yield very similar sets of genes that are highly mutated in LUAD regardless of KRAS status: all 8 of the highly mutated genes in the cBioPortal data (CSMD3, LRP1B, MUC16, RYR2, TP53, TTN, USH2A, ZFHX4) are also found in the 11 hits in the Broad data. However, only TP53 passes our filters in the much larger AACR Genie data set.
- See figures 2A-2C in Figures.
Colon (COAD), rectal (READ), and colorectal (COADREAD) andenocarcinomas
- The COADREAD tumor data found in the MSK cBioPortal and AACR Genie data sets represent separate and distinct tumors, not the sum of COAD and READ.
- KRAS genes are mutant in ~45% of these tumors (Table 1).
- Despite large differences in the sample numbers, the spectrum of KRAS tumors is similar among the three tissue designations: about 60% of the KRAS mutations are at G12, but G13D and A146T mutants are far more abundant in colorectal tumors than PAAD or LUAD (Spreadsheet, “KRAS mutants” pages).
- BRAF mutations, about three quarters of them BRAF V600E, contribute to 15-20% of the colorectal cancers in which KRAS is wild type.
- PIK3CA, along with the familiar colorectal cofactors APC and TP53, appear as highly mutated independent of KRAS status as the number of samples increases.
- See figures 3A-5B in Figures.
The number of patient samples that have been analyzed for missense mutations has increased 28-fold since our analysis of TCGA data in 2015. Our filters for detecting genes that were co-mutated or contra-mutated with mutant KRAS genes found many hits in 2015 that were not replicated as the number of samples increased, and missed other genes whose significance is now difficult to miss (e.g., EGFR mutants in lung cancers).
For pancreas adenocarcinomas the incidence of mutant KRAS is around 90%, meaning that teasing out genes that are significantly mutated in the much smaller number of samples in which KRAS is wild-type is difficult. For example, even with ten-fold more samples (AACR Genie data), the significance of mutant BRAF is difficult to assess: 28 cases when KRAS was wild type (N=193), 9 cases when KRAS was mutant (N=1780). Missense mutations in TP53 are frequent in pancreas (and lung and colorectal) cancers, whether KRAS is mutant or wild type.
The signal for EGFR being mutated only when KRAS is wild type is very strong in all three databases. Now that KRAS G12C inhibitors are in phase 3 clinical trials, tens of thousands of patients each year with G12C or EGFR mutations (14% and 27% of the 7429 AACR Genie LUAD samples) will benefit.
We are not pathologists and would appreciate any comments on how the COADREAD category of tumors is defined. G13D mutants are relatively common in this group, comprising ~15% of all KRAS mutations (compared to 3% in LUAD and 1% in PAAD), as are A146Ts (~6% in the colorectal group but <1% in PAAD and LUAD). Both the G13D and A146T forms of KRAS exchange GDP for GTP in the absence of the GEFs (guanine nucleotide exchange factors) much more extensively than other mutants, but why this might be more highly selected in colorectal tissue is not clear.
Data were downloaded from each site and formatted so they could be analyzed using Sequence Query Language (SQL). We wrote an R script that does the following:
Test 1: Some of the samples contain high numbers (thousands) of mutated genes. We reasoned that most of the mutations in such samples were likely to be passengers, rather than drivers of cancer phenotypes. Therefore the mean number of mutant genes in samples from each tumor type was determined, and samples with numbers of mutant genes greater than two standard deviations above the mean were excluded from further tests.
Test 2: We chose KRAS as the index gene for co-mutation analysis. The script selected only missense mutations in exons.
Test 3: Co-mutated or contra-mutated genes were found by requiring candidate genes to pass two tests. Test 3a: Genes must be mutated in at least 10% of the samples where either KRAS is mutant or where KRAS is wild-type. Test 3b: Call genes co-mutated with KRAS when they are mutant more than 4 times as often (by frequency) when KRAS is mutant. Call genes contra-mutated with KRAS when they are mutant 4 times as often when KRAS is wild-type.
Test 4: Genes that were highly mutated (>20%) regardless of KRAS status were also included.
Test 5: Single missense mutations that satisfy either Test 3 or Test 4 are displayed in separate rows.
Results from the above tests are plotted as a grid of rectangles, in which each rectangle represents one gene in one patient tumor sample. When a gene is mutated in a particular sample, that rectangle is colored for that particular missense mutation. White rectangles in the diagrams show that gene in that tumor sample was wild-type. Black rectangles appear only in rows in which a specific amino acid change passed the above tests, and represent mutations in that gene other than the specified mutation. For example, in lung adenocarcinomas the G12C mutation of KRAS passed the above tests, and such samples appear as green rectangles in the KRASG12C row. Patient samples with other KRAS mutations, such as G12V, G13D, etc. are represented as black rectangles in that row.
For further information contact Dr. Ming Yi at email@example.com.