Methods of Genetic Analysis and Gene Discovery
The recognition that cancer clusters within families has led many investigators to collect data on multiple-case families with the goal of localizing cancer susceptibility genes through linkage studies.
Linkage studies are typically performed on high-risk kindreds, in whom multiple cases of a particular disease have occurred, in an effort to identify disease susceptibility genes. Linkage analysis statistically compares the genotypes between affected and unaffected individuals and looks for evidence that known genetic markers are inherited along with the disease trait. If such evidence is found (linkage), it provides statistical data that the chromosomal region near the marker also harbors a disease susceptibility gene. Once a genomic region of interest has been identified through linkage analysis, additional studies are required to prove that there truly is a susceptibility gene at that position. Linkage analysis is affected by the following:
- Family size and having a sufficient number of family members who volunteer to contribute DNA.
- The number of disease cases in each family.
- Factors related to age at disease onset (e.g., utilization of screening).
- Gender differences in disease risk (not relevant in gender-specific cancers).
- Heterogeneity of disease in cases (e.g., aggressive vs. nonaggressive phenotype).
- The accuracy of family history information.
- Prevalence of phenocopies.
An additional issue in linkage studies is the background rate of sporadic cancer in the context of family studies. For example, because a man’s lifetime risk of prostate cancer is one in seven, it is possible that families under study have both inherited and sporadic prostate cancer cases. Thus, men who do not inherit the prostate cancer susceptibility gene that is segregating in their family may still develop prostate cancer.
One way to address inconsistencies between linkage studies is to require inclusion criteria that defines clinically significant disease.[2-4] This approach attempts to define a homogeneous set of cases/families to increase the likelihood of identifying a linkage signal. It also prevents the inclusion of cases that may be considered clinically insignificant that were identified by screening in families.
Investigators have also incorporated clinical parameters into linkage analyses with the goal of identifying genes that may influence disease severity.[5,6] This type of approach, however, has not yet led to the identification of consistent linkage signals across datasets.[7,8]
Genome-wide Association Studies (GWAS)
GWAS are showing great promise in identifying common, low-penetrance susceptibility alleles for many complex diseases, including cancer. This approach can be contrasted with linkage analysis, which searches for genetic-risk variants cosegregating within families that have a high prevalence of disease. While linkage analyses are designed to uncover rare, highly penetrant variants that segregate in predictable heritance patterns (e.g., autosomal dominant, autosomal recessive, X-linked, and mitochondrial), GWAS are best suited to identify multiple, common, low-penetrance genetic polymorphisms. GWAS are conducted under the assumption that the genetic underpinnings of complex phenotypes, such as prostate cancer, are governed by many alleles, each conferring modest risk. Most genetic polymorphisms genotyped in GWAS are common, with minor allele frequencies greater than 1% to 5% within a given population (e.g., men of European ancestry). GWAS capture a large portion of common variation across the genome.[10,11] The strong correlation between many alleles located close to one another on a given chromosome (called linkage disequilibrium) allows one to “scan” the genome without having to test all 10 million known single nucleotide polymorphisms (SNPs). With GWAS, researchers can test 500,000 to 1 million SNPs per study and ascertain almost all common inherited variants in the genome.
In a GWAS, allele frequency for each SNP is compared between cases and controls. Promising signals—in which allele frequencies deviate significantly in case and control populations—are validated in replication cohorts. To have adequate statistical power to identify variants associated with a phenotype, large numbers of cases and controls, typically thousands of each, are studied. Because up to 1 million SNPs are evaluated in a GWAS, false-positive findings are expected to occur frequently when using standard statistical thresholds. Therefore, stringent statistical rules are used to declare a positive finding, usually using a threshold of P < 1 × 10-7.[12-14]
To date, hundreds of cancer-risk variants have been identified by well-powered GWAS and validated in independent cohorts. These studies have revealed convincing associations between specific inherited variants and cancer risk. However, the findings should be qualified with a few important considerations:
- GWAS reported thus far have been designed to identify relatively common genetic polymorphisms. It is very unlikely that an allele with high frequency in the population by itself contributes substantially to cancer risk. This, coupled with the polygenic nature of tumorigenesis, means that the contribution by any single variant identified by GWAS to date is quite small, generally with an odds ratio for disease risk of less than 1.5. In addition, despite extensive genome-wide interrogation of common polymorphisms in tens of thousands of cases and controls, GWAS findings to date do not account for even half of the genetic component of cancer risk.
- Variants uncovered by GWAS are not likely to directly contribute to disease risk. As mentioned above, SNPs exist in linkage disequilibrium blocks and are merely proxies for a set of variants—both known and previously undiscovered—within a given block. The causal allele is located somewhere within that linkage disequilibrium block.
- Admixture by groups of different ancestry can confound GWAS findings (i.e., a statistically significant finding could reflect a disproportionate number of subjects in the cases versus controls, rather than a true association with disease). Therefore, GWAS are typically powered to analyze a single predominant ancestral group. As a result, many populations remain underrepresented in genome-wide analyses.
The implications of these points are discussed in greater detail in the PDQ summaries on Genetics of Breast and Gynecologic Cancers; Genetics of Colorectal Cancer; and Genetics of Prostate Cancer. Additional details can be found elsewhere.
- American Cancer Society: Cancer Facts and Figures 2015. Atlanta, Ga: American Cancer Society, 2015. Available online. Last accessed April 1, 2015.
- Stanford JL, McDonnell SK, Friedrichsen DM, et al.: Prostate cancer and genetic susceptibility: a genome scan incorporating disease aggressiveness. Prostate 66 (3): 317-25, 2006. [PUBMED Abstract]
- Chang BL, Isaacs SD, Wiley KE, et al.: Genome-wide screen for prostate cancer susceptibility genes in men with clinically significant disease. Prostate 64 (4): 356-61, 2005. [PUBMED Abstract]
- Lange EM, Ho LA, Beebe-Dimmer JL, et al.: Genome-wide linkage scan for prostate cancer susceptibility genes in men with aggressive disease: significant evidence for linkage at chromosome 15q12. Hum Genet 119 (4): 400-7, 2006. [PUBMED Abstract]
- Witte JS, Goddard KA, Conti DV, et al.: Genomewide scan for prostate cancer-aggressiveness loci. Am J Hum Genet 67 (1): 92-9, 2000. [PUBMED Abstract]
- Witte JS, Suarez BK, Thiel B, et al.: Genome-wide scan of brothers: replication and fine mapping of prostate cancer susceptibility and aggressiveness loci. Prostate 57 (4): 298-308, 2003. [PUBMED Abstract]
- Slager SL, Zarfas KE, Brown WM, et al.: Genome-wide linkage scan for prostate cancer aggressiveness loci using families from the University of Michigan Prostate Cancer Genetics Project. Prostate 66 (2): 173-9, 2006. [PUBMED Abstract]
- Slager SL, Schaid DJ, Cunningham JM, et al.: Confirmation of linkage of prostate cancer aggressiveness with chromosome 19q. Am J Hum Genet 72 (3): 759-62, 2003. [PUBMED Abstract]
- Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 (7145): 661-78, 2007. [PUBMED Abstract]
- The International HapMap Consortium: The International HapMap Project. Nature 426 (6968): 789-96, 2003. [PUBMED Abstract]
- Thorisson GA, Smith AV, Krishnan L, et al.: The International HapMap Project Web site. Genome Res 15 (11): 1592-3, 2005. [PUBMED Abstract]
- Evans DM, Cardon LR: Genome-wide association: a promising start to a long race. Trends Genet 22 (7): 350-4, 2006. [PUBMED Abstract]
- Cardon LR: Genetics. Delivering new disease genes. Science 314 (5804): 1403-5, 2006. [PUBMED Abstract]
- Chanock SJ, Manolio T, Boehnke M, et al.: Replicating genotype-phenotype associations. Nature 447 (7145): 655-60, 2007. [PUBMED Abstract]
- Chang CQ, Yesupriya A, Rowell JL, et al.: A systematic review of cancer GWAS and candidate gene meta-analyses reveals limited overlap but similar effect sizes. Eur J Hum Genet 22 (3): 402-8, 2014. [PUBMED Abstract]
- Ioannidis JP, Castaldi P, Evangelou E: A compendium of genome-wide associations for cancer: critical synopsis and reappraisal. J Natl Cancer Inst 102 (12): 846-58, 2010. [PUBMED Abstract]
- Jorgenson E, Witte JS: Genome-wide association studies of cancer. Future Oncol 3 (4): 419-27, 2007. [PUBMED Abstract]