Crunching Numbers: What Cancer Screening Statistics Really Tell Us
Over the past several years, the conversation about cancer screening has started to change within the medical community. Be it breast, prostate, or ovarian cancer, the trend is to recommend less routine screening, not more. These recommendations are based on an emerging—if counterintuitive—understanding that more screening does not necessarily translate into fewer cancer deaths and that some screening may actually do more harm than good.
Much of the confusion surrounding the benefits of screening comes from interpreting the statistics that are often used to describe the results of screening studies. An improvement in survival—how long a person lives after a cancer diagnosis—among people who have undergone a cancer screening test is often taken to imply that the test saves lives.
But survival cannot be used accurately for this purpose because of several sources of bias.
Sources of Bias
Lead-time bias occurs when screening finds a cancer earlier than that cancer would have been diagnosed because of symptoms, but the earlier diagnosis does nothing to change the course of the disease. (See the graphic on the right for further explanation.)
Lead-time bias is inherent in any analysis comparing survival after detection. It makes 5-year survival after screen detection—and, by extension, earlier cancer diagnosis—an inherently inaccurate measure of whether screening saves lives. Unfortunately, the perception of longer life after detection can be very powerful for doctors, noted Dr. Donald Berry, professor of biostatistics at the University of Texas MD Anderson Cancer Center.
"I had a brilliant oncologist say to me, 'Don, you have to understand: 20 years ago, before mammography, I'd see a patient with breast cancer, and 5 years later she was dead. Now, I see breast cancer patients, and 15 years later they're still coming back, they haven't recurred; it's obvious that screening has done wonders,'" he recounted. "And I had to say no—that biases could completely explain the difference between the two [groups of patients]."
Another confounding phenomenon in screening studies is length-biased sampling (or "length bias"). Length bias refers to the fact that screening is more likely to pick up slower-growing, less aggressive cancers, which can exist in the body longer than fast-growing cancers before symptoms develop.
Dr. Berry likens screening to reaching into a bag of potato chips—you're more likely to pick a larger chip because it's easier for your hand to find, he explained. Similarly, with a screening test "you're going to pick up the slower-growing cancers disproportionately, because the preclinical period when they can be detected by screening—the so-called sojourn time—is longer."
The extreme example of length bias is overdiagnosis, where a slow-growing cancer found by screening never would have caused harm or required treatment during a patient's lifetime. Because of overdiagnosis, the number of cancers found at an earlier stage is also an inaccurate measure of whether a screening test can save lives. (See the graphic on the left for further explanation.)
The effects of overdiagnosis are usually not as extreme in real life as in the worst-case scenario shown in the graphic; many cancers detected by screening tests do need to be treated. But some do not. For example, recent studies have estimated that 15 to 25 percent of screen-detected breast cancers and 20 to 70 percent of screen-detected prostate cancers are overdiagnosed.
How to Measure Lives Saved
Because of these biases, the only reliable way to know if a screening test saves lives is through a randomized trial that shows a reduction in cancer deaths in people assigned to screening compared with people assigned to a control (usual care) group. In the NCI-sponsored randomized National Lung Screening Trial (NLST), for example, screening with low-dose spiral CT scans reduced lung cancer deaths by 20 percent relative to chest x-rays in heavy smokers. (Previous studies had shown that screening with chest x-rays does not reduce lung cancer mortality.)
However, improvements in mortality caused by screening often look small—and they are small—because the chance of a person dying from a given cancer is, fortunately, also small. "If the chance of dying from a cancer is small to begin with, there isn't that much risk to reduce. So the effect of even a good screening test has to be small in absolute terms," said Dr. Lisa Schwartz, professor of medicine at the Dartmouth Institute for Health Policy and Clinical Practice and co-director of the Veterans Affairs Outcomes Group in White River Junction, VT.
For example, in the case of NLST, a 20 percent decrease in the relative risk of dying of lung cancer translated to an approximately 0.4 percentage point reduction in lung cancer mortality (from 1.7 percent in the chest x-ray group to 1.3 percent in the CT scan group) after about 6.5 years of follow-up, explained Dr. Barry Kramer, director of NCI's Division of Cancer Prevention.
A study published March 6, 2012, in the Annals of Internal Medicine by Dr. Schwartz and her colleagues showed how these relatively small—but real—reductions in mortality from screening can confuse even experienced doctors when pitted against large—but potentially misleading—improvements in survival.
Tricky Even for Experienced Doctors
To test community physicians' understanding of screening statistics, Dr. Schwartz, Dr. Steven Woloshin (also of Dartmouth and co-director of the Veterans Affairs Outcomes Group), and their collaborators from the Max Planck Institute for Human Development in Germany developed an online questionnaire based on two hypothetical screening tests. They then administered the questionnaire to 412 doctors specializing in family medicine, internal medicine, or general medicine who had been recruited from the Harris Interactive Physician Panel.
The effects of the two hypothetical tests were described to the participants in two different ways: in terms of 5-year survival and in terms of mortality reduction. The participants also received additional information about the tests, such as the number of cancers detected and the proportion of cancer cases detected at an early stage.
The results of the survey showed widespread misunderstanding. Almost as many doctors (76 percent of those surveyed) believed—incorrectly—that an improvement in 5-year survival shows that a test saves lives as believed—correctly—that mortality data provides that evidence (81 percent of those surveyed).
About half of the doctors erroneously thought that simply finding more cases of cancer in a group of people who underwent screening compared with an unscreened group showed that the test saved lives. (In fact, a screening test can only save lives if it advances the time of diagnosis and earlier treatment is more effective than later treatment.) And 68 percent of doctors surveyed said they were even more likely to recommend the test if evidence showed that it detected more cancers at an early stage.
Doctors were three times more likely to say they would recommend the test supported by irrelevant survival data than the test supported by relevant mortality data.
In short, "the majority of primary care physicians did not know which screening statistics provide reliable evidence on whether screening works," Dr. Schwartz and her colleagues wrote. "They were more likely to recommend a screening test supported by irrelevant evidence…than one supported by the relevant evidence: reduction in cancer mortality with screening."
- The U.S. Preventive Services Task Force (USPSTF) revised its breast cancer screening recommendations in December 2009. The Taskforce is in the process of updating the recommendations again.
- In May 2012, the USPSTF updated its prostate cancer screening recommendation.
- The USPSTF updated its cervical cancer screening recommendation in March 2012.
- In December 2013, the USPSTF updated its lung cancer screening recommendation.
Teaching the Testers
"In some ways these results weren't surprising, because I don't think [these statistics] are part of the standard medical school curriculum," said Dr. Schwartz.
"When we were in medical school and in residency, this wasn't part of the training," Dr. Woloshin agreed.
"We should be teaching residents and medical students how to correctly interpret these statistics and how to see through exaggeration," added Dr. Schwartz.
Some schools have begun to do this. The University of North Carolina (UNC) School of Medicine has introduced a course called the Science of Testing, explained Dr. Russell Harris, professor of medicine at UNC. The course includes modules on 5-year survival and mortality outcomes.
The UNC team also recently received a research grant to form a Research Center for Excellence in Clinical Preventive Services from the Agency for Healthcare Research and Quality. "Part of our mandate is to talk not only to medical students but also to community physicians, to help them begin to understand the pros and cons of screening," said Dr. Harris.
Drs. Schwartz and Woloshin also think that better training for reporters, advocates, and anyone who disseminates the results of screening studies is essential. "A lot of people see those [news] stories and messages, so people writing them need to understand," said Dr. Woloshin.
Patients also need to know the right questions to ask their doctors. "Always ask for the right numbers," he recommended. "You see these ads with numbers like '5-year survival changes from 10 percent to 90 percent if you're screened.' But what you always want to ask is: 'What's my chance of dying [from the disease] if I'm screened or if I'm not screened?'"