Cumulative Incidence of False-Positive Test Results in Lung Cancer Screening

A Randomized Trial    Annals Int Med April 20, 2010 vol. 152 no. 8 505-512

Background: Direct-to-consumer promotion of lung cancer screening has increased, especially low-dose computed tomography (CT). However, screening exposes healthy persons to potential harms, and cumulative false-positive rates for low-dose CT have never been formally reported.  Objective: To quantify the cumulative risk that a person who participated in a 1- or 2-year lung cancer screening examination would receive at least 1 false-positive result, as well as rates of unnecessary diagnostic procedures.

Design: Randomized, controlled trial of low-dose CT versus chest radiography.Setting: Feasibility study for the ongoing National Lung Screening Trial. Patients: Current or former smokers, aged 55 to 74 years, with a smoking history of 30 pack-years or more and no history of lung cancer (n = 3190).

Intervention: Random assignment to low-dose CT or chest radiography with baseline and 1 repeated annual screening; 1-year follow-up after the final screening. Randomization was centralized and stratified by age, sex, and study center. Measurements: False-positive screenings, defined as a positive screening with a completed negative work-up or 12 months or more of follow-up with no lung cancer diagnosis.

Results: By using a Kaplan–Meier analysis, a person's cumulative probability of 1 or more false-positive low-dose CT examinations was 21% after 1 screening and 33% after 2. The rates for chest radiography were 9% and 15%, respectively. A total of 7% of participants with a false-positive low-dose CT examination and 4% with a false-positive chest radiography had a resulting invasive procedure. Limitations: Screening was limited to 2 rounds. Follow-up after the second screening was limited to 12 months. The false-negative rate is probably an underestimate.

Conclusion: Risks for false-positive results on lung cancer screening tests are substantial after only 2 annual examinations, particularly for low-dose CT. Further study of resulting economic, psychosocial, and physical burdens of these methods is warranted.

  • The ongoing National Lung Screening Trial aims to define the effectiveness of screening for lung cancer. However, imaging studies to screen for lung cancer are currently marketed to patients.

Contribution

  • These data from a pilot study for the National Lung Screening Trial show a 33% cumulative incidence of false-positive results after 2 computed tomography examinations and 15% after 2 chest radiography examinations. Substantial proportions of patients (7% for computed tomography and 4% for chest radiography) with false-positive results required invasive testing to determine that the screening-detected lesion was not cancer.

Implication

  • Physicians and patients should bear in mind high false-positive rates when considering screening for lung cancer with computed tomography or chest radiography.

—The Editors

Despite the lack of a completed randomized, controlled trial demonstrating the efficacy of low-dose computed tomography (CT) in reducing mortality from lung cancer, its use as a screening tool is gaining increased attention in the past several years. Some hospitals and advocacy organizations have actively promoted CT screening to the public. A 2007 New York Times article quotes the director of surgical oncology at Greenwich Hospital, Greenwich, Connecticut, as predicting that “within the next five years, lung cancer screening will be routine, like mammography and colonoscopy”. One advocacy group mounted a national “Demand a CAT Scan” billboard campaign in 2008. The Lung Cancer Mortality Reduction Act of 2009 Senate bill states that “significant and rapid improvements in lung cancer mortality can be expected through greater use and access to lung cancer screening tests”.

Utilization rates of chest radiography or CT screening in the community setting have not been well studied. The Dutch–Belgian randomized lung cancer (NELSON) screening trial reported that 3.1% of participants received a screening chest radiography or CT examination outside the trial by 24 months after randomization (this may not be representative of U.S. rates). Surveys of U.S. community physicians have demonstrated high enthusiasm for screening. One study found that two thirds of family practitioners, internists, and gynecologists and 82% of general surgeons recommended chest radiography for lung cancer screening every 1 to 2 years.

However, as is the case with all medical interventions, screening tests may generate both benefits and harms. If lung cancer screening becomes national health policy, we must have solid evidence not only about the benefits but also the harms of testing. This is particularly important because asymptomatic persons are the target population. Major harms associated with screening include the risk for overdiagnosis—the discovery of indolent lung cancer that would not lead to a person's death or cancer in a person who would die of a competing cause first—and the risk for false-positive results. False-positive results are important because they may have negative psychological effects, affect future adherence to other preventive health measures and generate physical harms and economic costs from surveillance visits and confirmatory procedures.

Although the Lung Screening Study has reported the total positivity rate, we have not previously examined cumulative false-positivity rates by using formal statistical methods, and the false-positive component most accurately represents a clinically important burden of screening. We focus on the probability of false-positive test results and resulting diagnostic procedures when chest radiography and CT are used as early detection strategies for lung cancer.

Design

The Lung Screening Study was a 2-year study conducted by 6 centers participating in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. It was a feasibility study for the ongoing National Lung Screening Trial

Setting and Participants

The Lung Screening Study had a goal of randomly assigning 3000 participants at elevated risk for lung cancer. Enrollment was achieved through mass mailings of recruitment materials, along with public service announcements, posters, and physician recruitment efforts. A total of 3318 persons were randomly assigned from September 2000 to January 2001 to have chest radiography or low-dose CT. All participants signed a consent form approved by the institutional review board before randomization.

Eligible participants were aged 55 to 74 years, had a cigarette smoking history of 30 pack-years or more, and were current smokers or had quit in the past 10 years. Exclusion criteria included chest CT within 24 months of enrollment, previous lung cancer, removal of part or all of a lung, current treatment of any cancer except nonmelanoma skin cancer, and ongoing participation in a cancer prevention or screening trial other than a smoking cessation study.

Randomization and Interventions

Once eligibility was established and consent was obtained by a study center, participants were randomly assigned to a treatment group through a single centralized, secure, Web-based system (which generated random code) operated by the trial coordinating center. This process ensured allocation concealment for study site investigators. Randomization was stratified by age group (in 5-year categories), sex, and study center by using variable block sizes. Once randomization occurred, participants and study investigators were not blinded to the screening method received.

Two screening examinations were possible: baseline (T0) and repeated examination (T1) 1 year later. Participants were eligible for the second screening if they did not receive a diagnosis of lung cancer after the first examination. For inclusion in this analysis of false-positive results, participants had to adhere to at least 1 screening.

Low-dose CT scans were obtained with the following technical parameters: 120 to 140 kV peak, 60 ma, scan time of 1 s, 5-mm collimation, pitch of 2 or equivalent, and contiguous reconstructions. Chest radiography consisted of single posteroanterior views and was obtained by using high-kilovolt equipment at a tube-to-receiver distance of 6 to 10 feet.

Each study center had 1 or more (range, 1 to 14) board-certified radiologists interpreting the examinations. A second radiologist blinded to the initial interpretation as a quality-control measure reread a small sample of films (n = 20) at each center.

Outcomes and Follow-up

For CT, the definition of a positive screening result changed slightly between the T0 and T1 scans to better match accumulating prognostic evidence: Noncalcified nodules larger than 3 mm at T0 scan or 4 mm or larger at T1 scan were considered suspicious for cancer. Other abnormalities (including spiculated noncalcified nodules of any size; focal parenchymal opacification; endobronchial lesions; hilar, mediastinal, bony, or pleural masses; and major atelectasis) could also be deemed positive according to the radiologist's judgment. For chest radiography, nodules with circular opacity of 3.0 cm or less in diameter, masses greater than 3.0 cm, hilar or mediastinal lymph node enlargement (excepting calcified nodes), major atelectasis, infiltrates or consolidation, and pleural masses were considered suggestive of cancer.

We defined a false-positive screening result as a positive screening with a completed negative work-up or follow-up of at least 12 months with no diagnosis of lung cancer. Because performing biopsy of all screening-detected lung abnormalities was impractical and undesirable (because of the potential for harm), we had to choose a definition of a false-positive test result that relied on diagnostic work-ups for suspicious examinations. For persons who did not receive definitive testing, given that lung cancer as a general rule is one of the more aggressive tumors, we felt that a 12-month monitoring period was a reasonable cutoff for a false-positive result.

We defined a false-negative examination result as a negative screening associated with a diagnosis of lung cancer within 1 year of the examination. This definition is limited in its ability to discern between types of cancer truly missed by screening and aggressive interval tumors that may develop between tests. Furthermore, because the Lung Screening Study was a feasibility trial, follow-up of negative examination findings was not done in the same systematic manner as for positive test results. Persons in whom screening was negative at T1 did not continue to have follow-up in the trial; reported false-negative rates are limited to the period between the T0 and T1 examinations.

All positive results were communicated by telephone and mailed to participants and their designated physician within 3 weeks. The Lung Screening Study did not specify a diagnostic algorithm for follow-up of positive results; centers would provide recommendations for diagnostic action if requested. Center personnel abstracted medical records relating to follow-up of positive screening results. This process began after a positive screening result and continued until a conclusive diagnosis was made or 12 months had passed. In addition, study participants completed a study update form at the T1 screening to identify any interval cases of lung cancer.

Classifications for diagnostic follow-up were divided into categories by author consensus: imaging examinations (noninvasive), minimally invasive procedures (bronchoscopy), moderately invasive procedures (for example, biopsy, thoracentesis, video-assisted thoracoscopy), and major surgical procedures (thoracotomy or lung resections).

Statistical Analysis

Our study attempts to answer the question, “What is the probability that a person entering a lung cancer screening program involving 1 or 2 screening tests will have at least 1 false-positive CT or chest radiography?” A person could contribute to the cumulative risk curve only once, after a first false-positive test result; this avoided double-counting of suspicious nodules and artificial inflation of the curve. Kaplan–Meier analysis generated cumulative incidence curves based on estimated probability of a first false-positive result at baseline or second screening received. For the base-case analysis, we considered persons with incomplete follow-up (<12 months) after a positive screening to have received a false-positive result. As a sensitivity analysis, we assumed that a proportion of persons with insufficient follow-up after a positive examination result did have cancer, in which the proportion was estimated as published positive predictive values (from other trials) of CT (7%) and chest radiography (2%) for screening detection of lung cancer

Logistic regression was done through 2 models to identify potential participant characteristics associated with increased odds of a false-positive examination result after the first screening or the second screening (if the first screening was negative). Variables included age, current versus former smoking status, and smoking history of 60 pack-years or more versus 30 to 59 pack-years. We adjusted logistic regression analyses for screening center. We calculated odds ratios (ORs) separately for CT and chest radiography. We also examined variance in the false-positive rate by study center.

Results

Demographic Characteristics

Participants were more likely to be men (59%) and current smokers (57%). As expected, randomization achieved balance in baseline patient characteristics.

Adherence Rates

A total of 1610 participants underwent at least 1 CT, and 1580 participants underwent at least 1 chest radiography. A total of 1374 participants in the CT group (97%) and 1287 participants in the chest radiography group (95.2%) received both examinations. Adherence was lower in both groups for the second screening than for the baseline test.

False-Positive and False-Negative Results

A total of 31% (n = 506) of participants in the CT group and 14% (n = 216) of participants in the chest radiography group received at least 1 false-positive result. In comparison, screening CT was true-positive in 38 instances (2% of participants) and chest radiography was true-positive in 16 instances (1% of participants).

At baseline screening, the risk for a false-positive result is 21% (95% CI, 19% to 23%) for CT and 9% (CI, 8% to 11%) for chest radiography. These risks increase to 33% (CI, 31% to 35%) and 15% (CI, 13% to 16%), respectively, at second examination. Sensitivity analysis, which assumed that 7% of participants in the CT group and 2% of participants in the chest radiography group with insufficient follow-up after a positive screening had true-positive results, yielded identical rates. Four examinations with false-negative results were reported between the baseline and T1 examinations. All were in the chest radiography group (0.2% of participants in this group).

Diagnostic Follow-up and Invasive Procedures

Of persons with at least 1 false-positive CT or chest radiography, 61% (n = 308) and 51% (n = 110), respectively, received 1 or more secondary imaging tests. The overall percentage of participants who had at least 1 invasive procedure as a result of a false-positive result was 7% for CT and 4% for chest radiography. Bronchoscopies were the most common invasive procedure resulting from false-positive CT (5% of participants with a false-positive result). The proportion of participants who had a moderately invasive procedure was 4% for false-positive CT and 3% for false-positive chest radiography. Rates of participants who had major surgical procedures for benign disease were similar (2%), although absolute numbers differed by a factor of 2 (8 CT recipients and 4 chest radiography recipients).

Participant Characteristics and False-Positive Risk

We did multivariable analyses to investigate potential participant characteristics associated with increased odds of a false-positive result after first or second screening received (in which the first screening was negative). Our models included age (65 to 74 years vs. 55 to 64 years), smoking status (current vs. former), smoking history (≥60 pack-years vs. 30 to 59 pack-years), and study center.

First Screening Received

For CT, older versus younger age demonstrated a non–statistically significant trend toward increased odds of a false-positive screening result (OR, 1.28 [CI, 0.98 to 1.67]). This effect was not seen in the chest radiography group (OR, 1.11 [CI, 0.77 to 1.60]). Current versus former smoking status did not show an association with false-positive results (OR, 1.15 [CI, 0.88 to 1.49] for CT and 1.31 [CI, 0.92 to 1.88] for chest radiography). Greater number of pack-years smoked was not associated with increased odds of a false-positive result (OR, 0.92 [CI, 0.71 to 1.19] for CT and 1.02 [CI, 0.72 to 1.45] for chest radiography).

The estimated probability of a false-positive result on the first screening received varied by center. For the combined categories of age (65 to 74 years), current smoker, and smoking history of 60 pack-years or more, the estimate for CT varied from 10% to 42% across centers. For chest radiography, the estimated false-positive rate for the combined categories of age (65 to 74 years), current smoker, and smoking history of 60 pack-years or more varied from 3% to 19% across centers.

Second Screening Received

Older versus younger age was not associated with odds of false-positive CT (OR, 1.20 [CI, 0.83 to 1.74]) but was associated with twice the odds of false-positive chest radiography (OR, 2.03 [CI, 1.23 to 3.36]). Current versus former smoking status was not associated with odds of false-positive CT (OR, 1.07 [CI, 0.75 to 1.52]) or chest radiography (OR, 0.85 [CI, 0.51 to 1.40]). More pack-years of smoking was associated with 1.5 times increased odds of a false-positive result for CT (OR, 1.53 [CI, 1.08 to 2.18]), but this was not observed in the chest radiography group (OR, 1.12 [CI, 0.68 to 1.85]).

Discussion

To our knowledge, our study is the first to formally evaluate a person's cumulative risk for receiving at least 1 false-positive test result in a program of low-dose CT screening for lung cancer over several years. The probability of a false-positive result is substantial after 2 annual examinations (33%). Of participants with a false-positive CT scan, 7% had an unnecessary invasive procedure and 2% had major surgery for benign disease.

Cumulative false-positive rates associated with screening tests have been infrequently reported; most studies of this nature have focused on mammography. An estimate of the cumulative false-positive rate for chest radiography in lung cancer screening as part of a larger evaluation of false-positive rates in screening programs that used multiple testing methods was about 14% after 2 screenings and 22% after 4 screenings, which aligns with our findings. Previous studies have generally reported discovery rates of all noncalcified nodules in persons who had CT screening. These rates have varied widely, depending on screening frequency and other factors; the average range is 25% to 50% of participants.

Clinical experience suggests that, in general, the larger the nodule, the greater the suspicion that the lesion is or will become cancerous. Despite this rule of thumb, no uniform consensus on how best to categorize and manage lesions detected on CT exists. Several studies of CT for lung cancer screening used protocols that called for diagnostic follow-up of varying intensity for all detected nodules, although most lesions detected by low-dose CT are smaller than 4 mm. Fleischner Society guidelines for management of incidental nodules detected on nonscreening CT suggest that lesions smaller than 4 mm pose minimal risk, and as such, generally recommend no further follow-up for low-risk patients and a single repeated screening at 12 months for high-risk patients with no intervention if the lesion is unchanged. The definition of a positive test result used in the Lung Screening Study (≥3 mm or 4 mm, depending on year) attempted to rule out lesions that were least likely to be indicative of cancer; the study design sought to minimize the number of false-positive results. The relatively high cumulative risk for a false-positive result after 2 examinations may represent a conservative estimate of rates in community practice, in which all nodules, regardless of size, may be more likely to receive follow-up.

Multislice scanners were emerging at the time of the Lung Screening Study, and some, but perhaps not all, of the study sites had 4-row scanners in use. It is not known what the effect of single versus multirow detectors would be on false-positive rates, nor the effect of additional (for example, 16 or 64) slices.

Previous studies have reported on surgery rates for benign disease; the proportion of persons with a screening-detected nonmalignant lesion who had surgery has ranged from about 0.6% to 2.7% for CT. This is consistent with our findings.

More than half of participants with false-positive chest radiography or CT had at least 1 additional imaging examination—some at higher radiation doses than that of the original test—which exposed these persons to a theoretical risk for radiation-induced carcinogenesis. This is potentially concerning in the target population because current evidence indicates that radiation and smoking damage interact synergistically, and the interaction is near multiplicative. Consensus has not been reached on a single diagnostic algorithm for positive screening examinations; however, repeated scans (of varying dose) at 1, 3, 6, 12, and/or 24 months have been advocated. Although no long-term longitudinal data on the effects of cumulative radiation exposure are available, estimates of cancer risks have been done by using relevant data from cohort studies of survivors of long-term atomic bomb exposure. One estimate found that among female smokers aged 60 years, a series of 3 low-dose chest CTs would generate an excess lung cancer risk for about 1.5 women per 1000 exposed. The National Lung Screening Trial is collecting information on cumulative radiation exposure among participants and should help clarify this important issue.

Negative psychological consequences of false-positive screening are also of concern. Although data for lung cancer screening are limited, 1 study examining the effect of CT on quality of life found that about half of participants had “discomfort and dread” while waiting for confirmation of screening results. Because positive lung cancer screening results may encompass diagnostic uncertainty for up to 24 months, further investigation of the long-term psychological effects of false-positive test results would be important.

A high rate of false-positive test results may place economic burdens on persons and the health care system. One study that investigated medical care expenditures triggered by false-positive test results found that the adjusted mean difference in spending after such a test was an additional $1024 for a woman and $1171 for a man in 1 year. Because 31% of participants in the CT group received at least 1 false-positive result in 2 years, the effect these rates could have on the cost-effectiveness of CT (if proven effective) is apparent.

Our study has several limitations. As with all cancer screening trials, a “healthy volunteer effect” is likely to be present. This effect has been documented in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial, which includes study centers that participated in the Lung Screening Study.

The Lung Screening Study was a pilot study and, as such, was limited to 2 rounds of screening and 1 year of follow-up after the last examination. Although it is possible that this study might overestimate the cumulative risk for a false-positive result, it is unlikely that a large proportion of false-positive results would convert to diagnoses of cancer. A small proportion (0.04% for each method) of Lung Screening Study participants was lost to follow-up after positive examination findings; this study's baseline assumption was that those findings were false positive. Because this could potentially overestimate the cumulative false-positive risk, a sensitivity analysis was done; however, the cumulative risk estimates remained unchanged.

Because the Lung Screening Study was a feasibility trial, negative screening results were not systematically followed in the same manner as positive screening results. Recording of false-negative results was limited to the period between the T0 and T1 examinations, and the estimated false-negative rate is crude and probably an underestimate.

The analysis of associations between patient characteristics and false-positive risk should be considered exploratory. The Lung Screening Study had a relatively modest sample size; subgroup populations were small; and, as in any multivariable analysis, multiplicity potentially inflated the probability of a type I error. The variation in estimated false-positive rates by center is probably due to the effects of interobserver variability in the assessment of a positive examination among centers. Studies of interobserver agreement on interpretation of chest CT scans have demonstrated notable variations among readers. Semiautomated volumetric determinations of nodule size or computer-aided detection programs have been proposed as methods that could theoretically reduce observer variability, although the ultimate degree of effect this might have on false-positive rates is unknown.

Given the relatively high probability of a false-positive low-dose CT lung cancer screening examination, it is important that providers have careful discussions with patients who request this technology to help them weigh known harms against currently theoretical benefits. Further investigation into the physical, psychological, and economic ramifications of false-positive low-dose CT screening test results is warranted.