Automation bias in medical decision making: A study of unreliable computer advice in breast cancer screening

Eugenio Alberdi, Andrey Povyakalo, Lorenzo Strigini,
Centre for Software Reliability, City University, London, UK

Peter Ayton
Psychology Department, City University, London, UK

Abstract:

PURPOSE: To investigate the effects of unreliable computer advice on the decisions of clinicians. The computerised tool under investigation is Computer Aided Detection (CAD) for mammography. CAD marks (prompts) on mammograms features which are potential indicators of cancer to reduce the likelihood of oversights by mammogram readers. The study focused on a particular type of unreliable computer output: CAD's failure to detect cancers, either by failing to prompt them or by prompting them incorrectly.

METHOD: 39 experienced mammogram readers (radiologists, radiographers and breast clinicians) were asked to examine the mammograms of 60 patients and to decide whether each patient should be recalled for further investigation. In the experimental condition, 20 of the participants examined the mammograms with the aid of CAD; in the control condition, the remaining 19 participants saw the same cases but without the aid of CAD. The mammograms of 30 of the patients contained signs of cancer and the other 30 were “normal cases” (no signs of cancer). The test set was designed to contain a large proportion (50%) of cancers for which CAD had generated incorrect output.

RESULTS: The average proportion of cancers (52%) recalled by the participants in the experimental condition (using CAD) was significantly lower than the average proportion of cancers (68%) recalled by the control group (ANOVA p<0.001). The difference was most marked for cancers for which CAD failed to provide correct prompting (46% vs.88%; ANOVA p<0.000001).

CONCLUSIONS: Our results strongly suggest that the readers using CAD were biased by the incorrect computer output. The effects are similar to those reported in the human factors literature as “automation bias”, namely, people’s failure to take appropriate action because an automated tool fails to flag an event. While these effects are typically reported in studies conducted with students working in laboratory settings, our participants were experts working in a relatively realistic setting relevant to their area of expertise. In our study, the participants using CAD appeared to use absence of computer prompts on ambiguous cases as support for 'non-cancer' decisions. Arguably, this is a rational strategy for decisions about normal cases as unprompted cases are nearly always normal. However it can damage decisions for unprompted difficult-to-detect cancers, with potentially serious consequences for patients.

Proc. 27th Annual Meeting of the Society for Medical Decision Making (SMDM05), San Francisco, October, 2005.