Design for What I Do, Not for What I Say: The Reality of Participant-Reported Root Causes
Event Type
Oral Presentations
TimeTuesday, April 134:10pm - 4:30pm EDT
LocationMedical and Drug Delivery Devices
“How did you decide to do that?” Interviewing participants for their perspectives on the root cause of a use event that they have committed is a fundamental component of Human Factors (HF) validation testing and HF FDA submissions. As HF practitioners, we delicately extract these participant-reported root causes through carefully-worded questions in an effort to reduce bias and increase accuracy. However, as demonstrated over a century of scientific research on human cognition, all self-report data must be interpreted very carefully due to humans being easily-influenced and having fragile memory systems. We theorize that participants' self-reported perspectives on why a use error occurred may often be confabulations or “best guesses”, as opposed to true justifications for their actions. If this is the case, then reliance on these self-reports may ultimately detract from medical device safety by providing false data and distracting from true root causes.

In this presentation we use novel experimental data, real-world use data from government databases, and literature from the social sciences to support our thesis and address its ramifications. We begin by contextualizing our view in the existing literature on the inaccuracy of self-reported, subjective data, then demonstrate through an analysis of the MAUDE database that current HF regulations (including the 2016 guidance requesting self-reported root causes for HF FDA submissions) do not appear to have substantially reduced fatal medical device use errors, even for devices that are well-known for high use error potential, such as infusion and insulin pumps. Next, we present the results of a novel experiment designed to tease apart whether participant-reported root causes are accurate reports of their memories related to a specific use event they encountered, or instead just reasonable justifications based on logic and schemas. Finally, we propose two methodological alternatives to current HF validation testing - one that we feel is a “least burdensome approach” and one that is more rigorous, but offers the benefit of accurately incorporating probability into risk management (as is standard for non-use-related risks) to help reduce user and patient harm.

Novel Experimental Data:
To investigate the accuracy of participant-reported root causes, we conducted a mock HF validation study with a medical device in two parts: Group A was tested according to current best practices in HF validation testing (i.e., with traditional root cause probing), while Group B was simply told about the use errors experienced by Group A participants and asked why those use errors might have occurred. If participant-reported root causes represent real memories or justifications for actions, Group A’s data should be highly variable from person to person, specific to the context at hand, and not easily intuited by others (after all, the reason we conduct root cause probing is purportedly to gain information that we do not already have). On the other hand, if participant-reported root causes are simply educated guesses about why one might have experienced the use error, as we suggest, people who have not experienced the use error like the participants in Group B should give responses similar to those in Group A.

In an initial pilot study (N = 4) of a health app, Group B provided 78% of the justifications that Group A reported during root cause probing, without having experienced the use events themselves. Data collection is underway for a larger-sample study of a blood pressure monitor. Preliminary data from N = 18 participants are consistent with the pilot study and with our hypothesis: nine Group B participants reported approximately 90% of the knowledge and performance task root causes provided by nine Group A participants (as well as some additional potential root causes). In short, the high degree of overlap suggests that the responses given by participants in both groups were driven by the schemas and logic available to all participants. Moreover, this overlap indicates a high degree of functional equivalence for the two groups, suggesting that little-to-no new information regarding a root cause is revealed by asking the person who actually experienced the use event.

A Path Forward:
What are the implications of the issues we raise with root cause probing? Should we scrap it altogether - or is there still valuable information to be learned from this subjective feedback?

Option A: The Least Burdensome Approach
While we are arguing that participant-reported root causes do not always represent “true” memories or justifications for an event, we hold that appeals to educated guesses based on heuristics, common sense, and design principles are are absolutely useful when identifying harmful design deficiencies - but that there is no reason to believe that this same information would not be obtained from a detailed heuristic review. Moreover, we still recognize participant feedback in formative testing as a key part of the design process - just not as the central data for a final HF validation test.

Option B: The Most Rigorous Approach
In an ideal scenario, the most stringent solution is a quantitative approach involving observing performance on critical tasks from a statistically valid sample size of potential users, calculating error rates on each task, and comparing them to predetermined acceptability levels. The larger sample size would allow both severity and probability to be used to determine these acceptable error rates, similar to the risk management process for non-use-related risks (e.g., biocompatibility, electricity, moving parts; see ISO 14971). This method eliminates issues with participant-reported root causes by shifting the focus to objective performance-based data, and allows manufacturers to statistically predict the safety and efficacy of a device before it is released.

We discuss both proposed methods, including potential pitfalls of each and practical implications. These proposals are not presented to denounce current HF regulations, but rather to start a dialogue within the HF community around how we can strive to continuously improve our methods and promote safe and effective devices.