Introduction
When designing a product, especially one as critical as a medical device or a user interface for an essential software application, ensuring its usability is paramount. Human factors (HF) studies play a crucial role in this process, helping to identify potential issues that real users might face. However, a fundamental question often arises: how many participants do we need to involve in these studies to obtain reliable and actionable insights? Determining the appropriate sample size is a balance of statistical rigor, resource constraints, and the specific nuances of the task at hand. In this blog post, we will delve into the science and art of sample size determination in human factors studies, exploring key scientific publications when referencing standard practice approach, best practice when selecting sample size for marketing claims studies, and practical guidelines to help you make informed decisions for your usability testing efforts.
Scientific publications that shaped the understanding of required sample size in Human Factors Studies (Formatives, summative, comparatives (ANDA)).
From early device design to Formative/Verification and Summative/Validation Human Factors studies/Comparative ANDA studies, when it comes in determining sample size for Human Factors activities every Human Factors professional is accustomed to these scientific sources:
- “Determining Sample Sizes for Usability Tests” by Jakob Nielsen and Thomas K. Landauer (1993).
- “Why You Only Need to Test with 5 Users” by Jakob Nielsen (2000).
- “How Many Subjects Is Enough?” by Virzi (1992).
- “The Number of Test Users Required for Usability Studies” by Lewis (1994).
- “Guidelines for Conducting Human Factors Engineering Usability Tests” by FDA (2016).
- “Human Factors Engineering – Design of Medical Devices” by AAMI/ANSI HE75:2009
- Abbreviated New Drug Application (ANDA) “Tango” method [Statist. Med. 17, pp. 891-908 (1998)].
These are the mothers and fathers of all that is to reference when determining sample size for Human Factors studies. Neither the FDA in USA, the MHRA UK nor any other notifying body in the EU is going to deny the statistical competence of the literature.
Of course, there are exceptions, and I have encountered a lot of them in my practice, where these numbers (5 to 8 for Formative HF studies, 15 to 20 Summative HF studies, 80 to 100 comparative (ANDA) studies) needed to be adjusted, based on the following principles:
Complexity of the Task: More complex tasks may require a larger sample size to uncover all usability issues.
Variability of Users: If the user population is highly diverse, a larger sample size may be necessary to capture this variability.
Risk Level: For high-risk devices or systems, larger sample sizes are often recommended to ensure safety and effectiveness.
Statistical Power: Consider the desired statistical power and confidence level for detecting differences or usability issues.
A good example is a Summative Human Factors study that, I had the opportunity to be involved in. We worked on a needle shield protection device for subcutaneous and intramuscular injections. Such device is used all over the world by doctors, nurses and pharmacists and, in such numbers, which makes deciding the right number of representative users extremely hard. Now in this case, the standard approach might just not convince the regulatory personnel, as the 15 participants per user group sample size could be seen as not fully representative of the real-life scenario. In this case a simple fix was to increase the sample size up to 25 participants per user group, which satisfied both the FDA and the study team. However, when my team and I conducted a contextual enquiry study to determine the involvement of each user group, we discovered that doctors would use the device the least. In the end, following our research, we have decreased the sample size of the doctor’s user group to 10, which resulted in a substantial cost reduction for the sponsor.
Selecting sample size in marketing claims studies
It is quite popular amongst medical device manufacturers to piggyback a marketing claim studies, when conducting a Formative or Summative Human Factors study. It is resource friendly and can bring valuable insights when trying to establish a product on a market.
This is exactly what the study sponsor wanted when we were asked to perform the Summative Human Factors study for the needle shield protection device. To tackle this, we created a questionnaire focusing on subjective comparative data with the help of a ‘Likert Scale’. To ensure statistical significance while generating subjective data, we implemented a methodology to calculate a sufficient sample size defined in:
“A Note on Determination of Sample Size for a Likert Scale” Jinwoo Park, Misook Junga, Communications of the Korean Statistical Society, 2009, Vol. 16, No. 4, 669–673, white paper.
Unfortunately, without the help of a statistician, it was quite complicated to extract the right information for the scientific paper. However, with some guidance we cracked the method and effectively implemented it! Last but not least, the calculated sample size did not exceed the one that was chosen for the Summative Human Factors study, ensuring that the data came with no extra resources needed.
Conclusions
To calculate sample size for your Human Factors study correctly is vital and sometimes can be very tricky. The standard approach in selecting sample size works very well, but there are times when it just might not be enough. But we are very fortunate as there is plenty of information out there that could assist each project perfectly. Accurate sample size calculation is a foundational element in the design of HF studies, impacting the validity, reliability, and applicability of the findings. Researchers must carefully consider the factors influencing sample size and use robust methods to estimate it, to ensure that the study is sufficiently powered to address its research objectives. Balancing ethical considerations, resource constraints, and scientific rigor is key for conducting effective and meaningful HF research. Medical Device manufacturers and relevant stake holders must be very mindful when selecting the appropriate sample size for their Human Factors studies, as wrong choices could lead to delays in submissions, unnecessary extra costs or a re-testing of the human factors study.