Studying the repeatability and reproducibility of decisions made during forensic examinations is important in order to better understand variation in decisions and establish confidence in procedures. For disciplines that rely on comparisons made by trained examiners such as for latent prints, handwriting, and cartridge cases, it has been recommended that ‘black-box’ studies be used to estimate the reliability and validity of decisions. In a typical black-box study, examiners are asked to judge samples of evidence as they would in practice, and their decisions are recorded; the ground truth about samples is known by the study designers. The design for such studies includes repeated assessments on forensic samples by different examiners and additionally, it is common for a subset of examiners to provide repeated assessments on the same evidence samples. We demonstrate a statistical approach to analyse the data collected across these repeated trials that offers the following advantages: i) we can make joint inference about repeatability and reproducibility while utilizing both the intra-examiner and inter-examiner data, ii) we can account for examiner–sample interactions that may impact the decision-making process. We demonstrate the approach first for continuous outcomes such as where decisions are made on an ordinal scale with many categories. The approach is next applied to binary decisions and results are presented on the data from two black-box studies.
Bibliographical notePublisher Copyright:
VC The Authors (2023). Published by Oxford University Press.