Online Remote Verbal IQ Testing for Large-Scale Autism Studies and the Issue of Cheating
Objectives: To develop and test an algorithm for detecting item responses suggestive of cheating during an online test.
Methods: We collected CARAT-V data from N=3383 anonymous participants, excluding N=151 who sped through the test at such a rate that effort was compromised (response faster than median on >80% of items; accuracy <30%). The current analyses focused on developing a detection algorithm using self-reported cheating as the gold standard. In the final analyses to be reported at IMFAR, we will refine the algorithm using the more reliable gold standard of participants recruited for a randomized trial who receive instructions that cheating is permitted or not. For a subset of participants aged 8-10 years, the data noted whether the participant left the “full-screen test mode” during a question (allowing internet searching), and this feature will be extended to adult samples in upcoming trials.
Results: Incidence of self-reported cheating differed significantly between adults, adolescents, and children (χ2 = 49.74, p < .001). Self-reported cheating among adults was so infrequent (Table 1) that further investigation is needed. For adolescents, an algorithm was created which i) divided items into quintiles by response time and ii) flagged participants whose accuracy on items in the slowest quintile was more than 1 standard error greater than for all other quintiles combined, correctly capturing 45% of adolescents who self-reported cheating. For children, leaving full-screen mode during the item was a compelling indication of potential cheating, since item accuracy was 7.3% higher when the participant did so instead of staying on the testing screen (t = 3.85, p< .001). Self-reported cheating was not associated with total score for any age group, as baseline abilities and the frequency of this behavior may vary.
Conclusions: Our analyses suggest that tracking when participants exit full screen mode and our response time algorithm are promising tools to help researchers determine if and when an online result is suspect for child and adolescent participants. A limitation of our approach, however, is that our current gold standard leads to an all or none detection strategy, and we would ideally eliminate select data as suspect from each test taker. We will address these problems and further refine our algorithms in upcoming trials. The ability to screen for cheating is a prerequisite to confidently deploying a test remotely to study samples, thus allowing wider adoption and larger scale data collection, which is especially attractive for genetic studies.
See more of: Diagnostic, Behavioral & Intellectual Assessment