Online Remote Verbal IQ Testing for Large-Scale Autism Studies and the Issue of Cheating

Friday, May 13, 2016: 5:30 PM-7:00 PM
Hall A (Baltimore Convention Center)
A. Zoltowski1, C. C. Clements2,3, L. Bateman1, N. Stein4 and R. T. Schultz5,6, (1)The Center for Autism Research, Children’s Hospital of Philadelphia, Philadelphia, PA, (2)Center for Autism Research, Children’s Hospital of Philadelphia, Philadelphia, PA, (3)Psychology, University of Pennsylvania, Philadelphia, PA, (4)Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA, (5)The Center for Autism Research, The Children’s Hospital of Philadelphia, Philadelphia, PA, (6)Departments of Pediatrics and Psychiatry, University of Pennsylvania, Philadelphia, PA
Background: Online phenotyping for autism research has many attractive properties, especially when the desired sample size is large and in-person testing burdensome.  For skill testing, computer adaptive testing (CAT) based on item response theory can be an accurate and efficient alternative. We have developed an online Verbal IQ test (CARAT-V; Clements et al., 2015) that can be conveniently taken remotely with the precision of a standard IQ test. However, remote testing without study personnel supervision leaves open the possibility that test-takers will ignore directions to do the test without seeking help from others or the internet.

Objectives: To develop and test an algorithm for detecting item responses suggestive of cheating during an online test.

Methods: We collected CARAT-V data from N=3383 anonymous participants, excluding N=151 who sped through the test at such a rate that effort was compromised (response faster than median on >80% of items; accuracy <30%). The current analyses focused on developing a detection algorithm using self-reported cheating as the gold standard. In the final analyses to be reported at IMFAR, we will refine the algorithm using the more reliable gold standard of participants recruited for a randomized trial who receive instructions that cheating is permitted or not. For a subset of participants aged 8-10 years, the data noted whether the participant left the “full-screen test mode” during a question (allowing internet searching), and this feature will be extended to adult samples in upcoming trials.

Results: Incidence of self-reported cheating differed significantly between adults, adolescents, and children (χ2 = 49.74, p < .001). Self-reported cheating among adults was so infrequent (Table 1) that further investigation is needed. For adolescents, an algorithm was created which i) divided items into quintiles by response time and ii) flagged participants whose accuracy on items in the slowest quintile was more than 1 standard error greater than for all other quintiles combined, correctly capturing 45% of adolescents who self-reported cheating. For children, leaving full-screen mode during the item was a compelling indication of potential cheating, since item accuracy was 7.3% higher when the participant did so instead of staying on the testing screen (t = 3.85, p< .001). Self-reported cheating was not associated with total score for any age group, as baseline abilities and the frequency of this behavior may vary.

Conclusions: Our analyses suggest that tracking when participants exit full screen mode and our response time algorithm are promising tools to help researchers determine if and when an online result is suspect for child and adolescent participants. A limitation of our approach, however, is that our current gold standard leads to an all or none detection strategy, and we would ideally eliminate select data as suspect from each test taker. We will address these problems and further refine our algorithms in upcoming trials. The ability to screen for cheating is a prerequisite to confidently deploying a test remotely to study samples, thus allowing wider adoption and larger scale data collection, which is especially attractive for genetic studies.