Development and Validation of a Harmonized Scale of Autism Symptom Severity Across ADOS Modules 1 and 2: A Bsrc Study

Saturday, May 14, 2016: 10:30 AM
Room 308 (Baltimore Convention Center)
A. Gross1, L. Kalb2, G. S. Young3, E. Stuart2, R. Landa4, T. Charman5, K. Chawarska6, T. Hutman7, D. S. Messinger8, S. Ozonoff9, W. L. Stone10, H. Tager-Flusberg11 and L. Zwaigenbaum12, (1)Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, (2)Johns Hopkins School of Public Health, Baltimore, MD, (3)Psychiatry & Behavioral Sciences, UC Davis MIND Institute, Sacramento, CA, (4)The Kennedy Krieger Institute, Baltimore, MD, (5)Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom, (6)Yale Child Study Center, Yale University School of Medicine, New Haven, CT, (7)University of California Los Angeles, Los Angeles, CA, (8)Psychology, University of Miami, Coral Gables, FL, (9)UC Davis MIND Institute, Sacramento, CA, (10)Department of Psychology, University of Washington, Seattle, WA, (11)Boston University, Boston, MA, (12)University of Alberta, Edmonton, AB, Canada
Background: The Autism Diagnostic Observation Schedule-Generic (ADOS-G) is a gold standard diagnostic assessment for autism spectrum disorders (ASD). The ADOS-G overall calibrated severity score (CSS), which provides a 10-point common metric across all ADOS modules, has several measurement limitations, including ceiling/floor effects, limited scoring range, collapsing of symptom domains, and questionable measurement invariance across modules. Failure of the overall CSS to capture heterogeneity in children with ASD led to research that culminated in defining separate CSS scores for social affect (SA-CSS) and restricted and repetitive behavior (RRB-CSS) domains. However, there is substantial overlap in distributions of domain calibrated scores of children with and without ASD. An alternate metric of ASD severity with expanded range may be more sensitive to differences in ASD behavioral phenotype and to changes in ASD severity over time.

Objectives: The goal of this study was to derive an alternative to the overall CSS that could be used across ADOS-G modules 1 and 2 for use in longitudinal research. Despite the introduction of the ADOS-2, many sites and consortia have substantial longitudinal ADOS-G data.  We used item response theory (IRT) to provide a finer, more precise score on a metric common across modules 1 and 2.

Methods: Using data from the Autism Speaks Baby Sibs Research Consortium (N=2191 with 6237 observations, age range 24 - 48 months) with a broad range of autism severity (N=755 low-risk siblings, N=1436 high-risk), we applied graded response IRT models to equate the 21 items of the ADOS that were highly consistent in the nature of the behavior being scored and scoring criteria across ADOS modules 1 and 2. Data analyses were conducted on participants having 36-month diagnostic confirmation assessments.  Measurement invariance by module, reliability, and convergent as well as criterion validity of the IRT-based autism severity score was assessed.

Results: A total of 21 items, representing items across each of the ADOS domains (Social, Communication, and Restricted/Repetitive Behavior), were used to develop the harmonized IRT ASD severity score (mean 50, SD 10, range 21.9 to115.6, 2716 distinct values). This IRT-derived severity score was highly correlated (r=0.86, p<.001) with the classic 10-point ADOS severity score, and was highly internally consistent (Cronbach’s α=0.88). The IRT-derived score demonstrated strong criterion validity against clinical best estimates of ASD (sensitivity=0.84, specificity=0.82, AUC=0.93), which was marginally superior to the CSS (AUC=0.90). Whereas 46% (2023/4396) of CSS scores were at the ceiling or floor, 0.2% of IRT-derived scores (15/6237) fell at the ceiling or floor (see Figure). The IRT-derived score could be scored in 30% of records (1841/6237) that the CSS could not be scored due to item missingness.

Conclusions: The IRT-derived ASD severity measure demonstrated strong psychometric characteristics, compared to the classic 10-point severity score, in terms of reduced ceiling and floor effects, finer differentiation particularly in the non-ASD range of severity, and facilitates a linkage across ADOS modules.  This measure may provide important advantages when characterizing trajectories and treatment responses during early, critical stages of development.