19766
The “True” Interrater Reliability of the ADOS in Clinical Settings – a Never Ending Story?
The Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1999; 2012) is a gold standard diagnostic instrument in both research and clinical practice of ASD. Excellent interrater reliability has been demonstrated for the ADOS under optimal conditions, i.e. with a group of highly trained “research reliable” examiners in a research setting (Lord et a., 1999). However, many inquire about the ADOS’ spontaneous interrater reliability among clinically trained ADOS users across multiple sites in everyday clinical routine.
Objectives:
To study the interrater reliability of the ADOS among clinically trained examiners with variable experience in a naturalistic multicenter approach across Sweden.
Methods:
Five raters (at least 4 blinded for diagnosis) per ADOS independently rated 10 video administrations per module (1-4). The filmed administrations represented clear cases with autistic disorder (13/40) as well as milder cases with Asperger’s disorder (6/40), PDD-NOS (9/40) and no ASD (12/40). A total of 15 raters from 9 sites across Sweden participated. The raters were clinical psychologists and pediatricians with varying training (all basic clinical training) and experience of the ADOS. Fleiss’ Kappa for multiple raters (©µ), IntraClass correlation coefficients (ICC) and exact agreement in percentage (the median of ten pairwise comparisons) were calculated.
Results:
In module 1, agreement was good to excellent for items (ICC = .52– .91) as well as the domain totals (ICC = .60 - .94). In module 2 the agreement ranged from poor to excellent for items (ICC = .31 - .91), but was excellent for domain totals (ICC = .75 - .87). The agreement was lower in module 3, ranging from poor to excellent (ICC = .22 - .89) for items and from fair to excellent (ICC = .58 - .84) for domain totals. Analyses of module 4 showed similar numbers with a poor to excellent agreement on item ratings (ICC = .22 - .97) but excellent agreement on the domain scores (ICC = .81 - .92). For diagnostic classification the (ASD vs non ASD), modules 1 and 2 showed a fair agreement (exact agreement = 80%, ©µ = .50 and exact agreement= 70%, ©µ = .56), module three a borderline agreement (exact agreement = 70%, ©µ = .23) and module 4 a fair to good agreement (exact agreement = 80%, ©µ= .59).
Conclusions:
The interrater reliability of the ADOS on an item, domain total, and diagnostic classification level was overall fair to excellent even among basically ADOS trained clinicians across multiple sites, although lower than those reported for highly trained ADOS users. The results endorse the clinical usage of the ADOS, but also underline the need and value of adequate and continuous training on the instrument.
See more of: Diagnostic, Behavioral & Intellectual Assessment