21737
Acoustic Predictors of Trained and Naive Rater Impressions of Speech Qualities in ASD
Objectives: Our aim was to combine human perception with acoustic analyses of a single set of speech productions, to reveal which acoustic cues seem to guide listener perceptions of atypicality.
Methods: Participants included 15 adolescents with high-functioning ASD and 15 with typical development (TD). Groups did not differ in age, full-scale IQ (scores>80), or gender, p’s>.50; standardized language (CELF) scores were all in the typical range, but with a trend for higher scores in the TD group. We elicited a set of eight spoken sentences, all with similar sentence structures and vocabulary items, by giving participants a card with the printed sentence, asking them to learn it by heart, and then to speak it aloud using their normal voice. Stimuli were recorded for subsequent analyses using Praat. Acoustic analyses to date included measures of minimum, maximum, and median pitch; pitch range, excusion, and standard deviation (SD); and mean pitch divided by SD.
In addition, the speech samples were presented to 15 undergraduates and 10 clinicians with ASD expertise, all of whom were naïve to study hypotheses. Raters were asked to determine whether a given sample sounded “atypical or unusual” (in the undergraduate group) or “ASD-like” (in the clinician group), versus "typical," on a 1-3 scale (with “2” anchored as “somewhat unusual”).
Results: Both groups of raters gave significantly higher “atypicality” scores to the ASD group (p’s<.01, with CELF score as a covariate). Grouping the slightly (2) or very (3) atypical ratings into a single category, sensitivity for naïve raters was .80, specificity of .73; for expert clinicians, sensitivity was .86, specificity of .86. Interestingly, again holding CELF scores constant, there were no mean group differences in acoustic variables. Thus, both naïve and expert raters were highly accurate in sorting the groups on the basis of speech samples, although acoustic analyses failed to distinguish between groups. In a regression analysis of the acoustic variables that predicted atypicality ratings, the best predictors were not IQ, age, nor symptom severity, but rather, slow speech rate and high median pitch.
Conclusions: Both trained and naive listeners appear to be able to detect ASD based only on speech. The trained clinicians showed significant sensitivity and specificity, misclassifying 4/30 participants (2 per group). Naïve undergraduates were slightly less accurate, misclassifying 7/30. Speech rate and median pitch appeared to drive these perceptions. Future work should compare findings from children with other speech and language delays or developmental concerns. These findings suggest that speech rate and fundamental frequency may provide useful targets for acoustic analysis and for intervention.