Note: Most Internet Explorer 8 users encounter issues playing the presentation videos. Please update your browser or use a different one if available.

Intonation Differences of Children with ASD or SLI

Friday, 3 May 2013: 14:00-18:00
Banquet Hall (Kursaal Centre)
G. Kiss, J. van Santen, E. T. Prud'hommeaux and L. M. Black, Center for Spoken Language Understanding, Oregon Health & Science University, Beaverton, OR

Prosody is often atypical in Autism Spectrum Disorder (ASD), but few studies have characterized this atypicality quantitatively. Studies examining intonation (i.e. pitch variation) generally only analyze overall statistical properties of pitch values in a speech sample, such as the mean and variance; these are commonly higher in speech of children with ASD than in those with typical development (TD). However, no studies have investigated how these statistical properties relate to the shapes of pitch contours of individual utterances – the “melodies of speech” – that may be key to how we perceive intonation of individuals with ASD.


The purpose of this study is to analyze atypical prosody of children with ASD by inferring the shapes of pitch contours from overall statistical properties of pitch.


The data consisted of transcripts of ADOS recordings of 111 children, ages 4-8. Participants included children with typical development (TD); ASD meeting the criteria for language impairment (ALI); ASD without language impairment (ALN); and specific language impairment (SLI). An iterative algorithm created four pairs of groups matched on specific measures: TD (25) vs. ASD (23), matched on chronological age (CA) and nonverbal IQ (NVIQ); ALN (18) vs. TD (19), matched on CA, NVIQ, and verbal IQ (VIQ); ALI (18) vs. SLI (17), matched on CA, NVIQ and VIQ; and ALI (18) vs. ALN (20), matched on CA, the Social Communication Questionnaire, and the ADOS Severity Score.

We used the Simplified Linear Alignment Model (SLAM) of intonation to parameterize contour shape, using ten parameters: phrase start, middle, and end, as well as different levels of accents.

We created 2000 sets of SLAM parameters, randomly generating SLAM parameters from realistic ranges mimicking different speaking styles. We used the CSLU speech synthesizer to synthesize the pitch curves for 1000 sentences chosen randomly from the transcriptions of the children’s speech, for each of the 2000 parameter sets, giving 2 million pitch curves. For each parameter set, we calculated robust statistical features of the curves, such as median, inter-quartile range, etc. from all pitch values, and statistics of the per-utterance statistics. We then trained machine learning models (linear regression, support vector regression) to relate these features to the SLAM parameters, and validated the effectiveness of the models in a ten-fold cross-validation scheme.

In the second step, we calculated the same features for the ADOS recordings, and used the previously trained machine learning models to estimate the SLAM parameters for each child. Finally, we examined whether there are group differences in the SLAM parameters.


We found that the phrase start and phrase middle parameters of the TD group were significantly lower than in the other three groups, whereas the ALN, ALI, and SLI groups did not differ from each other significantly. None of the other phrase and accent curve parameters differed significantly.


We conclude, first, that overall statistics can be used to draw inferences about individual pitch contours. Second, that these groups have different pitch contour shapes. Third, that these features may not be specific to ASD.

| More