It is widely acknowledged that people on the ASD spectrum behave atypically in the way they modulate aspects of speech and voice, including pitch, fluency, and voice quality. ASD speech has been described at times as “odd”, “mechanical”, or “monotone”. However, it has proven difficult to quantify and explain this oddness of speech pattern.
Objectives:
In this project, we quantify how the speech patterns of people with Asperger’s Syndrome (AS) differ from that of matched controls. To do so, we employed both traditional measures (pitch range and standard deviation, pause duration, and so on) and 2) non-linear techniques measuring the structure (regularity and complexity) of verbal, prosodic and fluency behaviour. Our aims were (1) to achieve a more fine-grained understanding of the speech patterns in AS than has previously been achieved using traditional, linear measures of prosody and fluency, and (2) to employ the results in a supervised machine-learning process (discriminant function) to classify speech production as either belonging to the control or the AS group, based solely on acoustic features.
Methods:
Our analysis was based on previously-acquired narratives of the Frith-Happé triangles by 10 people with AS and 10 matched controls, with 8 narratives per subject. Transcripts were time-coded, and pitch (F0), as well as speech-pause sequences were extracted. Each of these time-series was then used to calculate mean, standard deviation, Shannon entropy, noise distribution (Hurst exponent) and recurrence quantification measures. The results were employed to train a supervised machine-learning algorithm to classify the descriptions as belonging either to the AS or the control group, using a repeated leave-one-out method of cross-validation.
Results:
Overall, we found that the AS group tended to have shorter utterances (p<0.05, t(18)=2.70, effect size=0.29) and longer pauses (p<0.05, t(18)=-2.24, effect size=0.22). The AS group was also characterized by a higher microscale regularity of prosody and fluency: higher Shannon entropy (fluency: p<0.001, t(18)=6.80, effect size=0.72; prosody: p<0.001, t(18)=9.74, effect size=0.85), stronger correlation of neighbouring datapoints in prosody (fluency Hurst exponent: p<0.01, t(18)=3.11, effect size=0.43) and higher repetition of short sequences of pitch (Recurrence Rate: p<0.001, t(18)=9.62, effect size=0.83), and speech/pause patterns (Recurrence Rate: p<0.01, t(18)=-3.11, effect size=0.35). Finally, transcripts in the AS group showed higher regularity, e.g. with stereotyped ways of starting the narratives (p<0.001, t(18)=12.61, effect size=0.90). These features enable us to automatically classify speech production as either belonging to the control or the AS group with 78.9% accuracy.
Conclusions:
The study points toward the usefulness of non-linear time series analyses techniques in picking out the subtle differences that characterize the unusual voice characteristics of people with AS. It also points to interesting parallelisms between AS patients’ speech patterns and repetitiveness in their motor behaviour. Future work includes further validation of the approach, as well as more detailed investigation of the relation between speech patterns and other symptoms.
See more of: Core Deficits
See more of: Symptoms, Diagnosis & Phenotype