International Meeting for Autism Research: Automated Analysis of Natural Language Samples: Comparison of Children with Autism Spectrum Disorders, Developmental Language Disorders, and Typical Development

Automated Analysis of Natural Language Samples: Comparison of Children with Autism Spectrum Disorders, Developmental Language Disorders, and Typical Development

Friday, May 13, 2011
Elizabeth Ballroom E-F and Lirenta Foyer Level 2 (Manchester Grand Hyatt)
1:00 PM
R. W. Sproat, L. M. Black, E. T. Prud'hommeaux, J. van Santen and B. Roark, Center for Spoken Language Understanding, Oregon Health & Science University, Beaverton, OR
Background: There are areas of overlap between autism spectrum disorders (ASD) and developmental language disorders (DLD) that pose challenges for differential diagnosis. One such area is language impairment.  There have been varying reports on the types of language impairments in ASD, their severity, and their incidence. These studies generally use structured, decontextualized instruments; yet, Language Sampling and Analysis (LSA) methods may provide information that critically complements structured instruments. Since the paucity of LSA-based studies is likely due to the labor-intensiveness of LSA, automated methods are urgently needed.

Objectives: First, to demonstrate the feasibility of automating the analysis of natural language samples, focusing on the IPSyn (Scarborough, 1991).  Second, to apply these methods to document morphology and syntax in high functioning verbal children with ASD, children with DLD, and typically developing children (TD).

Methods: Children, 4-8, were given a comprehensive battery of language and neurocognitive measures. Classification into the ASD group (N=36) utilized the revised algorithm of the ADOS, the Social Communication Questionnaire, and DSM-IV-TR-based clinical consensus diagnosis. Classification into the DLD group (N=20) utilized Tomblin’s Epi-SLI criteria or a CELF index score at -1 SD plus a spontaneous language measure at -1 SD, and DSM-IV-TR-based diagnosis.  Stringent exclusion criteria applied to all groups; moreover, children with neurodevelopmental disorders, neuropsychiatric disorders, and a sibling with ASD or DLD were excluded from the TD group.  The ASD group was divided into an “ASD+DLD” group that also met DLD criteria (N=25) and an “ASD-DLD” group that did not meet DLD criteria (N=11).  Groups were well-matched on age, but only the DLD and ASD+DLD groups were matched on VIQ and PIQ.

ADOS recordings were transcribed manually into (un-coded) text.

The automated system, directly applied to such transcripts, comprised: (1) text normalization tools for clean-up and normalization of transcriptions; (2) morphological analyzer; (3) syntactic parser, which also produces parts of speech; (4) scoring module that maps language analysis output to IPSyn scores. Top-level IPSyn measures computed were: Noun phrases, Questions and Negations, Sentence Complexity, Verb Phrases, and Total (sum of these four measures), as well as MLU (in words per utterance).

The system was compared with manual coding by a team of trained linguists on a subset of the data.

Results: (1) The automated system differed as little from the manual coders as they differed from each other. (2) The TD group scored higher on IPSyn Total and MLU than the ASD+DLD group and also higher on MLU than the DLD group.  (3) The DLD group scored higher than the ASD+DLD group on IPSyn Total but not on MLU.   (4) ASD-DLD and DLD group scores were similar, despite substantial VIQ and PIQ differences.  (5) The ASD+DLD and ASD-DLD groups tended to have different IPSyn score patterns.

Conclusions: The data show that the method is as accurate as human coding.  Surprisingly, the methods showed that both ASD groups performed more poorly on IPSyn measures than one would expect based on their IQ characteristics.  These results argue for the importance of developing additional automated LSA methods.

| More