Objectives: The objectives of this work are the following: (1) to establish that children with ASD use unexpected or inappropriate words in their narratives; (2) to investigate methods of identifying such words automatically and objectively using existing language analysis technology; and (3) to determine whether these automated methods are an adequate substitute for manual analysis for distinguishing children with ASD from their TD peers.
Methods: Participants in this study included 37 children with TD and 21 children with ASD, who were diagnosed via clinical consensus according to the DSM-IV criteria and the established threshold scores on the ADOS and the SCQ. There were no significant between-group differences in age (mean=6.4) or full-scale IQ (mean=114). The Narrative Memory subtest of the NEPSY, in which a child hears a brief story and must retell the story to the examiner, was administered to each child. Each retelling was recorded and then transcribed. Two annotators, blind to diagnosis, identified every word in each retelling transcript that was unexpected or inappropriate given the context of the story. In order to identify such words automatically, we then calculated for each word the tf-idf score, which is based on a comparison between the frequency of that word in the child's retelling and the frequency of that word in a large corpus of retellings collected from neurotypical adults. A high tf-idf score indicates that a word is very unlikely or unexpected in that particular context.
Results: First, we found that children with ASD do, in fact, produce significantly more manually identified unexpected and inappropriate words in their narrative retellings (p < 0.05). Second, the set of words selected as unexpected using the automatic tf-idfscore corresponds very well with the set of words manually identified as unexpected (precision=84%, recall=53%). Finally, using the set of automatically identified words, we found again that children with ASD produce significantly more unexpected words than children with TD (p < 0.05).
Conclusions: These results demonstrate that identifying specific instances of atypical language at the lexical level can reveal patterns of language use that are characteristic of ASD. The word likelihood score presented here captures these patterns accurately enough to allow for automated analysis of language that can serve as a proxy for manual indentification of unexpected words. This work underscores the potential of automated techniques for improving our understanding of the linguistic features associated with ASD.
See more of: Core Deficits
See more of: Symptoms, Diagnosis & Phenotype