International Meeting for Autism Research: Automatic Detection of Idiosyncratic Word Use in Autism Spectrum Disorders

Thursday, May 20, 2010

Franklin Hall B Level 4 (Philadelphia Marriott Downtown)

11:00 AM

E. T. Prud'hommeaux , Center for Spoken Language Understanding, Oregon Health & Science University, Beaverton, OR

J. van Santen , Center for Spoken Language Understanding, Oregon Health & Science University, Beaverton, OR

L. M. Black , Center for Spoken Language Understanding, Oregon Health & Science University, Beaverton, OR

B. Roark , Center for Spoken Language Understanding, Oregon Health & Science University, Beaverton, OR

Background: Stereotyped and idiosyncratic use of words and phrases is one of the coded behaviors used in the ADOS algorithm for autism diagnosis. Characteristic of this behavior is the use of neologisms and of “inappropriately formal” or pedantic words. Existing methods for assessing these language features during ADOS administration typically rely on overall impressions and observations of apparent trends in a child's speech. Natural language processing (NLP) techniques can be used to quantify these features and automate their detection.

Objectives: The goals of this study are to apply principles of NLP in order to 1) identify neologisms and patterns of overly formal word use; 2) compare the relative prevalence of these features in spontaneous language samples of children with Autism Spectrum Disorders (ASD) and Typical Development (TD); and 3) determine whether these features can be used to distinguish the two groups.

Methods: The ADOS was administered to children ages 4 to 8 with TD and with ASD. (Module 3 was administered to the majority of subjects; module 2 was used only for those subjects whose expressive language age equivalency was less than 4.0.) The two groups were roughly matched in terms of various measures of utterance complexity and acceptability as measured by standard NLP methods. The entire ADOS for each child was recorded and digitized for analysis. The subject utterances from the following ADOS activities were transcribed from these audio recordings: Make-Believe Play, Joint Interactive Play, Description of a Picture, Telling a Story From a Book, and Conversation and Reporting.

Relative frequencies of occurrence of single words and word-sequences were generated from two corpora: 1) the Wall Street Journal training corpus of the Penn Treebank, and 2) the Child Language Data Exchange System (CHILDES) database of child speech. For each child, we determined the relative frequencies of each word in the two respective corpora. Words whose relative frequency is zero (i.e., those that do not occur in a given corpus, known as out-of-vocabulary words, or OOVs) are likely to be neologisms.

Results: The average relative frequencies of the words, based on either corpus, were not significantly different in the two groups. However, neologism use, as measured by OOV rate, was significantly higher in the ASD group than in the TD group, using both the Wall Street Journal corpus and the CHILDES corpus. Very low-frequency words from the Wall Street Journal corpus were also used significantly more often in ASD speech. This trend was not observed using the CHILDES corpus, which suggests that ASD speech is characterized not only by neologisms but also by the use of very infrequent formal words.

Conclusions: Neologistic and formal word use, which are both characteristic of ASD speech, can be identified automatically using natural language processing techniques. Incorporating automated analysis of speech could enhance the coding of these behaviors and reveal word distribution properties that might go unrecognized during examination.

See more of: Language
See more of: Autism Symptoms