Audiovisual Integration of Speech and Gesture in Adolescents with ASD

Saturday, May 16, 2015: 3:04 PM
Grand Ballroom C (Grand America Hotel)
L. B. Silverman1, N. Gebhard1, A. R. Canfield2, J. T. Foley1 and A. P. Salverda3, (1)University of Rochester Medical Center, Rochester, NY, (2)University of Connecticut, Storrs, CT, (3)University of Rochester, Rochester, NY
Background:   The integration of information across sensory modalities is essential for successful social and communicative functioning in everyday life. Impaired sensory integration abilities in ASD could contribute to the core social and communicative impairments that typify the disorder.

Objectives:   The purpose of this study was to characterize the nature of sensory integration abilities in ASD, using a set of stimuli that systematically varied the relationship between speech and co-expressive gesture.  We had two primary aims.  The first aim was to test whether participants integrated speech and complementary gestures, when gesture was not necessary for understanding the speaker’s message (i.e., speech and gesture provided similar semantic information).  The second aim was to test sensory integration of speech and supplementary gesture, when integration was necessary for understanding (i.e., speech alone did not provide adequate information).

Methods:   Participants were 26 adolescents with high-functioning ASD and 25 typically developing controls matched on age, gender, and VIQ.  In Experiment 1, speech and gesture integration was assessed through quantitative analyses of eye fixations during a video-based task. Participants watched a man describing one of four shapes (target, speech competitor, gesture competitor, distractor) shown on a computer screen.  Half of the videos depicted natural speech-and-gesture combinations, while the other half depicted speech-only descriptions.  Participants clicked on the shape that the speaker described. Since gesture typically precedes speech, we hypothesized that TD controls would visually fixate the target shape earlier on speech-and-gesture trials compared to speech-only trials, indicating immediate integration of audiovisual information. We hypothesized that the ASD group would not show this effect. In Experiment 2, participants watched a man describing one of four pictures, this time including a homophone pair (e.g., baseball-bat or animal-bat). The man’s co-expressive gestures provided semantic information required for reference resolution.  Participants clicked on the picture that they thought he described.  Item accuracy scores were analyzed to determine whether participants appropriately integrated speech and gesture.

Results:   Data from Experiment 1 were analyzed using mixed logit models, which predicted the probability that the participant fixated the speech competitor at any point from 200 to 800 ms after the onset of the target word.  The model had fixed effects for group (ASD versus control) and condition (speech-only versus speech-and-gesture), and random intercepts and slopes for participants and items.  The interaction between group and condition was significant (β=.65, z=2.1, p<.05).  For controls, the proportion of trials with a fixation to the speech competitor was significantly reduced in the speech-and-gesture condition (.39) compared to the speech-only condition (.56), suggesting audio-visual integration (β=-.82, z=-3.6, p<.0005).  For the ASD group, the effect of condition was nonsignificant (speech-and-gesture: .42 versus speech-only: .40; z<1).  For Experiment 2, accuracy scores were analyzed using a mixed logit model, which indicated no group differences.

Conclusions: These findings suggest that individuals with ASD fail to integrate gesture and speech when it is beneficial, but not necessary for understanding (Experiment 1).  However, once speech becomes ambiguous, as with homophones, individuals with ASD can utilize semantic information embodied in gesture to improve their comprehension (Experiment 2).