International Meeting for Autism Research: Face Benefit In Auditory-Only Speech and Speaker Recognition In Asperger Syndrome and High-Functioning Autism

Face Benefit In Auditory-Only Speech and Speaker Recognition In Asperger Syndrome and High-Functioning Autism

Saturday, May 14, 2011
Elizabeth Ballroom E-F and Lirenta Foyer Level 2 (Manchester Grand Hyatt)
10:00 AM
S. Schelinski, P. Riedel and K. von Kriegstein, Max Planck Research Group Neural Mechanisms of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
Background: Successful human social interaction is based on the fast and accurate online perception of communication signals. It is traditionally assumed that in auditory-only conditions, e.g. when talking on the phone, this online perception relies solely on the auditory sensory system (auditory-only model) (e.g. Hickok & Poeppel, 2007). In contrast, recent research suggests that the brain exploits previously encoded audio-visual correlations to improve behavioural performance in auditory-only perceptual tasks (auditory-visual model) (von Kriegstein et al., 2008). For example, observing a specific person talking for only about 2 min improves auditory-only speaker and speech recognition for this person. This effect is called the face benefit. The improvement is based on face-specific visual areas, which are instrumental for auditory recognition even if no visual input is available. These findings challenge auditory-only models, because they imply that ecologically valid auditory-only input is processed using audiovisual processing strategies. Here we test the predictions of the auditory-visual model in a group of individuals with Asperger syndrome or high-functioning autism (AS/HFA). These conditions are associated with impaired face processing. The auditory-visual model would therefore predict that in AS/HFA speaker-specific facial information is not available to improve auditory-only recognition.

Objectives: The aim of the present study is to investigate whether individuals with AS/HFA use speaker specific audiovisual information to improve auditory-only speech and speaker recognition.

Methods: We trained individuals with AS/HFA (n=14) and typically developed controls (n=14, age, gender, and IQ matched) to identify 6 speakers by name and voice. Three of the speakers were learned by a video showing their talking face (voice-face learning). The other 3 speakers were learned with an occupation symbol (voice-occupation learning). During auditory-only testing, sentences spoken by the same 6 speakers were presented. Participants decided whether a visually presented name matched the voice (speaker task) or if a visually presented word appeared within the sentence (speech task). Additionally, a lip reading and a face recognition experiment were performed.

Results: The AS/HFA group did not benefit from voice-face learning in contrast to voice-occupation learning, while the control group did (learning x group interaction, F(1,26) = 10.41, p = .003; face benefit speaker/controls, t(13) = 2.96, p = .011; face benefit speech/controls, t(13) = 2.47, p = .028). Individuals with AS/HFA performed worse after voice-face learning compared to voice-occupation learning in the speech task (t(13) = -2.88, p = .013). This was paralleled by worse lip reading performance in AS/HFA in comparison to the controls (t(26) = 2.38, p = .025). Face identity recognition was within the normal range.

Conclusions: The findings indicate that in AS/HFA speaker specific dynamic visual information is not available to optimize auditory-only speech recognition as predicted by the auditory-visual model. As facial speech processing is a key requirement for robust human speech processing, less successful communication in AS/HFA might be linked to deficiencies in the facial speech network.

Reference List: Hickok & Poeppel, 2007, NatRevNeurosci, 8; von Kriegstein et al., 2008, PNAS, 105

| More