International Meeting for Autism Research: A Study on Audio Patterns of Natural Environment for Children with Autism

A Study on Audio Patterns of Natural Environment for Children with Autism

Friday, May 13, 2011
Elizabeth Ballroom E-F and Lirenta Foyer Level 2 (Manchester Grand Hyatt)
2:00 PM
D. Xu1, J. Gilkerson2 and J. A. Richards3, (1)LENA Foundation, Boulder, CO, (2)LENA Foundation, Boulder, (3)Research, LENA Foundation, Boulder, CO
Background: Previous work showed the effectiveness of daylong audio recordings for child vocalization analysis and environment monitoring. The scheme includes a digital recorder worn by a child to collect his/her sound and other sounds in natural environment. Speech processing and pattern recognition technologies are used to automatically detect different sound segments in a recording, including key-child, adult, noise, overlapped-sounds and others, producing a sequence of segment labels. This framework provides rich information about child, environment and how they interact with each other. It can be applied to childhood autism risk estimation, home activity and treatment monitoring. Previous study (presented in 2009 IMFAR) showed the good discriminating capability of child vocalization features for children with autism, children with language delay and children of typical development.

Objectives: In addition to child vocalization, this study investigates other features for autism risk estimation by looking into audio-based environmental and interactive information. It was found that parents and children with autism tend to have more “near-distance-talk” than “far-distance-talk” as it may even more difficult for them to interact from far distance because of the impairment in social interaction and communication. Since loudness of sound is correlated with distance, the dB-level statistics of environment sounds in daylong recordings are investigated as one characterization for environmental and interactive information.

Methods: The segments of adult male, female, noise and overlapped-sound are considered interactive type when they are next to a child segment. The average dB-level and the peak dB-level of these segments are calculated. Since segments can further be categorized into “clear” or “faint”, there are eventually 24 parameters for each recording. These parameters are analyzed by linear discriminant analysis (LDA) and Ada-Boosting analysis (an approximation to logistic regression). These parameters are further combined with child vocalization parameters to exam the joint effect for autism discrimination and risk estimation.

Results: Data set contains 802 recordings for 106 typical children, 333 recordings for 49 children with language delay, 228 recordings for 71 children with autism. No recording is under 9 hours and No recording contains therapy. Equal sensitivity and specificity (ESS) is used as performance measure. With the 24 dB-level parameters, the LDA gives 79.6% ESS for recordings and 85.8% ESS for children, and the Ada-Boosting gives 80.3% ESS for recordings and 83.2% ESS for children. With the 113 parameters of child vocalization reported before, the LDA gives 86.4% ESS for recordings and 90.1% ESS for children, and the Ada-Boosting gives 88.2% ESS for recordings and 89.1% ESS for children. For the combined parameters, the LDA gives 88.9% ESS for recordings and 91.6% ESS for children, and the Ada-Boosting gives 90.6% ESS for recordings and 93.0% ESS for children. All results are obtained with Leave-one-out cross-validation.

Conclusions: This study shows that daylong audio recordings contain rich information for childhood autism risk estimation. The dB-level statistics of sounds surrounding child vocalizations can characterize certain interactive aspects between child and environment, which can significantly improve the autism risk estimation performance, achieving 93% ESS. More detailed investigation is necessary in the future.

| More