19545
Toward Improved Clinically Useful Automated Vocal Assessments for the Prediction of ASD

Thursday, May 14, 2015: 12:00 PM
Grand Ballroom A (Grand America Hotel)
D. K. Oller1, P. J. Yoder2, D. Xu3, J. A. Richards4, J. Gilkerson3 and S. S. Gray5, (1)Konrad Lorenz Institute for Evolution and Cognition Research, Klosterneuburg, Austria, (2)Special Education, Vanderbilt University, Nashville, TN, (3)Department of Speech, Language and Hearing Sciences, University of Colorado, Boulder, CO, (4)LENA Research Foundation, Boulder, CO, (5)Mobility Core Research, Nuance Communications, Dracut, MA
Background:

 

Oller et al. (2010) , Xu et al. (2014), and Richards et al. (2008) have provided three alternatives for automated analysis of child vocalizations measured in day-long home recordings. The three methods vary by theoretical orientation, method of derivation, and number of vocal characteristics considered, but all three used the massive quantities of data available from the LENA Research Foundation and have been shown to predict age and to differentiate groups that differ on presence and type of disability (including ASD). Clinical utility of the measures would be enhanced by testing which of the three measures is most strongly associated with variables more directly relevant to language level.

Objectives:

 

We test which of the three methods of automated vocal analysis is most strongly associated with communication and language in three groups of preschoolers who differ on presence and type of disability.

Methods:

The dataset is based on multiple day-long recordings from 106 typically developing (TD) children, 77 children diagnosed with ASD, and 49 children diagnosed with language delay (LD) but not ASD (a total of 1486 day-long recordings). Children’s ages ranged from 20-48 months. Communication and language measures were collected via parent report at approximately the same time as the day-long recordings. The communication measure, the LENA Developmental Snapshot (LDS) developmental age, correlates highly with age and other measures of early cognitive, social, and language development (Gilkerson, et al., 2008). Ninety-five percent  of parents also completed the Child Development Inventory (CDI; Ireton & Glascoe, 1995). We used the developmental age scores from expressive language (CDI-Exp) and language comprehension (CDI-Comp) subscales as our language measures. The three models of vocal analysis are: (a) the linear combination of 12 vocal characteristics (here called parameters), mostly based on infraphonological theory (12P; Oller, et al., 2010), (b) the linear combination of 4 variables derived from the Sphynx automated speech recognition system (SASRS; Xu, et al., 2014), and (c) a single variable derived from SASRS with mathematical adjustments based on the relation of the earlier scores with age and language measures (AVA developmental age score; Richards et al., 2008).

Preliminary Results:

 

All three models are significantly associated with age and the communication/language variables. The 12P model is more strongly associated with the communication and language variables than with age. In the TD group, the 12P and the AVA models are most strongly associated with the CDI language variables, adjusted R2 values ≥ 0.74. In the LD group, 12P and AVA yield lower, but still excellent adjusted, R2 values > .5. In the ASD group, the 12P model is most strongly associated with the CDI-Exp and CDI–Comp, adjusted R2 = 0.61 and 0.54, respectively.

Conclusions:

 

Combined models using 12P and SASRS have as yet not been thoroughly evaluated, but may yield a much improved score that can be made available publicly for clinical work. The potential for improved clinically relevant automated tools of vocal analysis is clear. A new generation of vocal assessment with no human intervention seems just around the corner.