Automated Prediction of a Child's Response to Name from Audio and Video

Thursday, May 15, 2014
Atrium Ballroom (Marriott Marquis Atlanta)
J. Bidwell1, A. Rozga1, J. C. Kim2, H. Rao2, M. A. Clements2, I. Essa1 and G. D. Abowd1, (1)School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA, (2)School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA
Background: Evidence has shown that a child’s failure to respond to name is an early warning sign for autism and is measured as a part of standard assessments e.g. ADOS [1,2].

Objectives: Build a fully automated system for measuring a child’s response to his or her name being called given video and recorded audio during a social interaction. Here our initial goal is to enable this measurement in a naturalistic setting with the long term goal of eventually obtaining finer gain behavior measurements such as child response time latency between a name call and a response.

Methods: We recorded 40 social interactions between an examiner and children (ages 15-24 months). 6 of our 40 child participants showed signs of developmental delay based on standardized parent report measures (M-CHAT, CSBS-ITC, CBCL language development survey). The child sat at a table with a toy to play with. The examiner wore a lapel microphone and called the child’s name up to 3 times while standing to the right and slightly behind the child. These interactions were recorded with two cameras that we used in conjunction with the examiner’s audio for predicting when the child responded. Name calls were measured by 1) detecting when an examiner called the child’s name and 2) evaluating whether the child turned to make eye contact with the examiner. Examiner name calls were detected using a speech detection algorithm. Meanwhile the child’s head turns were tracked using a pair of cameras which consisted of overhead Kinect color and depth camera and a front facing color camera. These speech and head turn measurements were used to train a binary classifier for automatically predicting if and when a child responds to his or her name being called. The result is a system for predicting the child’s response to his or her name being called automatically recorded audio and video of the session.

Results: The system was evaluated against human coding of the child’s response to name from video. If the automated prediction fell within +/- 1 second of the human coded response then we recorded a match. Across our 40 sessions we had 56 name calls, 35 responses and 5 children that did not respond to name. Our software correctly predicted children’s response to name with a precision of 90%, recall of 85%.

Conclusions: This work presents an automated system for predicting child response to name with the aim of benefiting existing clinical and research communities. In the future this could could be extended to a support wide range of use cases such as gathering data on children’s response to name in naturalistic settings (e.g. pediatric waiting rooms, day care centers and the home) as well as providing psychologists with a second source of information during clinical assessments.


1. G. Dawson et al, Children with autism fail to orient to naturally-occurring social stimuli. J. Autism Dev. Disord., 28 (1998)
2. Nadig et al. A Prospective Study of Response to Name in Infants at Risk for Autism. Arch Pediatr Adolesc Med. 2007