International Meeting for Autism Research: Salient Feature Extraction From Video Stimuli for Diagnostic Gaze Tracking Paradigms

Salient Feature Extraction From Video Stimuli for Diagnostic Gaze Tracking Paradigms

Friday, May 13, 2011
Elizabeth Ballroom E-F and Lirenta Foyer Level 2 (Manchester Grand Hyatt)
1:00 PM
D. Conant, R. Stoner, E. Musker, S. Marinero, E. Borchert and K. Pierce, Neurosciences and UCSD Autism Center of Excellence, University of California, San Diego, La Jolla, CA

Free standing eye tracking technology is a powerful tool that has shown promise in early identification efforts. For example, a previous study by our center has identified a clear subgroup of toddlers with autism as young as 14 months that display unusually prolonged visual fixation patterns in response to dynamic geometric images (Pierce et al., 2010). Visual fixation patterns alone were clear enough to discriminate toddlers with an ASD from those that were typically developing with 100% positive predictive value.  From this work however it was unclear as to which segments of the dynamic video and/or which salient features were driving the classification. In order to produce a robust diagnostic tool, it is critical to identify the most salient components. Doing so will facilitate the rapid development of powerful stimuli for early identification and discover of other gaze-specific phenotypes.


To develop technology for use with eye tracking technology that automatically identifies salient features in dynamic video that best discriminates toddlers with autism from those that are developmentally delayed or typically developing.


To date, ninety toddlers (44 ASD; 41 Typical; 27 LD; 5 DD, mean age = 27 months) ranging in age between12-42 have participated in a series of eye tracking experiments each containing dynamic video.  Results from the first eye paradigm that examined visual fixation to an actor engaged in various gestures, served as the test sample for the current experiment.

During the “gesture experiment” each toddler was presented with a 43 second video containing 8 scenes of an actor performing socially engaging actions. These actions vary from a simple 'wave hello' to emotionally salient actions, e.g. crying 'boo hoo', to bids for shared attention. Gaze fixation was monitored continuously with a TOBII T120 eye tracker. Areas of interest (AOI) for body, head, eye, mouth and hand regions were created on a per-scene basis and applied within TOBII and Matlab code.

To extract salient regions from the video stimuli, AOI data were grouped by diagnostic category and normalized by the number of valid samples per group. Difference metrics per scene were calculated and used to select individual frames of greatest deviation by diagnostic group and AOI.  Mean gaze location was then determined and used as a secondary metric of comparison.


Based on preliminary feature extraction algorithms, multiple segments during the gesture experiment strongly discriminated visual fixation patterns between at-risk and control groups (p < 0.005 for mean vertical position). In contrast, when the duration of gaze fixation across primary areas of interest were averaged across an entire scene, as is common in most eye tracking experiments, the group differences disappeared (p > 0.05 for all AOIs compared).


Using a salient feature extraction, we were able to identify segments of video stimuli that produced deviation in gaze location between at-risk and control toddlers that would not be apparent in broader analyses. The current algorithms are easy to use, and could be of considerable benefit during the creating of maximally potent early eye tracking tests for autism.

| More