25921
Application of Semi-Automated Video Scene Coding for Reducing Manual ROI Marking in Eye-Tracking-Based ASD Studies
Objectives: To generate a semi-automated approach capable of identifying and marking any type of ROI in any video sequence automatically and with minimal human operator involvement.
Methods: The proposed mechanism for ROI tracking utilizes Speed Up Robust Features, a scale and rotation invariant object detector/descriptor method. 1)the algorithm detects ROIs new locations/orientations in the frame and defines new ROIs (objects not captured by ROIs) 2)the operator review the results and re-evaluates weakly detected ROIs using new manually markups. Nine video files, used for an activity monitoring eye tracking paradigm in toddlers with ASD (Shic et al., 2013), that contained 9 to 16 ROIs are considered. These videos included two people (P1 & P2) surrounded with multiple objects conducting a controlled activity. The ROIs included 1) dynamic: head(H),body(B), upper/lower-face(UF&LF) regions and 2) static: images on the wall(I) and toys on the ground(T) in the scene.
Results:
The ROIs that are identified with potentially faulty detections and the percentage of frames that their faulty detections occurred in are presented in following table. It should be noted that the results represent the first pass of the algorithm only while additional runs with corrected base ROI can help to re-detect/re-track the ROIs in question for frames with potentially faulty detections. Moreover, although the below ROIs are automatically marked as less successfully detected, a closer investigation indicated that a fair percentage of the potentially faulty detections are likely to be acceptable without any need for re-tracking.
It is noteworthy that majority of inaccurately tracked/detected ROIs are located within facial regions. This is likely due to 1) smaller size of these regions which makes the task of finding robust representative features more difficult and 2) the differences between the viewing angle and orientation of the head in the first frame (the basis of the ROI detection) and the later frames in the video. Data from Thirty-six 4-8 year-olds, ASD n = 14 (11 males,MDQ=88.36,SD=19.84) and non-ASD n=22 (13 males, MDQ=108.91,SD=12.38) shows that there are similarities between eye tracking result variables computed through ROIs developed through this semi-automated process and standard ROI specification techniques.
Conclusions: We have designed a computational method for ROI tracking which dramatically lowers the time burden of defining ROIs for complex, dynamic videos. This approach, which does not lead to identical ROIs as manual coding, nonetheless shows similar performance on certain measures of outcome. Further work will aim to expand the scope and flexibility of our region tracker and to explore the use of this technique on fully unconstrained video inputs, such as that acquired from head mounted eye trackers in natural interactions.