The Role of Feature Engineering in Developing Clinical Diagnostic Machine Learning Algorithms for ASD Screening

Friday, May 12, 2017: 12:00 PM-1:40 PM
Golden Gate Ballroom (Marriott Marquis Hotel)
J. W. Wade1, A. Vehorn2 and Z. Warren3, (1)Autos Consulting, Murfreesboro, TN, (2)Vanderbilt University Medical Center, Nashville, TN, (3)Vanderbilt University, Nashville, TN
Background:  Supervised machine learning (ML) methods are used to train models that map multi-dimensional numerical data to categorical target variables, given a labeled data set. This type of ML has been recently applied to the problem of predicting ASD diagnosis from gold standard measures such as the Autism Diagnostic Observation Schedule. However, these types of assessments are time- and resource-intensive. To demonstrate reasonably accurate predictions using ML methods on a reduced data set would provide immediate value for clinicians and families alike in the form of patient queue sorting and more rapid feedback.

Objectives:  In this work, we present the design and performance of ML models trained on clinical diagnostic data to predict the presence or absence of ASD, using the labels ASD and Non-ASD, respectively. We first identify a small subset of features to be used in model training, thus reducing the number of items required to make predictions. Second, we use feature engineering methods to craft composite features that are more explanatory than their individual components. Finally, we train decision tree ML models and report on their performance for a variety of analyses.

Methods:  A database of best-estimate clinical diagnoses for toddlers observed using the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) was used for model training. This data set contained N=737 samples (514 ASD and 223 Non-ASD) and 41 features. Dozens of engineered features were created based on exploratory analysis of the data. The most discriminating feature among these was the distance of the data point to the Non-ASD median centroid, which was composed of the medians of ADOS-2 codes A2, B1, and B5 for the Non-ASDsubset.

Results:  The metric unweighted average recall (UAR) has been suggested as a more robust measure of a model’s reliability than true-positive rate (accuracy), particularly in the presence of unbalanced data. To compare the performance of our reduced feature set to the gold standard measures, we trained a model on the full set of ADOS-2 codes (accuracy=90.5%, UAR=0.89) and another model solely on the ADOS-2 total (accuracy=93.1%, UAR=0.92). Our best performing model using only 7 features (the three components of the Non-ASD median centroid and ADOS-2 codes A3, D1, D2, and D5) yielded comparable results (accuracy=92.4%, UAR=0.90), and an even simpler model using just 3 features (Non-ASD median centroid distance) demonstrated respectable results with just 7.3% of the available feature set (accuracy=91.32%, UAR=0.88).

Conclusions:  Using the principles of feature engineering, we demonstrated that some composite feature (e.g., distance to Non-ASD median centroid) outperformed their component features in predicting best-estimate clinical diagnosis. Moreover, we showed that a small subset of ADOS-2 codes had comparable prediction power to the full set of codes, as well as to the ADOS-2 total score. These findings support the claim that a simplified clinical diagnostic tool may be used to facilitate faster turnaround times for clinicians, patients, and their families.