High Diagnostic Prediction Accuracy for ASD Using Functional Connectivity MRI Data and Random Forest Machine Learning

Thursday, May 15, 2014: 11:30 AM
Imperial A (Marriott Marquis Atlanta)
C. P. Chen1,2, B. A. Bailey3, C. L. Keown1,2,4 and R. A. Müller2, (1)Computational Science Research Center, San Diego State University, San Diego, CA, (2)Brain Development Imaging Laboratory, Dept. of Psychology, San Diego State University, San Diego, CA, (3)Department of Mathematics and Statistics, San Diego State University, San Diego, CA, (4)Dept. of Cognitive Science, University of California San Diego, La Jolla, CA
Background: Although considered a neurological disorder, ASD is diagnosed based on behavioral criteria because unique brain biomarkers are not known. Mounting evidence from functional connectivity MRI (fcMRI) suggests aberrant connectivity in ASD involving multiple brain networks. Machine learning provides promising tools for diagnostic classification (ASD vs. typically developing [TD]) and for the identification of complex biomarkers. 

Objectives: To predict diagnostic status based on large-sample resting state fcMRI data, using random forest (RF) feature selection in an attempt to identify functional connectivity biomarkers of ASD.

Methods: We used resting state fMRI data from 252 low-motion participants (126 ASD, 126 TD) from the Autism Brain Imaging Data Exchange (ABIDE), matched on age, nonverbal IQ, and head motion. FMRI data were motion and field map corrected, aligned to high-resolution anatomicals, standardized to the MNI152 template, and blurred to a 6mm global full-width-at-half-maximum. Six rigid-body motion parameters and signal from white matter and ventricles were modeled as nuisance regressors. Time points with motion >.25mm (and their neighbors) were censored. We chose 220 regions of interest (ROIs) from Power et al. (2011), using 10mm spheres to extract an average signal from each. For each ROI pair, a feature was defined as the functional connectivity (signal correlation; 24090 total features). Implementing the randomForest package in R for feature selection, we used the top features for binary classification. Random forest is a multivariate machine learning method that uses an ensemble of classification and regression trees (CART) and aggregates their results, capitalizing on linear and non-linear interactions of brain networks and connectivity. RF minimizes overfitting and contains an internal validation step by selecting bootstrap samples to build the tree and then using the ‘out of bag’ (OOB) sample to determine error and variable importance (analogous to leave-one-out and other cross-validation procedures).

Results: Using RF feature reduction, we examined the top 100 features with highest importance scores. OOB errors reached a plateau at ≤15% with >40 features. A maximum classification accuracy of 90.1% (OOB error 9.9%) was achieved using 69 features, with 89.8% sensitivity and 90.4% specificity. Selected connections had an average Euclidean distance of 74.1mm, and were primarily inter-network (85.5%) as opposed to intra-network (14.5%). Of the connections with between group difference at p<.05, 80% were underconnected (ASD < TD) and 20% were overconnected. Of the ROIs included in these informative connections, 24% were in the default mode, 13% in somatosensory/motor, and 10% in visual networks.

Conclusions: A 90% classification accuracy suggests that resting state functional connectivity may be a promising source of ASD biomarkers. Selected most informative features were overall consistent with the literature, implicating primarily long-distance connectivity, in particular of default mode, but also sensorimotor networks. Informative connections were predominantly (but not exclusively) characterized by underconnectivity in ASD.