Objectives: We aim to predict Social Responsiveness Scale (SRS) levels by means of a new bioinformatics platform for quantitative phenotype prediction. Although currently there is a limited coverage of the SRS phenotypes in the AGRE cohort, this is a powerful set of indicators that can be used to determine individual trajectories, the ultimate goal for our analysis. Here we set a bioinformatics experiment in which all unfiltered variant positions in the genome are used as potential markers and training is based on extreme value cases.
Methods: Given the 2,883 AGRE samples genotyped by the Broad Institute with the Affymetrix 5.0 platform (399,197 SNPs), we first identified 803 individuals with only ADI or ADOS-confirmed autism diagnosis and 1446 healthy controls not tested for ADI. Individuals having a teacher-administered SRS questionnaire were then selected, leaving 144 cases and 19 controls. We considered the highest 17 and lowest 18 SRS total scores (respectively, only cases and only controls). A linear l1l2-regularization regression model was trained on all features, using the SRS total score as target. The experiment protocol was based on the 10x5 FDA’s MAQC-II procedure (5-fold cross-validation repeated 10 times). For the l1l2 parameter set having the best average R^2 score computed from CV test portions, we evaluated the Area under the Curve (AUC) for classification from real predictions (Wilcoxon Mann-Whitney) and ranked the weights corresponding to each selected SNP.
Results: AUC was 0.723 (95% CI: 0.684-0.768), with a fit of R2 = 0.237 (95% CI: 0.155-0.331). The same 51,744 SNPs were consistently selected in all experiments. Ranked by regression weights, the top 30 markers all had an average position higher than 150. Of these, 24 belong to only four regions: 3p12.2 (3), 8q21.11 (10), 11p12 (3), 11p14.1 (5), Xp11.4 (3). Near loci on chromosomes 8 and 11 had been previously identified for SRS by Duvall et al (2007). Within the top 500 SNPs, we also found 12 SNPs at loci 5p14.1 (2), 14q21.1 (4) and Xq21.1 (6) indicated as candidate markers for autism (Wang et al 2009).
Conclusions: This study represents the first application of a regression method to an autism-related quantitative phenotype. When trained on extreme values of the SRS score, the l1l2 method fairly discriminated cases from controls and explained 23.7% of variance. Moreover, selected markers were stable and consistent with literature. Top ranked markers are being investigated.