Objectives: The current study uses multiple clustering methods across 123 ADI-R items which cover a broad spectrum of behaviors and functions from a large population to identify subgroups of autistic probands with clinically relevant behavioral phenotypes in order to isolate more homogeneous groups of subjects for gene expression analyses.
Methods: ADIR score sheets were downloaded for 1954 individuals with autism from the Autism Genetic Research Exchange phenotype database. The scores were modified to fit a 0-3 numerical scale and subjected to multiple clustering analyses, including principal components analysis (PCA), hierarchical clustering (HCL), and k-means clustering (KMC) which were employed to subgroup individuals on the basis of ADIR item scores. A fitness of merit (FOM) analysis was also conducted to estimate the optimal number of clusters.
Results: This analysis demonstrated that there were easily recognizable distinctions among the groups based upon severity of scores in different domains. Based on the FOM analysis, KMC analysis was performed, dividing the samples into 4 clusters. One cluster is characterized by severe language deficits, while another exhibits milder symptoms across the domains. A third group possesses noticeable savant skills while the fourth group exhibited intermediate severity across all domains. When the clusters of the samples were superimposed upon the graph obtained by PCA, it showed a clear, though not perfect, separation among the groups.
Conclusions: Grouping autistic individuals by multivariate cluster analysis of ADI-R scores reveals meaningful phenotypes of subgroups within the autistic spectrum which we show, in a related study, to be associated with distinct gene expression profiles.