Objectives: Based on the hypothesis that there are variants with small effect confined to a limited number of biological pathways, which are not detected by Transmission disequilibrium Test, we applied a network-based approach to the Autism Genome Project (AGP) consortium GWAS to indentify biological networks associated with ASD risk and high priority candidate genes.
Methods: We combined family-based association data from 2258 ASD families genotyped in the Illumina 1M SNP, as part of the AGP GWAS, with Human Protein-Protein interaction (PPI) data compiled from public databases. Various association P-value thresholds were analyzed to check if potential relevant risk genes can still be identified within higher significance levels using network properties, such as the percentage of direct interactions and the size of the largest connected component (LCC) of the network. A sample of 943 ASDs families from the Autism Genetic Resource Exchange genotyped using the Illumina HumanHap550 BeadChip (Wang et al, 2009) was used for replication. The networks identified in the 2 datasets were compared and a functional enrichment analysis was performed.
Results: By comparing the network properties of our data with random samples, we determined a P-value of 0.01 as the threshold for which we can still infer meaningful biology from the data. Selecting genes including SNPs with P<0.01, we found that 50% of the proteins directly interact with each other (P=5x10-5) and were connected in a subnetwork of 453 proteins (P<0.001). Using the same threshold, these observations were replicated in the independent dataset, with 219 proteins interconnected in the LCC (P<0.001) and 39.4% of proteins directly interacting (P=0.008). The proteins of the largest connected component of the two datasets are mainly localized in the synaptic compartment and expressed in the brain. Both are enriched in pathways related to focal adhesion, with 64 genes in common, including 5 established autism candidate genes.
Conclusions: Using the AGP GWAS we showed that network analysis is an effective strategy to uncover, from a GWAS, low effect sizes loci that cannot be detected using single SNP analysis. We found that there are many potentially relevant susceptibility loci with P<0.01 that are being overlooked and should be further explored. Using two independent autism GWAS datasets we found that autism-associated proteins are functionally related and involved in a small number of interconnected biological processes. The overlapping genes between the analyses are being further explored for specificity for autism.