20279
Identifying Non-Protein Coding Autism Risk Variants By Computational Analysis of a Large Case Control Sequencing Cohort
Objectives: The goal of this study is to computationally assess the functionality of single nucleotide variants (SNVs) identified in a large case-control sequencing study and to prioritize those for future molecular follow-up.
Methods: Our sequencing cohort consisted of 2071 ASD cases diagnosed by DSM-IV criteria and supported by the Autism Diagnostic Interview and 904 age matched controls; all of European ethnicity. We sequenced 17Mb consisting of exons including untranslated regions, evolutionarily conserved intronic areas, and 5kb flanking regulatory regions of 681 genes and evolutionarily conserved regions in 694 associated intergenic loci identified by ASD-GWAS Noise Reduction analysiss (Hussman et al., 2011). We applied three recently published computational methods for prioritization of noncoding variants. 1) FunSeq2 (Function based Prioritization of Sequence Variants v2) scores SNVs by integrating ENCODE annotation, known binding motif sequences, evolutionary conservation, and population variant frequencies. (Fu et. al., 2014) 2) GWAVA (Genome Wide Annotation of Variants) uses noncoding variants in the Human Gene Mutation Database to train a model to score SNVs based distance to gene start sites or known pathogenic noncoding mutations. (Ritchie et. al., 2014) 3) CADD (Combined Annotation Dependent Depletion) implements a machine learning vector using evolutionarily analyses to score SNVs relative to all possible genomic variations (Kircher et. al., 2014). Novel risk variants and genes were identified by clustering of case specific functional SNVs with the highest scores.
Results: We identified 507,924 noncoding SNVs across our entire dataset. To identify individual ASD noncoding variants and genes, we determined the intersection of the top 10% of scores for all three programs and identified 158 candidate SNVs scored highly by all three computational methods. From these, we identified clusters of at least three noncoding rare SNVs (1000 Genomes MAF < 0.01) unique to ASD cases within 15bp. We identified 10 such clusters, including four in regulatory regions of genes playing important roles in neuronal processes implicated in ASD physiology. These included clusters within 2kb of the start site of the neuronal migration gene ASTN1 and cell adhesion molecule gene CDH2. Also, two within 5’ UTRs of the MAP kinase MAP3K8 and the intracellular solute transporter SLC35A2.
Conclusions: As lower costs allow generation of sequencing data beyond exomes, application of computational tools for variant prioritization is essential for interpretation of noncoding SNVs. In this study, they have suggested potential novel genes related to ASD pathophysiology. As the first large scale study of noncoding variation in ASD, this investigation adds noncoding variants to the compendium of ASD risk loci and implicate potential novel biological mechanisms, beyond protein coding variations, that contribute to ASD genetic risk.