17438
Integrated Analyses of Genome Wide Association and Targeted Sequencing Data Identify Loss of Function and Noncoding Regulatory Rare Variants Contributing to Autism Spectrum Disorder
Objectives: To identify rare coding and noncoding ASD risk variants integrating GWAS association with custom targeted massively parallel sequencing of candidate regions.
Methods: Candidate regions were chosen from GWAS Noise Reduction analyses of two autism datasets (Hussman et. al., 2011). We sequenced 17Mb on the Illumina HiSeq2000 utilizing a custom Agilent SureSelect probe-set targeting exons, UTRs, conserved intronic areas, and regulatory regions of 681 GWAS-NR associated genes and conserved regions in 694 associated intergenic loci. Our cohort consisted of 2,112 ASD cases and 834 controls of white ethnicity determined by Eigenstrat. Bioinformatic processing included alignment with bwa, genotype calling with the GATK Universal Genotype Caller, and annotation of coding variants with SeattleSeq137, PolyPhen2, and SIFT and noncoding variants with ENCODE, VISTA, and GENCODE databases. We first identified rare (MAF≤0.01) loss-of-function (LOF) alterations (stop gains-losses, splice changes, and frameshifts) or genes with multiple damaging variants in the same individual, and noncoding variants predicted likely to affect binding by RegulomeDB. For LOF variants in ASD candidate genes, we determined inheritance status of the variant by Sanger sequencing. We used the Sequence Kernel Association Test (SKAT) for gene and noncoding element based association testing between sets of rare variants and ASD.
Results: We identified 547,716 single nucleotide variants (SNVs) and 94,326 indels, including 29,734 coding variations and 2,653 rare LOF alterations. No overall difference existed between cases and controls in the number of variations or multiple hit genes, but there was significant enrichment in cases when restricting to LOF variants in ASD candidate genes (p=0.003). Among these was a confirmed de novo premature stop in the candidate gene RBFOX1. RegulomeDB predicted 55 case unique noncoding variants to affect transcription factor binding motifs. Two were in the same glutamate receptor gene, GRIK4, altering SP1 and CTCF sites. Association testing identified only nominally significant associations with rare exonic variants ASD in 30 genes including the transcription factors ZNF24 (p=0.00192) and ZNF519 (p= 0.00465) and nominal significance with rare SNVs in enhancer hs658 (p=0.07) that drives expression in the midbrain of mouse embryos. Further analyses of subsets of variations are ongoing.
Conclusions: The identification of an enrichment of LOF variants in ASD candidate genes corroborate literature from other ASD datasets and identification of a de novo LOF in RBFOX1 further establishes its importance as an ASD gene. As the first large scale study of noncoding variation in ASD, this investigation adds noncoding variants to the compendium of ASD risk loci and implicate potential novel biological mechanisms contributing to ASD etiology. Overall, this study highlights integration of association and sequencing data to find rare ASD risk alleles.