Whole Genome Sequencing and Identical By Descent Filtering of Autism Spectrum Disorder Extended Families Reveals Novel ASD Risk Variants

Friday, May 13, 2016: 11:30 AM-1:30 PM
Hall A (Baltimore Convention Center)
A. J. Griswold1, H. N. Cukier1, D. Van Booven1, P. L. Whitehead2, N. K. Hofmann1, J. M. Lee2, E. R. Martin1, M. L. Cuccaro2, J. R. Gilbert2, J. P. Hussman3 and M. A. Pericak-Vance2, (1)John P Hussman Institute for Human Genomics, University of Miami, Miami, FL, (2)John P. Hussman Institute for Human Genomics, University of Miami, Miami, FL, (3)Hussman Institute for Autism, Inc., Catonsville, MD
Background:  Massively parallel sequencing in autism spectrum disorder (ASD) has focused primarily on whole exome (WES) or whole genome sequencing (WGS) in trio cohorts for identification of de novo loss of function protein coding variants. These studies have only used simplex families and the available WGS analyses have reported largely only on protein coding portions of the genome.

Objectives:  Our study applies WGS to extended, multiplex families with at least two affected cousins likely to carry rare, partially penetrant inherited alterations. We hypothesize that identical by descent (IBD) filtering in these large, multiplex pedigrees would define genomic regions of shared ASD risk and allow identification of coding variants missed by exome sequencing, functional variants in the 98% of the genome that is noncoding, as well as structural variation, to identify potential new ASD loci.

Methods:  We performed WGS on at least two affected cousins across six ASD extended families (15 individuals). Sequencing was performed on the Illumina HiSeq2500 and analyzed through pipelines including BWA-MEM alignment, quality recalibration by GATK, and variant calling with the GATK HaplotypeCaller. Structural variants (SVs) were called with the SWAN algorithm. Annotations were applied with ANNOVAR including functional predictions for noncoding variants (GWAVA, CADD, FATHMM-MKL). We determined IBD regions using whole genome genotyping data and the MERLIN package and used these regions to filter variations for each family. Variants were prioritized by sharing in all affected individuals per family, rarity of the variant in the population (< 1%), and evidence of functionality from computational predictions.

Results:  We sequenced each genome to ~40X coverage and identified more than four million single nucleotide variants (SNVs) and small indels and more than 100 SVs per individual. IBD and sharing filtering within each family limited the total number of SNVs for analysis to 289-655,355, depending on family structure. Among coding SNVs, ~94% concordance was found with existing WES (Cukier, et al, 2014), however WGS identified ~10% more coding variant calls. These include a family with a rare missense mutation in the neurogenesis growth factor GDF11 and another with a frameshift in the axonal development gene SLAIN1. Impact of noncoding variation is continuing to be evaluated, and has identified two shared likely functional variants in the putative promoter of the ASD candidate gene CNTN4 in one family and a two others upstream of potassium channel KCTD1 in another. Finally, rare copy number variants disrupting the promoter of the neurodevelopmental WWOX gene as well as deleting an exon of the lincRNA FIRRE involved in chromosomal organization were found in single families.

Conclusions:  By studying these unique pedigrees, applying cutting edge sequencing and analysis methods, and employing using IBD filtering we establish that WGS in extended families can be used to identify ASD risk alterations. Such methods extend the scope of ASD genetic risk beyond de novo protein coding variants to functional noncoding SNVs, SNVs not captured by WES, and SVs that might be conferring ASD Taken together, WGS identifies new ASD candidate genes and pathways.

See more of: Genetics
See more of: Genetics