25250
Prioritization of ASD-Associated Genes By Variant Annotation Identifies Trends in Genetic Variant Discovery
The search for genetic causes of autism spectrum disorders (ASD) has led to the identification of hundreds of genes containing thousands of variants that differ in their mode of inheritance, effect size, and frequency in cases and controls, and exhibit a broad range of putative functional effects. These data are richly annotated in AutDB (also known as SFARI Gene), an open-access database for ASD-associated genetic variation.
Objectives:
We previously described a scoring algorithm for the prioritization of candidate genes based on the cumulative strength of evidence from each ASD-associated variant curated in AutDB. Here, we present the results using an expanded and up-to-date dataset of ASD-associated variants.
Methods:
A total of 1176 annotated research articles were analyzed to generate a dataset of 3557 ASD-associated rare variants and 861 ASD-associated common variants distributed across 787 candidate genes (September 2016 data freeze). Under our algorithm, each individual variant is manually annotated with multiple attributes extracted from the original report, followed by score assignment using a set of standardized scoring parameters that were summed up to yield a single score for each gene in the database. We also performed a time-course analysis of ASD-associated gene scores to identify trends in rare and common variant discovery and assess how newly discovered variants affected the scores of the genes in which they were identified.
Results:
We observed remarkable variation in gene scores resulting in a distribution with a mean gene score of 23.53 ± 38.73. We were able to identify a set of 14 high confidence candidate genes with scores deviating more than two standard deviations (SDs) from the mean score of all genes. The gene scores generated by our approach once again significantly correlated with other ASD candidate gene ranking systems, including the expert-mediated SFARI Gene scoring initiative and gene prioritization based on Transmission and De Novo Association (TADA) analysis of ASD cohorts. Finally, time course analysis of ASD-associated gene scores identified a subset of candidate genes that showed a marked increase in gene score based on the identification of novel genetic variants in those genes.
Conclusions:
Altogether, our scoring algorithm continues to provide a framework for assessment of diverse types of ASD-associated genetic variants that are likely to be important for defining the genetic risk architecture of ASD.