16089
Integration of Copy Number and Exome Sequence Data in a Queryable Database for the Investigation of ASDs

Friday, May 16, 2014
Atrium Ballroom (Marriott Marquis Atlanta)
E. McArthur1,2, X. Zhang1, E. R. Gamazon1, J. S. Sutcliffe3, E. H. Cook4, L. K. Davis1 and N. J. Cox1, (1)University of Chicago, Chicago, IL, (2)University of North Carolina at Chapel Hill, Chapel Hill, NC, (3)Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, (4)University of Illinois at Chicago, Chicago, IL
Background:  Although ASD has a significant genetic component, a heterogeneous etiology makes gene discovery complex. With the advent of efficient and affordable sequencing tools, genomic data is becoming widely available in public databases. Two data types of particular relevance for ASD investigation are copy number variants (CNVs) and whole exome sequence data. Exome sequence data has been previously shown to help identify causative genes and variations in complex disorders.  De novoCNVs have been found in dosage-sensitive regions, which are typically stable in healthy controls. Because of the heterogeneity of ASD, it is necessary to integrate CNV and exome sequence data to help identify candidate genes.

Objectives:  To develop a pilot queryable and scalable database for effective integration of whole exome sequence data and CNV data in order to identify additional genes and variants of interest in ASD.

Methods:  A pilot database was developed using available data for 24 probands—23 male and 1 female. Exome sequence data were variant-called with GATK and TrioCaller, exported to a variant call format (VCF), and annotated with ANNOVAR. The VCF contained information such as position, phased genotype, raw read depth, and multiple measures of deleteriousness and conservation. CNV calls were generated by CNVision using Illumina 1M array data and contained information including position, copy number, ancestry, gender, and the calling algorithm used for detection (PennCNV, GNOSIS, and/or QuantiSNP). A MySQL database with four relational tables containing all variant and proband data was developed. The database was queried for rare hemizygous deleterious variants by specifying SNVs with a minor allele frequency less than 5% and possibly deleterious LJB_PolyPhen2 scores which are within areas of CNV deletion.

Results:  Eight rare hemizygous and putative deleterious SNVs were found in the population. Three of the eight (rs59056023, rs17844333, rs115218749) were found in CNV deletion regions within the gene PCDHA9, encoding protocadherin alpha 9. By issuing further queries, it was determined that nine of the 24 individuals (37.5%) have at least one region of deletion within the PCDHA9 gene.

Conclusions:  Through this investigation of rare, hemizygous deleterious variants in ASD probands, we present an efficient approach for effectively integrating different data types and demonstrate that the approach is simple, although robust with even a small sample of data. Database queries are easily customizable for multiple parameters, such as deleteriousness, copy number, conservation, exonic function, MAF, among others. Our initial application of the methodology suggests a further need for investigation into the characterization of rare SNVs within CNV regions. Further attention to PCDHA9 is warranted, especially given its predicted role in establishing and maintaining neuronal cell-cell connections. Other applications of this database framework and integration of additional types of genomic and phenotypic data for ASD investigation will be considered as future directions.

See more of: Genetics
See more of: Genetics