Building the Open Information Commons for Autism Research and Discovery: The Hartwell Autism Research and Technology Initiative (iHART)

Friday, May 13, 2016: 11:30 AM-1:30 PM
Hall A (Baltimore Convention Center)
J. Y. Jung1, L. Perez-Cano2, M. Duda1, D. Kashef-Haghighi1, J. Kosmicki1, J. K. Lowe2, E. K. Ruzzo2, S. Sharma1, D. H. Geschwind2 and D. Wall1, (1)Department of Pediatrics, Stanford University, Stanford, CA, (2)Department of Neurology, UCLA, Los Angeles, CA
Background: The search for markers and therapeutic targets has been hampered by the complexity of autism as an integrated system of behaviors, genetics, and environment. This highlights the urgent need for an increasingly larger collection of shared data for open investigations both within and across autism’s systems.

Objectives: We have created the first openly available, cloud hosted “sandbox” of autism data. This sandbox is grounded in a foundation of whole genomic DNA sequence of nearly 5000 individuals enriched for multiplex families, twins, and females, coupled with rich phenotypic information. This sandbox has already begun to complement the large collection of autism exomes in the field to enable identification of de novo variants, large structural variants, and the effect of non-coding region variants with large-scale whole genome data. Ultimately, we hope this database as an openly accessible and cloud hosted resource will enable the field to together define the forms of autism and begin making faster gains towards the development of robust diagnostic tools and targets for therapy.

Methods: We have built a cloud resource that utilizes the massively parallel and distributed computing capabilities of Google to integrate genetic markers with phenotype within biological networks. We have also enabled a system for easy queries of the data as well as a universal interface for sharing and integration of new datasets, with the goal of both growing and sharing the data openly now and in the future.

Results: The iHART now contains nearly a petabyte of data including Illumina HiSeq X10, 30X whole genomic sequences of ~5000 individuals that we intend to grow to over 10,000 individuals spanning roughly 2500 families by 2017.

Conclusions: The iHART represents a primary example of what is both possible and necessary to reach, through interdisciplinary collaboration, a truly open “Information Commons” for autism.  The effort involves collaborators from Stanford, UCLA, the Simons Foundation Autism Research Initiative (SFARI), and New York Genome Center.  In our talk at IMFAR, we will describe the progress on the initiative, demonstrate the user interface to iHART, showcase the preliminary discoveries, and announce its availability for use by our growing community of open access autism researchers.

See more of: Genetics
See more of: Genetics