Note: Most Internet Explorer 8 users encounter issues playing the presentation videos. Please update your browser or use a different one if available.

Integrated Data Management Processes for Autism Research: Empowering Efficient Data Reuse and Sharing Across Multiple Studies, Data Types and Sites

Friday, 3 May 2013: 09:00-13:00
Banquet Hall (Kursaal Centre)
11:00
J. Hawthorne, D. Voccola, N. Sinanis, F. Farach and L. Rozenblit, Prometheus Research, LLC, New Haven, CT
Background: A typical data management challenge for the past 25 years has been organizing data from a single study with a single data type for analysis. Upon completion, the data was archived with no expectation of reuse. Just-in-time (JIT) data management practices, such as cleaning and organizing data immediately prior to use, were adequate. Today, however, the demand of translational research in autism requires investigators to (1) integrate multiple data types from multiple sites and/or studies and (2) reuse and share their data. When applied to modern translational autism research, JIT processes are inefficient and compromise data quality. We sought to optimize these activities through a set of procedural changes, made possible by an integrated data management system (iDMS).

Objectives: Our goals were to (1) make high quality data sets accessible within minutes rather than weeks or months; (2) reduce redundant data manipulations for simple revisions to a research question; and (3) implement a transparent and reproducible mechanism for data regeneration.

Methods: We developed an iDMS that can accommodate multiple studies, heterogeneous data types, and multi-site collaboration.  We implemented processes that encouraged up-front data cleaning at the time of acquisition, including data integrity constraints, improved electronic data capture, and data quality reporting.  The iDMS and processes have been adopted by several research groups; we reviewed the use of the iDMS by two data managers to fulfill a range of requests for clean data sets by investigators.  The data managers produced multiple data sets over the past year, and the average time it took to complete each was recorded. The recorded time included investigator communications and query revisions.

Results: Our solution was to (1) centralize all data sources into a single database; (2) clean the data once, up-front; and (3) enable data managers to easily construct and save complex queries. These three procedural changes reduced the time required to produce a data set. Though quantitative measures are unavailable for the “before” state, anecdotal evidence strongly suggests that producing each data set using the JIT process took weeks or months. With an iDMS, the average time per request was 5.8 hours for the 85 ad hoc data set requests. Interestingly, 33% of requests took 30 minutes or less to complete.  Additionally, data quality was higher as a consequence of upfront cleaning and staff allocation was lower due to elimination of redundant data cleaning activities. The queries could be saved in the system and later reused for revisions and comparisons.

Conclusions: Adopting an integrated data management (iDM) process with a compatible system can significantly reduce costs and increase data quality. The greatest challenge was integrating data sources with varying degrees of cleanliness. Data cleaning and organization are likely to remain challenging, as the number of data sources to manage continues to increase. We expect iDM practices will be most valuable when both the cost of data acquisition and the probability of reuse are high. We believe the efficiency gains described here will accelerate scientific discovery and translation.

| More