20116
A Novel Approach for Efficient Submission of Research Data to the National Database for Autism Research (NDAR)
Objectives: We sought to dramatically reduce the time and money required to establish and maintain the interoperability of data between research centers, with particular attention to interfacing with NDAR. The ideal solution would maintain high data quality on both sides of the interaction, so that all beneficiaries of the data sharing agreement would have high confidence in the accuracy of the data. The solution needed to scale to large volumes of research data collected without a substantial increase in the effort required to establish or maintain the sharing connection.
Methods: We created a process where manual recoding of data is replaced by data sharing instructions in the form of extraction and transformation scripts. These scripts are stored in revision-controlled file repositories accessible to all members of the data sharing team. The source data from transmitting researchers are stored in relational databases built on the RexDB® platform, while the data sharing team works exclusively on the data extraction and transformation scripts. This division ensures that the researchers’ data are not impacted or altered in any way. NDAR receives flat data files that conform fully to the organization’s published expectations for data structure and assessment values, while researchers continue to enter and analyze data in structures with which they are familiar.
Results: Over the course of seven typical (20-60 subjects, 400-1000 fields each) data submissions to NDAR, the need for duplication, retranscription, or restructuring of the source data is fully eliminated. Separating the extraction and transformation scripts from data files has also eradicated the impact of additional data collection on the time required to repeat successful transmissions. Revision controlled management of these scripts provides a new benefit: traceability of the transformation process itself. Now, point-in-time retrieval of extraction scripts and explanations for modifications to the data sharing interface are possible. None of the datasets provided to NDAR within this framework drew any audits or revision requests following any of the submissions, an indication of the high data quality levels achieved.
Conclusions: This method has proven to be successful and efficient for interfacing research data with NDAR. It presents little-to-no impact to transmitting investigators’ data, ensures high data integrity, trivializes the complexities of repeatedly modifying a growing dataset over time, and introduces traceability to the collaborative process of integrating two collections of data with one another. Anecdotal evidence indicates that NDAR submissions have transformed from a time consuming process into an opportunity to improve the quality of research data for future analyses. As this framework becomes more widely utilized, we expect even broader adoption of ad-hoc data sharing agreements within the research community, permitting researchers to investigate questions of increasing complexity.