Friday, May 13, 2011: 4:00 PM
Elizabeth Ballroom GH (Manchester Grand Hyatt)
3:45 PM
M. Bresnahan1, K. W. Carter2, R. W. Francis2, M. Gissler3, T. Grønborg4, R. Gross5, M. Hornig1, C. Hultman6, A. Langridge7, H. Leonard7, A. Nyman8, E. T. Parner9, M. Posada10, A. Reichenberg11, S. Sandin6, D. E. Schendel12, A. Sourander13, C. Stoltenberg14, P. Surén14 and E. Susser1, (1)Columbia University, New York, NY, United States, (2)UWA Centre for Child Health Research, Subiaco, Australia, (3)THL National Institute for Health and Welfare, Helsinki, Finland, (4)University of Aarhus, Aarhus, Denmark, (5)Columbia University, New York, NY, (6)Karolinska Institutet , Stockholm, Sweden, (7)Telethon Institute for Child Health Research, West Perth, Australia, (8)Karolinska Institutet, Stockholm, Sweden, (9)Department of Biostatistics, School of Public Health, University of Aarhus, Aarhus, Denmark, (10)Carlos III Health Institute, Madrid, (11)Kings College, London, England, (12)National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA, (13)Dept. of Child Psychiatry, University of Turku, Turku, Finland, (14)Norwegian Institute of Public Health, Oslo
Background: International collaborations using linked, population-based disease and birth registries are increasingly common in health research. The adoption of this approach by the International Collaboration for Autism Registry Epidemiology (iCARE) provides unique opportunities; however, it is not without challenges. iCARE collaborators represent six countries or states, with a total of 26,317 cases of ASD arising from 8.2 million births. By site, data are drawn from two or more sources (e.g., autism registry, birth registry, and other health and population data sets). Very little iCARE registry or administrative data are standardized by definition or method of collection, thus requiring a large investment in data harmonization across sites before combined analysis.
Objectives: To create a unified iCARE dataset consisting of variables that measure the same construct, obtaining/retaining sufficient information to query for artefact and potential sources of bias.
Methods: To develop comprehensive site variable descriptions, a series of questionnaires were distributed to each data-contributing site, moving from the general to the specific: a) description of each registry, and its inclusion criteria; b) description of potential sources for specific variables; c) for each potential source, the variable definition, method of collection, modification to the data during archiving, method of measurement, unit of measurement, missingness and patterns of missingness; d) availability of and changes in variable characteristics over time.
Results: Variations in definition, measurement and availability across sites and over time were synthesized and presented in table format. Inquiries identified three levels of harmonization: a) variables that required little or no modification from the site-specific source variables, b) constructed variables based on one or more variables, and c) constructed variables based on careful examination of the data across all sites. A master algorithm was prepared and distributed to all sites for each iCARE variable, specifying the variable name, definition, missing values, eligible range and values. Each site created an iCARE dataset in accordance with the algorithm and generated a log that tracked iCARE-variable creation for purposes of quality control
Conclusions: Data harmonization was achieved for a basic set of variables across all sites and for additional variables across two or more sites. Given the complexity and diversity of material involved in the harmonization process, it will be essential to maintain archival documentation throughout the studies.