20292
A Large-Scale Analyses of ASD Cases Using Electronic Medical Records

Saturday, May 16, 2015: 11:30 AM-1:30 PM
Imperial Ballroom (Grand America Hotel)
N. Connolly1, K. W. Burkett2 and K. A. Bowers3, (1)Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, (2)Developmental and Behavioral Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, (3)Division of Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH
Background:  Electronic Medical Records (EMR) are an emerging resource for studies evaluating risk factors, diagnosis trends, co-occurring conditions and other areas of research on autism spectrum disorder (ASD).  EMR holds great potential for a number of reasons, including the amount of data, the ability to integrate data of various types, and the possibility of analyzing large numbers of cases.  The latter is a particularly relevant for ASD, which is characterized by both clinical and genetic heterogeneity

Objectives:  Our objectives were as follows:

a) to develop a method of identifying ASD cases by combining past approaches of querying ICD codes with natural language processing of the clinical notes to provide a more accurate assessment of case status; and

b) to investigate demographic and socio-economic factors influencing the age of first ASD diagnosis and follow-up.

c) to follow the temporal evolution of diagnosis ("diagnostic shift") as extracted by the above methods.

Methods: We queried the EPIC hosted EMR of Cincinnati Children’s Hospital Medical Center (CCHMC) to identify all patients with a diagnosis of ASD.  We included all patients with a 299.* ICD9 code anywhere in their EMR from 2009-2014 that was recorded by CCHMC's Division of Developmental Disabilities and Behavioral Pediatrics (DDBP).   DDBP houses a specialized ASD diagnosis and treatment center and has a rigorous and uniform ASD assessment pipeline.  We used several natural language processing  (NLP) systems to confirm (or disprove) the ASD diagnosis in the clinical narrative, including the Apache clinical Text Analysis and Knowledge Extraction System (cTAKES), yTEX, and MetaMap.   We compared the accuracy of selecting cases by ICD9 code versus NLP diagnosis extraction by comparing each to a manual review of the clinical notes.   We also traced the temporal sequence of diagnoses for a given patient over multiple DDBP encounters.    Finally, we correlated our findings with the patients' demographic and socio-economic status. Data will additionally be combined with hospital –based controls, including those with developmental disabilities and neurotypical to facilitate multivariate analyses.  

Results:  We considered a sample of 3,878 total ASD cases through selection by ICD9 code and confirmation with NLP.  The sample consisted of 3,159 male and 719 female  (~ 4:1 ratio) children between the ages of 2 and 18.  The racial composition was: 78.3% white, 10.6% black, 5.7% other, 2.3 % two or more races, 1.6% Asian, 1.2% unknown, 0.23 % Native Hawaiian or Indian/Alaska Native.  A subsample of n=100 observations were validated by comparing ICD9 codes and NLP diagnosis extraction to a manual review of the clinical notes.  As shown in Fig. 1, we see an indication of a discrepancy between the numbers of African American children in our cohort vs. the expected number of African American children based on U.S. Census data.   Certain geographical areas appear to be particularly affected.

Conclusions: Preliminary results suggest a slight racial bias with a lower representation of African American children for some, but not all geographic areas. Further multivariate analyses will control for age, IQ, gender and additional socioeconomic factors; we will additionally evaluate diagnostic trends over time.

See more of: Epidemiology
See more of: Epidemiology