20292
A Large-Scale Analyses of ASD Cases Using Electronic Medical Records
Objectives: Our objectives were as follows:
a) to develop a method of identifying ASD cases by combining past approaches of querying ICD codes with natural language processing of the clinical notes to provide a more accurate assessment of case status; and
b) to investigate demographic and socio-economic factors influencing the age of first ASD diagnosis and follow-up.
c) to follow the temporal evolution of diagnosis ("diagnostic shift") as extracted by the above methods.
Methods: We queried the EPIC hosted EMR of Cincinnati Children’s Hospital Medical Center (CCHMC) to identify all patients with a diagnosis of ASD. We included all patients with a 299.* ICD9 code anywhere in their EMR from 2009-2014 that was recorded by CCHMC's Division of Developmental Disabilities and Behavioral Pediatrics (DDBP). DDBP houses a specialized ASD diagnosis and treatment center and has a rigorous and uniform ASD assessment pipeline. We used several natural language processing (NLP) systems to confirm (or disprove) the ASD diagnosis in the clinical narrative, including the Apache clinical Text Analysis and Knowledge Extraction System (cTAKES), yTEX, and MetaMap. We compared the accuracy of selecting cases by ICD9 code versus NLP diagnosis extraction by comparing each to a manual review of the clinical notes. We also traced the temporal sequence of diagnoses for a given patient over multiple DDBP encounters. Finally, we correlated our findings with the patients' demographic and socio-economic status. Data will additionally be combined with hospital –based controls, including those with developmental disabilities and neurotypical to facilitate multivariate analyses.
Results: We considered a sample of 3,878 total ASD cases through selection by ICD9 code and confirmation with NLP. The sample consisted of 3,159 male and 719 female (~ 4:1 ratio) children between the ages of 2 and 18. The racial composition was: 78.3% white, 10.6% black, 5.7% other, 2.3 % two or more races, 1.6% Asian, 1.2% unknown, 0.23 % Native Hawaiian or Indian/Alaska Native. A subsample of n=100 observations were validated by comparing ICD9 codes and NLP diagnosis extraction to a manual review of the clinical notes. As shown in Fig. 1, we see an indication of a discrepancy between the numbers of African American children in our cohort vs. the expected number of African American children based on U.S. Census data. Certain geographical areas appear to be particularly affected.
Conclusions: Preliminary results suggest a slight racial bias with a lower representation of African American children for some, but not all geographic areas. Further multivariate analyses will control for age, IQ, gender and additional socioeconomic factors; we will additionally evaluate diagnostic trends over time.