Objectives: We aim to build a publication search engine that specifically focuses on identifying target disorders and investigated human genes. In this case, we aim to identify all original research articles and reviews available in PubMed that examine links between autisms and genotype data, with minimal human intervention. The primary output of this pipeline is a list of autism candidate genes with significant results, supporting publications, and the main statements in such publications.
Methods: First, we built a disorder-specific query to use in PubMed with MeSH terms, expanded disorder aliases, and publication type filters in order to retrieve autism-related research articles. Next, we filtered genetics-related publications among them by matching genetics keywords, extracted from a training set of genetics-focused articles. We built and applied a rule-based text-mining algorithm to analyze titles, abstracts and MeSH terms in order to identify human gene symbols, negation/structures in the title and abstract text, and characteristics of the study (e.g., linkage analysis, gene expression, genome-wide association, copy number variations, etc.). Finally, we identified the main candidate gene(s) per each publication using structural information obtained in the previous step, and assessed the collective significance of each candidate gene based on the number/importance of related publications and the type of study.
Results: Of 20,921 autism-related articles in PubMed, we identified 12,900 research-oriented publications and 5,155 articles of them turned out to be genetics-related, including 959 reviews. About half of them (2,542) included names or symbols mapped to human genes, and we found 784 research articles (excluding reviews) with 576 genes that report significant test result in either the title or result/conclusion section of the abstract. We compared our set of candidate genes and supporting publications with those of SFARI (manually curated) and HuGE Navigator Phenopedia (algorithm based). It turned out that our sets cover more number of publications including ones reporting negative associations, and we also found a number of significant candidate genes missing in both sites.
Conclusions: We implemented a PubMed search engine that can extensively search and summarize research articles, specifically focused on a given target disorder and human genome/genetics. This engine enables us to “see” the genetic networks of autism in terms of research publications, and works as a fundamental basis for conducting cross-disorder analysis between autism and other related complex disorders.
See more of: Genetic Factors in ASD
See more of: Biological Mechanisms