Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

Aryan Arbabi; David R Adams; Sanja Fidler; Michael Brudno

首页> 外文期刊>JMIR Medical Informatics >Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

【24h】

Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

机译：使用本体导向机学习识别医疗文本中的临床术语

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic recognition of medical concepts in unstructured text is an important component of many clinical and research applications, and its accuracy has a large impact on electronic health record analysis. The mining of medical concepts is complicated by the broad use of synonyms and nonstandard terms in medical documents. We present a machine learning model for concept recognition in large unstructured text, which optimizes the use of ontological structures and can identify previously unobserved synonyms for concepts in the ontology. We present a neural dictionary model that can be used to predict if a phrase is synonymous to a concept in a reference ontology. Our model, called the Neural Concept Recognizer (NCR), uses a convolutional neural network to encode input phrases and then rank medical concepts based on the similarity in that space. It uses the hierarchical structure provided by the biomedical ontology as an implicit prior embedding to better learn embedding of various terms. We trained our model on two biomedical ontologies-the Human Phenotype Ontology (HPO) and Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT). We tested our model trained on HPO by using two different data sets: 288 annotated PubMed abstracts and 39 clinical reports. We achieved 1.7%-3% higher F1-scores than those for our strongest manually engineered rule-based baselines (P=.003). We also tested our model trained on the SNOMED-CT by using 2000 Intensive Care Unit discharge summaries from MIMIC (Multiparameter Intelligent Monitoring?in?Intensive?Care) and achieved 0.9%-1.3% higher F1-scores than those of our baseline. The results of our experiments show high accuracy of our model as well as the value of using the taxonomy structure of the ontology in concept recognition. Most popular medical concept recognizers rely on rule-based models, which cannot generalize well to unseen synonyms. In addition, most machine learning methods typically require large corpora of annotated text that cover all classes of concepts, which can be extremely difficult to obtain for biomedical ontologies. Without relying on large-scale labeled training data or requiring any custom training, our model can be efficiently generalized to new synonyms and performs as well or better than state-of-the-art methods custom built for specific ontologies. ?Aryan Arbabi, David R Adams, Sanja Fidler, Michael Brudno. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 10.05.2019.

机译：自动识别非结构化文本中的医学概念是许多临床和研究应用的重要组成部分，其准确性对电子健康记录分析产生了很大影响。医疗概念的采矿在医疗文件中广泛使用同义词和非标准术语是复杂的。我们为大型非结构化文本中的概念识别提供了一种机器学习模型，这优化了本体结构的使用，可以识别本体中的概念的先前未观察到的同义词。我们介绍了一个神经词典模型，其可以用于预测短语，如果短语是对参考本体中概念的同义。我们的模型称为神经概念识别器（NCR），使用卷积神经网络来编码输入短语，然后根据该空间的相似性排列医学概念。它使用生物医学本体提供的分层结构，作为隐式嵌入的，以便更好地学习嵌入各种术语。我们在两种生物医学本体 - 人类表型本体（HPO）和系统化的医学术语中培训了我们的模型 - 临床术语（Snomed-CT）。我们通过使用两套不同的数据集在HPO上进行了测试：288注释的PubMed摘要和39个临床报告。我们的F1分数比我们最强大的基于规则的基线（P = .003）获得1.7％-3％的F1分数。我们还通过模仿2000强化护理单元排放摘要（Multiparameter智能监测？在CIMICES CORN）和达到的F1分数比我们的基线更高的0.9％-1.3％，而不是我们的基线，通过2000强化护理单元排放摘要测试了我们的模型。我们的实验结果表明了我们模型的高精度以及使用本体论概念认可的本体结构的价值。大多数流行的医学概念识别员依赖于基于规则的模型，这不能概括到解开的同义词。此外，大多数机器学习方法通常需要大量注释文本，涵盖所有类别的概念，这可能非常难以获得生物医学本体。如果没有依赖大规模标记的培训数据或需要任何自定义培训，我们的模型可以将其有效地推广到新的同义词，并且也比特定本体的最先进的方法进行了更好或更好地执行。？Aryan Arbabi，David R Adams，Sanja Fidler，Michael Brudno。最初发表在JMIR医疗信息学（http://medinform.jmir.org），10.05.2019。

著录项

来源
《JMIR Medical Informatics》 |2019年第2期|共15页
作者
Aryan Arbabi; David R Adams; Sanja Fidler; Michael Brudno;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
biomedical ontologiesconcept recognitionhuman phenotype ontologymachine learningmedical text miningphenotyping;

机译：生物医学发作概念概念识别人类表型on理体学习医学文本Miningphenotyping;

相似文献

外文文献
中文文献
专利

1. A machine learning based approach to identify protected health information in Chinese clinical text [J] . Du Liting, Xia Chenxi, Deng Zhaohua, International journal of medical informatics . 2018,第AUGa期

机译：基于机器学习的方法来识别中文临床文本中受保护的健康信息
2. Using Biomedical Text as Data and Representation Learning for Identifying Patients with an Osteoarthritis Phenotype in the Electronic Medical Record [J] . Christopher Meaney, Jessica Widdifield, Liisa Jaakkimainen, International Journal of Population Data Science . 2018,第4期

机译：使用生物医学文本作为数据和表征学习来识别电子病历中的骨关节炎表型患者
3. Comparing Deep Learning and Classical Machine Learning Approaches for Predicting Inpatient Violence Incidents from Clinical Text [J] . Vincent Menger, Floor Scheepers, Marco Spruit Applied Sciences . 2018,第6期

机译：比较深度学习和经典机器学习方法以根据临床文本预测住院暴力事件
4. Identifying Clinical Terms in Free-Text Notes Using Ontology-Guided Machine Learning [C] . Aryan Arbabi, David R. Adams, Sanja Fidler, International Conference on Research in Computational Molecular Biology . 2019

机译：使用本体指导的机器学习识别自由文本注释中的临床术语
5. Machine Learning for Drug Development: Integrating Genomic, Chemical, and Clinical Data to Identify Drug Targets, Efficacies, Adverse Events, and Combinations [D] . Madhukar, Neel S. 2017

机译：药物开发机器学习：整合基因组，化学品和临床数据，以鉴定药物目标，效率，不良事件和组合
6. MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING [O] . Michael Wells, Adam Robin, Laila Poisson, 2019

机译：MLTI-05。使用机器学习从文本临床叙词中识别脑转移并完善语义异质性
7. SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques [O] . Jong Hwan Suh 2019

机译：SocialMerm-Extractor：使用文本挖掘和机器学习技术识别和预测来自大量在线新闻文章的社会问题的关键名词条款

Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅