首页> 外文期刊>JMIR Medical Informatics >Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning
【24h】

Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

机译:使用本体导向机学习识别医疗文本中的临床术语

获取原文
           

摘要

Automatic recognition of medical concepts in unstructured text is an important component of many clinical and research applications, and its accuracy has a large impact on electronic health record analysis. The mining of medical concepts is complicated by the broad use of synonyms and nonstandard terms in medical documents. We present a machine learning model for concept recognition in large unstructured text, which optimizes the use of ontological structures and can identify previously unobserved synonyms for concepts in the ontology. We present a neural dictionary model that can be used to predict if a phrase is synonymous to a concept in a reference ontology. Our model, called the Neural Concept Recognizer (NCR), uses a convolutional neural network to encode input phrases and then rank medical concepts based on the similarity in that space. It uses the hierarchical structure provided by the biomedical ontology as an implicit prior embedding to better learn embedding of various terms. We trained our model on two biomedical ontologies-the Human Phenotype Ontology (HPO) and Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT). We tested our model trained on HPO by using two different data sets: 288 annotated PubMed abstracts and 39 clinical reports. We achieved 1.7%-3% higher F1-scores than those for our strongest manually engineered rule-based baselines (P=.003). We also tested our model trained on the SNOMED-CT by using 2000 Intensive Care Unit discharge summaries from MIMIC (Multiparameter Intelligent Monitoring?in?Intensive?Care) and achieved 0.9%-1.3% higher F1-scores than those of our baseline. The results of our experiments show high accuracy of our model as well as the value of using the taxonomy structure of the ontology in concept recognition. Most popular medical concept recognizers rely on rule-based models, which cannot generalize well to unseen synonyms. In addition, most machine learning methods typically require large corpora of annotated text that cover all classes of concepts, which can be extremely difficult to obtain for biomedical ontologies. Without relying on large-scale labeled training data or requiring any custom training, our model can be efficiently generalized to new synonyms and performs as well or better than state-of-the-art methods custom built for specific ontologies. ?Aryan Arbabi, David R Adams, Sanja Fidler, Michael Brudno. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 10.05.2019.
机译:自动识别非结构化文本中的医学概念是许多临床和研究应用的重要组成部分,其准确性对电子健康记录分析产生了很大影响。医疗概念的采矿在医疗文件中广泛使用同义词和非标准术语是复杂的。我们为大型非结构化文本中的概念识别提供了一种机器学习模型,这优化了本体结构的使用,可以识别本体中的概念的先前未观察到的同义词。我们介绍了一个神经词典模型,其可以用于预测短语,如果短语是对参考本体中概念的同义。我们的模型称为神经概念识别器(NCR),使用卷积神经网络来编码输入短语,然后根据该空间的相似性排列医学概念。它使用生物医学本体提供的分层结构,作为隐式嵌入的,以便更好地学习嵌入各种术语。我们在两种生物医学本体 - 人类表型本体(HPO)和系统化的医学术语中培训了我们的模型 - 临床术语(Snomed-CT)。我们通过使用两套不同的数据集在HPO上进行了测试:288注释的PubMed摘要和39个临床报告。我们的F1分数比我们最强大的基于规则的基线(P = .003)获得1.7%-3%的F1分数。我们还通过模仿2000强化护理单元排放摘要(Multiparameter智能监测?在CIMICES CORN)和达到的F1分数比我们的基线更高的0.9%-1.3%,而不是我们的基线,通过2000强化护理单元排放摘要测试了我们的模型。我们的实验结果表明了我们模型的高精度以及使用本体论概念认可的本体结构的价值。大多数流行的医学概念识别员依赖于基于规则的模型,这不能概括到解开的同义词。此外,大多数机器学习方法通​​常需要大量注释文本,涵盖所有类别的概念,这可能非常难以获得生物医学本体。如果没有依赖大规模标记的培训数据或需要任何自定义培训,我们的模型可以将其有效地推广到新的同义词,并且也比特定本体的最先进的方法进行了更好或更好地执行。 ?Aryan Arbabi,David R Adams,Sanja Fidler,Michael Brudno。最初发表在JMIR医疗信息学(http://medinform.jmir.org),10.05.2019。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号