...
首页> 外文期刊>JMIR Medical Informatics >Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study
【24h】

Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study

机译:结合上下文化嵌入和临床名称实体识别的先验知识:评估研究

获取原文
           

摘要

Background Named entity recognition (NER) is a key step in clinical natural language processing (NLP). Traditionally, rule-based systems leverage prior knowledge to define rules to identify named entities. Recently, deep learning–based NER systems have become more and more popular. Contextualized word embedding, as a new type of representation of the word, has been proposed to dynamically capture word sense using context information and has proven successful in many deep learning–based systems in either general domain or medical domain. However, there are very few studies that investigate the effects of combining multiple contextualized embeddings and prior knowledge on the clinical NER task. Objective This study aims to improve the performance of NER in clinical text by combining multiple contextual embeddings and prior knowledge. Methods In this study, we investigate the effects of combining multiple contextualized word embeddings with classic word embedding in deep neural networks to predict named entities in clinical text. We also investigate whether using a semantic lexicon could further improve the performance of the clinical NER system. Results By combining contextualized embeddings such as ELMo and Flair, our system achieves the F-1 score of 87.30% when only training based on a portion of the 2010 Informatics for Integrating Biology and the Bedside NER task dataset. After incorporating the medical lexicon into the word embedding, the F-1 score was further increased to 87.44%. Another finding was that our system still could achieve an F-1 score of 85.36% when the size of the training data was reduced to 40%. Conclusions Combined contextualized embedding could be beneficial for the clinical NER task. Moreover, the semantic lexicon could be used to further improve the performance of the clinical NER system.
机译:作为临床自然语言处理(NLP)的关键步骤,所以命名实体识别(NER)。传统上,基于规则的系统利用先验知识来定义规则以识别命名实体。最近,基于深度学习的新系统已经变得越来越受欢迎。上下文化的单词嵌入作为单词的新类型的表示,已经提出了使用上下文信息动态捕获字词,并且在普通域或医疗领域的许多深度学习的系统中被证明是成功的。然而,很少有研究可以调查组合多种上下文化嵌入的影响和对临床编辑任务的先验知识的影响。目的本研究旨在通过组合多种上下文嵌入和先验知识来改善临床文本中必备的性能。方法在本研究中,我们调查将多个上下文化单词嵌入与深度神经网络中的经典词汇中的效果相结合,以预测临床文本中的命名实体。我们还研究了是否使用语义词典可以进一步提高临床内系统的性能。结果通过组合elmo和flair等上下文化嵌入,我们的系统仅在仅基于2010年信息学的一部分集成生物学和床头旁任务数据集的培训时达到87.30%的F-1得分。将医用词典纳入嵌入单词后,F-1得分进一步增加到87.44%。另一个发现是,当培训数据的规模减少到40%时,我们的系统仍然可以实现85.36%的F-1得分。结论结合的情境化嵌入可能对临床编制任务有益。此外,语义词典可用于进一步提高临床内系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号