...
首页> 外文期刊>BMC Medical Informatics and Decision Making >Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition
【24h】

Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition

机译:前体诱导的条件随机场:通过诱导连接单独的实体以改善临床命名实体的识别

获取原文
           

摘要

This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textual descriptions in clinical narrative documents frequently indicate causal or posterior relationships that can be used to facilitate clinical named entity recognition. However, the CRF that is generally used for named entity recognition is a first-order model that constrains label transition dependency of adjoining labels under the Markov assumption. Based on the first-order structure, our proposed model utilizes non-entity tokens between separated entities as an information transmission medium by applying a label induction method. The model is referred to as precursor-induced CRF because its non-entity state memorizes precursor entity information, and the model’s structure allows the precursor entity information to propagate forward through the label sequence. We compared the proposed model with both first- and second-order CRFs in terms of their F1-scores, using two clinical named entity recognition corpora (the i2b2 2012 challenge and the Seoul National University Hospital electronic health record). The proposed model demonstrated better entity recognition performance than both the first- and second-order CRFs and was also more efficient than the higher-order model. The proposed precursor-induced CRF which uses non-entity labels as label transition information improves entity recognition F1 score by exploiting long-distance transition factors without exponentially increasing the computational time. In contrast, a conventional second-order CRF model that uses longer distance transition factors showed even worse results than the first-order model and required the longest computation time. Thus, the proposed model could offer a considerable performance improvement over current clinical named entity recognition methods based on the CRF models.
机译:本文提出了一种条件随机场(CRF)方法,该方法能够捕获特定的高阶标签转换因子以改善临床命名实体的识别性能。句子中的连续临床实体通常彼此分开,并且临床叙事文档中的文字描述经常指出因果关系或后验关系,可以用来促进临床命名实体的识别。但是,通常用于命名实体识别的CRF是一阶模型,该模型限制了在马尔可夫假设下相邻标签的标签过渡相关性。基于一阶结构,我们提出的模型通过应用标签归纳法将分离实体之间的非实体标记用作信息传输介质。该模型之所以称为前驱体诱导CRF,是因为其非实体状态会存储前驱体实体信息,并且该模型的结构允许前驱体实体信息通过标签序列向前传播。我们使用两种临床命名的实体识别语料库(2012年i2b2挑战和首尔国立大学医院电子健康记录)在F1得分方面将提议的模型与一阶和二阶CRF进行了比较。所提出的模型显示出比一阶和二阶CRF更好的实体识别性能,并且比高阶模型更有效。提议的使用非实体标签作为标签过渡信息的前体诱导CRF通过利用长距离过渡因子来提高实体识别F1分数,而不会成倍增加计算时间。相反,使用较长距离转换因子的常规二阶CRF模型显示的结果比一阶模型还要差,并且需要最长的计算时间。因此,与基于CRF模型的当前临床命名实体识别方法相比,所提出的模型可以提供可观的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号