...
首页> 外文期刊>Knowledge and Information Systems >Detecting duplicate biological entities using Markov random field-based edit distance
【24h】

Detecting duplicate biological entities using Markov random field-based edit distance

机译:使用基于马尔可夫随机场的编辑距离检测重复的生物实体

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Detecting duplicate entities in biological data is an important research task. In this paper, we propose a novel and context-sensitive Markov random field-based edit distance (MRFED) for this task. We apply the Markov random field theory to the Needleman–Wunsch distance and combine MRFED with TFIDF, a token-based distance algorithm, resulting in SoftMRFED. We compare SoftMRFED with other distance algorithms such as Levenshtein, SoftTFIDF, and Monge–Elkan for two matching tasks: biological entity matching and synonym matching. The experimental results show that SoftMRFED significantly outperforms the other edit distance algorithms on several test data collections. In addition, the performance of SoftMRFED is superior to token-based distance algorithms in two matching tasks.
机译:检测生物学数据中的重复实体是一项重要的研究任务。在本文中,我们针对此任务提出了一种新颖且上下文敏感的基于马尔可夫随机场的编辑距离(MRFED)。我们将马尔可夫随机场理论应用于Needleman–Wunsch距离,并将MRFED与基于令牌的距离算法TFIDF相结合,从而得出SoftMRFED。我们将SoftMRFED与其他距离算法(如Levenshtein,SoftTFIDF和Monge-Elkan)进行两项匹配任务:生物实体匹配和同义词匹配。实验结果表明,SoftMRFED在多个测试数据集上明显优于其他编辑距离算法。此外,在两个匹配任务中,SoftMRFED的性能优于基于令牌的距离算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号