首页> 美国卫生研究院文献>Journal of Integrative Bioinformatics >Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation
【2h】

Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation

机译:监督学习和基于知识的方法应用于生物医学词义消歧

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Word sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation approaches, with the best results obtained by supervised means. In the supervised method we used bag-of-words as local features, and word embeddings as global features. In the knowledge-based method we combined word embeddings, concept textual definitions extracted from the UMLS database, and concept association values calculated from the MeSH co-occurrence counts from MEDLINE articles. Also, in the knowledge-based method, we tested different word embedding averaging functions to calculate the surrounding context vectors, with the goal to give more importance to closest words of the ambiguous term. The MSH WSD dataset, the most common dataset used for evaluating biomedical concept disambiguation, was used to evaluate our methods. We obtained a top accuracy of 95.6 % by supervised means, while the best knowledge-based accuracy was 87.4 %. Our results show that word embedding models improved the disambiguation accuracy, proving to be a powerful resource in the WSD task.
机译:词义消歧(WSD)是生物医学文本挖掘中的重要步骤,该任务负责为模棱两可的术语指定明确的概念,从而提高生物医学信息提取系统的准确性。在这项工作中,我们遵循有监督和基于知识的消歧方法,并通过有监督的手段获得了最佳结果。在监督方法中,我们使用词袋作为局部特征,并使用词嵌入作为全局特征。在基于知识的方法中,我们结合了词嵌入,从UMLS数据库中提取的概念文本定义以及从MEDLINE文章的MeSH共现计数中计算出的概念关联值。同样,在基于知识的方法中,我们测试了不同的词嵌入平均函数以计算周围的上下文向量,目的是更加重视模糊词的最接近词。 MSH WSD数据集是用于评估生物医学概念消歧的最常见数据集,用于评估我们的方法。通过监督手段,我们获得了95.6%的最高准确性,而基于知识的最佳准确性为87.4%。我们的结果表明,词嵌入模型提高了歧义消除的准确性,被证明是WSD任务中的强大资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号