首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Phrase2VecGLM: Neural generalized language model-based semantic tagging for complex query reformulation in medical IR
【24h】

Phrase2VecGLM: Neural generalized language model-based semantic tagging for complex query reformulation in medical IR

机译:基于神经通用语言模型的语言模型,用于医疗IR的复杂查询重构的神经通用语言模型语义标记

获取原文

摘要

In fact-based information retrieval, state-of-the-art performance is traditionally achieved by knowledge graphs driven by knowledge bases, as they can represent facts about and capture relationships between entities very well. However, in domains such as medical information retrieval, where addressing specific information needs of complex queries may require understanding query intent by capturing novel associations between potentially latent concepts, these systems can fall short. In this work, we develop a novel, completely unsupervised, neural language model-based ranking approach for semantic tagging of documents, using the document to be tagged as a query into the model to retrieve candidate phrases from top-ranked related documents, thus associating every document with novel related concepts extracted from the text. For this we extend the word embedding-based generalized language model (GLM) due to (Ganguly et al., 2015), to employ phrasal embeddings, and use the semantic tags thus obtained for downstream query expansion, both directly and in feedback loop settings. Our method, evaluated using the TREC 2016 clinical decision support challenge dataset, shows statistically significant improvement not only over various baselines that use standard MeSH terms and UMLS concepts for query expansion, but also over baselines using human expert-assigned concept tags for the queries, on top of a standard Okapi BM25-based document retrieval system.
机译:在基于事实的信息检索中,传统上通过知识库驱动的知识图形来实现最先进的性能,因为它们可以代表实体之间的事实和捕捉实体之间的关系。然而,在诸如医疗信息检索的域中,在寻址复杂查询的特定信息需求可能需要了解潜在潜在的概念之间的新颖关联来了解查询意图,这些系统可以缩短。在这项工作中,我们开发了一种新颖的,完全无监督的基于神经语言模型的排名方法,用于文档的语义标记,使用该文档被标记为模型,以检索来自排名相关文档的候选短语,从而关联每个文档都从文本中提取了新的相关概念。为此,我们扩展了基于嵌入的广义语言模型(GLM)(Ganguly等,2015),雇用短语嵌入品,并使用如此获得的语义标签直接和反馈循环设置中获得的下游查询扩展。我们的方法使用TREC 2016临床决策支持挑战数据集进行了评估,不仅在使用标准网格术语和UMLS概念的各种基线上显示了统计上显着的改进,而且还使用用于查询的人类专家分配的概念标签的基准,在标准的基于OKAPI BM25的文档检索系统之上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号