首页> 外文会议>Annual meeting of the Association for Computational Linguistics;Workshop on biomedical natural language processing >Phrase2VecGLM: Neural generalized language model-based semantic tagging for complex query reformulation in medical IR
【24h】

Phrase2VecGLM: Neural generalized language model-based semantic tagging for complex query reformulation in medical IR

机译:Phrase2VecGLM:基于神经广义语言模型的语义标记,用于医学IR中复杂的查询重构

获取原文

摘要

In fact-based information retrieval, state-of-the-art performance is traditionally achieved by knowledge graphs driven by knowledge bases, as they can represent facts about and capture relationships between entities very well. However, in domains such as medical information retrieval, where addressing specific information needs of complex queries may require understanding query intent by capturing novel associations between potentially latent concepts, these systems can fall short. In this work, we develop a novel, completely unsupervised, neural language model-based ranking approach for semantic tagging of documents, using the document to be tagged as a query into the model to retrieve candidate phrases from top-ranked related documents, thus associating every document with novel related concepts extracted from the text. For this we extend the word embedding-based generalized language model (GLM) due to (Ganguly et al., 2015), to employ phrasal embeddings, and use the semantic tags thus obtained for downstream query expansion, both directly and in feedback loop settings. Our method, evaluated using the TREC 2016 clinical decision support challenge dataset, shows statistically significant improvement not only over various baselines that use standard MeSH terms and UMLS concepts for query expansion, but also over baselines using human expert-assigned concept tags for the queries, on top of a standard Okapi BM25-based document retrieval system.
机译:在基于事实的信息检索中,传统上,先进的性能是由知识库驱动的知识图来实现的,因为它们可以很好地表示事实并很好地捕获实体之间的关系。但是,在诸如医学信息检索之类的领域中,要解决复杂查询的特定信息需求可能需要通过捕获潜在的潜在概念之间的新颖关联来理解查询意图,这些系统可能会不完善。在这项工作中,我们开发了一种新颖的,完全不受监督的,基于神经语言模型的文档语义标记排序方法,使用被标记为文档的文档作为查询模型,以从排名最高的相关文档中检索候选短语,从而进行关联从文本中提取的每个具有新颖相关概念的文档。为此,由于(Ganguly et al。,2015),我们扩展了基于词嵌入的通用语言模型(GLM),以使用短语嵌入,并将由此获得的语义标签用于直接和反馈循环设置中的下游查询扩展。我们的方法经过TREC 2016临床决策支持挑战数据集的评估,不仅在使用标准MeSH术语和UMLS概念进行查询扩展的各种基线上,而且在使用专家分配的概念标签进行查询的基线上,都显示出统计学上的显着改善,在基于Okapi BM25的标准文档检索系统之上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号