首页> 外文学位 >Cluster-based Query Expansion Using Language Modeling for Biomedical Literature Retrieval.
【24h】

Cluster-based Query Expansion Using Language Modeling for Biomedical Literature Retrieval.

机译:用于生物医学文献检索的使用语言建模的基于聚类的查询扩展。

获取原文
获取原文并翻译 | 示例

摘要

The tremendously huge volume of biomedical literature, scientists' specific information needs, long terms of multiples words, and fundamental problems of synonym and polysemy have been challenging issues facing the biomedical information retrieval community researchers. Search engines have significantly improved the efficiency and effectiveness of biomedical literature searching. The search engines, however, are known to return many results that are irrelevant to the intention of a user's query, in other words, perform not very sound in terms of precision and recall. To further improve precision and recall of biomedical informational retrieval, various query expansion strategies are widely used. In this thesis, we concentrate on empirical comparison, experiments and evaluations in investigating query expansion methods. We also use the findings as an empirical justification for cluster-based query expansion. We have investigated broadly many methods of query expansion such as local analysis, global analysis, ontology-based term reweighting across various search engines and obtained important insights. Among the findings, two-stage concept-based latent semantic analysis strategy and cluster-based query expansion have been presented and the Singular Value Decomposition (SVD) technique in the Latent Semantic Indexing (LSI) is utilized in the proposed method. In contrast to other query expansion methods, our strategy selects those terms that are most similar to the concepts of in the query as well as the related documents, rather than selects terms that are similar to the query terms only. Furthermore, we propose a novel framework for cluster-based query expansion. we have designed and implemented a novel and efficient computational approach to cluster-based query expansion using language modeling. Through our experiments in TREC genomic track ad-hoc retrieval task, we demonstrate that clusters which are created based on the whole collection or the initially returned document results of the original query can be utilized to perform query expansion and eventually improve the overall effectiveness and performance of information retrieval system in the biomedical literature retrieval. Lastly, we believe the principles of this strategy may be extended and utilized in other domains.
机译:生物医学文献的数量巨大,科学家的特定信息需求,长字倍数以及同义词和多义性的基本问题一直在挑战生物医学信息检索社区研究人员面临的问题。搜索引擎已大大提高了生物医学文献搜索的效率和有效性。但是,已知搜索引擎会返回许多与用户查询的目的无关的结果,换句话说,就准确性和召回性而言,它们的表现不是很好。为了进一步提高精度和生物医学信息检索的回忆性,广泛使用了各种查询扩展策略。在本文中,我们主要研究调查扩展方法的经验比较,实验和评估。我们还将这些发现用作基于集群的查询扩展的经验依据。我们已经广泛研究了许多查询扩展方法,例如局部分析,全局分析,跨各种搜索引擎的基于本体的术语重加权,并获得了重要的见解。在研究结果中,提出了基于概念的两阶段潜在语义分析策略和基于聚类的查询扩展,并在潜在语义索引(LSI)中利用了奇异值分解(SVD)技术。与其他查询扩展方法相比,我们的策略选择与查询以及相关文档中的概念最相似的术语,而不是仅选择与查询术语相似的术语。此外,我们提出了一种基于集群的查询扩展的新颖框架。我们使用语言建模为基于集群的查询扩展设计并实现了一种新颖高效的计算方法。通过我们在TREC基因组轨道临时检索任务中的实验,我们证明了基于整个集合或原始查询的初始返回文档结果创建的聚类可以用于执行查询扩展,并最终提高整体有效性和性能信息检索系统在生物医学文献检索中的应用。最后,我们认为该策略的原理可能会在其他领域得到扩展和利用。

著录项

  • 作者

    Xu, Xuheng.;

  • 作者单位

    Drexel University.;

  • 授予单位 Drexel University.;
  • 学科 Information Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 103 p.
  • 总页数 103
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号