首页> 外文学位 >Exploiting query features in language modeling approach for information retrieval.
【24h】

Exploiting query features in language modeling approach for information retrieval.

机译:在语言建模方法中利用查询功能进行信息检索。

获取原文
获取原文并翻译 | 示例

摘要

Recent advances in Information Retrieval (IR) are based on statistical language models. Most retrieval experiments demonstrating the language modeling approach use smoothed unigram language models that only exploit the term occurrence statistic in probability estimation. Experiments with additional features like bigrams have met with limited success. However, language models incorporating n-gram, word-triggers, topic of discourse, syntactic and semantic features have shown significant improvements in speech recognition.; The main thrust of this dissertation is to identify the need to design language models for IR that satisfy its specific modeling requirements and demonstrate it by designing language models that (1) incorporate IR-specific features (biterm language model), (2) correspond to better document and query representations (concept language model) and (3) combine evidence from the different information sources (language features) towards modeling the relevance of a document to a given query (maximum entropy language models for IR).; Illustrating the difference between the language modeling requirements of speech recognition and information retrieval, the dissertation proposes biterm language model that identifies term co-occurrence rather than order of term occurrence as an important feature for IR. Biterm language models handle the local variation in the surface form of the words that express a concept of interest. It is, however, these concepts that need to be modeled in the queries to improve retrieval performance. Concept language models proposed here model user's information need as a sequence of concepts and the query as an expression of such concepts of interest. Empirical results demonstrate significant improvements in retrieval performance.; While mixture models, that combine statistical evidence from different information sources to estimate the probability distribution, are easy to implement, they seem to make suboptimal use of their components. A natural method of combining information sources based on the Maximum Entropy Principle, that has been shown to be effective in speech recognition, is proposed here as a solution to the information retrieval problem. In the context of document likelihood models, the maximum entropy language model for information retrieval provides a better mechanism for incorporating external knowledge and additional syntactic and semantic features of the language in language models for IR.
机译:信息检索(IR)的最新进展基于统计语言模型。大多数展示语言建模方法的检索实验都使用平滑的unigram语言模型,该模型仅在概率估计中利用术语出现统计量。像bigrams这样的附加功能的实验取得了有限的成功。但是,结合了n-gram,单词触发,语篇话题,句法和语义特征的语言模型在语音识别方面已显示出显着的改进。本文的主要目的是确定需要设计满足其特定建模要求的IR语言模型并通过设计(1)包含IR特定功能(双向语言模型),(2)更好的文档和查询表示形式(概念语言模型),以及(3)结合来自不同信息源(语言功能)的证据,以对文档与给定查询的相关性进行建模(用于IR的最大熵语言模型);为了说明语音识别和信息检索的语言建模要求之间的差异,本文提出了一种识别术语共现而不是术语出现顺序的双项语言模型,这是IR的重要特征。双项语言模型处理表达感兴趣概念的单词的表面形式的局部变化。但是,这些概念需要在查询中建模以提高检索性能。这里提出的概念语言模型将用户的信息需求建模为一系列概念,并将查询建模为这些感兴趣的概念的表达。实验结果表明检索​​性能有了显着提高。虽然混合模型可以轻松实现,该模型结合了来自不同信息源的统计证据以估计概率分布,但它们似乎并未充分利用其成分。在此提出了一种基于最大熵原理组合信息源的自然方法,该方法已被证明在语音识别中很有效,作为解决信息检索问题的一种方法。在文档可能性模型的上下文中,用于信息检索的最大熵语言模型提供了一种更好的机制,可以将外部知识以及该语言的其他句法和语义特征纳入IR语言模型中。

著录项

  • 作者

    Srikanth, Munirathnam.;

  • 作者单位

    State University of New York at Buffalo.;

  • 授予单位 State University of New York at Buffalo.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 151 p.
  • 总页数 151
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号