首页> 外文学位 >Exploiting query features in language modeling approach for information retrieval.

【24h】

Exploiting query features in language modeling approach for information retrieval.

机译：在语言建模方法中利用查询功能进行信息检索。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent advances in Information Retrieval (IR) are based on statistical language models. Most retrieval experiments demonstrating the language modeling approach use smoothed unigram language models that only exploit the term occurrence statistic in probability estimation. Experiments with additional features like bigrams have met with limited success. However, language models incorporating n-gram, word-triggers, topic of discourse, syntactic and semantic features have shown significant improvements in speech recognition.; The main thrust of this dissertation is to identify the need to design language models for IR that satisfy its specific modeling requirements and demonstrate it by designing language models that (1) incorporate IR-specific features (biterm language model), (2) correspond to better document and query representations (concept language model) and (3) combine evidence from the different information sources (language features) towards modeling the relevance of a document to a given query (maximum entropy language models for IR).; Illustrating the difference between the language modeling requirements of speech recognition and information retrieval, the dissertation proposes biterm language model that identifies term co-occurrence rather than order of term occurrence as an important feature for IR. Biterm language models handle the local variation in the surface form of the words that express a concept of interest. It is, however, these concepts that need to be modeled in the queries to improve retrieval performance. Concept language models proposed here model user's information need as a sequence of concepts and the query as an expression of such concepts of interest. Empirical results demonstrate significant improvements in retrieval performance.; While mixture models, that combine statistical evidence from different information sources to estimate the probability distribution, are easy to implement, they seem to make suboptimal use of their components. A natural method of combining information sources based on the Maximum Entropy Principle, that has been shown to be effective in speech recognition, is proposed here as a solution to the information retrieval problem. In the context of document likelihood models, the maximum entropy language model for information retrieval provides a better mechanism for incorporating external knowledge and additional syntactic and semantic features of the language in language models for IR.

机译：信息检索（IR）的最新进展基于统计语言模型。大多数展示语言建模方法的检索实验都使用平滑的unigram语言模型，该模型仅在概率估计中利用术语出现统计量。像bigrams这样的附加功能的实验取得了有限的成功。但是，结合了n-gram，单词触发，语篇话题，句法和语义特征的语言模型在语音识别方面已显示出显着的改进。本文的主要目的是确定需要设计满足其特定建模要求的IR语言模型并通过设计（1）包含IR特定功能（双向语言模型），（2）更好的文档和查询表示形式（概念语言模型），以及（3）结合来自不同信息源（语言功能）的证据，以对文档与给定查询的相关性进行建模（用于IR的最大熵语言模型）；为了说明语音识别和信息检索的语言建模要求之间的差异，本文提出了一种识别术语共现而不是术语出现顺序的双项语言模型，这是IR的重要特征。双项语言模型处理表达感兴趣概念的单词的表面形式的局部变化。但是，这些概念需要在查询中建模以提高检索性能。这里提出的概念语言模型将用户的信息需求建模为一系列概念，并将查询建模为这些感兴趣的概念的表达。实验结果表明检索性能有了显着提高。虽然混合模型可以轻松实现，该模型结合了来自不同信息源的统计证据以估计概率分布，但它们似乎并未充分利用其成分。在此提出了一种基于最大熵原理组合信息源的自然方法，该方法已被证明在语音识别中很有效，作为解决信息检索问题的一种方法。在文档可能性模型的上下文中，用于信息检索的最大熵语言模型提供了一种更好的机制，可以将外部知识以及该语言的其他句法和语义特征纳入IR语言模型中。

著录项

作者
Srikanth, Munirathnam.;
展开▼
作者单位

State University of New York at Buffalo.;

展开▼
授予单位 State University of New York at Buffalo.;
学科 Computer Science.
学位 Ph.D.
年度 2004
页码 151 p.
总页数 151
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A novel approach for modeling non-keyword intervals in a keyword spotter exploiting acoustic similarities of languages [J] . Heracleous P, Shimizu T Speech Communication . 2005,第4期

机译：一种利用语言的声学相似性在关键字检测器中对非关键字间隔进行建模的新颖方法
2. Natural language querying of databases: an information extraction approach in the conceptual query language [J] . Owei V. International journal of human-computer studies . 2000,第4期

机译：数据库的自然语言查询：概念查询语言中的信息提取方法
3. An empirical study of query expansion and cluster-based retrieval in language modeling approach [J] . Na SH, Kang IS, Roh JE, Information Processing & Management . 2007,第2期

机译：语言建模方法中查询扩展和基于聚类的检索的实证研究
4. Exploiting syntactic structure of queries in a language modeling approach to IR [C] . Munirathnam Srikanth, Rohini Srihari International conference on Information and knowledge management . 2003

机译：在IR的语言建模方法中利用查询的句法结构
5. Cluster-based Query Expansion Using Language Modeling for Biomedical Literature Retrieval. [D] . Xu, Xuheng. 2011

机译：用于生物医学文献检索的使用语言建模的基于聚类的查询扩展。
6. Graphical modeling and query language for hospitals [O] . Janis Barzdins, Juris Barzdins, Edgars Rencis, 2013

机译：医院的图形建模和查询语言
7. Query expansion using term relationships in language models for information retrieval. [O] . Bai, Jing, Song, Dawei, Bruza, Peter D., 2005

机译：使用语言模型中的术语关系进行查询扩展以进行信息检索。
8. Building Effective Queries in Natural Language Information Retrieval. [R] . Strzalkowski, T., Lin, F., Perez-Carballo, J., 1997

机译：在自然语言信息检索中构建有效查询。

Exploiting query features in language modeling approach for information retrieval.

摘要

著录项

相似文献

相关主题

期刊订阅