首页> 外文学位 >A regression framework for learning to rank in web information retrieval.
【24h】

A regression framework for learning to rank in web information retrieval.

机译:用于学习在网络信息检索中排名的回归框架。

获取原文
获取原文并翻译 | 示例

摘要

Machine learning approaches for learning ranking functions have been generating much interest from both the web information retrieval community and the machine learning community recently. It has the promise of improved relevancy of search engines and reduced demand for manual parameter tuning. We focus on developing a regression framework for learning to rank with complex loss functions. More specifically, this framework first applies functional iterative or boosting algorithm to compute updates for a given loss function and then fit the updates with a standard regression base learner. We explore supervised learning methodology from machine learning, and we distinguish two types of relevance judgments used as the training data: (1) absolute relevance judgments arising from explicit labeling of query-document pairs; and (2) relative relevance judgments extracted from user click-throughs of search results or converted from the absolute relevance judgments. Within the framework, we propose three novel ranking algorithms and illustrate their application to web search ranking. The first one is to calibrate the existing point-wise(univariant) regression loss to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we present an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. It is an improvement over the existing approach within the category of learning to rank using point-wise regression loss. The second is an extension of gradient boosting methods for point-wise regression loss to complex(multi-variant) loss functions. It is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. We illustrate an application of this approach in pair-wise preference learning to rank for Web search by combining both preference data and labeled data. The third one is a list-wise approach based on minimum effort optimization that takes into account the entire training data within a query at each iteration. We tackle this optimization problem using functional iterative methods where the update at each iteration is computed by solving an isotonic regression problem. This more global approach results in faster convergency and signficantly improved performance of the learned ranking functions over existing state-of-the-art methods.;Experimental results are carried out using both data sets obtained from a commercial search engine and widely used IR benchmarking data, namely OHSUMED and TREC. Our results show significant improvements of our proposed methods over existing state-of-the-art methods.
机译:最近,用于学习排名功能的机器学习方法引起了Web信息检索社区和机器学习社区的极大兴趣。它有望改善搜索引擎的相关性,并减少对手动参数调整的需求。我们专注于开发一种回归框架,用于学习使用复杂损失函数进行排名。更具体地说,此框架首先应用功能迭代或增强算法来计算给定损失函数的更新,然后使用标准回归基础学习器拟合更新。我们从机器学习中探索有监督的学习方法,我们区分了两种类型的相关性判断作为训练数据:(1)由显式标记查询文档对引起的绝对相关性判断; (2)从用户对搜索结果的点击中提取或从绝对相关性判断转换而来的相对相关性判断。在该框架内,我们提出了三种新颖的排名算法,并说明了它们在网络搜索排名中的应用。第一个方法是校准现有的逐点(单变量)回归损失,以在统计模型中引入扰民参数方面纳入查询差异,并且我们提出一种交替优化方法,以同时学习检索功能和扰民参数。这是对使用点逐步回归损失进行排名的学习方法中现有方法的一种改进。第二个是将梯度提升方法用于点逐步回归损失扩展为复杂(多变量)损失函数。它基于损失函数的二次上限的优化,这使我们能够对算法进行严格的收敛分析。我们通过结合偏好数据和标记数据,说明了该方法在成对偏好学习中对Web搜索进行排名的应用。第三种是基于列表的最小工作量优化方法,该方法考虑了每次迭代中查询中的整个训练数据。我们使用功能迭代方法解决此优化问题,其中通过解决等渗回归问题来计算每次迭代的更新。与现有的最新方法相比,这种更具全局性的方法可以使学习的排名函数更快地收敛并显着改善性能;使用从商业搜索引擎获得的数据集和广泛使用的IR基准测试数据来执行实验结果,即OHSUMED和TREC。我们的结果表明,相对于现有的最新方法,我们提出的方法有了重大改进。

著录项

  • 作者

    Zheng, Zhaohui.;

  • 作者单位

    State University of New York at Buffalo.;

  • 授予单位 State University of New York at Buffalo.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 88 p.
  • 总页数 88
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号