...
首页> 外文期刊>Computers in Industry >Enhancing passage retrieval in log files by query expansion based on explicit and pseudo relevance feedback
【24h】

Enhancing passage retrieval in log files by query expansion based on explicit and pseudo relevance feedback

机译:通过基于显式和伪相关性反馈的查询扩展来增强日志文件中的段落检索

获取原文
获取原文并翻译 | 示例
           

摘要

Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very efficient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results could be to consider query expansions that aim at adding automatically or semi-automatically additional information in the query to improve the reliability and accuracy of the returned results. In this paper, we present a new approach for enhancing the relevancy of queries during a passage retrieval in log files. It is based on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of the requested information within a learning process. The second step is a new kind of pseudo relevance feedback. Based on a novel term weighting measure it aims at assigning a weight to terms according to their relatedness to queries. This measure, called TRQ (Term Relatedness to Query), is used to identify the most relevant expansion terms. The main advantage of our approach is that is can be applied both on log files and documents from general domains. Experiments conducted on real data from logs and documents show that our query expansion protocol enables retrieval of relevant passages.
机译:段落检索通常定义为搜索段落的任务,其中可能包含给定查询的答案。尽管这些方法在处理文本时非常有效,但应用于日志文件(即包含数字和符号信息的半结构化数据)时,它们通常提供无关或无用的结果。但是,一种用于改善结果的方法很有吸引力,可以考虑考虑进行查询扩展,这些扩展旨在自动或半自动在查询中添加其他信息,以提高返回结果的可靠性和准确性。在本文中,我们提出了一种新的方法来增强日志文件中段落检索过程中查询的相关性。它基于两个相关性反馈步骤。在第一个中,我们通过在学习过程中确定所请求信息的上下文来确定显式相关性反馈。第二步是一种新型的伪相关反馈。基于一种新颖的术语加权度量,它旨在根据术语与查询的相关性为各个术语分配权重。此度量称为TRQ(与查询的术语相关性),用于识别最相关的扩展术语。我们方法的主要优点是可以应用于常规域中的日志文件和文档。对来自日志和文档的真实数据进行的实验表明,我们的查询扩展协议可以检索相关段落。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号