首页> 外文期刊>Information Processing & Management >Term disambiguation techniques based on target document collection for cross-language information retrieval: An empirical comparison of performance between techniques
【24h】

Term disambiguation techniques based on target document collection for cross-language information retrieval: An empirical comparison of performance between techniques

机译:基于目标文档集合的术语歧义消除技术用于跨语言信息检索:技术之间性能的实证比较

获取原文
获取原文并翻译 | 示例
           

摘要

Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches. (c) 2006 Elsevier Ltd. All rights reserved.
机译:用于跨语言信息检索的基于字典的查询翻译通常会产生各种查询候选,这些候选翻译对于查询中的源术语具有不同的含义。本文研究仅基于目标文档集合的解决翻译歧义的方法。首先,我们讨论两种消歧技术:(1)一种是在集合中使用术语共现统计的方法,(2)一种基于伪相关反馈的技术。接下来,使用CLEF 2003测试集对这些技术进行经验比较,该测试集适用于德语到意大利语的双语搜索,这些搜索以英语为中心进行。实验表明,基于术语共现的技术是占优势的,其中使用最佳的翻译算法和余弦系数来选择翻译的最佳顺序算法,并且PRF方法显示出相当高的搜索性能,尽管统计测试还不够。支持这些结论。此外,为了验证我们的发现,我们在相同的CLEF 2003测试集中对法语到意大利语(枢轴)和英语到意大利语(非枢轴)的搜索重复了相同的实验。同样,观察到了相似的结果,只是在基于英语到意大利语搜索的词共现进行歧义消除的情况下,Dice系数略胜于余弦系数。 (c)2006 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号