【24h】

Improving query translation for cross-language information retrieval using statistical models

机译:使用统计模型改进查询翻译以进行跨语言信息检索

获取原文

摘要

Dictionaries have often been used for query translation in cross-language information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC English-Chinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.

机译:

词典通常用于跨语言信息检索(CLIR)中的查询翻译。但是,我们面临翻译含糊不清的问题,即,多个翻译被存储在一个单词的字典中。另外,逐字查询翻译不够精确。在本文中,我们探索了几种改进以前基于字典的查询翻译的方法。首先,通过使用统计模型和短语翻译模式,尽可能多地识别和翻译名词短语。其次,根据翻译词的衔接选择最佳的词翻译。我们在TREC英汉CLIR集合上的实验结果表明,这些技术比简单的字典方法产生了显着改进,并且比高质量的机器翻译系统具有更好的性能。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号