Dictionaries have often been used for query translation in cross-language information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC English-Chinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.
词典通常用于跨语言信息检索(CLIR)中的查询翻译。但是,我们面临翻译含糊不清的问题,即,多个翻译被存储在一个单词的字典中。另外,逐字查询翻译不够精确。在本文中,我们探索了几种改进以前基于字典的查询翻译的方法。首先,通过使用统计模型和短语翻译模式,尽可能多地识别和翻译名词短语。其次,根据翻译词的衔接选择最佳的词翻译。我们在TREC英汉CLIR集合上的实验结果表明,这些技术比简单的字典方法产生了显着改进,并且比高质量的机器翻译系统具有更好的性能。 P>
机译:跨语言信息检索的统计查询翻译模型
机译:改进英语-韩语跨语言信息检索中的查询翻译
机译:在跨语言信息检索中嵌入基于Web的统计翻译模型
机译:使用统计模型改进跨语言信息检索的查询翻译
机译:跨语言信息检索中的翻译事件:词汇歧义,词汇漏洞,词汇不匹配和正确翻译。
机译:消费者健康资源的机器翻译支持的跨语言信息检索
机译:使用统计模型改进查询翻译以进行跨语言信息检索