首页> 外文期刊>Information Processing & Management >Empirical studies on the impact of lexical resources on CLIR performance
【24h】

Empirical studies on the impact of lexical resources on CLIR performance

机译:词汇资源对CLIR绩效影响的实证研究

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we compile and review several experiments measuring cross-lingual information retrieval (CLIR) performance as a function of the following resources: bilingual term lists, parallel corpora, machine translation (MT), and stemmers. Our CUR system uses a simple probabilistic language model; the studies used TREC test corpora over Chinese. Spanish and Arabic. Our findings include:One can achieve an acceptable CLIR performance using only a bilingual term list (70-80% on Chinese and Arabic corpora).However, if a bilingual term list and parallel corpora are available, CLIR performance can rival monolingual performance.If no parallel corpus is available, pseudo-parallel texts produced by an MT system can partially overcome the lack of parallel text.While stemming is useful normally, with a very large parallel corpus for Arabic-English, stemming hurt performance in our empirical studies with Arabic, a highly inflected language. (C) 2004 Elsevier Ltd. All rights reserved.
机译:在本文中,我们编译并审查了几个实验,这些实验根据以下资源来衡量跨语言信息检索(CLIR)的性能:双语术语列表,并行语料库,机器翻译(MT)和词干分析器。我们的CUR系统使用简单的概率语言模型;该研究对中文使用TREC测试语料库。西班牙语和阿拉伯语。我们的发现包括:仅使用双语术语列表(在中文和阿拉伯语语料库中占70-80%)就能获得可接受的CLIR性能,但是,如果有双语术语列表和平行语料库可用,CLIR性能可以与单语性能相媲美。没有平行语料库,MT系统生成的伪平行语料可以部分克服平行语的缺失。尽管词干通常很有用,阿拉伯语-英语的平行语料库非常大,但在我们的阿拉伯语实证研究中却阻止了表现,一种高度折衷的语言。 (C)2004 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号