首页> 外文会议>Multilingual information access in South Asian languages >Improving Cross-Language Information Retrieval by Transliteration Mining and Generation
【24h】

Improving Cross-Language Information Retrieval by Transliteration Mining and Generation

机译:通过音译挖掘和生成改进跨语言信息检索

获取原文
获取原文并翻译 | 示例

摘要

The retrieval performance of Cross-Language Retrieval (CLIR) systems is a function of the coverage of the translation lexicon used by them. Unfortunately, most translation lexicons do not provide a good coverage of proper nouns and common nouns which are often the most information-bearing terms in a query. As a consequence, many queries cannot be translated without a substantial loss of information and the retrieval performance of the CLIR system is less than satisfactory for those queries. However, proper nouns and common nouns very often appear in their transliterated forms in the target language document collection. In this work, we study two techniques that leverage this fact for addressing the problem, namely, Transliteration Mining and Transliteration Generation. The first technique attempts to mine the transliterations of out-of-vocabulary query terms from the document collection whereas the second generates the transliterations. We systematically study the effectiveness of both techniques in the context of the Hindi-English and Tamil-English ad hoc retrieval tasks at FIRE2010. The results of our study show that both techniques are effective in addressing the problem posed by out-of-vocabulary terms with Transliteration Mining technique giving better results than Transliteration Generation.
机译:跨语言检索(CLIR)系统的检索性能是其使用的翻译词典覆盖率的函数。不幸的是,大多数翻译词典没有很好地覆盖专有名词和普通名词,这些专有名词和普通名词通常是查询中信息量最大的术语。结果,许多查询在没有大量信息丢失的情况下就无法翻译,并且CLIR系统的检索性能对于这些查询而言并不令人满意。但是,专有名词和普通名词经常以音译形式出现在目标语言文档集中。在这项工作中,我们研究了两种利用这一事实解决问题的技术,即音译挖掘和音译生成。第一种技术尝试从文档集合中挖掘词汇外查询词的音译,而第二种技术则生成音译。我们在FIRE2010的印地语-英语和泰米尔语-英语临时检索任务的背景下系统地研究了这两种技术的有效性。我们的研究结果表明,与音译生成相比,音译挖掘技术可以更好地解决语音中词汇不足带来的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号