首页> 外文会议>Multilingual information access in South Asian languages >Term Conflation and Blind Relevance Feedback for Information Retrieval on Indian Languages
【24h】

Term Conflation and Blind Relevance Feedback for Information Retrieval on Indian Languages

机译:印度语信息检索中的术语合并和盲相关反馈

获取原文
获取原文并翻译 | 示例

摘要

For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, Information Retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conflation, Blind Relevance Feedback (BRF), and manual and automatic query translation. The experiments are based on BM25 and on language modeling (LM) for IR. Results show that term conflation always improves Mean Average Precision (MAP) compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi, corpus-based stemming approach achieves a higher MAP. For Bengali, the LM retrieval model with the rule based stemmer achieves a higher (but not significantly higher) MAP than BM25 with a corpus based stemmer (0.4583 vs. 0.4526). In all experiments, BRF yields considerably higher MAP in comparison to experiments without it. Bilingual IR experiments (English to Bengali and English to Hindi) are based on query translations obtained from native speakers and the Google translate web service. For the automatically translated queries, MAP is slightly (but not significantly) lower compared to experiments with manual query translations. The bilingual English to Bengali (English to Hindi) experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best corresponding monolingual experiments.
机译:为了让都柏林城市大学(DCU)首次参加FIRE 2010评估活动,进行了英语,孟加拉语,北印度语和马拉地语文档的信息检索(IR)实验,以研究术语合并,盲目相关反馈(BRF)和手册和自动查询翻译。实验基于BM25和IR的语言建模(LM)。结果表明,与索引未处理的单词形式相比,术语合并总是可以提高平均平均精度(MAP),但是对于不同的语言,不同的方法似乎效果最好。例如,在单语的Marathi实验中,索引5个前缀的性能要优于基于语料库的词干。在印地语中,基于语料库的词干提取方法可实现更高的MAP。对于孟加拉语而言,具有基于规则的词干的LM检索模型比具有基于主体的词干的BM25可获得更高(但不是更高)的MAP(0.4583与0.4526)。在所有实验中,与不使用BRF的实验相比,BRF产生的MAP更高。双语IR实验(英语到孟加拉语和英语到印地语)基于从母语使用者和Google翻译网络服务获得的查询翻译。对于自动翻译的查询,与使用手动查询翻译的实验相比,MAP略低(但不明显)。双语到孟加拉语(英语到北印度语)实验达到了最佳对应的单语实验的81.7%-83.3%(78.0%-80.6%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号