首页> 外文会议>Multilingual information access in South Asian languages >Frequent Case Generation in Ad Hoc Retrieval of Three Indian Languages - Bengali, Gujarati and Marathi
【24h】

Frequent Case Generation in Ad Hoc Retrieval of Three Indian Languages - Bengali, Gujarati and Marathi

机译:三种印度语言(孟加拉语,古吉拉特语和马拉地语)的临时检索中频繁生成案例

获取原文
获取原文并翻译 | 示例

摘要

This paper presents results of a generative method for the management of morphological variation of query keywords in Bengali, Gujarati and Marathi. The method is called Frequent Case Generation (FCG). It is based on the skewed distributions of word forms in natural languages and is suitable for languages that have either fair amount of morphological variation or are morphologically very rich. We participated in the ad hoc task at FIRE 2011 and applied the FCG method on monolingual Bengali, Gujarati and Marathi test collections. Our evaluation was carried out with title and description fields of test topics, and the Lemur search engine. We used plain unprocessed word index as the baseline, and n-gramming and stemming as competing methods. The evaluation results show 30%, 16% and 70% relative mean average precision improvements for Bengali, Gujarati and Marathi respectively when comparing the FCG method to plain words. The method shows competitive performance in comparison to n-gramming and stemming.
机译:本文介绍了一种用于管理孟加拉语,古吉拉特语和马拉地语中查询关键词的形态变化的生成方法的结果。该方法称为“频繁案例生成”(FCG)。它基于自然语言中单词形式的偏斜分布,并且适合于形态变化量很大或形态上非常丰富的语言。我们参加了FIRE 2011的临时任务,并将FCG方法应用于孟加拉语,古吉拉特语和马拉地语单语测试集。我们使用测试主题的标题和描述字段以及Lemur搜索引擎进行了评估。我们使用普通的未经处理的单词索引作为基准,并使用n-gramming和temming作为竞争方法。评估结果显示,将FCG方法与普通单词进行比较时,孟加拉语,古吉拉特语和马拉地语的相对平均平均精度分别提高了30%,16%和70%。与n语法和词干相比,该方法显示出竞争性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号