首页> 外文会议>Multilingual information access in South Asian languages >UTA Stemming and Lemmatization Experiments in the FIRE Bengali Ad Hoc Task
【24h】

UTA Stemming and Lemmatization Experiments in the FIRE Bengali Ad Hoc Task

机译:FIRE Bengali Ad Hoc任务中的UTA阻止和放行实验

获取原文
获取原文并翻译 | 示例

摘要

UTA participated in the monolingual Bengali ad hoc Track at FIRE 2010. As Bengali is highly inflectional, we experimented with three language normalizers: one stemmer, YASS, and two lemmatizers, GRALE and StaLe. YASS is a corpus-based unsupervised statistical stemmer capable of handling several languages through suffix removal. GRALE is a novel graph-based lem-matizer for Bengali, but extendable for other agglutinative languages. StaLe is a statistical rule-based lemmatizer that has been implemented for several languages. We analyze 9 runs, using the three systems for the title (T) and title-and-description (TD) and title-description-and-narrative (TDN). The T runs were the least effective with MAP about 0.34 (P@10 about 0.30). All the TD runs delivered a MAP close to 0.45 (P@10 about 0.37), while the TDN runs gave a MAP of 0.50 to 0.52 (P@10 about 0.41). The performances of the three normalizers are close to each other, but they have different strengths in other aspects. The performances compare well with the ones other groups obtained in the monolingual Bengali ad hoc Track at FIRE 2010.
机译:UTA参加了2010年FIRE的单语孟加拉语临时跟踪。由于孟加拉语的语调变化很大,我们尝试了三种语言归一化器:一个词干分析器(YASS)和两个词形归一化处理程序(GRAME和StaLe)。 YASS是基于语料库的无监督统计词干分析器,能够通过去除后缀来处理多种语言。 GRALE是孟加拉语的一种新颖的基于图的lem-matizer,但可扩展到其他凝集语言。 StaLe是一种基于统计规则的lemmatizer,已针对多种语言实现。我们使用标题(T),标题和说明(TD)和标题说明和叙事(TDN)这三个系统分析了9个运行。对于MAP约为0.34(P @ 10约为0.30),T运行效果最低。所有TD运行的MAP均接近0.45(P @ 10约为0.37),而TDN运行的MAP则为0.50至0.52(P @ 10约为0.41)。三种规格化器的性能彼此接近,但是在其他方面却具有不同的优势。与在FIRE 2010的孟加拉语专案单轨比赛中获得的其他表演相比,这些表演表现出色。

著录项

  • 来源
  • 会议地点 Gandhinagar(IN);Bombay(IN)
  • 作者单位

    School of Information Sciences, University of Tampere, Finland Indian Statistical Institute, Kolkata, India;

    School of Information Sciences, University of Tampere, Finland Indian Statistical Institute, Kolkata, India;

    School of Information Sciences, University of Tampere, Finland Indian Statistical Institute, Kolkata, India;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号