首页> 外文期刊>Information Processing & Management >Searching strategies for the Hungarian language
【24h】

Searching strategies for the Hungarian language

机译:匈牙利语的搜索策略

获取原文
获取原文并翻译 | 示例
       

摘要

This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.
机译:本文报告了在处理匈牙利语中发现的复杂形态和复合结构时遇到的潜在IR问题。它描述了对该语言在两种通用词干策略上进行的评估,还证明了轻量词干方法可能非常有效。基于对CLEF测试集合的搜索,我们发现更具攻击性的后缀剥离方法可能会产生更好的MAP。与不使用茎干的IR方案或仅使用轻茎的IR方案相比,我们发现差异具有统计学意义。与概率模型,向量空间模型和语言模型进行比较时,我们发现Okapi模型的检索效果最佳。发现生成的MAP比传统的tf idf方法好大约35%,特别是对于非常短的请求。最后,我们证明,与基于单词的索引策略相比,对查询和文档应用自动分解过程可显着提高IR性能(+ 10%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号