Searching strategies for the Hungarian language

Jacques Savoy

首页> 外文期刊>Information Processing & Management >Searching strategies for the Hungarian language

【24h】

Searching strategies for the Hungarian language

机译：匈牙利语的搜索策略

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf idf approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies.

机译：本文报告了在处理匈牙利语中发现的复杂形态和复合结构时遇到的潜在IR问题。它描述了对该语言在两种通用词干策略上进行的评估，还证明了轻量词干方法可能非常有效。基于对CLEF测试集合的搜索，我们发现更具攻击性的后缀剥离方法可能会产生更好的MAP。与不使用茎干的IR方案或仅使用轻茎的IR方案相比，我们发现差异具有统计学意义。与概率模型，向量空间模型和语言模型进行比较时，我们发现Okapi模型的检索效果最佳。发现生成的MAP比传统的tf idf方法好大约35％，特别是对于非常短的请求。最后，我们证明，与基于单词的索引策略相比，对查询和文档应用自动分解过程可显着提高IR性能（+ 10％）。

著录项

来源
《Information Processing & Management》 |2008年第1期|p.310-324|共15页
作者
Jacques Savoy;
展开▼
作者单位

Computer Science Department, University of Neuchatel, Rue Emile Argand 11, 2009 Neuchatel, Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;情报学、情报工作;
关键词
hungarian information retrieval; hungarian language; CLEF; evaluation; decompounding; n-gram indexing;

机译：匈牙利信息检索;匈牙利语言;CLEF;评估;分解;n-gram索引;

相似文献

外文文献
中文文献
专利

1. Indexing and Searching Strategies for the Russian Language [J] . Ljiljana Dolamic, Jacques Savoy Journal of the American Society for Information Science and Technology . 2009,第13期

机译：俄语的索引和搜索策略
2. Searching strategies for the Bulgarian language [J] . Jacques Savoy Information retrieval . 2007,第6期

机译：保加利亚语的搜索策略
3. Bibliographic database searching by graduate students in language and literature: Search strategies, system interfaces, and relevance judgments [J] . Debora Shaw Library & Information Science Research . 1995,第4期

机译：研究生在语言和文学方面的书目数据库搜索：搜索策略，系统界面和相关性判断
4. Analysis of Chinese-English Mixed Language Query Reformulation Strategies and Patterns During Web Searching [C] . Hengyi Fu ASISamp;T Annual Meeting . 2016

机译：网络搜索过程中汉英混合语言查询重构策略与模式分析
5. Term selection process in subject searching: End-user interactions with information retrieval systems and indexing languages. [D] . Salaba, Athena. 2005

机译：主题搜索中的术语选择过程：最终用户与信息检索系统和索引语言的交互。
6. Cross-cultural differences in foreign language learning strategy preferences among Hungarian Chinese and Mongolian University students [O] . Anita Habók, Yunjun Kong, Jargaltuya Ragchaa, 2021

机译：匈牙利中蒙大学生外语学习战略偏好的跨文化差异
7. Searching Strategies for the Hungarian Language [O] . Jacques Savoy 2008

机译：匈牙利语的搜索策略

Searching strategies for the Hungarian language

摘要

著录项

相似文献

相关主题

期刊订阅