Indexing and Searching Strategies for the Russian Language

Ljiljana Dolamic; Jacques Savoy

首页> 外文期刊>Journal of the American Society for Information Science and Technology >Indexing and Searching Strategies for the Russian Language

【24h】

Indexing and Searching Strategies for the Russian Language

机译：俄语的索引和搜索策略

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stem-mers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram).To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.

机译：本文介绍并评估了俄语的各种词干和索引策略。我们设计并评估了两种词干提取方法，一种轻便且更具攻击性的方法，并将这些词干提取器与Snowball词干提取器，无词干提取方法以及与语言无关的方法（n-gram）进行了比较。我们应用各种概率信息检索（IR）模型，包括Okapi，随机散度（DFR），统计语言模型（LM）以及两种向量空间方法，即经典tf idf方案和dtu-dtn模型。我们发现，向量空间dtu-dtn和DFR模型往往比Okapi，LM或tf idf模型具有更好的检索效果，而仅后两种IR方法导致统计学上显着的性能差异。忽略词干通常会使MAP降低50％以上，并且这些差异始终很明显。当应用n-gram方法时，性能差异通常低于涉及词干的方法。最后，尽管轻型，激进型和Snowball阻止器之间的性能差异在统计上并不显着，但我们的轻型阻止器往往表现最佳。

著录项

来源
《Journal of the American Society for Information Science and Technology》 |2009年第13期|2540-2547|共8页
作者
Ljiljana Dolamic; Jacques Savoy;
展开▼
作者单位

Computer Science Department, University of Neuchatel, Rue Emile Argand 11, 2009 Neuchatel, Switzerland;

Computer Science Department, University of Neuchatel, Rue Emile Argand 11, 2009 Neuchatel, Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages [J] . LJILJANA DOLAMIC, JACQUES SAVOY ACM transactions on Asian language information processing . 2010,第3期

机译：印地语，马拉地语和孟加拉语的索引和搜索策略的比较研究
2. Learning strategies of students studying Russian as a second foreign language, with relation to English as their first foreign language [J] . Sadikoglu Saide, Oktay Serdar Quality & Quantity: International Journal of Methodology . 2018,第5期

机译：学习俄语学习俄罗斯作为第二外语的学习策略，与英语为第一外语
3. An eigenvalue-based pivot selection strategy for efficient indexing and searching in metric spaces [J] . Sung-Hwan Kim, Da-Young Lee, Hwan-Gue Cho Cluster computing . 2017,第4期

机译：基于特征值的枢轴选择策略，用于在公制空间中进行高效索引和搜索
4. Russian Titles in the Context of Russian Culture: History of the title TZAR (king) and it's derivates in Old Slavic, Ancient Russian and modern Russian languages [C] . Timur Galeev International Conference on Education, Language, Art and Intercultural Communication . 2014

机译：俄罗斯文化背景下的俄罗斯冠军：Tzar（王）的历史，它是老斯拉夫，古代俄罗斯和现代俄语的衍生
5. Term selection process in subject searching: End-user interactions with information retrieval systems and indexing languages. [D] . Salaba, Athena. 2005

机译：主题搜索中的术语选择过程：最终用户与信息检索系统和索引语言的交互。
6. NLP-PIER: A Scalable Natural Language Processing Indexing and Searching Architecture for Clinical Notes [O] . Reed McEwan, Genevieve B. Melton, Benjamin C. Knoll, 2016

机译：NLP-PIER：用于临床笔记的可扩展自然语言处理索引和搜索体系结构
7. Indexing and searching strategies for the Russian language [O] . Dolamic, Ljiljana, Savoy, Jacques 2009

机译：俄语索引和搜索策略

Indexing and Searching Strategies for the Russian Language

摘要

著录项

相似文献

相关主题

期刊订阅