【24h】

The Effectiveness of a Graph-Based Algorithm for Stemming

机译:基于图的词干算法的有效性

获取原文
获取原文并翻译 | 示例

摘要

In Information Retrieval (IR), stemming enables a matching of query and document terms which are related to a same meaning but which can appear in different morphological variants. In this paper we will propose and evaluate a statistical graph-based algorithm for stemming. Considering that a word is formed by a stem (prefix) and a derivation (suffix), the key idea is that strongly interlinked prefixes and suffixes form a community of sub-strings. Discovering these communities means searching for the best word splits which give the best word stems. We conducted some experiments on CLEF 2001 test sub-collections for Italian language. The results show that stemming improve the IR effectiveness. They also show that effectiveness level of our algorithm is comparable to that of an algorithm based on a-priori linguistic knowledge. This is an encouraging result, particularly in a multi-lingual context.
机译:在信息检索(IR)中,词干使查询和文档术语匹配,它们具有相同的含义,但可以以不同的形态表示。在本文中,我们将提出并评估基于统计图的词干算法。考虑到单词是由词干(前缀)和派生词(后缀)组成的,关键思想是强互连的前缀和后缀形成了子字符串社区。发现这些社区意味着寻找提供最佳词干的最佳单词分割。我们对CLEF 2001意大利语测试子集进行了一些实验。结果表明,茎梗改善了IR的有效性。他们还表明,我们算法的有效性水平可与基于先验语言知识的算法相媲美。这是一个令人鼓舞的结果,尤其是在使用多种语言的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号