...
首页> 外文期刊>Information Research >A survey of stemming algorithms in information retrieval
【24h】

A survey of stemming algorithms in information retrieval

机译:信息检索中的词干算法研究

获取原文
           

摘要

During the last fifty years, improved information retrieval techniques have become necessary because of the huge amount of information people have available, which continues to increase rapidly due to the use of new technologies and the Internet. Stemming is one of the processes that can improve information retrieval in terms of accuracy and performance. This paper provides a detailed assessment of the current status of the stemming process framed in an information retrieval application field by tracing its historical evolution. Papers presenting the first approaches for stemming were reviewed to extract their main features, benefits and drawbacks. Additionally, papers dealing with stemmers for non-English languages or with some more recent proposals were also consulted and compiled. Finally, experimental papers defining the most well-known methods and metrics aimed at evaluating and classifying stemmers were also taken into account to expose their contributions and results. Even if not all researchers agree on the benefits and drawbacks of using stemming in an information retrieval process in general terms, many of them agree on its benefits in specific contexts, such as when the language is highly inflective, when documents are short or when there is limited space for storing data. Some researchers also state that the nature of the documents can influence the performance and the accuracy of the stemmer. Despite many researchers having investigated this field over many years, there are still some open questions, such as how to evaluate a stemmer independently of the information retrieval process, or how much a stemmer improves an information retrieval application in terms of speed. As a summary, some guidelines are also provided to help readers to determine which is the best stemmer for their needs and the tasks they have to carry out.
机译:在过去的五十年中,由于人们拥有大量信息,因此有必要改进信息检索技术,而由于使用新技术和互联网,信息继续迅速增长。提取是可以提高准确性和性能方面的信息检索的过程之一。本文通过追踪信息检索应用程序领域的发展历程,详细评估了该过程的现状。审查了提出阻止的第一种方法的论文进行了审查,以提取其主要特征,优点和缺点。此外,还查阅和汇编了有关非英语词干的论文或一些较新的建议。最后,还考虑了定义最著名的方法和指标以评估和分类茎秆的实验论文,以揭示它们的贡献和结果。即使不是所有研究人员都普遍同意在信息检索过程中使用词干的利弊,但许多人还是同意在特定情况下使用词干的好处,例如,当语言高度灵活时,文档简短或存在时是用于存储数据的有限空间。一些研究人员还指出,文档的性质会影响词干分析器的性能和准确性。尽管许多研究人员已经对该领域进行了多年研究,但仍然存在一些悬而未决的问题,例如如何独立于信息检索过程来评估词干分析器,或者词干分析器在速度方面提高了多少。作为总结,还提供了一些指南,以帮助读者确定最适合他们的需求和必须执行的任务的词干。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号