...
首页> 外文期刊>International journal on digital libraries >A New Full-Text Indexing Model with Low Space Overhead for Chinese Text Retrieval
【24h】

A New Full-Text Indexing Model with Low Space Overhead for Chinese Text Retrieval

机译:一种低空间开销的中文文本检索新全文索引模型

获取原文
获取原文并翻译 | 示例
           

摘要

Text retrieval systems require an index to allow efficient retrieval of documents at the cost of some storage overhead. This paper proposes a novel full-text indexing model for Chinese text retrieval based on the concept of adjacency matrix of directed graph. Using this indexing model, on one hand, retrieval systems need to keep only the indexing data, instead of the indexing data and the original text data as the traditional retrieval systems always do. On the other hand, occurrences of index term are identified by labels of the so-called s-stromgs where the index term appears, rather than by its positions as in traditional indexing models. Consequently, system space cost as a whole can be reduced drastically while retrieval efficiency is maintained satisfactory. Experiments over several real-world Chinese text collections are carried out to demonstrate the effectiveness and efficiency of this model. In addition to Chinese, The proposed indexing model is also effective and efficient for text retrieval of other Oriental languages, such as Japanese and Korean. It is especially useful for digital library application areas where storage resource is very limited (e.g., e-books and CD-based text retrieval systems).
机译:文本检索系统需要一个索引,以允许以一些存储开销为代价的有效文档检索。基于有向图邻接矩阵的概念,提出了一种新颖的中文全文检索索引模型。一方面,使用这种索引模型,检索系统只需要保留索引数据,而不需要像传统检索系统那样总是保留索引数据和原始文本数据。另一方面,索引项的出现是通过出现索引项的所谓s-stromgs的标签来识别的,而不是像传统索引模型中那样通过其位置来标识。因此,可以大幅度降低系统空间总成本,同时保持令人满意的检索效率。进行了多个真实世界中文文本集合的实验,以证明该模型的有效性和效率。除中文外,建议的索引模型对于其他东方语言(如日语和韩语)的文本检索也非常有效。对于存储资源非常有限的数字图书馆应用领域(例如,电子书和基于CD的文本检索系统),此功能特别有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号