...
首页> 外文期刊>International Journal of Innovative Computing Information and Control >A SINGLE-SCAN ALGORITHM FOR MINING SEQUENTIAL PATTERNS FROM DATA STREAMS
【24h】

A SINGLE-SCAN ALGORITHM FOR MINING SEQUENTIAL PATTERNS FROM DATA STREAMS

机译:一种从数据流挖掘序列模式的单扫描算法

获取原文
获取原文并翻译 | 示例
           

摘要

Sequential pattern mining (SPAM) is one of the most interesting research issues of data mining. In this paper, a new research problem of mining data streams for sequential patterns is defined. A data stream is an unbound sequence of data elements arriving at a rapid rate. Based on the characteristics of data streams, the problem complexity of mining data streams for sequential patterns is more difficult than that of mining sequential patterns from large static databases. Therefore, mining sequential patterns from data streams is a challenging research issue of data mining and knowledge discovery. Hence, an efficient single-pass algorithm, called IncSpam (Incremental Sequential pattern mining of streaming itemset-sequences), is proposed for discovering sequential patterns from streaming itemset-sequences over extended sliding window models. In the framework of IncSpam algorithm, a new sliding window model, called CSW-BV (Customer Sliding Window with Bit-Vectors), and an extended lexicographic tree-based data structure, called LexSeq-Tree (Lexicographic Sequence Tree), are developed to reduce the time and memory needed to slide the windows over streaming data and maintain all sequential patterns of current sliding windows. Experimental results show that the proposed method is an efficient single-pass algorithm for mining sequential patterns from streaming data.
机译:顺序模式挖掘(SPAM)是数据挖掘中最有趣的研究问题之一。在本文中,定义了一个新的研究问题,即为顺序模式挖掘数据流。数据流是快速到达的无限制数据元素序列。根据数据流的特征,为序列模式挖掘数据流的问题复杂性比从大型静态数据库中挖掘序列模式的问题更为复杂。因此,从数据流中挖掘顺序模式是数据挖掘和知识发现中一个具有挑战性的研究问题。因此,提出了一种有效的单遍算法,称为IncSpam(流式项目集序列的增量顺序模式挖掘),用于从扩展滑动窗口模型上的流式项目集序列中发现顺序模式。在IncSpam算法的框架中,开发了一种新的滑动窗口模型,称为CSW-BV(带有位向量的客户滑动窗口),以及一种扩展的基于词典树的数据结构,称为LexSeq-Tree(词典序列树),用于减少了在流数据上滑动窗口并维持当前滑动窗口的所有顺序模式所需的时间和内存。实验结果表明,该方法是一种从流数据中挖掘序列模式的有效单遍算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号