首页> 中文期刊> 《计算机应用与软件》 >面向数据流的一个高效用项集挖掘算法

面向数据流的一个高效用项集挖掘算法

         

摘要

In recent years,to carry out high utility itemset mining in data stream has become an important research topic.Existing algorithms produce a large number of candidate itemsets in mining process and this masks it difficult for the users to screen out useful messages among huge sets of candidate patterns.In light of this situation,we present an algorithm for mining high utility itemsets over data stream,namely HUIDE (high utility itemsets over data stream).First,the algorithm proposes an effective measure of utility metrics by comprehensively considering the information characteristics of data;Then,it describes the distribution of data more accurately using a time-based sliding window and constructs a tree structure,called HUI-tree (high utility itemsets tree).Finally,it traverses the constructed tree structure HUI-tree and mines high utility itemsets.Experimental results in artificial and real data stream show that this algorithm reduces the generation of candidate sets and the consumption of time and space by procuring mining results with scanning database only once.This algorithm can effectively mine high utility itemsets over data stream.%近年来,在数据流中进行高效用项集挖掘成为一个重要的研究课题。已存在的算法在挖掘过程中产生大量的候选项集,使用户很难从大量候选模式中筛选出有用的信息。针对这种情况,提出一个数据流高效用项集挖掘算法HUIDE(High-Utility Item-sets Over Data Streams)。算法首先综合考虑数据的信息特征,提出一种有效的效用度量方法。然后采用基于时间的滑动窗口技术更加准确地描述数据分布,构建一种树结构HUI-tree(High Utility Itemsets tree)。最后遍历构建的树结构HUI-tree挖掘高效用项集。在人工和真实数据流上的实验结果表明该算法通过扫描一次数据库获取挖掘结果,减少了候选项集的产生及时间和空间的消耗。该算法在数据流中能够有效地挖掘高效用项集。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号