基于二叉树的并行频繁项集挖掘算法

陈静; 郑彦

首页> 中文期刊> 《计算机技术与发展》 >基于二叉树的并行频繁项集挖掘算法

基于二叉树的并行频繁项集挖掘算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Along with the advent of the era of big data,people have higher requirements in the speed of data processing and the utilization of data. In the aspect of mining frequent itemset,the algorithms of Count Distribution and Data Distribution are classical parallel algo-rithms for mining frequent itemset,because large storage space and communication overhead are needed in the process of mining,the min-ing efficiency is not very ideal. A parallel algorithm of frequent itemset mining based on the binary-tree is proposed in this paper,it takes advantage of the parallelism of MapReduce. Firstly,find out all subsets of fixed size in the database by using the method of traversing the binary-tree. Secondly,count occurrence numbers of each subset,and compare with a fixed threshold which is set in advance. If the occur-rence number of a subset is more than the threshold value,the subset is the frequent itemset which is requested. The study of the compari-son and analysis of the experimental results show that the proposed algorithm needs only one process of MapReduce to complete the min-ing work,it makes full use of the parallelism of the cluster. It does not need to use iterative way for mining frequent itemset,and the per-formance is superior to the CD and DD algorithms,in other words,it has higher mining efficiency.%大数据时代的到来，使得人们对数据的处理速度、利用率等方面的要求变得更高。在频繁项集挖掘方面， Count Distribution算法和Data Distribution算法是比较经典的并行频繁项集挖掘算法，由于挖掘过程中需要较大的存储空间和通信开销，挖掘效率并不十分理想。文中提出了一种基于二叉树的并行频繁项集挖掘算法，利用了MapReduce的并行性，先通过遍历二叉树的方法找出数据库中固定大小的所有子集，然后统计每个子集的出现次数，再与事先设定好的一个固定阈值进行比较，超过阈值的子集即为所求的频繁项集。通过对实验结果进行对比分析表明，提出的算法只需要一次Ma-pReduce过程即可完成挖掘，充分利用了集群的并行性，不需要使用迭代的方式进行挖掘，性能上明显优于CD和DD算法，也就是说，该算法具有较高的挖掘效率。

著录项

来源
《计算机技术与发展》 |2015年第10期|80-8387|共5页
作者
陈静; 郑彦;
展开▼
作者单位

南京邮电大学计算机学院;

江苏南京 210003;

南京邮电大学计算机学院;

江苏南京 210003;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
频繁项集挖掘; MapReduce; 并行计算; 二叉树;

相似文献

中文文献
外文文献
专利

1. 基于MapReduce的并行频繁项集挖掘算法研究 [J] . 刘卫明 ,张弛 ,毛伊敏 . 计算机应用研究 . 2021,第003期
2. 基于Spark的并行频繁项集挖掘算法 [J] . 张素琪 ,孙云飞 ,武君艳 . 计算机应用与软件 . 2019,第002期
3. MRClose：一种基于MapReduce的并行闭频繁项集挖掘算法 [J] . 胡娟 ,肖文 . 电子技术与软件工程 . 2017,第022期
4. 基于N-list的并行频繁项集挖掘算法 [J] . 陈奇 ,张曦煌 . 微电子学与计算机 . 2017,第5期
5. 基于传递收缩剪枝策略的并行频繁项集挖掘算法的研究 [J] . 赵明 . 领导科学论坛 . 2016,第019期
6. 云环境下基于二进制编码聚类的并行频繁项集挖掘算法 [C] . LIU Bo ,刘博 ,LI Yun . 2012年江苏省人工智能学术会议 . 2012
7. 基于Spark的并行频繁项集挖掘算法研究及应用 [A] . 杜斐阳 . 2018

基于二叉树的并行频繁项集挖掘算法

摘要

著录项

相似文献

相关主题

期刊订阅