首页> 中文期刊> 《计算机技术与发展》 >基于Hadoop的关联规则挖掘算法研究--以Apriori算法为例

基于Hadoop的关联规则挖掘算法研究--以Apriori算法为例

         

摘要

In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability,take Apriori as an example,the algorithm is realized in the parallelization based on Hadoop framework and MapReduce model. On the basis,it is improved using the transaction reduce method for further enhance-ment of the algorithm's mining efficiency. The experiment,which consists of verification of parallel mining results,comparison on effi-ciency between serials and parallel,variable relationship between mining time and node number and between mining time and data a-mounts,is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled Apriori algorithm implemented is able to accurately mine frequent item sets,with a better performance and scalability. It can be better to meet the require-ments of big data mining and efficiently mine frequent item sets and association rules from large dataset.%为了解决传统关联规则挖掘算法在挖掘效率、算法扩展性等方面无法适应大数据挖掘需求的问题,以经典的关联规则挖掘算法—Apriori算法为例,首先基于Hadoop平台和MapReduce编程模型,实现算法的并行化。在此基础上,基于事务缩减的思想对算法进行优化,进一步提高算法的挖掘效率。搭建Hadoop集群环境,对算法的挖掘结果和挖掘效率进行实验。通过并行挖掘结果验证、串行版与并行版效率对比、挖掘时间与节点数目的变化关系、挖掘时间与数据量的变化关系4组实验,结果表明:文中实现的Apriori算法不仅能够准确挖掘频繁项集,而且比传统串行算法具有更高的挖掘性能和可扩展性。该算法能够更好地适应大数据集的挖掘要求,能够实现从大规模数据集中高效挖掘频繁项集和关联规则。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号