...
首页> 外文期刊>International journal of computer science and network security >An Enhanced Apriori Algorithm Using Hybrid Data Layout Based on Hadoop for Big Data Processing
【24h】

An Enhanced Apriori Algorithm Using Hybrid Data Layout Based on Hadoop for Big Data Processing

机译:一种基于Hadoop的混合数据布局改进Apriori算法,用于大数据处理

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Frequent itemset mining is one of the data mining methodes implemeted to find frequent patterns, utilized in prediction, association rule mining, classification, etc. Apriori algorithm is an iterative method , that is used to discover frequent itemsets from transactional dataset. It scans entire dataset in every iteration to come up with the large frequent itemsets of various cardinality, which sounds efficient for small data but not useful for big data. To resolve the problem of treatment dataset in every iteration, we present an algorithm called Hybrid Frequent Itemset Mining on Hadoop ( HFIMH ) which uses the vertical layout of dataset to solve the problem of treatment the dataset in every iteration. Vertical dataset conveys information to discover support of every itemsets, and the idea of set intersection is utilized to compute it. We compare the execution of HFIMH with another Hadoop based implementation of Apriori algorithm for different datasets. Experimental results demonstrate that our approach is better.
机译:频繁项集挖掘是用于发现频繁模式的数据挖掘方法之一,用于预测,关联规则挖掘,分类等。Apriori算法是一种迭代方法,用于从事务数据集中发现频繁项集。它在每次迭代中扫描整个数据集,以提供各种基数的大型频繁项集,这对于小数据来说听起来很有效,但对大数据却没有用。为了解决每次迭代中处理数据集的问题,我们提出了一种称为Hadoop的混合频繁项集挖掘(HFIMH)的算法,该算法使用数据集的垂直布局来解决每次迭代中对数据集的处理问题。垂直数据集传达信息以发现每个项目集的支持,并利用集合相交的思想进行计算。我们将HFIMH的执行与针对不同数据集的Apriori算法的另一种基于Hadoop的实现进行了比较。实验结果表明我们的方法更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号