...
首页> 外文期刊>Future generation computer systems >BigFCM: Fast, precise and scalable FCM on hadoop
【24h】

BigFCM: Fast, precise and scalable FCM on hadoop

机译:BigFCM:Hadoop上的快速,精确和可扩展的FCM

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering plays an important role in mining big data both as a modeling technique and a preprocessing step in many data mining process implementations. Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing each data record to belong to more than one cluster to some degree. However, a serious challenge in fuzzy clustering is the lack of scalability. Massive datasets in emerging fields such as geosciences, biology, and networking do require parallel and distributed computations with high performance to solve real-world problems. Although some clustering methods are already improved to execute on big data platforms, their execution time is highly increased for gigantic datasets. In this paper, a scalable Fuzzy C-Means (FCM) clustering method named BigFCM is proposed and designed for the Hadoop distributed data platform. Based on the MapReduce programming model, the proposed algorithm exploits several mechanisms including an efficient caching design to achieve several orders of magnitude reduction in execution time. The BigFCM performance compared with Apache Mahout K-Means and Fuzzy K-Means through an evaluation framework developed in this research. Extensive evaluation using over multi-gigabyte datasets including SUSY and HIGGS shows that BigFCM is scalable while it preserves the quality of clustering.
机译:群集在许多数据挖掘过程的实现中,作为建模技术和预处理步骤,在挖掘大数据方面都起着重要作用。通过使每个数据记录在某种程度上属于多个群集,模糊群集比非模糊方法提供了更大的灵活性。但是,模糊聚类的一个严峻挑战是缺乏可伸缩性。诸如地球科学,生物学和网络等新兴领域的海量数据集确实需要高性能的并行和分布式计算来解决实际问题。尽管某些聚类方法已经过改进,可以在大数据平台上执行,但是对于巨大的数据集,它们的执行时间却大大增加了。本文针对Hadoop分布式数据平台,提出并设计了一种可扩展的模糊C-均值(FCM)聚类方法BigFCM。基于MapReduce编程模型,该算法利用了多种机制,其中包括有效的缓存设计,可以将执行时间减少几个数量级。通过本研究开发的评估框架,BigFCM性能与Apache Mahout K均值和Fuzzy K均值相比。使用包括SUSY和HIGGS在内的数千兆字节的数据集进行的广泛评估表明,BigFCM具有可扩展性,同时保留了聚类的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号