首页> 外文会议>2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare >Parallel clustering of large data set on Hadoop using data mining techniques
【24h】

Parallel clustering of large data set on Hadoop using data mining techniques

机译:使用数据挖掘技术在Hadoop上对大型数据集进行并行集群

获取原文
获取原文并翻译 | 示例

摘要

Traditional data processing techniques are not enough to handle rapidly growing data. Hadoop can be used for processing such large data. K-means is the traditional clustering method which is simple, scalable and can easily implement but K-means converges to local minima from starting position and sensitive to initial centers. K-means required number of clusters in advance. Particle Swarm Optimization i.e PSO is mimic behavior based algorithm used to introduce the connectivity principle in the centroid based clustering algorithm that will gives optimum centroid and hence find better clusters. We used PSO for finding initial centroids and K-means to find better clusters. Hadoop is used for fast and parallel processing of large datasets.
机译:传统的数据处理技术不足以处理快速增长的数据。 Hadoop可用于处理如此大的数据。 K均值是传统的聚类方法,它简单,可扩展且易于实现,但K均值从起始位置收敛到对初始中心敏感的局部最小值。 K均值表示预先需要的簇数。粒子群优化(即PSO)是一种基于模仿行为的算法,用于在基于质心的聚类算法中引入连通性原理,该算法将提供最佳质心,从而找到更好的聚类。我们使用PSO查找初始质心,并使用K均值查找更好的聚类。 Hadoop用于大型数据集的快速和并行处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号