Parallel clustering of large data set on Hadoop using data mining techniques

机译：使用数据挖掘技术在Hadoop上对大型数据集进行并行集群

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Traditional data processing techniques are not enough to handle rapidly growing data. Hadoop can be used for processing such large data. K-means is the traditional clustering method which is simple, scalable and can easily implement but K-means converges to local minima from starting position and sensitive to initial centers. K-means required number of clusters in advance. Particle Swarm Optimization i.e PSO is mimic behavior based algorithm used to introduce the connectivity principle in the centroid based clustering algorithm that will gives optimum centroid and hence find better clusters. We used PSO for finding initial centroids and K-means to find better clusters. Hadoop is used for fast and parallel processing of large datasets.

机译：传统的数据处理技术不足以处理快速增长的数据。 Hadoop可用于处理如此大的数据。 K均值是传统的聚类方法，它简单，可扩展且易于实现，但K均值从起始位置收敛到对初始中心敏感的局部最小值。 K均值表示预先需要的簇数。粒子群优化（即PSO）是一种基于模仿行为的算法，用于在基于质心的聚类算法中引入连通性原理，该算法将提供最佳质心，从而找到更好的聚类。我们使用PSO查找初始质心，并使用K均值查找更好的聚类。 Hadoop用于大型数据集的快速和并行处理。

著录项

来源
《2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare》|2016年|1-4|共4页
会议地点 Coimbatore(IN)
作者
Kaustubh S. Chaturbhuj; Gauri Chaudhary;
展开▼
作者单位

Dept. of Computer Science and Engineering, YCCE, Nagpur, India;

Dept. of Computer Science and Engineering, YCCE, Nagpur, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Particle swarm optimization; Programming; Data mining; Partitioning algorithms; Parallel processing;

机译：聚类算法；粒子群优化；编程；数据挖掘；分区算法；并行处理;

相似文献

外文文献
中文文献
专利

1. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework [J] . Guru Prasad M S, Nagesh H R, Swathi Prabhu International Journal of Intelligent Systems and Applications . 2017,第1期

机译：大数据的高性能计算：基于Hadoop MapReduce框架的事务数据并行频繁项集挖掘算法的性能优化方法
2. Data Encoding and Parallelization Porting Techniques to Transform Binary Data Formats to Hadoop/MapReduce [J] . NASA Tech Briefs . 2016,第5期

机译：数据编码和并行化移植技术，可将二进制数据格式转换为Hadoop / MapReduce
3. Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework [J] . Journal of grid computing . 2020,第2期

机译：Hadoop并行框架下的大数据挖掘改进的K-means聚类算法
4. Parallel clustering of large data set on Hadoop using data mining techniques [C] . Kaustubh S. Chaturbhuj, Gauri Chaudhary World Conference on Futuristic Trends in Research and Innovation for Social Welfare . 2016

机译：使用数据挖掘技术在Hadoop上的大型数据集的并行聚类
5. Computational intelligence and data mining techniques using the fire data set. [D] . Storer, Jeremy. 2016

机译：使用火灾数据集的计算智能和数据挖掘技术。
6. Efficient clustering of large EST data sets on parallel computers [O] . Anantharaman Kalyanaraman, Srinivas Aluru, Suresh Kothari, 2003

机译：在并行计算机上高效地对大型EST数据集进行聚类
7. Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids [O] . Edy Irwansyah, Ebiet Salim Pratama, Margaretha Ohyver 2020

机译：使用具有主成分分析和K-yemoids的数据挖掘患者使用数据挖掘技术的心血管疾病患者使用数据挖掘技术和K-MEDOIDS患者K-MEDOIDS患者
8. Cluster Analysis-Based Approaches for Geospatiotemporal Data Mining of Massive Data Sets for Identification of Forest Threats. [R] . Mills, R. T., Hoffman, F. M., Kumar, J., 2011

机译：基于聚类分析的海量数据集地理时空数据挖掘方法用于森林威胁识别。

Parallel clustering of large data set on Hadoop using data mining techniques

摘要

著录项

相似文献

相关主题

期刊订阅