A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability

Anaraki Seyed Alireza Mousavian; Haeri Abdorrahman; Moslehi Fateme

首页> 外文期刊>Pattern Analysis and Applications >A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability

【24h】

A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability

机译：具有创新方法的PCA和K-in的混合互惠模型，其考虑子数据集改进K-Means初始化和逐步标记，以创建具有高可解释性的群集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The K-means algorithm is a popular clustering method, which is sensitive to the initialization of samples and selecting the number of clusters. Its performance on high-dimensional datasets is considerably influenced. Principal component analysis (PCA) is a linear dimensionless reduction method that is closely related to the K-means algorithm. Dimension reduction leads to the selection of initial centers in a smaller space, which is a solution to solve initialization problems. The present study investigates the reciprocal relationship between K-means and PCA and adopts an innovative approach of creating sub-datasets and applying step-by-step labeling in the hybrid execution of both algorithms to propose two methods, namely K-P and P-K. The clusters that are obtained from the two proposed methods are of high interpretability. This was verified by the step-by-step labeling results of a human resource dataset. Interpretability was evaluated via the distribution of features of interest (FoI), suggesting improved results for both datasets. In addition to the improvement of the qualitative results, the outcome of the present study showed the sum of squared estimate of errors (SSE)/N (total number of data) and silhouette improvement of 10 datasets with eight initialization methods in previous studies. The P-K results and run time were better than the K-P ones.

机译：K-means算法是一种流行的聚类方法，它对样本的初始化和选择簇数敏感。它在高维数据集上的性能很大。主成分分析（PCA）是与K均值算法密切相关的线性无量纲减少方法。尺寸减少导致在较小的空间中选择初始中心，这是解决初始化问题的解决方案。本研究研究了K均值和PCA之间的互殖关系，并采用了创建子数据集的创新方法，并在两种算法的混合执行中应用逐步标记，提出两种方法，即K-P和P-K。从两个所提出的方法获得的簇具有高的可解释性。通过人力资源数据集的逐步标记结果验证了这一点。通过感兴趣的特征分布（FOI）的分布评估解释性，表明两个数据集的结果改进了结果。除了改进定性结果外，本研究的结果表明，在先前研究中具有八种初始化方法的10个数据集的误差（SSE）/ N（数据总数）的平方估计和轮廓改善的总和。 P-K结果和运行时间优于K-P.

著录项

来源
《Pattern Analysis and Applications》 |2021年第3期|1387-1402|共16页
作者
Anaraki Seyed Alireza Mousavian; Haeri Abdorrahman; Moslehi Fateme;
展开▼
作者单位

Iran Univ Sci & Technol Dept Ind Engn Tehran Iran;

Iran Univ Sci & Technol Dept Ind Engn Tehran Iran;

Iran Univ Sci & Technol Dept Ind Engn Tehran Iran;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
K-means; PCA; Reciprocal relationship; Step-by-step labeling; Interpretability;

机译：k-means;pca;互惠关系;逐步标记;解释性;

相似文献

外文文献
专利

1. A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering [J] . Mustafi D., Sahoo G. Soft computing: A fusion of foundations, methodologies and applications . 2019,第15期

机译：一种使用遗传算法的混合方法和差分演化启发式提高K均值算法的初始化与文本群集的应用
2. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study [J] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, Computational and mathematical methods in medicine . 2020,第1期

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法，最小生成树和分层聚类的三种混合方法的比较
3. Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling [J] . Kim Hyunjoong, Kim Han Kyul, Cho Sungzoon Expert systems with applications . 2020,第Jula期

机译：提高文档聚类的球形K均值：快速初始化，稀疏质心投影和有效的群集标签
4. Cluster Validation in k-Means Clustering Based on PCA-guided k-Means and Procrustean Transformation of PC Scores [C] . Tomohiro Matsui, Katsuhiro Honda, Chi-Hyon Oh, IEEE International Conference on Fuzzy Systems . 2009

机译：基于PCA引导的K型K型群和PC分数的Procrustean转换的K-Means群集中的群集验证
5. Protein structure analysis and prediction utilizing the Fuzzy Greedy K-means Decision Forest model and Hierarchically-Clustered Hidden Markov Models method. [D] . Hudson, Cody Landon. 2013

机译：利用模糊贪婪K均值决策森林模型和层次聚类的隐马尔可夫模型方法对蛋白质结构进行分析和预测。
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with PCA [O] . Adnan Alrabea, A. V. Senthilkumar, Hasan Al-Shalabi, 2013

机译：增强初始集群中心与沿着带有PCA的数据轴的数据划分导出的初始集群中心

A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability

摘要

著录项

相似文献

相关主题

期刊订阅