...
首页> 外文期刊>Pattern Analysis and Applications >A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability
【24h】

A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability

机译:具有创新方法的PCA和K-in的混合互惠模型,其考虑子数据集改进K-Means初始化和逐步标记,以创建具有高可解释性的群集

获取原文
获取原文并翻译 | 示例
           

摘要

The K-means algorithm is a popular clustering method, which is sensitive to the initialization of samples and selecting the number of clusters. Its performance on high-dimensional datasets is considerably influenced. Principal component analysis (PCA) is a linear dimensionless reduction method that is closely related to the K-means algorithm. Dimension reduction leads to the selection of initial centers in a smaller space, which is a solution to solve initialization problems. The present study investigates the reciprocal relationship between K-means and PCA and adopts an innovative approach of creating sub-datasets and applying step-by-step labeling in the hybrid execution of both algorithms to propose two methods, namely K-P and P-K. The clusters that are obtained from the two proposed methods are of high interpretability. This was verified by the step-by-step labeling results of a human resource dataset. Interpretability was evaluated via the distribution of features of interest (FoI), suggesting improved results for both datasets. In addition to the improvement of the qualitative results, the outcome of the present study showed the sum of squared estimate of errors (SSE)/N (total number of data) and silhouette improvement of 10 datasets with eight initialization methods in previous studies. The P-K results and run time were better than the K-P ones.
机译:K-means算法是一种流行的聚类方法,它对样本的初始化和选择簇数敏感。它在高维数据集上的性能很大。主成分分析(PCA)是与K均值算法密切相关的线性无量纲减少方法。尺寸减少导致在较小的空间中选择初始中心,这是解决初始化问题的解决方案。本研究研究了K均值和PCA之间的互殖关系,并采用了创建子数据集的创新方法,并在两种算法的混合执行中应用逐步标记,提出两种方法,即K-P和P-K。从两个所提出的方法获得的簇具有高的可解释性。通过人力资源数据集的逐步标记结果验证了这一点。通过感兴趣的特征分布(FOI)的分布评估解释性,表明两个数据集的结果改进了结果。除了改进定性结果外,本研究的结果表明,在先前研究中具有八种初始化方法的10个数据集的误差(SSE)/ N(数据总数)的平方估计和轮廓改善的总和。 P-K结果和运行时间优于K-P.

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号