首页> 中文期刊> 《计算机应用》 >不平衡数据的软子空间聚类算法

不平衡数据的软子空间聚类算法

         

摘要

Aiming at the problem that the current K-means-type soft-subspace algorithms cannot effectively cluster imbalanced data due to uniform effect,a new partition-based algorithm was proposed for soft subspace clustering on imbalanced data.First,a bi-weighting method was proposed,where each attribute was assigned a feature-weight and each cluster was assigned a cluster-weight to measure its importance for clustering.Second,in order to make a trade-off between attributes with different types or those categorical attributes having various numbers of categories,a new distance measurement was then proposed for mixed-type data.Third,an objective function was defined for the strbspace clustering algorithm on imbalanced data based on the bi-weighting method,and the expressions for optimizing both the cluster-weights and feature-weights were derived.A series of experiments were conducted on some real-world data sets and the results demonstrated that the biweighting method used in the new algorithm can learn more accurate soft-subspace for the clusters hidden in the imbalanced data.Compared with the existing K-means-type soft-subspace clustering algorithms,the proposed algorithm yields higher clustering accuracy on imbalanced data,achieving about 50% improvements on the bioinformatic data used in the experiments.%针对受均匀效应的影响,当前K-means型软子空间算法不能有效聚类不平衡数据的问题,提出一种基于划分的不平衡数据软子空间聚类新算法.首先,提出一种双加权方法,在赋予每个属性一个特征权重的同时,赋予每个簇反映其重要性的一个簇类权重;其次,提出一种混合型数据的新距离度量,以平衡不同类型属性及具有不同符号数目的类属型属性间的差异;第三,定义了基于双加权方法的不平衡数据子空间聚类目标优化函数,给出了优化簇类权重和特征权重的表达式.在实际应用数据集上进行了系列实验,结果表明,新算法使用的双权重方法能够为不平衡数据中的簇类学习更准确的软子空间;与现有的K-means型软子空间算法相比,所提算法提高了不平衡数据的聚类精度,在其中的生物信息学数据上可以取得近50%的提升幅度.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号