首页> 中文期刊> 《电子学报》 >面向范畴类型数据的sIB算法

面向范畴类型数据的sIB算法

         

摘要

本文针对sIB算法仅适用于共现数据的问题,提出了一种能够自动进行范畴类型数据分析的sIB算法:CD-sIB.该算法根据范畴类型数据的离散化表示、不同属性值有限的特征,进行数据的属性的拓展和二元化处理,基于属性值的出现进行X,Y的联合分布的计算,使得sIB算法可有效应用于范畴类型数据的分析.实验结果表明:CD-sIB算法相对于现有的面向范畴类型数据聚类模式分析的算法GAClust和K-modes具有明显的优势;CD-sIB算法在进行数据属性概化程度高、类数据分布相对平衡的范畴类型数据的分析中,在效率和精确度方面均很突出.%The sIB algorithm has previously been only applied to the analysis of co-occurrence data.Therefore,it cannot directly analyze categorical data that do not appear in the form of cooccurrence of two variables X,Y. Aiming to solve the problem, this paper proposes a CD-sIB algorithm for automatically analyzing categorical data based on the theory of sIB algorithm. According to the nature that categorical data is discrete and its distinct attribute value is finite,CD-sIB algorithm counts joint distribution of relevant variable X, Y based on the occurrence frequency of attribute value by extending the attributes of dataset and utilizing binariza-tion to process the categorical data. Consequently, our algorithm can be effectively employed in analyzing the categorical data. As shown by our experimental results,CD-sIB outperforms the GAClust and the K-modes algorithm, and it achieves high precision and efficiency in analyzing categorical data, especially in the analysis of categorical data which is highly generalizable and comparatively balanced in the data distribution of each class.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号