Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data

机译：模糊C型和K-MEARY聚类性能的比较：在家庭预算调查数据中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

National Household Budget Survey (HBS) data includes sociode-mographic and financial indicators that are the elements of government public policy actions. Finding the optimal grouping of households in a given, sufficiently large data is a challenging task for policymakers. Soft classification techniques such as Fuzzy C-means (FCM) provide a deep understanding of hidden patterns in the variable set. This study aims to compare FCM and k-means (KM) classification performance for the grouping of households in terms of sociodemographic and out-of-pocket (OOP) health expenditure variables. Health expenditure variables have heavily skewed distributions and that the shape of the variable distribution has a measurable effect on classifiers. Incorporating Bayesian data generation procedures into the variable transformation process will increase the ability to deal with skewness and improve model performance. However, there is a scarcity of knowledge about the embedded strategy performance of the Bayesian data generation approach with unsupervised learning with the application on health expenditures. This study applied the aforementioned strategy to Turkish HBS data for the year 2015 while comparing FCM and KM classification performance. Normality test results for the distribution of logarithmic (KS = 0.006; p > 0.05) and Box-Cox transformed (KS = 0.006; p > 0.05) health expenditure variables, which were generated using lognormal distributions from a Bayesian viewpoint, are next to normal. Moreover, KM clustering (Sil = 0.48) results are better than FCM (Sil = 0.4198) for classifying households. The optimal number of household groups is 20. Further studies will compare the cluster-seeking performance of other unsupervised learning algorithms while incorporating arbitrary health expenditure variables into the study model.

机译：国家家庭预算调查（HBS）数据包括社会 - Mographic和财务指标，即政府公共政策行动的要素。在给定的情况下找到最佳分组的家庭，足够大的数据是政策制定者的具有挑战性的任务。软分类技术，例如模糊C-Means（FCM）提供了对变量集中的隐藏模式的深刻理解。本研究旨在比较FCM和K-Means（KM）分类性能，以便在社会碘目和口袋外（OOP）保健支出变量方面进行分组。健康支出变量具有很大的倾斜分布，并且可变分布的形状对分类器具有可测量的影响。将贝叶斯数据生成过程纳入可变转换过程将增加处理偏差并提高模型性能的能力。然而，关于贝叶斯数据生成方法的嵌入式战略表现，与保健支出申请无监督学习的嵌入式战略表现略有了解。本研究将上述战略应用于2015年的土耳其HBS数据，同时比较FCM和km分类绩效。对数分布的正常性测试结果（Ks = 0.006; p> 0.05）和箱体转化的箱体转化（ks = 0.006; p> 0.05），使用来自贝叶斯观点的逻辑分布产生的卫生支出变量，毗邻正常。此外，KM集群（SIL = 0.48）结果优于FCM（SIL = 0.4198），用于分类家庭。家庭群体的最佳数量为20.进一步的研究将比较其他无监督的学习算法的簇寻求性能，同时将任意保健支出变量纳入研究模式。

著录项

来源
《International Conference on Intelligent and Fuzzy Systems》|2021年|xxiii 930 pages :|共9页
会议地点
作者
Songul Cinaroglu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-532;
关键词
Fuzzy C-means; K-means; Classification; Health expenditure; Household Budget Survey; Bayesian data generation; Unsupervised learning;

机译：模糊C-meance;K-means;分类;健康支出;家庭预算调查;贝叶斯数据生成;无监督的学习;

相似文献

外文文献
中文文献
专利

1. Performance based analysis between k-Means and Fuzzy C-Means clustering algorithms for connection oriented telecommunication data [J] . T. Velmurugan Applied Soft Computing . 2014,第Null期

机译：面向连接的电信数据的k均值和模糊C均值聚类算法之间基于性能的分析
2. Comparison of Distributed K-Means and Distributed Fuzzy C-Means Algorithms for Text Clustering [J] . I Made Artha Agastya, Teguh Bharata Adji, Noor Akhmad Setiawan Communications in Science and Technology . 2017,第1期

机译：文本聚类的分布式K均值和分布式模糊C均值算法的比较
3. Comparison of K-Means and Fuzzy C-Means Algorithms on Different Cluster Structures [J] . Zeynel Cebeci, Figen Yildiz Agrarinformatika Folyoirat . 2015,第3期

机译：不同聚类结构的K均值和模糊C均值算法比较
4. Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data [C] . Songul Cinaroglu International Conference on Intelligent and Fuzzy Systems . 2021

机译：模糊C型和K-MEARY聚类性能的比较：在家庭预算调查数据中的应用
5. Optimizing parameters in fuzzy k-means for clustering microarray data. [D] . Yang, Wei. 2005

机译：在模糊k均值中优化参数以对微阵列数据进行聚类。
6. The Comparison of Clustering Algorithms K-Means and Fuzzy C-Means for Segmentation Retinal Blood Vessels [O] . Wiharto Wiharto, Esti Suryani 2020

机译：分割视网膜血管的聚类算法K均值和模糊C均值的比较
7. IMPACT OF DISTANCE METRICS ON THE PERFORMANCE OF K-MEANS AND FUZZY C-MEANS CLUSTERING – AN APPROACH TO ASSESS STUDENT’S PERFORMANCE IN E-LEARNING ENVIRONMENT [O] . V.P. Mahatme 2018

机译：距离指标对k型和模糊C型聚类性能的影响 - 一种评估学生电子学习环境性能的方法
8. Fuzzy Robust Statistics for Application to the Fuzzy c-Means Clustering Algorithm [R] . Kersten, P. R. 1993

机译：模糊稳健统计量在模糊c-均值聚类算法中的应用

Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data

摘要

著录项

相似文献

相关主题

期刊订阅