首页> 外文会议>International Conference on Intelligent and Fuzzy Systems >Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data
【24h】

Comparison of Fuzzy C-Means and K-Means Clustering Performance: An Application on Household Budget Survey Data

机译:模糊C型和K-MEARY聚类性能的比较:在家庭预算调查数据中的应用

获取原文

摘要

National Household Budget Survey (HBS) data includes sociode-mographic and financial indicators that are the elements of government public policy actions. Finding the optimal grouping of households in a given, sufficiently large data is a challenging task for policymakers. Soft classification techniques such as Fuzzy C-means (FCM) provide a deep understanding of hidden patterns in the variable set. This study aims to compare FCM and k-means (KM) classification performance for the grouping of households in terms of sociodemographic and out-of-pocket (OOP) health expenditure variables. Health expenditure variables have heavily skewed distributions and that the shape of the variable distribution has a measurable effect on classifiers. Incorporating Bayesian data generation procedures into the variable transformation process will increase the ability to deal with skewness and improve model performance. However, there is a scarcity of knowledge about the embedded strategy performance of the Bayesian data generation approach with unsupervised learning with the application on health expenditures. This study applied the aforementioned strategy to Turkish HBS data for the year 2015 while comparing FCM and KM classification performance. Normality test results for the distribution of logarithmic (KS = 0.006; p > 0.05) and Box-Cox transformed (KS = 0.006; p > 0.05) health expenditure variables, which were generated using lognormal distributions from a Bayesian viewpoint, are next to normal. Moreover, KM clustering (Sil = 0.48) results are better than FCM (Sil = 0.4198) for classifying households. The optimal number of household groups is 20. Further studies will compare the cluster-seeking performance of other unsupervised learning algorithms while incorporating arbitrary health expenditure variables into the study model.
机译:国家家庭预算调查(HBS)数据包括社会 - Mographic和财务指标,即政府公共政策行动的要素。在给定的情况下找到最佳分组的家庭,足够大的数据是政策制定者的具有挑战性的任务。软分类技术,例如模糊C-Means(FCM)提供了对变量集中的隐藏模式的深刻理解。本研究旨在比较FCM和K-Means(KM)分类性能,以便在社会碘目和口袋外(OOP)保健支出变量方面进行分组。健康支出变量具有很大的倾斜分布,并且可变分布的形状对分类器具有可测量的影响。将贝叶斯数据生成过程纳入可变转换过程将增加处理偏差并提高模型性能的能力。然而,关于贝叶斯数据生成方法的嵌入式战略表现,与保健支出申请无监督学习的嵌入式战略表现略有了解。本研究将上述战略应用于2015年的土耳其HBS数据,同时比较FCM和km分类绩效。对数分布的正常性测试结果(Ks = 0.006; p> 0.05)和箱体转化的箱体转化(ks = 0.006; p> 0.05),使用来自贝叶斯观点的逻辑分布产生的卫生支出变量,毗邻正常。此外,KM集群(SIL = 0.48)结果优于FCM(SIL = 0.4198),用于分类家庭。家庭群体的最佳数量为20.进一步的研究将比较其他无监督的学习算法的簇寻求性能,同时将任意保健支出变量纳入研究模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号