首页> 外文会议>UKM FST Postgraduate Colloquium >Outliers detection for Pareto distributed data
【24h】

Outliers detection for Pareto distributed data

机译:Pareto分布式数据的异常值检测

获取原文

摘要

This study aims to examine the presence of outliers in the upper tail of Malaysian income distribution under the assumption that the data follow Pareto model. For this purpose, three types of boxplot: standard boxplot, adjusted boxplot and generalized boxplot are considered. The performance of these boxplots is determined by a simulation study. In this study, the data were simulated from Pareto distribution, P(1, α = 2, 3, 4), then the simulated data were contaminated by replacing a proportion ε (3%, 5%, 10%) of randomly selected data. It is found that the generalized boxplot gives higher power value compared to the standard and adjusted boxplots. Therefore, the generalized boxplot was used for determining the presence of outliers in the upper tail of income distribution, while the threshold for Pareto tail modelling was determined by using Van Kerm's formula. The results showed that 0.4%, 0.4%, 0.9% and 1.2% outliers were detected by the generalized boxplot in the household income data that exceeded the threshold for the years of 2007, 2009, 2012 and 2014.
机译:本研究旨在根据数据遵循Pareto模型,检查马来西亚收入分配的上尾的异常值的存在。为此目的,考虑了三种类型的Boxplot:标准Boxplot,调整后的Boxplot和泛化的Boxplot。这些盒子的性能由模拟研究确定。在本研究中,数据从Pareto分布模拟,P(1,α= 2,3,4),然后通过替换ε(3%,5%,10%)随机选择的数据来污染模拟数据。结果发现,与标准和调整后的盒子相比,广义盒子盒提供更高的功率值。因此,广义盒子用于确定收入分布的上尾的异常值的存在,而通过使用VAN KERM的公式测定帕累托尾部建模的阈值。结果表明,由于家庭收入数据中的普遍化盒子在2007年,2009年,2012年和2014年的阈值下,通过了0.4%,0.4%,0.9%和1.2%的异常值检测到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号