首页> 外文期刊>Statistics and computing >Outlyingness: Which variables contribute most?
【24h】

Outlyingness: Which variables contribute most?

机译:偏远:哪些变量贡献最多?

获取原文
获取原文并翻译 | 示例
           

摘要

Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier's outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.
机译:异常值检测是大多数统计数据分析的必然步骤。然而,仅仅检测偏远案例并不总是回答与该数据点相关的所有科学问题。异常值检测技术,经典和强大的相似,通常将整个案例标记为偏远,或将特定病例权重归因于整个情况。在实践中,特别是在高维数据中,异常值很可能不会沿其所有变量偏远,而是沿着它们的子集。如果是这样,科学质疑为什么案件被标记为异常值。在本文中,提出了一种快速和有效的方法来检测对异常值偏远的最大贡献的变量。因此,它有助于分析师理解,以便异常呈现出来的方式。在这项工作中追求的方法是估计最大偏远的单一方向。结果表明,估计方向的问题可以被重写为经典最小二乘回归问题的规范解。因此,通过以稀疏方式估计相关的最小二乘问题,可以通过估计相关的最小二乘问题来识别大多数贡献的变量的子集。从实际角度来看,建议以快速稀疏的尼皮斯(SNIPLS)算法来解决稀疏部分最小二乘(SPL)回归,以解决该问题。对所执行的方法进行说明在模拟数据和现实生活中执行良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号