首页> 外文会议>Knowledge-Based Systems for Safety Critical Applications >LOCI: fast outlier detection using the local correlation integral
【24h】

LOCI: fast outlier detection using the local correlation integral

机译:LOCI:使用局部相关积分快速检测异常值

获取原文
获取原文并翻译 | 示例

摘要

Outlier detection is an integral part of data mining and has attracted much attention recently [M. Breunig et al., (2000)], [W. Jin et al., (2001)], [E. Knorr et al., (2000)]. We propose a new method for evaluating outlierness, which we call the local correlation integral (LOCI). As with the best previous methods, LOCI is highly effective for detecting outliers and groups of outliers (a.k.a. micro-clusters). In addition, it offers the following advantages and novelties: (a) It provides an automatic, data-dictated cutoff to determine whether a point is an outlier-in contrast, previous methods force users to pick cut-offs, without any hints as to what cut-off value is best for a given dataset. (b) It can provide a LOCI plot for each point; this plot summarizes a wealth of information about the data in the vicinity of the point, determining clusters, micro-clusters, their diameters and their inter-cluster distances. None of the existing outlier-detection methods can match this feature, because they output only a single number for each point: its outlierness score, (c) Our LOCI method can be computed as quickly as the best previous methods, (d) Moreover, LOCI leads to a practically linear approximate method, aLOCI (for approximate LOCI), which provides fast highly-accurate outlier detection. To the best of our knowledge, this is the first work to use approximate computations to speed up outlier detection. Experiments on synthetic and real world data sets show that LOCI and aLOCI can automatically detect outliers and micro-clusters, without user-required cut-offs, and that they quickly spot both expected and unexpected outliers.
机译:离群检测是数据挖掘不可或缺的一部分,最近引起了很多关注[M. Breunig et al。,(2000)],[W。 Jin et al。,(2001)],[E。 Knorr et al。,(2000)]。我们提出了一种评估离群值的新方法,称为局部相关积分(LOCI)。与以前的最佳方法一样,LOCI在检测异常值和异常值组(又称微型群集)方面非常有效。此外,它还具有以下优点和新颖性:(a)它提供了一个自动的,由数据决定的截止点,以确定一个点是否是离群值,相反,以前的方法迫使用户选择截止点,而没有任何暗示对于给定的数据集,哪种截断值最合适。 (b)可以提供每个点的LOCI图;该图总结了有关该点附近数据的大量信息,确定了簇,微簇,它们的直径以及它们之间的距离。现有的离群值检测方法均无法匹配此功能,因为它们对每个点仅输出一个数字:其离群值得分,(c)我们的LOCI方法的计算速度可以与先前最好的方法一样快,(d)此外, LOCI导致了一种实用的线性近似方法aLOCI(用于近似LOCI),该方法可提供快速,高精度的离群值检测。据我们所知,这是使用近似计算来加快离群值检测的第一项工作。对合成数据和现实世界数据集的实验表明,LOCI和aLOCI可以自动检测离群值和微观集群,而无需用户要求截止值,并且它们可以快速发现预期的和意外的离群值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号