首页> 外文期刊>Statistical Methods and Applications >robROSE: A robust approach for dealing with imbalanced data in fraud detection
【24h】

robROSE: A robust approach for dealing with imbalanced data in fraud detection

机译:robrose:一种稳健的方法,用于处理欺诈检测中的不平衡数据

获取原文
获取原文并翻译 | 示例
           

摘要

A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set. In most data sets, fraud occurs in typically less than 0.5% of the cases. Detecting fraud in such a highly imbalanced data set typically leads to predictions that favor the majority group, causing fraud to remain undetected. We discuss some popular oversampling techniques that solve the problem of imbalanced data by creating synthetic samples that mimic the minority class. A frequent problem when analyzing real data is the presence of anomalies or outliers. When such atypical observations are present in the data, most oversampling techniques are prone to create synthetic samples that distort the detection algorithm and spoil the resulting analysis. A useful tool for anomaly detection is robust statistics, which aims to find the outliers by first fitting the majority of the data and then flagging data observations that deviate from it. In this paper, we present a robust version of ROSE, called robROSE, which combines several promising approaches to cope simultaneously with the problem of imbalanced data and the presence of outliers. The proposed method achieves to enhance the presence of the fraud cases while ignoring anomalies. The good performance of our new sampling technique is illustrated on simulated and real data sets and it is shown that robROSE can provide better insight in the structure of the data. The source code of the robROSE algorithm is made freely available.
机译:试图检测欺诈时的主要挑战是欺诈活动形成了一个少数群体,弥补了一个非常小的数据集比例。在大多数数据集中,欺诈通常在小于0.5%的情况下发生。在这种高度不平衡数据集中检测欺诈通常会导致有利于多数组的预测,导致欺诈仍未被遗留。我们讨论了一些流行的过采样技术,通过创建模拟少数群体类的合成样本来解决不平衡数据的问题。分析真实数据时的常见问题是存在异常或异常值。当数据中存在此类非典型观察时,大多数过采样技术易于创建扭曲检测算法的合成样本,并破坏所得到的分析。用于异常检测的有用工具是强大的统计数据,旨在通过首先拟合大多数数据然后标记偏离它的数据观察来找到异常值。在本文中,我们展示了一种叫做robrose的玫瑰的强大版本,其结合了几种有希望的方法来同时应对数据的问题和异常值的存在。该方法达到了增强欺诈病例的存在,同时忽略异常。我们的新采样技术的良好表现在模拟和实际数据集上示出,并显示RobRose可以在数据结构中提供更好的洞察。 RobRose算法的源代码是自由的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号