首页> 外文会议>SIAM International Conference on Data Mining >Efficient Selection of Globally Optimal Rules on Large Imbalanced Data Based on Rule Coverage Relationship Analysis
【24h】

Efficient Selection of Globally Optimal Rules on Large Imbalanced Data Based on Rule Coverage Relationship Analysis

机译:基于规则覆盖关系分析的大型不平衡数据有效选择全局最优规则

获取原文

摘要

Rule-based anomaly and fraud detection systems often suffer from massive false alerts against a huge number of enterprise transactions. A crucial and challenging problem is to effectively select a globally optimal rule set which can capture very rare anomalies dispersed in large-scale background transactions. The existing rule selection methods which suffer significantly from complex rule interactions and overlapping in large imbalanced data, often lead to very high false positive rate. In this paper, we analyze the interactions and relationships between rules and their coverage on transactions, and propose a novel metric, Max Coverage Gain. Max Coverage Gain selects the optimal rule set by evaluating the contribution of each rule in terms of overall performance to cut out those locally significant but globally redundant rules, without any negative impact on the recall. An effective algorithm, MCGminer, is then designed with a series of built-in mechanisms and pruning strategies to handle complex rule interactions and reduce computational complexity towards identifying the globally optimal rule set. Substantial experiments on 13 UCI data sets and a real time online banking transactional database demonstrate that MCGminer achieves significant improvement on both accuracy, scalability, stability and efficiency on large imbalanced data compared to several state-of-the-art rule selection techniques.
机译:基于规则的异常和欺诈检测系统通常会因大量的企业交易而受到大规模的虚假警报。至关重要的问题是有效地选择全球最佳规则集,该规则集可以捕获分散在大规模背景交易中的非常罕见的异常。从复杂规则相互作用和大量不平衡数据重叠的现有规则选择方法,通常导致非常高的误率。在本文中,我们分析了规则之间的互动和关系,并提出了一种新的度量最大覆盖率。 Max Ruckage Gain通过在整体性能方面评估每个规则的贡献来选择最佳规则,以削减那些当地重要但全球冗余规则,而不会对召回的任何负面影响。然后,麦格默尼尔有效的算法,然后设计了一系列内置机制和修剪策略,以处理复杂的规则交互,并降低识别全局最优规则集的计算复杂性。关于13个UCI数据集的实质实验和实时在线银行交易数据库表明,与多种最先进的规则选择技术相比,McGminer对精度,可扩展性,稳定性和效率进行了显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号