...
首页> 外文期刊>IEEE transactions on multimedia >Boosting Positive and Unlabeled Learning for Anomaly Detection With Multi-Features
【24h】

Boosting Positive and Unlabeled Learning for Anomaly Detection With Multi-Features

机译:用多个功能提高对异常检测的积极和未标记的学习

获取原文
获取原文并翻译 | 示例
           

摘要

One of the key challenges of machine learning-based anomaly detection relies on the difficulty of obtaining anomaly data for training, which is usually rare, diversely distributed, and difficult to collect. To address this challenge, we formulate anomaly detection as a Positive and Unlabeled (PU) learning problem where only labeled positive (normal) data and unlabeled (normal and anomaly) data are required for learning an anomaly detector. As a semi-supervised learning method, it does not require providing labeled anomaly data for the training, thus it is easily deployed to various applications. As the unlabeled data can be extremely unbalanced, we introduce a novel PU learning method, which can tackle the situation where an unlabeled data set is mostly composed of positive instances. We start by using a linear model to extract the most reliable negative instances followed by a self-learning process to add reliable negative and positive instances with different speeds based on the estimated positive class prior. Furthermore, when feedback is available, we adopt boosting in the self-learning process to advantageously exploit the instability characteristic of PU learning. The classifiers in the self-learning process are weighted combined based on the estimated error rate to build the final classifier. Extensive experiments on six real datasets and one synthetic dataset show that our methods have better results under different conditions compared to existing methods.
机译:基于机器学习的异常检测的关键挑战之一依赖于获得培训的异常数据的难度,这通常是罕见的,多样化的分布,难以收集。为了解决这一挑战,我们将异常检测标记为积极和未标记的(PU)学习问题,只有标记的正(正常)数据和未标记的(正常和异常)数据是学习异常探测器所必需的。作为半监督的学习方法,它不需要为培训提供标记的异常数据,因此它很容易部署到各种应用程序。由于未标记的数据可能非常不平衡,我们介绍了一种新颖的PU学习方法,可以解决未标记数据集主要由正实例组成的情况。我们首先使用线性模型来提取最可靠的负面实例,然后是自学习过程,基于估计的正类在估计的正类上添加具有不同速度的可靠的负面实例。此外,当有反馈时,我们采用在自学习过程中提升,以利地利用PU学习的不稳定特征。自学习过程中的分类器基于估计的误差率来构建最终分类器的加权组合。与六个真实数据集和一个合成数据集的大量实验表明,与现有方法相比,我们的方法在不同条件下具有更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号