...
首页> 外文期刊>Informatica >Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets
【24h】

Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets

机译:高级分类算法对高度不平衡网络入侵数据集的研究

获取原文
获取原文并翻译 | 示例
           

摘要

This paper is devoted to the problem of class imbalance in machine learning, focusing on the intrusion detection of rare classes in computer networks. The problem of class imbalance occurs when one class heavily outnumbers examples from the other classes. In this paper, we are particularly interested in classifiers, as pattern recognition and anomaly detection could be solved as a classification problem. As still a major part of data network traffic of any organization network is benign, and malignant traffic is rare, researchers therefore have to deal with a class imbalance problem. Substantial research has been undertaken in order to identify these methods or data features that allow to accurately identify these attacks. But the usual tactic to deal with the imbalance class problem is to label all malignant traffic as one class and then solve the binary classification problem. In this paper, however, we choose not to group or to drop rare classes but instead investigate what could be done in order to achieve good multi-class classification efficiency. Rare class records were up-sampled using SMOTE method (Chawla et al, 2002) to a preset ratio targets. Experiments with the 3 network traffic datasets, namely CIC-IDS2017, CSE-CIC-IDS2018 (Sharafaldin et al., 2018) and LITNET-2020 (Damasevicius et al., 2020) were performed aiming to achieve reliable recognition of rare malignant classes available in these datasets. Popular machine learning algorithms were chosen for comparison of their readiness to support rare class detection. Related algorithm hyper parameters were tuned within a wide range of values, different data feature selection methods were used and tests were executed with and without over-sampling to test the multiple class problem classification performance of rare classes. Machine learning algorithms ranking based on Precision, Balanced Accuracy Score, G, and prediction error Bias and Variance decomposition, show that decision tree ensembles (Adaboost, Random Forest Trees and Gradient Boosting Classifier) performed best on the network intrusion datasets used in this research.
机译:本文致力于机器学习中的类别不平衡问题,专注于计算机网络中罕见课程的入侵检测。当一个类超越其他类的例子时,发生了类别不平衡的问题。在本文中,我们对分类器特别感兴趣,因为模式识别和异常检测可以作为分类问题解决。由于任何组织网络的数据网络流量的主要部分是良性的,并且恶性交通罕见,因此研究人员必须处理课程不平衡问题。已经进行了实质性研究,以确定允许准确识别这些攻击的这些方法或数据功能。但是要处理不平衡课题问题的通常策略是将所有恶性流量标记为一个类,然后解决二进制分类问题。然而,在本文中,我们选择不对群体或删除稀有课程,而是调查可以做些什么,以实现良好的多级分类效率。使用Smote方法(Chawla等,2002)对预设比率靶向罕见的类记录。使用3个网络流量数据集的实验,即CIC-IDS2017,CSE-CIC-IDS2018(Sharafaldin等,2018)和Litnet-2020(Damasevicius等,2020)的旨在实现可靠的可用性罕见的恶性课程在这些数据集中。选择流行的机器学习算法,以比较他们准备支持稀有级别检测。相关算法在宽范围内调整相关算法,使用不同的数据特​​征选择方法,并使用和无过采样来执行测试以测试罕见类的多级问题分类性能。机器学习算法基于精度,平衡准确度分数,G和预测误差偏差和方差分解等级,显示了决策树集合(Adaboost,随机林树和梯度升压分类器)在本研究中使用的网络入侵数据集上最佳地执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号