Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets

Viktoras BULAVAS; Virginijus MARCINKEVICIUS; Jacek RUMINSKI

首页> 外文期刊>Informatica >Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets

【24h】

Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets

机译：高级分类算法对高度不平衡网络入侵数据集的研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper is devoted to the problem of class imbalance in machine learning, focusing on the intrusion detection of rare classes in computer networks. The problem of class imbalance occurs when one class heavily outnumbers examples from the other classes. In this paper, we are particularly interested in classifiers, as pattern recognition and anomaly detection could be solved as a classification problem. As still a major part of data network traffic of any organization network is benign, and malignant traffic is rare, researchers therefore have to deal with a class imbalance problem. Substantial research has been undertaken in order to identify these methods or data features that allow to accurately identify these attacks. But the usual tactic to deal with the imbalance class problem is to label all malignant traffic as one class and then solve the binary classification problem. In this paper, however, we choose not to group or to drop rare classes but instead investigate what could be done in order to achieve good multi-class classification efficiency. Rare class records were up-sampled using SMOTE method (Chawla et al, 2002) to a preset ratio targets. Experiments with the 3 network traffic datasets, namely CIC-IDS2017, CSE-CIC-IDS2018 (Sharafaldin et al., 2018) and LITNET-2020 (Damasevicius et al., 2020) were performed aiming to achieve reliable recognition of rare malignant classes available in these datasets. Popular machine learning algorithms were chosen for comparison of their readiness to support rare class detection. Related algorithm hyper parameters were tuned within a wide range of values, different data feature selection methods were used and tests were executed with and without over-sampling to test the multiple class problem classification performance of rare classes. Machine learning algorithms ranking based on Precision, Balanced Accuracy Score, G, and prediction error Bias and Variance decomposition, show that decision tree ensembles (Adaboost, Random Forest Trees and Gradient Boosting Classifier) performed best on the network intrusion datasets used in this research.

机译：本文致力于机器学习中的类别不平衡问题，专注于计算机网络中罕见课程的入侵检测。当一个类超越其他类的例子时，发生了类别不平衡的问题。在本文中，我们对分类器特别感兴趣，因为模式识别和异常检测可以作为分类问题解决。由于任何组织网络的数据网络流量的主要部分是良性的，并且恶性交通罕见，因此研究人员必须处理课程不平衡问题。已经进行了实质性研究，以确定允许准确识别这些攻击的这些方法或数据功能。但是要处理不平衡课题问题的通常策略是将所有恶性流量标记为一个类，然后解决二进制分类问题。然而，在本文中，我们选择不对群体或删除稀有课程，而是调查可以做些什么，以实现良好的多级分类效率。使用Smote方法（Chawla等，2002）对预设比率靶向罕见的类记录。使用3个网络流量数据集的实验，即CIC-IDS2017，CSE-CIC-IDS2018（Sharafaldin等，2018）和Litnet-2020（Damasevicius等，2020）的旨在实现可靠的可用性罕见的恶性课程在这些数据集中。选择流行的机器学习算法，以比较他们准备支持稀有级别检测。相关算法在宽范围内调整相关算法，使用不同的数据特征选择方法，并使用和无过采样来执行测试以测试罕见类的多级问题分类性能。机器学习算法基于精度，平衡准确度分数，G和预测误差偏差和方差分解等级，显示了决策树集合（Adaboost，随机林树和梯度升压分类器）在本研究中使用的网络入侵数据集上最佳地执行。

著录项

来源
《Informatica》 |2021年第3期|441-475|共35页
作者
Viktoras BULAVAS; Virginijus MARCINKEVICIUS; Jacek RUMINSKI;
展开▼
作者单位

Institute of Data Science and Digital Technologies Vilnius University Akademijos str. 4 LT-08663 Vilnius Lithuania;

Institute of Data Science and Digital Technologies Vilnius University Akademijos str. 4 LT-08663 Vilnius Lithuania;

Faculty of Electronics Telecommunications and Informatics Gdansk University of Technology 11/12 Gabriela Narutowicza 80-233 Gdansk Poland;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
network intrusion detection; multi-class classification; imbalanced learning; bias and variance decomposition; SMOTE;

机译：网络入侵检测;多级分类;学习的不平衡;偏差和方差分解;窒息;

相似文献

外文文献
中文文献
专利

1. Studying cost-sensitive learning for multi-class imbalance in Internet traffic classification [J] . LIU Zhen, LIU Qiong 中国邮电高校学报（英文版） . 2012,第006期
2. Implementation of the Broad—band User Access Network and the Study of the Key Algorithm for Improving the Performance of Asymmetrical Networks [J] . LIHui－lin, HUZheng－ming, 等中国邮电高校学报：英文版 . 2001,第002期
3. Implementation of the Broadband User AccessNetwork and the Study of the Key Algorithm for Improving the Performance of Asymmetrical Networks [J] . 中国邮电高校学报（英文版） . 2001,第002期
4. AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning [J] . Taherkhani Aboozar, Cosma Georgina, McGinnity T. M. Neurocomputing . 2020,第Sepa3期

机译：adaboost-cnn：卷积神经网络的自适应促进算法，用于使用传输学习对多级不平衡数据集进行分类
5. Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms [J] . Li Jinyan, Fong Simon, Mohammed Sabah, Journal of supercomputing . 2016,第10期

机译：通过群体优化算法提高生物不平衡数据集的分类性能
6. Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. [J] . Mazurowski MA, Habas PA, Zurada JM, Neural Networks: The Official Journal of the International Neural Network Society . 2008,第2a3期

机译：训练用于医疗决策的神经网络分类器：不平衡数据集对分类性能的影响。
7. Enhancing classification performance of multi-class imbalanced data using the OAA-DB algorithm [C] . Jeatrakul Piyasak, Wong Kok Wai Neural Networks (IJCNN), The 2012 International Joint Conference on . 2012

机译：使用OAA-DB算法提高多类不平衡数据的分类性能
8. A comparative study of classification algorithms for network intrusion detection. [D] . Wang, Yunling. 2004

机译：网络入侵检测分类算法的比较研究。
9. Training Neural Network Classifiers for Medical Decision Making: The Effects of Imbalanced Datasets on Classification Performance [O] . Maciej A. Mazurowski, Piotr A. Habas, Jacek M. Zurada, -1

机译：训练用于医疗决策的神经网络分类器：不平衡数据集对分类性能的影响
10. Enhancing classification performance of multi-class imbalanced data using the OAA-DB algorithm [O] . Jeatrakul P., Wong K.W. 2012

机译：使用OAA-DB算法提高多类不平衡数据的分类性能
11. Learning Algorithms for Multi-Class Pattern Classification and Problems Associated with on-Line Handwritten Character Recognition [R] . Li, C. C., Teng, T. L. 1970

机译：多类模式分类的学习算法及与在线手写字符识别相关的问题

Study of Multi-Class Classification Algorithms' Performance on Highly Imbalanced Network Intrusion Datasets

摘要

著录项

相似文献

相关主题

期刊订阅