...
首页> 外文期刊>Computers & Security >A cost analysis of machine learning using dynamic runtime opcodes for malware detection
【24h】

A cost analysis of machine learning using dynamic runtime opcodes for malware detection

机译:使用动态运行时操作码进行恶意软件检测的机器学习成本分析

获取原文
获取原文并翻译 | 示例
           

摘要

The ongoing battle between malware distributors and those seeking to prevent the onslaught of malicious code has, so far, favored the former. Anti-virus methods are faltering with the rapid evolution and distribution of new malware, with obfuscation and detection evasion techniques exacerbating the issue. Recent research has monitored low-level opcodes to detect malware. Such dynamic analysis reveals the code at runtime, allowing the true behaviour to be examined. While previous research uses machine learning techniques to accurately detect malware using dynamic runtime opcodes, underpinning datasets have been poorly sampled and inadequate in size. Further, the datasets are always fixed size and no attempt, to our knowledge, has been made to examine the cost of retraining malware classification models on datasets which grow continually. In the literature, researchers discuss the explosion of malware, yet opcode analyses have used fixed-size datasets, with no deference to how this model will cope with retraining on escalating datasets. The research presented here examines this problem, and makes several novel contributions to the current body of knowledge.First, the performance of 23 machine learning algorithms are investigated with respect to the largest run trace dataset in the literature. Second, following an extensive hyperparameter selection process, the performance of each classifier is compared, on both accuracy and computational costs (CPU time). Lastly, the cost of retraining and testing updatable and non-updatable classifiers, both parallelized and non-parallelized, is examined with simulated escalating datasets. This provides insight into how implemented malware classifiers would perform, given simulated dataset escalation. We find that parallelized RandomForest, using 4 cores, provides the optimal performance, with high accuracy and low training and testing times. (C) 2019 Elsevier Ltd. All rights reserved.
机译:到目前为止,恶意软件分发者与那些试图防止恶意代码攻击的人之间正在进行的斗争,有利于前者。随着新恶意软件的快速发展和分发,防病毒方法步履蹒跚,而混淆和检测规避技术加剧了该问题。最近的研究已经监视了低级操作码以检测恶意软件。这种动态分析可以在运行时显示代码,从而可以检查真实行为。虽然先前的研究使用机器学习技术来使用动态运行时操作码来准确检测恶意软件,但基础数据集的采样率很低且大小不足。此外,数据集的大小始终是固定的,据我们所知,没有尝试检查对不断增长的数据集重新训练恶意软件分类模型的成本。在文献中,研究人员讨论了恶意软件的爆炸式增长,但操作码分析使用的是固定大小的数据集,而没有考虑该模型将如何应对不断升级的数据集的再训练。本文提出的研究对此问题进行了研究,并对当前的知识体系做出了一些新颖的贡献。首先,针对文献中最大的运行轨迹数据集,研究了23种机器学习算法的性能。其次,经过广泛的超参数选择过程,比较了每个分类器的性能,包括准确性和计算成本(CPU时间)。最后,使用模拟的渐进式数据集检查了重新训练和测试并行化和非并行化的可更新和不可更新分类器的成本。在给定模拟数据集升级的情况下,这可以洞悉已实施的恶意软件分类器将如何执行。我们发现,使用4个内核的并行RandomForest可提供最佳性能,且具有较高的准确性,并且培训和测试时间短。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号