...
首页> 外文期刊>International Journal of Distributed Sensor Networks >Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
【24h】

Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets

机译:模糊综合少数群体过采样技术:基于模糊集理论的过采样用于不平衡数据集中的Android恶意软件检测

获取原文
           

摘要

In previous work, imbalanced datasets composed of more benign samples (the majority class) than the malicious one (the minority class) have been widely adopted in Android malware detection. These imbalanced datasets bias learning toward the majority class, so that the minority class examples are more likely to be misclassified. To solve the problem, we propose a new oversampling method called fuzzy–synthetic minority oversampling technique, which is based on fuzzy set theory and the synthetic minority oversampling technique method. As the sample size of the majority class increases relative to that of the minority class, fuzzy–synthetic minority oversampling technique generates more synthetic examples for each minority class examples in the fuzzy region, where the minority examples have a low degree of membership to the minority class and are more likely to be misclassified. Using the new synthetic examples, the classifiers build larger decision regions that contain more minority examples, and they are no longer biased to the majority class. Compared with synthetic minority oversampling technique and Borderline–synthetic minority oversampling technique methods, fuzzy–synthetic minority oversampling technique achieves higher accuracy on both the minority class and the entire datasets.
机译:在先前的工作中,由恶意样本(多数类)比恶意样本(少数类)更多的良性样本(多数类)组成的不平衡数据集已被Android恶意软件检测广泛采用。这些不平衡的数据集将学习偏向多数派,因此少数派示例更可能被错误分类。为了解决这个问题,我们提出了一种新的过采样方法,称为模糊-综合少数人过采样技术,它基于模糊集理论和综合少数人过采样技术方法。随着多数类样本数量相对于少数类样本数量的增加,模糊合成的少数群体过采样技术会为模糊区域中的每个少数类样本生成更多的合成样本,其中少数样本对少数群体的隶属度较低类,并且更有可能被错误分类。使用新的综合示例,分类器可建立包含更多少数派示例的更大决策区域,并且不再偏向多数派。与合成少数派过采样技术和边界线-合成少数派过采样技术方法相比,模糊-合成少数派过采样技术在少数派类别和整个数据集上均具有更高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号