首页> 外文期刊>Data technologies and applications >Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising
【24h】

Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising

机译:从在线广告的用户点击数据中自动进行欺诈性发布者分类的特征提炼和累积选择

获取原文
获取原文并翻译 | 示例
           

摘要

© 2021, Emerald Publishing Limited.Purpose: The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection. Design/methodology/approach: To overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models. Findings: Empirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification. Originality/value: The FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.
机译:©2021,翡翠出版有限。选择最大的有用特性的问题用户从数以百计的特征时间序列单击数据出现在在线广告的方向欺诈出版商的分类。特征子集是在这样一个关键问题分类任务。过滤方法是常见的;忽略了在功能相关性。相反,包装方法不可能应用由于其复杂性。具体来说,现有的特征选择方法不能处理这些数据,这是特征选择的不稳定的主要原因。设计/方法/方法:克服这些问题,多数voting-based混合特性蒸馏和选择方法,即特征积累的选择(上述),提出了调查相关的最佳子集特性分析了出版商的欺诈行为。从蒸馏,重要的特性特征选择标准过滤器和包装器使用多数表决方法获得;积累的选择,我们列举一个累计评价相关的特征子集寻找一个最优的特征子集有效的机器学习(ML)模型。实证结果证明增强分类平均性能,提出了功能精度,还记得,f1-score和AUC出版商识别和分类。创意/值:上述评估FDMA2012用户点击数据和其他九个基准数据集来评估其泛化特点,首先,考虑原始特性,其次,与相关特征子集选择特征选择(FS)方法,第三,获得最优特征子集提出的方法。进行演示显著差异之间的独立特性。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号