Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising

Sisodia D.; Sisodia D.S.

首页> 外文期刊>Data technologies and applications >Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising

【24h】

Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising

机译：从在线广告的用户点击数据中自动进行欺诈性发布者分类的特征提炼和累积选择

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

© 2021, Emerald Publishing Limited.Purpose: The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection. Design/methodology/approach: To overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models. Findings: Empirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification. Originality/value: The FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.

机译：©2021,翡翠出版有限。选择最大的有用特性的问题用户从数以百计的特征时间序列单击数据出现在在线广告的方向欺诈出版商的分类。特征子集是在这样一个关键问题分类任务。过滤方法是常见的;忽略了在功能相关性。相反,包装方法不可能应用由于其复杂性。具体来说,现有的特征选择方法不能处理这些数据,这是特征选择的不稳定的主要原因。设计/方法/方法:克服这些问题,多数voting-based混合特性蒸馏和选择方法,即特征积累的选择(上述),提出了调查相关的最佳子集特性分析了出版商的欺诈行为。从蒸馏,重要的特性特征选择标准过滤器和包装器使用多数表决方法获得;积累的选择,我们列举一个累计评价相关的特征子集寻找一个最优的特征子集有效的机器学习(ML)模型。实证结果证明增强分类平均性能,提出了功能精度,还记得,f1-score和AUC出版商识别和分类。创意/值:上述评估FDMA2012用户点击数据和其他九个基准数据集来评估其泛化特点,首先,考虑原始特性,其次,与相关特征子集选择特征选择(FS)方法,第三,获得最优特征子集提出的方法。进行演示显著差异之间的独立特性。

著录项

来源
《Data technologies and applications》 |2022年第4期|602-625|共24页
作者
Sisodia D.; Sisodia D.S.;
展开▼
作者单位

Computer Science and Engineering National Institute of Technology Raipur;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词

相似文献

外文文献
中文文献

1. An Embedded Feature Selection Method for Imbalanced Data Classification [J] . Haoyue Liu, MengChu Zhou, Qing Liu 自动化学报：英文版 . 2019,第003期
2. An Embedded Feature Selection Method for Imbalanced Data Classification [J] . Haoyue Liu, MengChu Zhou, Qing Liu 自动化学报（英文版） . 2019,第3期
3. A hybrid data-level sampling approach in learning from skewed user-click data for click fraud detection in online advertising [J] . Deepti Sisodia, Dilip Singh Sisodia Expert systems: The international journal of knowledge engineering . 2023,第2期

机译：A hybrid data-level sampling approach in learning from skewed user-click data for click fraud detection in online advertising
4. Data Sampling Strategies for Click Fraud Detection Using Imbalanced User Click Data of Online Advertising: An Empirical Review [J] . Deepti Sisodia, Dilip Singh Sisodia IETE technical review . 2022,第4期

机译：Data Sampling Strategies for Click Fraud Detection Using Imbalanced User Click Data of Online Advertising: An Empirical Review
5. Features Selection as a Nash-Bargaining Solution: Applications in Online Advertising and Information Systems [J] . Kimia Keshanian, Daniel Zantedeschi, Kaushik Dutta INFORMS journal on computing . 2022,第5期

机译：Features Selection as a Nash-Bargaining Solution: Applications in Online Advertising and Information Systems
6. Feature selection and classification problem in bioinformatics. [O] . 2010

机译：Feature selection and classification problem in bioinformatics.

Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising

摘要

著录项

相似文献

相关主题

期刊订阅