...
首页> 外文期刊>Journal of Information Science >Feature engineering for detecting spammers on Twitter: Modelling and analysis
【24h】

Feature engineering for detecting spammers on Twitter: Modelling and analysis

机译:用于在Twitter上检测垃圾邮件发送者的功能工程:建模和分析

获取原文
获取原文并翻译 | 示例
           

摘要

Twitter is a social networking website that has gained a lot of popularity around the world in the last decade. This popularity made Twitter a common target for spammers and malicious users to spread unwanted advertisements, viruses and phishing attacks. In this article, we review the latest research works to determine the most effective features that were investigated for spam detection in the literature. These features are collected to build a comprehensive data set that can be used to develop more robust and accurate spammer detection models. The new data set is tested using popular classifiers (Naive Bayes, support vector machines, multilayer perceptron neural networks, Decision Trees, Random forests and k -Nearest Neighbour). The prediction performance of these classifiers is evaluated and compared based on different evaluation metrics. Moreover, a further analysis is carried out to identify the features that have higher impact on the accuracy of spam detection. Three different techniques are used and compared for this analysis: change of mean square error (CoM), information gain (IG) and Relief-F method. Top five features identified by each technique are used again to build the detection models. Experimental results show that most of the developed classifiers obtained high evaluation results based on the comprehensive data set constructed in this work. Experiments also reveal the important role of some features like the reputation of the account, average length of the tweet, average mention per tweet, age of the account, and the average time between posts in the process of identifying spammers in the social network.
机译:Twitter是一个社交网站,在过去十年中已在全球范围内广受欢迎。这种流行使Twitter成为垃圾邮件发送者和恶意用户传播不需要的广告,病毒和网络钓鱼攻击的常见目标。在本文中,我们回顾了最新的研究工作,以确定在文献中被调查用于垃圾邮件检测的最有效功能。收集这些功能以构建全面的数据集,该数据集可用于开发更健壮和准确的垃圾邮件发送者检测模型。使用流行的分类器(朴素贝叶斯,支持向量机,多层感知器神经网络,决策树,随机森林和k最近邻)对新数据集进行了测试。这些分类器的预测性能将根据不同的评估指标进行评估和比较。此外,还进行了进一步的分析,以确定对垃圾邮件检测的准确性有较高影响的功能。使用了三种不同的技术并对其进行了比较:均方差(CoM)的变化,信息增益(IG)和Relief-F方法。每种技术确定的前五项功能将再次用于构建检测模型。实验结果表明,基于这项工作构建的综合数据集,大多数已开发的分类器均获得了较高的评价结果​​。实验还揭示了一些功能的重要作用,例如帐户的信誉,平均鸣叫时间,每个鸣叫的平均提及次数,帐户的年龄以及在社交网络中识别垃圾邮件发送者的平均间隔时间。

著录项

  • 来源
    《Journal of Information Science》 |2018年第2期|230-247|共18页
  • 作者单位

    Business Information Technology, King Abdullah II School of Information Technology, The University of Jordan, Jordan;

    Business Information Technology, King Abdullah II School of Information Technology, The University of Jordan, Jordan;

    Business Information Technology, King Abdullah II School of Information Technology, The University of Jordan, Jordan;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Classifiers; detection; feature engineering; spam; spam features; spammers; Twitter;

    机译:分类器;检测;功能工程;垃圾邮件;垃圾邮件功能;垃圾邮件发送者;Twitter;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号