首页> 外文会议>2017 XLIII Latin American Computer Conference >Cost-sensitive classifier for spam detection on news media Twitter accounts
【24h】

Cost-sensitive classifier for spam detection on news media Twitter accounts

机译:成本敏感的分类器,用于在新闻媒体Twitter帐户上检测垃圾邮件

获取原文
获取原文并翻译 | 示例

摘要

Social media are increasingly being used as sources in mainstream news coverage. However, since news is so rapidly updating it is very easy to fall into the trap of believing everything as truth. Spam content usually refers to the information that goes viral and skews users' views on subjects. To this end, this paper introduces a new approach for detecting spam tweets using Cost-Sensitive Classifier that includes Random Forest. Tweets were first annotated manually and then four different sets of features were extracted from them. Afterward, four machine learning algorithms were cross-validated to determine the best base classifier for spam detection. Finally, class imbalanced problem was dealt by resampling and incorporating arbitrary misclassification costs into the learning process. Results showed that the proposed approach helped mitigate overfitting and reduced classification error by achieving an overall accuracy of 89.14% in training and 76.82% in testing.
机译:社交媒体正越来越多地被用作主流新闻报道的来源。但是,由于新闻是如此迅速地更新,因此很容易陷入将一切都视为真理的陷阱。垃圾内容通常是指传播大量信息并歪曲用户对主题的看法的信息。为此,本文介绍了一种使用包括随机森林在内的使用成本敏感分类器检测垃圾邮件推文的新方法。首先手动注释推文,然后从中提取四组不同的功能。之后,对四种机器学习算法进行了交叉验证,以确定用于垃圾邮件检测的最佳基础分类器。最后,通过重新采样并将任意错误分类成本纳入学习过程来解决班级不平衡问题。结果表明,该方法通过在训练中达到89.14%的整体准确度,在测试中达到76.82%的整体准确度,有助于减轻过拟合并减少分类错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号