首页> 外文会议>Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining >Spam detection of Twitter traffic: A framework based on random forests and non-uniform feature sampling
【24h】

Spam detection of Twitter traffic: A framework based on random forests and non-uniform feature sampling

机译:Twitter流量的垃圾邮件检测:基于随机森林和非均匀特征采样的框架

获取原文
获取原文并翻译 | 示例

摘要

Law Enforcement Agencies cover a crucial role in the analysis of open data and need effective techniques to filter troublesome information. In a real scenario, Law Enforcement Agencies analyze Social Networks, i.e. Twitter, monitoring events and profiling accounts. Unfortunately, between the huge amount of internet users, there are people that use microblogs for harassing other people or spreading malicious contents. Users' classification and spammers' identification is a useful technique for relieve Twitter traffic from uninformative content. This work proposes a framework that exploits a non-uniform feature sampling inside a gray box Machine Learning System, using a variant of the Random Forests Algorithm to identify spammers inside Twitter traffic. Experiments are made on a popular Twitter dataset and on a new dataset of Twitter users. The new provided Twitter dataset is made up of users labeled as spammers or legitimate users, described by 54 features. Experimental results demonstrate the effectiveness of enriched feature sampling method.
机译:执法机构在分析开放数据中扮演着至关重要的角色,并且需要有效的技术来过滤麻烦的信息。在实际情况下,执法机构会分析社交网络(即Twitter),监视事件和分析帐户。不幸的是,在庞大的互联网用户之间,有些人使用微博来骚扰他人或传播恶意内容。用户的分类和垃圾邮件发送者的身份识别是一种有用的技术,可以缓解Twitter流量中不包含任何内容的内容。这项工作提出了一个框架,该框架利用灰色盒子机器学习系统内部的非均匀特征采样,使用随机森林算法的一种变体来识别Twitter通信中的垃圾邮件发送者。在流行的Twitter数据集和Twitter用户的新数据集上进行实验。新提供的Twitter数据集由标记为垃圾邮件发送者或合法用户的用户组成,由54个功能描述。实验结果证明了丰富特征采样方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号