首页> 外文期刊>Future generation computer systems >Identifying malicious social media contents using multi-view Context-Aware active learning
【24h】

Identifying malicious social media contents using multi-view Context-Aware active learning

机译:使用多视图上下文感知主动学习识别恶意社交媒体内容

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a semi-supervised, multi-view, active learning method, which uses an optimized set of most informative samples and utilizes domain specific context information to efficiently and effectively identify malicious forum content in web-based social media platforms. As research shows, the task of automated identification of malicious forum posts, which also helps in detecting their associated key suspects in web forums, faces numerous challenges: (1) Online data, particularly social media data originate from diverse and heterogeneous sources and are largely unstructured; (2) Online data characteristics evolve quickly; and, (3) There are limited amounts of ground truth data to support the development of effective classification technologies in a strictly supervised scenario. In order to address the above challenges, the proposed human-machine collaborative, semi-supervised learning method is designed to efficiently and effectively identify harmful, provocative, or fabricated forum content by observing only a small number of annotated samples. Our learning framework is initiated by modeling initial view-dependent classifiers from a limited labeled data collection and allows each, in an interactive manner, to evolve dynamically into a sophisticated model by observing data patterns from a shared shortlist of most informative samples, identified via a graph-based optimization method and solved by a maximum flow algorithm. By designing a context rich metric definition in a data-driven manner, the proposed framework is able to learn a sufficiently robust classification model, that utilizes only a small number of human annotated samples, typically 1-2 orders of magnitude fewer as compared to a fully supervised solution. We validate our method using a large collection of flagged words with a wide range of origins, words frequently appearing in web-based forums and manually verified by multiple experienced, independent domain experts. (C) 2019 Elsevier B.V. All rights reserved.
机译:本文提出了一种半监督,多视图的主动学习方法,该方法使用一组最优化的大多数信息样本,并利用特定领域的上下文信息来有效地识别基于Web的社交媒体平台中的恶意论坛内容。研究表明,自动识别恶意论坛帖子的任务(也有助于在网络论坛中检测与之相关的主要嫌疑人)面临许多挑战:(1)在线数据,尤其是社交媒体数据,其来源和来源多种多样非结构化(2)在线数据特征发展迅速; (3)在严格监督的情况下,为支持有效分类技术的发展而提供的地面真理数据数量有限。为了解决上述挑战,建议的人机协作,半监督学习方法旨在通过仅观察少量带注释的样本来有效地识别有害,挑衅或虚假的论坛内容。我们的学习框架是通过对来自有限标记数据集合的初始依赖于视图的分类器进行建模而启动的,并允许每个人以交互方式通过观察大多数信息量样本的共享候选清单中的数据模式,动态演变为复杂模型,基于图的优化方法并通过最大流量算法求解。通过以数据驱动的方式设计上下文丰富的度量定义,所提出的框架能够学习足够强大的分类模型,该模型仅使用少量的人类注释样本,与样本相比,通常少1-2个数量级。完全监督的解决方案。我们使用大量来源广泛且带有标志的单词,经常出现在基于Web的论坛中并由多位经验丰富的独立领域专家手动验证的单词来验证我们的方法。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号