...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >WebGuard: a Web filtering engine combining textual, structural, and visual content-based analysis
【24h】

WebGuard: a Web filtering engine combining textual, structural, and visual content-based analysis

机译:WebGuard:Web过滤引擎,结合了基于文本,结构和视觉内容的分析

获取原文
获取原文并翻译 | 示例
           

摘要

Along with the ever-growing Web comes the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable Web content. In this paper, we investigate this problem and describe WebGuard, an automatic machine learning-based pornographic Web site classification and filtering system. Unlike most commercial filtering products, which are mainly based on textual content-based analysis such as indicative keywords detection or manually collected black list checking, WebGuard relies on several major data mining techniques associated with textual, structural content-based analysis, and skin color related visual content-based analysis as well. Experiments conducted on a testbed of 400 Web sites including 200 adult sites and 200 nonpornographic ones showed WebGuard's filtering effectiveness, reaching a 97.4 percent classification accuracy rate when textual and structural content-based analysis was combined with visual content-based analysis. Further experiments on a black list of 12,311 adult Web sites manually collected and classified by the French Ministry of Education showed that WebGuard scored a 95.62 percent classification accuracy rate. The basic framework of WebGuard can apply to other categorization problems of Web sites which combine, as most of them do today, textual and visual content.
机译:随着不断增长的Web的出现,令人反感的内容(如性,暴力,种族主义等)的泛滥也随之而来。我们需要有效的工具来分类和过滤不良Web内容。在本文中,我们将研究此问题并描述WebGuard,这是一个基于机器学习的自动色情网站分类和过滤系统。与大多数商业过滤产品不同,WebGuard主要基于基于文本内容的分析(例如指示性关键字检测或手动收集的黑名单检查),而WebGuard则依赖于与文本,基于结构内容的分析以及与肤色相关的几种主要数据挖掘技术。基于视觉内容的分析。在包含200个成人网站和200个非色情网站的400个网站的测试平台上进行的实验表明,WebGuard的过滤效果有效,当基于文本和结构内容的分析与基于视觉内容的分析相结合时,分类准确率达到97.4%。法国教育部手动收集并分类的12,311个成人网站的黑名单上的进一步实验表明,WebGuard的分类准确率达到95.62%。 WebGuard的基本框架可以应用于网站的其他分类问题,这些问题像今天大多数情况一样,结合了文本和视觉内容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号