首页> 外文会议>Machine Vision, 2009. ICMV '09 >A New Approach towards Text Filtering
【24h】

A New Approach towards Text Filtering

机译:文本过滤的新方法

获取原文

摘要

The problem of malicious contents in blogs has reached epic proportions and various efforts are underway to fight it. Blog classification using machine learning techniques is a key method towards doing it. We have devised a machine learning algorithm where features are created from individual sentences in the body of a blog by taking one word at a time. Weights are assigned to the features based on the strength of their predictive capabilities for illegitimate/legitimate determination. The predictive capabilities are estimated by the frequency of occurrence of the feature in illegitimate/legitimate collections. During classification, total illegitimate and legitimate evidence in the blog is obtained by summing up the weights of extracted features of each class and the message is classified into whichever class accumulates the greater sum. We compared the algorithm against the popular a nïve-bayes algorithm (in [8]) and found its performance does not deteriorate in the least than that of naïve-bayes algorithm both in terms of catching blog spam and for reducing false positives.
机译:博客中恶意内容的问题已达到史无前例的程度,并且正在采取各种措施来加以解决。使用机器学习技术的博客分类是实现博客分类的关键方法。我们设计了一种机器学习算法,该算法通过一次输入一个单词从博客正文中的各个句子创建功能。权重是根据其用于非法/合法确定的预测能力的强度分配给这些特征的。通过在非法/合法集合中特征出现的频率来估计预测能力。在分类期间,通过对每个类别的提取特征的权重求和来获得博客中的全部非法和合法证据,并将消息分类到任何类别中累积的总和较大。我们将该算法与流行的n-bayes算法(在[8]中进行了比较),发现在捕获博客垃圾邮件和减少误报方面,其性能至少不比naive-bayes算法恶化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号