首页> 外文学位 >A comparison of statistical spam detection techniques.
【24h】

A comparison of statistical spam detection techniques.

机译:统计垃圾邮件检测技术的比较。

获取原文
获取原文并翻译 | 示例

摘要

Spam (unsolicited and undesirable email) has become a significant problem for email users. This study investigated the current state-of-the-art in statistical spam filtering. Established methods, inspired by the work of Paul Graham, were examined, and new techniques were introduced and tested. Tests were conducted using two private corpora of email messages and one publicly available corpus.; A base configuration of a spam filter program, similar in technique to a popular production spam filter, was implemented and tested. This configuration achieved high accuracy while maintaining a low false positive rate. One main objective of this paper was to develop a new weighted token probability function. The data contained in header fields are important, and it was believed weighting header data higher than data in the body of the message could improve accuracy. This new weighted token probability function strengthens or weakens header and phrase tokens. Weighting headers applies the weight to any token from a header field, while all body tokens are given unit weight. (Abstract shortened by UMI.)
机译:垃圾邮件(未经请求和不受欢迎的电子邮件)已成为电子邮件用户的重要问题。这项研究调查了统计垃圾邮件过滤的最新技术。在保罗·格雷厄姆(Paul Graham)的启发下,研究了已建立的方法,并介绍和测试了新技术。测试使用两个私人电子邮件语料库和一个公共可用语料库进行。已实施并测试了垃圾邮件过滤器程序的基本配置,该配置与技术类似,是一种流行的生产垃圾邮件过滤器。该配置在保持低误报率的同时实现了高精度。本文的主要目的是开发一种新的加权令牌概率函数。标头字段中包含的数据很重要,并且认为将标头数据的权重设置为高于邮件正文中的数据的权重可以提高准确性。此新的加权令牌概率函数可增强或减弱标头和短语令牌。权重标头将权重应用于标头字段中的任何令牌,而所有正文令牌均被赋予单位权重。 (摘要由UMI缩短。)

著录项

  • 作者

    Brown, Kevin Alan.;

  • 作者单位

    Oklahoma State University.;

  • 授予单位 Oklahoma State University.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2006
  • 页码 107 p.
  • 总页数 107
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号