Near-Duplicate Mail Detection Based on URL Information for Spam Filtering

机译：基于URL信息的几乎重复邮件检测以进行垃圾邮件过滤

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to fast changing of spam techniques to evade being detected, we argue that multiple spam detection strategies should be developed to effectively against spam. In literature, many proposed spam detection schemes used similar strategies based on supervised classification techniques such as Naieve Baysian, SVM, and K-NN. But only few works were on the strategy using detection of duplicate copies. In this paper, we propose a new duplicate-mail detection scheme based on similarity of mail context between incoming mails, especially the context of URL information. We discuss different design strategies to against possible spam tricks to avoid being detected. Also, We compared our approaches with four different approaches available in literature: Octet-based histogram method, I-Mach, Winnowing, and identical matching. With over thousands of real mails we collected as testing data, our experiment results show that the proposed strategy outperforms the others. Without considering compulsory miss, over 97% of near duplicate mails can be detected correctly.

机译：由于垃圾邮件技术快速变化以逃避被检测到，我们认为应该开发多种垃圾邮件检测策略来有效地抵制垃圾邮件。在文献中，许多提议的垃圾邮件检测方案都使用基于监督分类技术（例如Naieve Baysian，SVM和K-NN）的类似策略。但是，只有极少数的工作在使用重复副本检测的策略上。在本文中，我们基于传入邮件之间的邮件上下文的相似性，尤其是URL信息的上下文，提出了一种新的重复邮件检测方案。我们讨论了不同的设计策略，以防止可能的垃圾邮件招数以避免被发现。另外，我们将我们的方法与文献中可用的四种不同方法进行了比较：基于八位位组的直方图方法，I-Mach，Winnowing和相同的匹配。我们收集了成千上万的真实邮件作为测试数据，我们的实验结果表明，提出的策略优于其他策略。在不考虑强制错过的情况下，可以正确检测出近97％的重复邮件。

著录项

来源
《Information Networking: Advances in Data Communications and Wireless Networks; Lecture Notes in Computer Science; 3961》|2006年|842-851|共10页
会议地点 Sendai(JP)
作者
Chun-Chao Yeh; Chia-Hui Lin;
展开▼
作者单位

Department of Computer Science, National Taiwan Ocean University, Taiwan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类通信;
关键词

相似文献

外文文献
中文文献
专利

1. An Efficient Model Of Detection And Filtering Technique Over Malicious And Spam E-Mails [J] . V S Kumar, Ravi kumar International Journal of Engineering Trends and Technology . 2013,第1期

机译：恶意和垃圾邮件检测和过滤技术的有效模型
2. Minimizing the Time of Spam Mail ?Detection by Relocating ?Filtering System to the Sender ?Mail Server [J] . Alireza Nemaney Pour, Raheleh Kholghi, Soheil Behnam Roudsari International Journal of Network Security & Its Applications . 2012,第2期

机译：通过将“过滤系统”重新定位到“发件人”邮件服务器，可最大程度地减少垃圾邮件检测的时间
3. An Email Modelling Approach for Neural Network Spam Filtering to Improve Score-based Anti-spam Systems [J] . Yahya Alamlahi, Abdulrahman Muthana International Journal of Computer Network and Information Security . 2018,第12期

机译：用于神经网络垃圾邮件过滤的电子邮件建模方法，以改进基于分数的反垃圾邮件系统
4. Near-Duplicate Mail Detection Based on URL Information for Spam Filtering [C] . Chun-Chao Yeh, Chia-Hui Lin Information Networking: Advances in Data Communications and Wireless Networks; Lecture Notes in Computer Science; 3961 . 2006

机译：基于URL信息的几乎重复邮件检测以进行垃圾邮件过滤
5. Spam e-mail filtering via global and user-level dynamic ontologies. [D] . Youn, Seongwook. 2009

机译：通过全局和用户级动态本体过滤垃圾邮件。
6. Machine learning for email spam filtering: review approaches and open research problems [O] . Emmanuel Gbenga Dada, Joseph Stephen Bassi, Haruna Chiroma, 2019

机译：用于电子邮件垃圾邮件过滤的机器学习：评论方法和公开研究问题
7. An Improved Spam Filter for Filtering Repeated Spam E-mails [O] . 2014

机译：用于过滤重复垃圾邮件的改进的垃圾邮件过滤器

Near-Duplicate Mail Detection Based on URL Information for Spam Filtering

摘要

著录项

相似文献

相关主题

期刊订阅