Distinguishing Real Web Crawlers from Fakes: Googlebot Example

机译：区分真实的Web爬虫与假冒：Googlebot示例

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web crawlers are programs or automated scripts that scan web pages methodically to create indexes. such as Google, Bing use crawlers in order to provide web surfers with relevant information. Today there are also many crawlers that impersonate well-known web crawlers. For example, it has been observed that Google's Googlebot crawler is impersonated to a high degree. This raises ethical and security concerns as they can potentially be used for malicious purposes. In this paper, we present an effective methodology to detect fake Googlebot crawlers by analyzing web access logs. We propose using Markov chain mohels to learn profiles of real and fake Googlebots based on their patterns of web resource access sequences. We have calculated log-odds ratios for a given set of crawler sessions and our results show that the higher the log-odds score, the higher the probability that a given sequence comes from the real Googlebot. Experimental results show, at a threshold log-odds score we can distinguish the real Googlebot from the fake.

机译：Web搜寻器是程序或自动脚本，可以有条不紊地扫描网页以创建索引。例如Google，Bing等Bing使用搜寻器，以便向网络冲浪者提供相关信息。如今，也有许多搜寻器模仿了著名的Web搜寻器。例如，已经观察到Google的Googlebot搜寻器被高度模仿。这引起了道德和安全问题，因为它们有可能被用于恶意目的。在本文中，我们提出了一种通过分析网络访问日志来检测伪造的Googlebot抓取工具的有效方法。我们建议使用Markov链式莫赫尔基于网络资源访问序列的模式来学习真实和虚假Googlebot的配置文件。我们已经计算出一组给定的搜寻器会话的对数比，结果表明，对数比值越高，给定序列来自真实Googlebot的概率就越高。实验结果表明，在对数奇数阈值下，我们可以区分真实的Googlebot和假冒的Googlebot。

著录项

来源
《4th International Moratuwa Engineering Research Conference》|2018年|13-18|共6页
会议地点 Moratuwa(LK)
作者
Nilani Algiryage;
展开▼
作者单位

Department of Industrial Management, University of Kelaniya, Sri Lanka;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Crawlers; Markov processes; Google; Web sites; Robots; Web servers; Search engines;

机译：抓取工具； Markov流程； Google；网站；机器人； Web服务器；搜索引擎；;

相似文献

外文文献
中文文献
专利

1. Analyzing and distinguishing fake and real news to mitigate the problem of disinformation [J] . Vereshchaka Alina, Cosimini Seth, Dong Wen Computational & Mathematical Organization Theory . 2020,第3期

机译：分析与区分虚假的真正新闻，减轻虚假信息问题
2. Distinguishing real from fake ivory products by elemental analyses: A Bayesian hybrid classification method [J] . Buddhachat Kittisak, Brown Janine L., Thitaram Chatchote, Forensic science international . 2017,第期

机译：通过元素分析区分真实的象牙产品：贝叶斯混合分类方法
3. Semiconductor parts: Distinguishing real from fake requires a trained eye [J] . Steve Martin ECN . 2013,第13期

机译：半导体零件：辨别真伪需要训练有素的眼睛
4. Distinguishing Real Web Crawlers from Fakes: Googlebot Example [C] . Nilani Algiryage International Moratuwa Engineering Research Conference . 2018

机译：从假货中区分真实的Web爬虫：GoogleBot示例
5. Constructing Web Crawlers for the World Art Dynamics Technology Platform [D] . Guo, Xueyuan. 2019

机译：为世界艺术动力学技术平台构建网络爬虫
6. A user-oriented web crawler for selectively acquiring online content in e-health research [O] . Songhua Xu, Hong-Jun Yoon, Georgia Tourassi -1

机译：面向用户的网络爬虫用于在电子卫生研究中选择性地获取在线内容
7. Applying Clickstream Data Mining to Real-Time Web Crawler Detection and Containment Using ClickTips Platform [O] . Anália Lourenço, O Belo 2013

机译：使用ClickTips平台将Clickstream数据挖掘应用于实时Web爬网程序检测和遏制

Distinguishing Real Web Crawlers from Fakes: Googlebot Example

摘要

著录项

相似文献

相关主题

期刊订阅