首页> 外文会议>IEEE European Symposium on Security and Privacy Workshops >A tight scrape: methodological approaches to cybercrime research data collection in adversarial environments
【24h】

A tight scrape: methodological approaches to cybercrime research data collection in adversarial environments

机译:一网打尽:对抗环境中网络犯罪研究数据收集的方法论方法

获取原文

摘要

We outline in this article a study of ‘adversarial scraping’ for academic research, which involves the collection of data from websites that implement defences against traditional web scraping tools. Although this is primarily a research methods article, it also constitutes a valuable systematic accounting of the different defensive techniques used by the administrators of illicit online services. Some of these administrators intentionally implement functionality which attempts to prevent web scrapers from gathering data from their site, and some will unintentionally design their sites in ways that make data gathering harder. This is of particular importance for criminological research, where websites such as cryptomarkets and underground forums are publicly available (and hence there is an ethical case for data collection), but the illicit activity involved means that the administrators of these services limit scraping. We classify different anti-crawling techniques taken by websites and outline our developed countermeasures. Based on this, we evaluate which of these methods do and do not succeed at preventing data gathering from a website, as well as those which impact the scraper but do not necessarily prevent the data from being obtained. We find that there are some defences that, if used together, might thwart scraping. There are also a series of defences that are successful at slowing down scrapers, making historical scraping more difficult. On the other hand, we show that many defences are easy to work around and do not impact scraping.
机译:我们在本文中概述了学术研究的“对抗刮削”的研究,这涉及从涉及传统Web刮擦工具的界面的网站收集数据。虽然这主要是一项研究方法,但它还构成了非法在线服务管理员使用的不同防御技术的有价值的系统核算。其中一些管理员故意实施尝试阻止Web滚动从他们网站收集数据的功能,有些则会无意中以制造数据收集更难的方式设计其网站。这对犯罪学研究特别重要,其中包括Cryptomarkets和地下论坛等网站公开可用(因此有一个伦理案例进行数据收集),但涉及的非法活动意味着这些服务的管理员限制了刮擦。我们分类了网站采取的不同防爬行技术,并概述了我们发达的对策。基于此,我们评估以下哪种方法,并且在防止从网站收集的数据以及影响刮刀但不一定阻止数据的数据进行成功。我们发现有一些防御,如果一起使用,可能会挫败刮擦。还有一系列防御,在减速刮板时成功,使历史刮擦更加困难。另一方面,我们表明许多防御很容易解决并且不会影响刮擦。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号