...
首页> 外文期刊>Journal of Computers >Exploiting Location-aware Mechanism for Distributed Web Crawling over DHTs
【24h】

Exploiting Location-aware Mechanism for Distributed Web Crawling over DHTs

机译:利用DHTS爬行的分布式Web的位置感知机制

获取原文
           

摘要

—Inspired by the concept of internet computing, DHT-based distributed Web crawling model is proposed to solve the bottlenecks of the traditional Web crawling systems. Based on this system model, we propose optimizations to reduce the download time of the Web crawling tasks in order to increase the efficiency of the system. The improvement on the download time is achieved by shortening the crawler-crawlee network distance. By utilizing the mapping mechanism of Content Addressable Network (CAN) over Network Coordinate System (NC), the issue can be mapped onto a problem of minimizing the distances between peers and resources on the DHT overlay. This paper focuses on reducing such distances, seeking to provide an improved location-aware infrastructure for distributed Web crawling. A new DHT-based distributed Web crawling model is proposed first. Then, under this model, a new method based on CAN’s splitting schemes is proposed which shows a significant decrease in crawlercrawlee distance against existing schemes. In addition, the issue of load balancing is also solved by combining the new method with old ones.
机译:- 通过互联网计算的概念,基于DHT的分布式Web爬网模型,旨在解决传统网络爬行系统的瓶颈。基于该系统模型,我们提出了优化以减少Web爬行任务的下载时间,以提高系统的效率。通过缩短履带式网络距离来实现下载时间的改进。通过利用通过网络坐标系(NC)的内容可寻址网络(CAN)的映射机制,可以将问题映射到最小化DHT覆盖上的对等体和资源之间的距离的问题。本文重点介绍减少此类距离,寻求为分布式Web爬网提供改进的位置感知基础架构。首先提出了一种新的DHT的分布式Web爬网模型。然后,在该模型下,提出了一种基于罐分裂方案的新方法,其显示了对现有方案的爬行触控距离的显着降低。此外,还通过将新方法与旧方法组合来解决负载平衡问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号