基于H adoop的分布式主题网络爬虫研究

李应

首页> 中文期刊> 《软件导刊》 >基于H adoop的分布式主题网络爬虫研究

基于H adoop的分布式主题网络爬虫研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

主题网络爬虫采用集中式体系结构，具有对单台服务器性能要求高、可扩展性差等缺点。提出了一种基于Hadoop的分布式主题网络爬虫架构，通过将主题网络爬虫部署在分布式集群中的不同机器，运用MapReduce编程模型对数据进行抓取分析，使不同机器共同完成对指定任务的抓取工作。实验证明，采用分布式架构，通过动态调节分布式集群中的节点个数，能够明显改善主题网络爬虫的抓取效果。%Topic Web crawler uses a centralized architecture for a single server have high performance requirements ,scal‐ability poor shortcomings ,this paper presents a distributed topic crawler Hadoop -based architecture .Topic by different machines in a distributed Web crawler deployment cluster ,using the MapReduce programming model for data analysis crawl ,crawl all the different machines together to complete work on a given task .Experiments show that the use of a dis‐tributed architecture ,distributed by dynamically adjusting the number of nodes in the cluster ,can significantly improve the topic craw ler to craw l effect .

著录项

来源
《软件导刊》 |2016年第3期|24-26|共3页
作者
李应;
展开▼
作者单位

西安工程大学计算机科学学院;

陕西西安710048;

展开▼
原文格式 PDF
正文语种 chi
中图分类算法理论;
关键词
Hadoop; MapReduce; 分布式架构; 主题网络爬虫;

相似文献

中文文献
外文文献
专利

1. 基于Python的分布式多主题网络爬虫的研究与设计 [J] . 张胜敏 ,王爱菊 . 开封大学学报 . 2021,第001期
2. 基于校园分布式主题网络爬虫技术基础框架实现研究 [J] . 袁小玲 . 电子世界 . 2020,第020期
3. 基于Hadoop的分布式主题网络爬虫研究 [J] . 李应 . 软件导刊 . 2016,第003期
4. 基于Hadoop的分布式主题网络爬虫的设计与实现 [J] . 施磊磊 ,施化吉 ,宋玉平 . 信息技术 . 2015,第007期
5. 分布式主题网络爬虫的设计与研究 [J] . 黄宇 . 黑龙江科技信息 . 2020,第015期
6. 基于分布式网络爬虫的Web空间数据获取方法研究 [C] . 曾李阳 ,齐华 ,任春雷 . 2016中国地理信息科学理论与方法学术年会 . -1
7. 分布式主题网络爬虫研究与设计 [A] . 单文远 . 2020

基于H adoop的分布式主题网络爬虫研究

摘要

著录项

相似文献

相关主题

期刊订阅