Web Crawling

Christopher Olston; Marc Najork

首页> 外文期刊>Foundations and trends in information retrieval >Web Crawling

【24h】

Web Crawling

机译：网络爬行

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical questions such as how often to revisit evolving content sources. This survey outlines the fundamental challenges and describes the state-of-the-art models and solutions. It also highlights avenues for future work.

机译：这是对网络爬网的科学和实践的调查。乍一看，Web爬网似乎只是广度优先搜索的一种应用，但事实是，存在许多挑战，从系统问题（例如管理非常大的数据结构）到理论问题（例如多久重新访问不断发展的内容）资料来源。该调查概述了基本挑战，并描述了最新的模型和解决方案。它还强调了未来工作的途径。

著录项

来源
《Foundations and trends in information retrieval》 |2010年第3期|p.0-74|共75页
作者
Christopher Olston; Marc Najork;
展开▼
作者单位

Yahoo! Research, 701 First Avenue, Sunnyvale, CA, 94089, USA;

rnMicrosoft Research, 1065 La Avenida, Mountain View, CA, 94043, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources [J] . Chih-Yuan Huang, Hao Chang ISPRS International Journal of Geo-Information . 2016,第8期

机译：GeoWeb爬网程序：用于发现地理空间Web资源的可扩展和可扩展的Web爬网框架
2. Optimal Web Page Download Scheduling Policies for Green Web Crawling [J] . Vassiliki Hatzi, B. Barla Cambazoglu, Iordanis Koutsopoulos IEEE Journal on Selected Areas in Communications . 2016,第5期

机译：绿色网页爬网的最佳网页下载调度策略
3. ArabicWeb16: A New Crawl for Today’s Arabic Web [J] . Reem Suwaileh, Mucahid Kutlu, Nihal Fathima, ACM SIGIR FORUM . 2016,第JULa17a21CD期

机译：ArabicWeb16：当今阿拉伯语网络的新爬行
4. Board Forum Crawling: A Web Crawling Method for Web Forum [C] . Yan Guo, Kui Li, Kai Zhang, IEEE/WIC/ACM International Conference on Intelligent Agent Technology . 2006

机译：董事会论坛爬行：网络论坛的Web爬网方法
5. Crawling the Web: Discovery and maintenance of large-scale Web data. [D] . Cho, Junghoo. 2002

机译：爬行Web：发现和维护大规模Web数据。
6. An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling [O] . R. Suganya Devi, D. Manjula, R. K. Siddharth 2015

机译：通过Web爬网中的超链接对大数据进行Web索引的一种有效方法
7. Board Forum Crawling: A Web Crawling Method for Web Forum [O] . Yan Guo, Kui Li, Kai Zhang, 2006

机译：Board Forum Crawling：Web论坛的Web爬行方法

Web Crawling

摘要

著录项

相似文献

相关主题

期刊订阅