【24h】

A review on techniques for optimizing web crawler results

机译:关于优化Web爬虫结果的技术的回顾

获取原文
获取原文并翻译 | 示例

摘要

Now a days Internet is widely used by users to satisfy their information needs. In the exponential growth of web, searching for useful information has become more difficult. Web crawler helps to extract the relevant and irrelevant links from the web. To optimizing this irrelevant links various algorithms and technique are used. Discovering information by using web crawler have certain issues; such as different URLs having the similar text which increase the time complexity of the search, crawler resources are wasted in fetching duplicate pages and larger storage is also required to store these web pages. These are some of the roadblocks in getting optimum results from the crawler. This paper provides a deep study of existing information retrieval techniques (I.R) which would help researchers to retrieve optimum result links and information.
机译:如今,用户已广泛使用Internet来满足他们的信息需求。在网络的指数增长中,搜索有用的信息变得更加困难。 Web搜寻器有助于从Web提取相关和不相关的链接。为了优化此无关的链接,使用了各种算法和技术。使用网络搜寻器发现信息存在某些问题;例如,具有相似文本的不同URL会增加搜索的时间复杂性,因此抓取程序资源会浪费在获取重复页面上,并且还需要更大的存储空间来存储这些网页。这些是从爬虫获得最佳结果的障碍。本文对现有信息检索技术(I.R)进行了深入研究,这将有助于研究人员检索最佳结果链接和信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号