首页> 外文会议>International Conference on Cyber and IT Service Management >Crawling and cluster hidden web using crawler framework and fuzzy-KNN
【24h】

Crawling and cluster hidden web using crawler framework and fuzzy-KNN

机译:使用搜寻器框架和模糊KNN对隐藏的Web进行搜寻和群集

获取原文

摘要

Today almost everyone is using internet for daily activities. Whether it's for social, academic, work or business. But only a few of us are aware that internet generally we access only a small part of the overall of internet access. The Internet or the world wide web is divided into several levels, such as web surfaces, deep web or dark web. Accessing internet into deep or dark web is a dangerous thing. This research will be conducted with research on web content and deep content. For a faster and safer search, in this research will be use crawler framework. From the search process will be obtained various kinds of data to be stored into the database. The database classification process will be implemented to know the level of the website. The classification process is done by using the fuzzy-KNN method. The fuzzy-KNN method classifies the results of the crawling framework that contained in the database. Crawling framework will generate data in the form of url address, page info and other. Crawling data will be compared with predefined sample data. The classification result of fuzzy-KNN will result in the data of the web level based on the value of the word specified in the sample data. From the research conducted on several data tests that found there are as much as 20% of the web surface, 7.5% web bergie, 20% deep web, 22.5% charter and 30% dark web. Research is only done on some test data, it is necessary to add some data in order to get better result. Better crawler frameworks can speed up crawling results, especially at certain web levels because not all crawler frameworks can work at a particular web level, the tor browser's can be used but the crawler framework sometimes can not work.
机译:今天,几乎每个人都在使用Internet进行日常活动。无论是社交,学术,工作还是商业用途。但是,只有少数人知道互联网通常只访问整个互联网访问的一小部分。互联网或万维网分为几个级别,例如,Web表面,深层Web或深色Web。将互联网接入深层或深色网络是一件危险的事情。这项研究将与对Web内容和深层内容的研究一起进行。为了更快,更安全的搜索,本研究中将使用搜寻器框架。通过搜索过程,将获得各种数据以存储到数据库中。数据库分类过程将被实施以了解网站的级别。分类过程是使用模糊KNN方法完成的。 Fuzzy-KNN方法对数据库中包含的爬网框架的结果进行分类。检索框架将以url地址,页面信息等形式生成数据。抓取的数据将与预定义的样本数据进行比较。 Fuzzy-KNN的分类结果将基于样本数据中指定的单词的值生成Web级别的数据。通过对多个数据测试的研究发现,多达20%的幅材表面,7.5%的幅材柏格纸,20%的深幅材,22.5%的宪章和30%的深色幅材。研究仅针对一些测试数据进行,有必要添加一些数据以获得更好的结果。更好的搜寻器框架可以加快搜寻结果的速度,特别是在某些Web级别上,因为并非所有搜寻器框架都可以在特定Web级别上工作,可以使用Tor浏览器,但是该搜寻器框架有时无法工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号