Crawling and cluster hidden web using crawler framework and fuzzy-KNN

机译：使用搜寻器框架和模糊KNN对隐藏的Web进行搜寻和群集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Today almost everyone is using internet for daily activities. Whether it's for social, academic, work or business. But only a few of us are aware that internet generally we access only a small part of the overall of internet access. The Internet or the world wide web is divided into several levels, such as web surfaces, deep web or dark web. Accessing internet into deep or dark web is a dangerous thing. This research will be conducted with research on web content and deep content. For a faster and safer search, in this research will be use crawler framework. From the search process will be obtained various kinds of data to be stored into the database. The database classification process will be implemented to know the level of the website. The classification process is done by using the fuzzy-KNN method. The fuzzy-KNN method classifies the results of the crawling framework that contained in the database. Crawling framework will generate data in the form of url address, page info and other. Crawling data will be compared with predefined sample data. The classification result of fuzzy-KNN will result in the data of the web level based on the value of the word specified in the sample data. From the research conducted on several data tests that found there are as much as 20% of the web surface, 7.5% web bergie, 20% deep web, 22.5% charter and 30% dark web. Research is only done on some test data, it is necessary to add some data in order to get better result. Better crawler frameworks can speed up crawling results, especially at certain web levels because not all crawler frameworks can work at a particular web level, the tor browser's can be used but the crawler framework sometimes can not work.

机译：今天，几乎每个人都在使用Internet进行日常活动。无论是社交，学术，工作还是商业用途。但是，只有少数人知道互联网通常只访问整个互联网访问的一小部分。互联网或万维网分为几个级别，例如，Web表面，深层Web或深色Web。将互联网接入深层或深色网络是一件危险的事情。这项研究将与对Web内容和深层内容的研究一起进行。为了更快，更安全的搜索，本研究中将使用搜寻器框架。通过搜索过程，将获得各种数据以存储到数据库中。数据库分类过程将被实施以了解网站的级别。分类过程是使用模糊KNN方法完成的。 Fuzzy-KNN方法对数据库中包含的爬网框架的结果进行分类。检索框架将以url地址，页面信息等形式生成数据。抓取的数据将与预定义的样本数据进行比较。 Fuzzy-KNN的分类结果将基于样本数据中指定的单词的值生成Web级别的数据。通过对多个数据测试的研究发现，多达20％的幅材表面，7.5％的幅材柏格纸，20％的深幅材，22.5％的宪章和30％的深色幅材。研究仅针对一些测试数据进行，有必要添加一些数据以获得更好的结果。更好的搜寻器框架可以加快搜寻结果的速度，特别是在某些Web级别上，因为并非所有搜寻器框架都可以在特定Web级别上工作，可以使用Tor浏览器，但是该搜寻器框架有时无法工作。

著录项

来源
《International Conference on Cyber and IT Service Management》|2017年|1-7|共7页
会议地点
作者
I Gede Surya Rahayuda; Ni Putu Linda Santiari;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Crawlers; Browsers; Search engines; Databases; Internet; Web sites; Weapons;

机译：爬网程序;浏览器;搜索引擎;数据库;互联网;网站;武器;

相似文献

外文文献
中文文献
专利

1. GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources [J] . Chih-Yuan Huang, Hao Chang ISPRS International Journal of Geo-Information . 2016,第8期

机译：GeoWeb爬网程序：用于发现地理空间Web资源的可扩展和可扩展的Web爬网框架
2. A Framework for Incremental Hidden Web Crawler [J] . Rosy Madaan, Ashutosh Dixit, A.K. Sharma, International Journal on Computer Science and Engineering . 2010,第3期

机译：增量隐藏Web爬网程序的框架
3. Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler [J] . P. Jaganathan, T. Karthikeyan Journal of computer sciences . 2015,第1期

机译：高效的架构，可使用增量并行Web爬网程序进行可扩展的集中爬网
4. Crawling and cluster hidden web using crawler framework and fuzzy-KNN [C] . I Gede Surya Rahayuda, Ni Putu Linda Santiari International Conference on Cyber and IT Service Management . 2017

机译：使用履带框架和模糊knn爬行和集群隐藏的网页
5. Crawling and searching the hidden Web. [D] . Ntoulas, Alexandros. 2006

机译：搜寻和搜索隐藏的Web。
6. A user-oriented web crawler for selectively acquiring online content in e-health research [O] . Songhua Xu, Hong-Jun Yoon, Georgia Tourassi -1

机译：面向用户的网络爬虫用于在电子卫生研究中选择性地获取在线内容
7. GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources [O] . Chih-Yuan Huang, Hao Chang 2016

机译：GeoWeb Crawler：用于发现地理空间Web资源的可扩展且可扩展的Web爬网框架

Crawling and cluster hidden web using crawler framework and fuzzy-KNN

摘要

著录项

相似文献

相关主题

期刊订阅