首页> 外国专利> HTML5 Apparatus and method for collecting and analysing HTML5 documents based a distributed parallel processing

HTML5 Apparatus and method for collecting and analysing HTML5 documents based a distributed parallel processing

机译:基于分布式并行处理的用于收集和分析HTML5文档的HTML5设备和方法

摘要

An apparatus and method for HTML5 document collection and analysis based on distributed parallel processing is provided. An HTML5 document collection and analysis apparatus based on the distributed parallel processing includes an injector module for storing root URL information in a first database, generating a list of URLs to be collected by receiving the root URL information from the first database, A generator module that stores the collection target URL list in a second database, a content provider that receives the collection target URL list from the second database, extracts content from a web page corresponding to the collection target URL list, A parser module for storing the parsing result information in the second database, a fetcher module for storing the parsing result information in the second database, a parsing module for receiving the parsing result information, A vulnerability analysis module for analyzing a vulnerability of the HTML code included in the content only when the document type of the web page is HTML5, Wherein the vulnerability analysis module divides the content into a plurality of sub-contents, extracts a keyword and an attribute for the sub-content, calculates a frequency of the keyword and the attribute, Analyze.
机译:提供了一种基于分布式并行处理的HTML5文档收集和分析的装置和方法。基于分布式并行处理的HTML5文档收集和分析装置包括:注入器模块,用于将根URL信息存储在第一数据库中,通过从第一数据库接收根URL信息来生成要收集的URL列表;将收集目标URL列表存储在第二数据库中,内容提供者从第二数据库接收收集目标URL列表,从与收集目标URL列表相对应的网页中提取内容,解析器模块,用于将解析结果信息存储在第二数据库,用于将解析结果信息存储在第二数据库中的提取器模块,用于接收解析结果信息的解析模块,用于仅当文档的类型为时才分析内容中包括的HTML代码的漏洞的漏洞分析模块。网页是HTML5,其中漏洞分析模块将内容分为多个子内容的大小,提取子内容的关键字和属性,计算关键字和属性的频率Analyze。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号