首页>
外国专利>
HTML5 Apparatus and method for collecting and analysing HTML5 documents based a distributed parallel processing
HTML5 Apparatus and method for collecting and analysing HTML5 documents based a distributed parallel processing
展开▼
机译:基于分布式并行处理的用于收集和分析HTML5文档的HTML5设备和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
An apparatus and method for HTML5 document collection and analysis based on distributed parallel processing is provided. An HTML5 document collection and analysis apparatus based on the distributed parallel processing includes an injector module for storing root URL information in a first database, generating a list of URLs to be collected by receiving the root URL information from the first database, A generator module that stores the collection target URL list in a second database, a content provider that receives the collection target URL list from the second database, extracts content from a web page corresponding to the collection target URL list, A parser module for storing the parsing result information in the second database, a fetcher module for storing the parsing result information in the second database, a parsing module for receiving the parsing result information, A vulnerability analysis module for analyzing a vulnerability of the HTML code included in the content only when the document type of the web page is HTML5, Wherein the vulnerability analysis module divides the content into a plurality of sub-contents, extracts a keyword and an attribute for the sub-content, calculates a frequency of the keyword and the attribute, Analyze.
展开▼