It is usually different between Internet public opinion search engine and information search. The first is tobe gathering and extracting data within the page depth to the site and effective. It puts forward many new research content and methods for information area. Web information extraction in the templates and page analysis of two ways, based on natural language processing, Ontology extraction and wrapper induction method based on the analysis have been researched. The wrapper induction based manner and in the rule generation module used an expert model has been designed. It improves the accuracy of public opinion and the quality of search engines.%网络舆情搜索引擎与通常的网络信息搜索不同,其最终结果要深入到站点和页面内部采集与抽取有效数据,给情报界提出了许多新的研究内容和方法.在对网页信息抽取的模板和页面分析两种方式、基于自然语言处理、包装器归纳和Ontology抽取方法的分析基础上,使用基于包装器归纳方式并在规则生成模块中采用专家模式,设计一种基于样本学习的新闻抽取方法,通过人工分析网页源代码制定和修改抽取规则,然后根据抽取规则进行信息自动抽取,以提高舆情搜索引擎的精度和质量.
展开▼