首页> 外国专利> METHOD AND APPARATUS FOR COLLECTING AND ANALYZING TEXT DATA FOR ANALYZING ASSOCIATION RULES OF TEXT DATA

METHOD AND APPARATUS FOR COLLECTING AND ANALYZING TEXT DATA FOR ANALYZING ASSOCIATION RULES OF TEXT DATA

机译:收集和分析文本数据以分析文本数据的关联规则的方法和装置

摘要

The present invention relates to a method for collecting and analyzing text data, which comprises the following steps: receiving an input related to a keyword and period information; acquiring information on an article including the keyword from a web on the basis of the keyword and the period information; crawling the web page including the article on the basis of the information on the article; collecting text data of the article included in the crawled web page; pre-processing the collected text data on the basis of a preset dictionary-defined word; forming a base data set to be used for data analysis from the preprocessed text data of the article; and analyzing the text data including frequency analysis of words used in the article and correlation rule analysis between the words in the article, on the basis of the base data set.
机译:本发明涉及一种用于收集和分析文本数据的方法,该方法包括以下步骤:接收与关键字和时段信息有关的输入;以及根据关键词和期间信息,从网络获取包括关键词的商品信息;根据文章中的信息,检索包含该文章的网页;收集爬网网页中包含的文章的文本数据;根据预设的词典定义词对收集到的文本数据进行预处理;从文章的预处理文本数据中形成用于数据分析的基础数据集;在基础数据集的基础上,对文本数据进行分析,包括文章中所用词的频率分析和文章中词之间的相关规则分析。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号