首页> 外文期刊>Journal of Residuals Science & Technology >Research and Implementation of Network Public Opinion Analysis System Based on Hadoop Platform
【24h】

Research and Implementation of Network Public Opinion Analysis System Based on Hadoop Platform

机译:基于Hadoop平台的网络舆情分析系统的研究与实现

获取原文
           

摘要

Through the network data analysis, the user can be informed of network terminal preference and the network service preference and the web browser preferences of user behavior characteristics, and then targeted to optimize network traffic, improve the user experience, to increase user activity and retention, bring more profit to the enterprise. It can be said that the Internet has become the most influential in the field of information dissemination, one of the most potential of the mainstream media, but also gradually become one of the mainstream media bearer of public opinion. In this case, the negative, negative network public opinion is very easy to spread among the majority of Internet users, the harmonious development of society have a great impact. It is necessary to use the modern Natural Language Processing and data mining technology, network data analysis and processing, it is of great significance for the relevant government departments to provide timely and accurate information of network public opinion. This system using Hadoop cloud platform advantages in data processing, divided into data acquisition, data preprocessing, data clustering, public opinion analysis, results show the five functional modules, to achieve various functions of the system analysis of network public opinion. Among them, the data acquisition module to data acquisition technology according to the characteristics of different data sources, the news website, use Nutch data collection, the micro-blog website, using the API interface to provide their own data acquisition; data preprocessing module using Fu dan NLP in Chinese word processing, and establish a stoplist, no actual filter the meaning of adverbs and prepositions, auxiliary words, the establishment of the text using the TF-IDF algorithm based on vector space; the data clustering module, according to the Chinese language and its own characteristics, such as synonymy, polysemy, clustering algorithm is proposed for Kmeans, Canopy and semantic similarity combination, improving the accuracy of clustering. Which provides the ability to find the network public opinion; public opinion analysis module in the topic sensitive topic detection and hot spot detection Content analysis and other public opinion indicators; in the results of the module, the use of Web presentation network public opinion information. This paper shows the whole process of system development, based on the characteristics of the network public opinion, study the implementation method of the system of network public opinion analysis, this paper describes the research background and significance of this subject, the research status at home and abroad, the research goal and the structure of the dissertation, introduces the advantages of Hadoop platform, in the data processing of data the text collection technology, vector space model, text clustering algorithm and the function of public opinion.
机译:通过网络数据分析,可以告知用户有关用户行为特征的网络终端首选项,网络服务首选项和Web浏览器首选项,然后有针对性地优化网络流量,改善用户体验,增加用户活动和保留度,给企业带来更多的利润。可以说,互联网已经成为信息传播领域最有影响力的主流媒体之一,也是逐渐成为公众舆论主流载体之一。在这种情况下,消极,消极的网络舆论很容易在广大互联网用户中传播,对社会的和谐发展产生很大的影响。必须利用现代自然语言处理和数据挖掘技术,对网络数据进行分析和处理,对于相关政府部门提供及时,准确的网络舆情信息具有重要意义。该系统利用Hadoop云平台在数据处理方面的优势,分为数据采集,数据预处理,数据聚类,舆情分析,结果展示五个功能模块,实现了系统功能的网络舆情分析。其中,数据采集模块以数据采集技术为依据,根据不同数据源,新闻网站的特点,使用Nutch数据采集,微博客网站,利用API接口提供自己的数据采集;数据预处理模块在中文单词处理中采用复旦NLP,并建立非索引字表,没有实际过滤副词和介词,辅助词的含义,使用基于向量空间的TF-IDF算法建立文本;数据聚类模块根据汉语言及其自身特点,如同义词,多义性,针对Kmeans,Canopy和语义相似度结合提出聚类算法,提高了聚类的准确性。从而提供查找网络舆论的能力;话题分析模块中的话题敏感主题检测和热点检测,内容分析等舆情指标;在该模块的结果中,使用Web演示文稿网络舆情信息。本文根据网络舆情的特点,展示了系统开发的全过程,研究了网络舆情分析系统的实现方法,阐述了该课题的研究背景和意义,国内外的研究现状。在国外,论文的研究目标和结构介绍了Hadoop平台的优势,在数据的数据处理中采用了文本收集技术,向量空间模型,文本聚类算法和舆论功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号