...
首页> 外文期刊>Journal of Computational Methods in Sciences and Engineering >Hot topic identification from micro-blog based on improved Single-pass algorithm
【24h】

Hot topic identification from micro-blog based on improved Single-pass algorithm

机译:基于改进的单遍算法的微博热点话题识别

获取原文
获取原文并翻译 | 示例
           

摘要

Hot topic identification from micro-blog is very important for detection and control of the public opinion. When using Single-pass algorithm to cluster hot topics for Chinese micro-blog, Chinese word segmentation technology is a necessary preprocessing, but it will introduce inevitable segment errors. This kind of errors will make topic identification has low clustering precision. To solve this problem, this paper proposed an improved algorithm based on Single-pass which combines CS (Cosine Similarity) and LCS (Longest Common Subsequences) to calculate the similarity between Chinese words. Experiments on three different micro-blog data sets for hot topic identification are made, and the results show that the improved algorithm has both higher recall rate and precision rate than the original ones. The proposed algorithm is feasible and effective.
机译:微博中的热门话题识别对于检测和控制舆论非常重要。当使用单次通过算法对中文微博客的热门话题进行聚类时,中文分词技术是必不可少的预处理程序,但是它会不可避免地引入分段错误。这种错误会使主题识别的聚类精度降低。为了解决这个问题,本文提出了一种基于单遍的改进算法,该算法结合了余弦相似度和最长公共子序列,计算了汉字之间的相似度。对三种不同的微博数据进行热点识别实验,结果表明,改进算法比原始算法具有更高的查全率和查准率。该算法是可行和有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号