【24h】

Microblog Hotspot Discovery Method Based on Improved K-Means Algorithm

机译:基于改进的K均值算法的微博热点发现方法

获取原文

摘要

The K-means algorithm is one of the most frequently used clustering algorithms in hot topic discovery. However, due to its shortcomings such as the number of clusters K value and easy to fall into local optimum, the clustering accuracy is not high, which directly affects the quality of hotspot discovery. This paper proposes an improved K-means algorithm to achieve fast clustering of microblog texts. Combining the high-frequency words and similarities of the microblog texts to perform single-pass clustering, the K number of clusters and the initial clustering center are obtained, which solves the problem that the K-means algorithm is too sensitive to the K value and the initial center. Through experimental comparison and analysis, it makes up for the shortcomings of K-means algorithm, and effectively improves the efficiency and accuracy of clustering. Applying it to the hot topic discovery model, the effectiveness of the hot spot discovery model based on the improved K-means algorithm is verified by experiments, and it has a high accuracy.
机译:K-means算法是热点话题发现中最常用的聚类算法之一。但是,由于聚类数K值多,容易陷入局部最优等缺点,聚类精度不高,直接影响热点发现的质量。本文提出了一种改进的K-means算法来实现微博客文本的快速聚类。结合高频词和微博文本的相似度进行单遍聚类,得到K个聚类和初始聚类中心,解决了K-means算法对K值过于敏感和最初的中心。通过实验比较分析,弥补了K-means算法的不足,有效提高了聚类的效率和准确性。将其应用于热点发现模型,通过实验验证了基于改进的K-means算法的热点发现模型的有效性,具有较高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号