首页> 中文期刊> 《计算机应用与软件》 >基于改进 TF*PDF 算法的网络新闻热点话题检测和跟踪

基于改进 TF*PDF 算法的网络新闻热点话题检测和跟踪

         

摘要

对网络新闻文本进行研究,发现网络新闻结构包含标题和正文,基于这种结构提出加权词频统计方法,该方法提高了可能成为热点话题的特征项的权重。通过Single-Pass聚类算法,对新闻报道进行聚类,得到话题列表。基于TF*PDF思想,引入话题权重,提出新的话题热度计算方法,同时使用“话题指数”描述话题的发展趋势。通过实验表明新的热度计算方法比原热度计算方法检测效果好,得到的话题发展趋势与实际吻合。%We study the text of network news , and find that the structure of news contains the title and the main text .Based on such structure we present a weighted word frequency statistical method .The method improves the weight of the feature item which may become the hot topic.Through Single-Pass clustering algorithm it clusters the news and reports and gets the topics list .Based on TF*PDF ideas, it introduces topic weight , and puts forward a new topic heat calculation method .At the same time it uses the ″topic index″to describe the development trend of the topic .Through the experiments it is showed that the new heat calculation method is better than the original heat calculation method in detection effect .The topic development trend derived is in agreement with the actual .

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号