首页> 中文期刊>内蒙古科技大学学报 >官方微博关键词提取与摘要技术研究

官方微博关键词提取与摘要技术研究

     

摘要

官方微博中混杂有较多无关其组织团体的信息,这为事件的提取与摘要工作带来了很大挑战.论文综合考虑官方微博数据的特性,提出了语料加权、标签识别的官方微博事件摘要模型,并结合官微相关语料提出了一种语料加权排序的关键词计算方法(Corpus Weighted Ranking,CWR),为博文相似度计算和事件摘要提供了基础支撑.实验测试表明,与IF-IDF和TextRank方法相比较,CWR 在关键词提取正确率P,召回率R和F值表现更好, 并在后期选取权重较大句子构成事件摘要时取得了很好的效果.%Official Microblog is the certified Microblog,whose account generally belongs to an organization.Its data are not only highly reliable with clear-cut labels,but also have a strong social effect.To summarize the organization temporal event information can greatly help improve the reading efficiency.However,the official Microblog usually contains more information unrelated to the organization, which brings great challenges for event extraction and summary.The corpus-weighted and label-recognized model of official Microblog event summarization was proposed according to the characteristics of the official Microblog data,and a corpus weighted ranking(CWR) keywords calculation method combined with the official relevant corpus was presented,providing a basic support for the official Microb-log similarity calculation and event summarization.Experimental tests show that,compared with IF-IDF and TextRank method,CWR have better performace in thematic term extraction precision rate P,the recall rate R and F value.And it achieved good results in the later selecting weighted sentences for generating event summarization.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号