首页> 中文期刊> 《电子学报》 >基于A C-Tri e的在线社交网络文本流热点短语挖掘

基于A C-Tri e的在线社交网络文本流热点短语挖掘

         

摘要

The hot phrases in the social network text streams can reflect the hidden hot topics and sudden events.This paper proposes a hot phrase mining technology which can support various hot degree measures without word segmentation. We first construct an AC-Trie using the candidate phrases gathered from text streams.Based on such AC-Trie,we record the historical occurrence frequency of phrases on the Trie by scanning the following streams in single-pass.Furthermore,the AC-Trie needs to be reconstructed using the new samples in the text stream because of the evolution of hot phrases.Thus,we start the reconstruction dynamically according to estimating the occurrence frequency of the missed phrases.The experiments on the Sina micro-blog show that our approach is effective (precision of 89%)and efficient (overhead is 2%of naïve ap-proach).%在线社交网络文本流中的热点短语能反映文本流中隐含的热点话题和突发事件。本文提出了一种无需分词并能支持多种热度度量函数的热点短语挖掘技术。首先用文本流的某个典型时段采样得到候选短语,构建AC-Trie前缀树。然后,基于该前缀树,单遍扫描后续的文本流,将候选短语的历史出现频率记录在Trie相应节点上,从而支持多种基于历史频率的热度计算方法。此外,为及时发现新的热点短语并减少AC-Trie的构建次数,本文通过分析Trie树各节点上的遗漏短语频率,动态确定候选短语的更新时机。新浪微博数据集上的实验验证了本文方法的有效性(准确率达89%)和高效性(时空开销仅为基准算法的2%)。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号