首页> 中文期刊> 《西北工业大学学报》 >一种基于LDA主题模型的话题发现方法

一种基于LDA主题模型的话题发现方法

         

摘要

Topic Detection is one of the most important techniques in hot topic extraction and evolution tracking. Due to the high dimensionality problem which hinders processing efficiency and topics mal⁃distribution problem which makes topics unclear, it is difficult to detect topics from a large number of short texts in social network. To address these challenges, we proposed a new LDA ( Latent Dirichlet Allocation) model based topic detection meth⁃od called CBOW⁃LDA topic modeling method. It utilizes a CBOW( Continuous Bag⁃of⁃Word) method to cluster the words, which generate word vectors and clustering by vectors similarity. This method decreases the dimensions of LDA output, and makes topic more clearly. Through the analysis of topic perplexity in the real⁃world dataset, it is obvious that topics detected by our method has a lower perplexity, comparing with word frequency weighing based vectors. In a condition of same number of topic words, perplexity is reduced by about 3%.%话题发现是提取热点话题并掌握其演化规律的关键技术之一。针对社交网络中海量短文本信息具有高维性导致主题模型难以处理以及主题分布不均导致主题不明确的问题,提出一种基于LDA( latent dirichlet allocation )主题模型的 CBOW⁃LDA 主题建模方法,通过引入基于 CBOW (continuous bag⁃of⁃word)模型的词向量化方法对目标语料进行相似词的聚类,能够有效降低LDA模型输入文本的维度,并且使主题更明确。通过在真实数据集上计算分析,与现有基于词频权重的词向量化LDA方法相比,在相同主题词数情况下困惑度可降低约3%。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号