...
首页> 外文期刊>Progress in Artificial Intelligence >GDTM: Graph-based Dynamic Topic Models
【24h】

GDTM: Graph-based Dynamic Topic Models

机译:GDTM:基于图形的动态主题模型

获取原文
获取原文并翻译 | 示例
           

摘要

Dynamic Topic Modeling (DTM) is the ultimate solution for extracting topics from short texts generated in Online Social Networks (OSNs) like Twitter. It requires to be scalable and to be able to account for sparsity and dynamicity of short texts. Current solutions combine probabilistic mixture models like Dirichlet Multinomial or Pitman-Yor Process with approximate inference approaches like Gibbs Sampling and Stochastic Variational Inference to, respectively, account for dynamicity and scalability of DTM. However, these methods basically rely on weak probabilistic language models, which do not account for sparsity in short texts. In addition, their inference is based on iterative optimizations, which have scalability issues when it comes to DTM. We present GDTM, a single-pass graph-based DTM algorithm, to solve the problem. GDTM combines a context-rich and incremental feature representation method with graph partitioning to address scalability and dynamicity and uses a rich language model to account for sparsity. We run multiple experiments over a large-scale Twitter dataset to analyze the accuracy and scalability of GDTM and compare the results with four state-of-the-art models. In result, GDTM outperforms the best model by 11% on accuracy and performs by an order of magnitude faster while creating four times better topic quality over standard evaluation metrics.
机译:动态主题建模(DTM)是从Twitter等在线社交网络(OSN)生成的短文本中提取主题的最终解决方案。它需要具有可扩展性,并且能够考虑短文本的稀疏性和动态性。当前的解决方案将Dirichlet多项式或Pitman-Yor过程等概率混合模型与Gibbs采样和随机变分推理等近似推理方法相结合,分别解释了DTM的动态性和可扩展性。然而,这些方法基本上依赖于弱概率语言模型,这并不能解释短文本中的稀疏性。此外,他们的推断基于迭代优化,这在DTM方面存在可伸缩性问题。我们提出了一种基于单程图的DTM算法GDTM来解决这个问题。GDTM将丰富的上下文和增量特征表示方法与图形分区相结合,以解决可伸缩性和动态性问题,并使用丰富的语言模型来解释稀疏性。我们在一个大型Twitter数据集上进行了多次实验,以分析GDTM的准确性和可扩展性,并将结果与四种最先进的模型进行比较。结果,GDTM的准确率比最佳模型高出11%,执行速度快了一个数量级,同时主题质量比标准评估指标高出四倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号