【24h】

Enhancing Short Text Topic Modeling with FastText Embeddings

机译:使用FastText嵌入增强短文本主题建模

获取原文

摘要

Over the past few years, we have experienced the rapid development of online social media, which produced a variety of short texts. It is important to understand the topic patterns of these short texts. Because of data sparsity, traditional topic models are not suitable for short text topic analysis. In this paper, we proposed a novel topic model, referred as FastText-based Sentence-LDA (FSL) model, which extends the Sentence-LDA topic model for short texts. We first utilize the FastText model to train a word embedding replacement model, which can alleviate the problem of lacking word co-occurrence information over short texts. Secondly, we propose a new latent feature topic model which integrates latent feature word embeddings into Sentence-LDA. Experimental results demonstrate that our new model has produced significant improvements in topic coherence by using information from external corpora.
机译:在过去的几年中,我们经历了在线社交媒体的飞速发展,产生了各种各样的短文本。重要的是要了解这些短文本的主题模式。由于数据稀疏,传统主题模型不适用于短文本主题分析。在本文中,我们提出了一种新颖的主题模型,称为基于FastText的Sentence-LDA(FSL)模型,它扩展了Sentence-LDA主题模型以用于短文本。我们首先利用FastText模型来训练单词嵌入替换模型,这可以缓解短文本上缺少单词共现信息的问题。其次,我们提出了一个新的潜在特征主题模型,该模型将潜在特征词嵌入集成到Sentence-LDA中。实验结果表明,通过使用来自外部语料库的信息,我们的新模型已极大地改善了主题的连贯性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号