【24h】

Robust Word-Network Topic Model for Short Texts

机译:短文本的健壮词网主题模型

获取原文

摘要

With the rapid development of online social media, the short text has become the prevalent format for information of Internet. Due to the severe data sparsity issue, accurately discovering knowledge behind these short texts remains a critical challenge. Since regular topic models, such as the Latent Dirichlet Allocation (LDA), can not perform well on short texts, many efforts have been put on building different types of probabilistic topic models for short texts. Inducing topics from dense word-word space instead of sparse document-word space becomes an emerging solution for avoiding data sparsity issue, and the representative one is the Word Network Topic Model (WNTM). However, the word-word space building procedure of WNTM often imports much irrelevant information. In light of this, we propose the Robust WNTM (RWNTM), which can filter out unrelated information during the sampling. The experimental results demonstrate that our method can learn more coherent topics and is more accurate in text classification, as compared with WNTM and other state-of-the-arts.
机译:随着在线社交媒体的迅速发展,短文本已经成为互联网信息的流行格式。由于严重的数据稀疏性问题,准确发现这些短文本背后的知识仍然是一个严峻的挑战。由于常规主题模型(例如潜在狄利克雷分配(LDA))在短文本上不能很好地执行,因此已经做出了很多努力来为短文本建立不同类型的概率主题模型。从密集的单词-单词空间而不是稀疏的文档-单词空间中引入主题成为避免数据稀疏性问题的新兴解决方案,并且代表性的一个就是单词网络主题模型(WNTM)。但是,WNTM的词-词空间构建过程通常会导入很多不相关的信息。有鉴于此,我们提出了稳健的WNTM(RWNTM),它可以在采样过程中过滤掉不相关的信息。实验结果表明,与WNTM和其他最新技术相比,我们的方法可以学习更多连贯的主题,并且在文本分类方面更准确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号