Robust Word-Network Topic Model for Short Texts

机译：短文本的健壮词网主题模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rapid development of online social media, the short text has become the prevalent format for information of Internet. Due to the severe data sparsity issue, accurately discovering knowledge behind these short texts remains a critical challenge. Since regular topic models, such as the Latent Dirichlet Allocation (LDA), can not perform well on short texts, many efforts have been put on building different types of probabilistic topic models for short texts. Inducing topics from dense word-word space instead of sparse document-word space becomes an emerging solution for avoiding data sparsity issue, and the representative one is the Word Network Topic Model (WNTM). However, the word-word space building procedure of WNTM often imports much irrelevant information. In light of this, we propose the Robust WNTM (RWNTM), which can filter out unrelated information during the sampling. The experimental results demonstrate that our method can learn more coherent topics and is more accurate in text classification, as compared with WNTM and other state-of-the-arts.

机译：随着在线社交媒体的迅速发展，短文本已经成为互联网信息的流行格式。由于严重的数据稀疏性问题，准确发现这些短文本背后的知识仍然是一个严峻的挑战。由于常规主题模型（例如潜在狄利克雷分配（LDA））在短文本上不能很好地执行，因此已经做出了很多努力来为短文本建立不同类型的概率主题模型。从密集的单词-单词空间而不是稀疏的文档-单词空间中引入主题成为避免数据稀疏性问题的新兴解决方案，并且代表性的一个就是单词网络主题模型（WNTM）。但是，WNTM的词-词空间构建过程通常会导入很多不相关的信息。有鉴于此，我们提出了稳健的WNTM（RWNTM），它可以在采样过程中过滤掉不相关的信息。实验结果表明，与WNTM和其他最新技术相比，我们的方法可以学习更多连贯的主题，并且在文本分类方面更准确。

著录项

来源
《IEEE International Conference on Tools with Artificial Intelligence》|2016年|852-856|共5页
会议地点
作者
Fei Wang; Rui Liu; Yuan Zuo; Hui Zhang; He Zhang; Junjie Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Robustness; Coherence; Twitter; Training; Buildings; Probabilistic logic;

机译：鲁棒性;连贯性; Twitter;培训;建筑物;概率逻辑;

相似文献

外文文献
中文文献
专利

1. A Robust User Sentiment Biterm Topic Mixture Model Based on User Aggregation Strategy to Avoid Data Sparsity for Short Text [J] . Nimala K., Jebakumar R. Journal of medical systems . 2019,第4期

机译：一种强大的用户情感比特妨据基于用户聚合策略的混合模型，以避免短文本的数据稀疏性
2. Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection [J] . Hu Xuegang, Wang Haiyan, Li Peipei Pattern recognition letters . 2018,第DECa1期

机译：使用短文本扩展和概念漂移检测的基于在线Biterm主题模型的短文本流分类
3. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
4. Robust Word-Network Topic Model for Short Texts [C] . Fei Wang, Rui Liu, Yuan Zuo, IEEE International Conference on Tools with Artificial Intelligence . 2016

机译：简短文本的强大单词网络主题模型
5. Topic Modeling and Spam Detection for Short Text Segments in Web Forums [D] . Sun, Yingcheng. 2020

机译：网上论坛中短文本段的主题建模和垃圾邮件检测
6. Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis [O] . Rania Albalawi, Tet Hin Yeap, Morad Benyoucef 2020

机译：使用短文本数据的主题建模方法：比较分析
7. Collaboratively Modeling and Embedding of Latent Topics for Short Texts [O] . Zheng Liu, Tingting Qin, Ke-Jia Chen, 2020

机译：短文本潜在主题的协作建模与嵌入

Robust Word-Network Topic Model for Short Texts

摘要

著录项

相似文献

相关主题

期刊订阅