首页> 外文会议>Pacific-Asia Conference on Knowledge Discovery and Data Mining >TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement
【24h】

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

机译:TSSE-DMM:基于主题细分和语义增强的短文本主题建模

获取原文

摘要

Short texts have been prevalent in Web sites and the emerging social media for several years, which makes it a critical task to identify intelligible topics from online data sources. However, the existing topic models over short texts cannot analyze the internal components of the learned topics, which is significant for improving the coherence and inter-pretability of topics. In this paper, we propose a novel topic model for short texts, named TSSE-DMM, for improving the coherence and inter-pretability of topics by the topic subdivision and alleviating the problem of text sparsity by the semantic enhancement strategy. Firstly, we subdivide each topic into 4 detailed aspects, namely the location aspect, the people & organization aspect, the core word aspect, and the background word aspect, to obtain the different and interpretable components of topics. Then, we combine the Generalized Polya Urn model and the joint word embedding to solve the problem of data sparsity. The extensive experimental results carried on three real-world text collections in two languages show that our model achieves better topic representations than the baseline methods. Moreover, our method has been adopted by the public service hotline platform of Jiangsu province in China.
机译:在网站和新兴社交媒体中普遍存在的短篇文本几年来,这使得从在线数据来源识别可理解的主题是一项关键任务。然而,在短文本上的现有主题模型无法分析学习主题的内部组件,这对于提高主题的相干性和可预测性很大。在本文中,我们提出了一个新的主题模型,即名为TSSE-DMM的短文本,用于提高主题细分和减轻语义增强策略的文本稀疏问题的主题的相干性和可预测性。首先,我们将每个主题细分为4个详细方面,即位置方面,人民和组织方面,核心字方面和背景字方面,以获得主题的不同和可解释的组件。然后,我们将广义的Polya URN模型和联合词嵌入结合以解决数据稀疏性问题。以两种语言为三种真实的文本集合进行了广泛的实验结果表明,我们的模型比基线方法实现了更好的主题表示。此外,我们的方法已被江苏省公共服务热线平台在中国采用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号