Topic modeling of Chinese language beyond a bag-of-words

Zengchang Qin; Yonghui Cong; Tao Wan

首页> 外文期刊>Computer speech and language >Topic modeling of Chinese language beyond a bag-of-words

【24h】

Topic modeling of Chinese language beyond a bag-of-words

机译：一字不漏的中文主题建模

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The topic model is one of best known hierarchical Bayesian models for language modeling and document analysis. It has achieved a great success in text classification, in which a text is represented as a big of its words, disregarding grammar and even word order, that is referred to as the bag-of-words assumption. In this paper, we investigate topic modeling of the Chinese language, which has different morphology from alphabetical western languages like English. The Chinese characters, but not the Chinese words, are the basic structural units in Chinese. In previous empirical studies, it shows that the character-based topic model performs better than the word-based topic model. In this research, we propose the character-word topic model (CWTM) to consider the character-word relation in topic modeling. Two types of experiments are designed to test the performance of the new proposed model: topic extraction and text classification. By empirical studies, we demonstrate the superiority of the new proposed model comparing to both word and character based topic models.

机译：主题模型是用于语言建模和文档分析的最著名的分层贝叶斯模型之一。它在文本分类中取得了巨大的成功，其中文本被视为大部分单词，而忽略了语法甚至单词顺序，这被称为“词袋假设”。在本文中，我们研究了汉语的主题建模，该主题建模与英语等西方字母语言具有不同的形态。中文是汉字的基本结构单位，但不是汉字。在以前的实证研究中，它表明基于字符的主题模型比基于单词的主题模型表现更好。在这项研究中，我们提出了字符-单词主题模型（CWTM），以考虑主题建模中的字符-单词关系。设计了两种类型的实验来测试新提出的模型的性能：主题提取和文本分类。通过实证研究，我们证明了新提出的模型与基于单词和字符的主题模型相比具有优越性。

著录项

来源
《Computer speech and language》 |2016年第11期|60-78|共19页
作者
Zengchang Qin; Yonghui Cong; Tao Wan;
展开▼
作者单位

Intelligent Computing and Machine Learning Lab, School of Automation Science and Electrical Engineering, Beihang University, Beijing, China;

Intelligent Computing and Machine Learning Lab, School of Automation Science and Electrical Engineering, Beihang University, Beijing, China;

School of Biological Science and Medical Engineering, Beihang University, Beijing, China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Topic models; Chinese language modeling; Text classification; Language model; Character-word topic model; Latent Dirichlet allocation;

机译：主题模型;中文建模;文字分类;语言模型;字词主题模型;潜在狄利克雷分配;

相似文献

外文文献
中文文献
专利

1. A Comparative Study of Bag-of-Words and Bag-of-Topics Models of EO Image Patches [J] . Bahmanyar Reza, Cui Shiyong, Datcu Mihai Geoscience and Remote Sensing Letters, IEEE . 2015,第6期

机译：EO图像补丁的词袋和主题袋模型的比较研究
2. A Two-Level Recurrent Neural Network Language Model Based on the Continuous Bag-of-Words Model for Sentence Classification [J] . Lee Yo Han, Kim Dong W., Lim Myo Taeg International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2019,第1期

机译：一种基于句子分类连续袋式模型的两级反复性神经网络语言模型
3. Tibetan-Chinese Cross Language Text Similarity Calculation Based onLDA Topic Model [J] . Sun Yuan, Zhao Qian The Open Cybernetics & Systemics Journal . 2017,第1期

机译：基于LDA主题模型的藏汉跨语言文本相似度计算
4. Bag-of-Words and Topic Modeling-Based Sport Video Analysis [C] . Sergio Rodriguez-Perez, Raul Montoliu Iberian conference on pattern recognition and image analysis . 2013

机译：基于词袋和主题建模的运动视频分析
5. The effects of topic familiarity and language difficulty on situation-model construction by readers of Chinese as a foreign language. [D] . Chang-Chow, Cecilia. 2004

机译：主题熟悉度和语言难度对汉语作为外语读者的情境模型建构的影响。
6. Plant Phenotyping using Probabilistic Topic Models: Uncovering the Hyperspectral Language of Plants [O] . Mirwaes Wahabzada, Anne-Katrin Mahlein, Christian Bauckhage, -1

机译：使用概率主题模型进行植物表型分析：发现植物的高光谱语言
7. From Bag-of-Words to Pre-trained Neural Language Models: Improving Automatic Classification of App Reviews for Requirements Engineering [O] . Adailton Araujo, Marcos Golo, Breno Viana, 2020

机译：从单词袋到预先培训的神经语言模型：改进应用程序审查的自动分类对需求工程

Topic modeling of Chinese language beyond a bag-of-words

摘要

著录项

相似文献

相关主题

期刊订阅