Online Multilingual Topic Models with Multi-Level Hyperpriors

机译：具有多级超优先级的在线多语言主题模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For topic models, such as LDA, that use a bag-of-words assumption, it becomes especially important to break the corpus into appropriately-sized "documents". Since the models are estimated solely from the term cooccurrences, extensive documents such as books or long journal articles lead to diffuse statistics, and short documents such as forum posts or product reviews can lead to sparsity. This paper describes practical inference procedures for hierarchical models that smooth topic estimates for smaller sections with hyperpriors over larger documents. Importantly for large collections, these online variational Bayes inference methods perform a single pass over a corpus and achieve better perplexity than "flat" topic models on monolingual and multilingual data. Furthermore, on the task of detecting document translation pairs in large multilingual collections, polylingual topic models (PLTM) with multi-level hyperpriors (mlhPLTM) achieve significantly better performance than existing online PLTM models while retaining computational efficiency.

机译：对于使用词袋假设的主题模型（例如LDA），将语料库分解为适当大小的“文档”变得尤为重要。由于仅根据术语“共现”来估算模型，因此大量的文档（例如书籍或长期刊文章）会导致统计数据的分散，而简短的文档（例如论坛帖子或产品评论）可能会导致稀疏性。本文介绍了用于层次模型的实用推理过程，该过程可通过较大的文档优先处理较小部分的主题估计。重要的是，对于大型馆藏，这些在线变式贝叶斯推理方法在语料库上执行一次遍历，比单语和多语数据的“扁平”主题模型具有更好的困惑。此外，在检测大型多语言集合中的文档翻译对的任务上，具有多级超优先级（mlhPLTM）的多语言主题模型（PLTM）在保持计算效率的同时，比现有的在线PLTM模型具有明显更好的性能。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2016年|454-459|共6页
会议地点
作者
Kriste Krstovski; David A. Smith; Michael J. Kurtz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An unsupervised multilingual approach for online social media topic identification [J] . Lo Siaw Ling, Chiong Raymond, Cornforth David Expert Systems with Application . 2017,第SEPa期

机译：在线社交媒体主题识别的无监督多语言方法
2. Integrating Topic, Sentiment, and Syntax for Modeling Online Reviews: A Topic Model Approach [J] . Tang Min, Jin Jian, Liu Ying, Journal of Computing and Information Science in Engineering . 2019,第1期

机译：集成主题，情感和语法以对在线评论进行建模：主题模型方法
3. Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora [J] . Reber Ueli Communication Methods and Measures . 2019,第2期

机译：克服语言障碍：评估机器翻译和主题建模的潜力，以了解多语言文本语料库的比较分析
4. Online Multilingual Topic Models with Multi-Level Hyperpriors [C] . Kriste Krstovski, David A. Smith, Michael J. Kurtz Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2016

机译：在线多语言主题模型，具有多级超高图
5. Online topic detection, tracking, and significance ranking using generative topic models [D] . AlSumait, Loulwah Saud 2009

机译：使用生成主题模型进行在线主题检测，跟踪和重要性排名
6. Using Topic Modeling to Develop Multi-level Descriptions of Naturalistic Driving Data from Drivers with and without Sleep Apnea [O] . Elease J. McLaurin, John D. Lee, Anthony D. McDonald, -1

机译：使用主题建模开发有和没有睡眠呼吸暂停的驾驶员对自然驾驶数据的多级描述
7. Relation Prediction in Multilingual Data Based on Multimodal Relational Topic Models [O] . Sakata Yosuke, Eguchi Koji 2017

机译：基于多峰关系主题模型的多语言数据关系预测

Online Multilingual Topic Models with Multi-Level Hyperpriors

摘要

著录项

相似文献

相关主题

期刊订阅