Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

Ivan Vulic; Wim De Smet; Jie Tang; Marie-Francine Moens

首页> 外文期刊>Information Processing & Management >Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

【24h】

Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

机译：多语言环境下的概率主题建模：其方法和应用概述

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Probabilistic topic models are unsupervised generative models which model document content as a two-step generation process, that is, documents are observed as mixtures of latent concepts or topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested into transferring the probabilistic topic modeling concept from monolingual to multilingual settings. Novel topic models have been designed to work with parallel and comparable texts. We define multilingual probabilistic topic modeling (MuPTM) and present the first full overview of the current research, methodology, advantages and limitations in MuPTM. As a representative example, we choose a natural extension of the omnipresent LDA model to multilingual settings called bilingual LDA (BiLDA). We provide a thorough overview of this representative multilingual model from its high-level modeling assumptions down to its mathematical foundations. We demonstrate how to use the data representation by means of output sets of (ⅰ) per-topic word distributions and (ⅱ) per-document topic distributions coming from a multilingual probabilistic topic model in various real-life cross-lingual tasks involving different languages, without any external language pair dependent translation resource: (1) cross-lingual event-centered news clustering, (2) cross-lingual document classification, (3) cross-lingual semantic similarity, and (4) cross-lingual information retrieval. We also briefly review several other applications present in the relevant literature, and introduce and illustrate two related modeling concepts: topic smoothing and topic pruning. In summary, this article encompasses the current research in multilingual probabilistic topic modeling. By presenting a series of potential applications, we reveal the importance of the language-independent and language pair independent data representations by means of MuPTM. We provide clear directions for future research in the field by providing a systematic overview of how to link and transfer aspect knowledge across corpora written in different languages via the shared space of latent cross-lingual topics, that is, how to effectively employ learned per-topic word distributions and per-document topic distributions of any multilingual probabilistic topic model in various cross-lingual applications.

机译：概率主题模型是无监督的生成模型，其将文档内容建模为两步生成过程，也就是说，文档被视为潜在概念或主题的混合，而主题是词汇单词上的概率分布。最近，已经投入了大量的研究工作来将概率主题建模概念从单语言环境转换为多语言环境。已设计出新颖的主题模型，以处理平行且可比的文本。我们定义了多语言概率主题建模（MuPTM），并提供了有关MuPTM中当前研究，方法，优势和局限性的第一篇完整概述。作为一个代表示例，我们选择无所不在的LDA模型自然扩展到称为双语LDA（BiLDA）的多语言环境。我们从高级建模假设到其数学基础，全面介绍了这种代表性的多语言模型。我们演示了如何通过（ⅰ）每个主题单词分布和（ⅱ）每个文档主题分布的输出集来使用数据表示，这些输出来自多语言概率主题模型，涉及各种涉及语言的现实生活中的跨语言任务，而没有任何依赖于外部语言对的翻译资源：（1）跨语言事件中心新闻聚类；（2）跨语言文档分类；（3）跨语言语义相似性；以及（4）跨语言信息检索。我们还将简要回顾相关文献中介绍的其他几个应用程序，并介绍和说明两个相关的建模概念：主题平滑和主题修剪。总之，本文涵盖了多语言概率主题建模的最新研究。通过介绍一系列潜在的应用，我们通过MuPTM揭示了独立于语言和独立于语言对的数据表示的重要性。我们通过系统地概述如何通过潜在的跨语言主题的共享空间跨以不同语言编写的语料库链接和转移方面知识，从而为该领域的未来研究提供了明确的方向，即如何有效地利用所学到的知识。各种跨语言应用程序中任何多语言概率主题模型的主题词分布和每个文档的主题分布。

著录项

来源
《Information Processing & Management》 |2015年第1期|111-147|共37页
作者
Ivan Vulic; Wim De Smet; Jie Tang; Marie-Francine Moens;
展开▼
作者单位

Department of Computer Science, KU Leuven, Celestijnenlaan 200A, 3001 Heverlee, Belgium;

Department of Computer Science, KU Leuven, Celestijnenlaan 200A, 3001 Heverlee, Belgium;

Department of Computer Science and Technology, Tsinghua University, 100084 Beijing, China;

Department of Computer Science, KU Leuven, Celestijnenlaan 200A, 3001 Heverlee, Belgium;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multilingual probabilistic topic models; Cross-lingual text mining; Cross-lingual knowledge transfer; Cross-lingual information retrieval; Language-independent data representation; Non-parallel data;

机译：多语言概率主题模型;跨语言文本挖掘;跨语言知识转移;跨语言信息检索;语言无关的数据表示;非并行数据;

相似文献

外文文献
中文文献
专利

1. Methodology to evaluate the monetary benefit of Probabilistic Risk Assessment by modeling the net value of Risk-Informed Applications at nuclear power plants [J] . Pence Justin, Abolhelm Marzieh, Mohaghegh Zahra, Reliability Engineering & System Safety . 2018,第JULa期

机译：通过对核电厂的风险告知应用程序的净值建模来评估概率风险评估的货币收益的方法
2. An overview of C-XSC as a tool for interval arithmetic and its application in computing verified uncertain probabilistic models under Dempster-Shafer theory [J] . Zimmer M., Rebner G., Kr?mer W. Soft computing: A fusion of foundations, methodologies and applications . 2013,第8期

机译：C-XSC作为区间算术工具的概述及其在Dempster-Shafer理论下计算验证的不确定概率模型中的应用
3. Surrogate SDOF models for probabilistic performance assessment of multistory buildings: Methodology and application for steel special moment frames [J] . Vaseghiamiri Shaghayegh, Mahsuli Mojtaba, Ghannad Mohammad Ali, Engineering Structures . 2020,第Juna1期

机译：代理SDOF模型用于多层建筑物的概率性能评估：钢制特殊时刻框架的方法论和应用
4. Recent Advances and Applications of Probabilistic Topic Models [C] . Ian Wood International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering . 2014

机译：概率主题模型的最新进展与应用
5. Guided probabilistic topic models for agenda-setting and framing [D] . Nguyen, Viet-An 2015

机译：用于议程设置和框架的引导性概率主题模型
6. An overview of topic modeling and its current applications in bioinformatics [O] . Lin Liu, Lin Tang, Wen Dong, -1

机译：主题建模及其在生物信息学中的当前应用概述
7. Probabilistic topic modeling in multilingual settings: a short overview of its methodology with applications [O] . Vulic Ivan, De Smet Wim, Tang Jie, 2012

机译：多语言环境下的概率主题建模：其方法学及其应用的简短概述
8. Improved Approach for Flight Readiness Certification: Probabilistic Models forFlaw Propagation and Turbine Blade Failure. Volume 1: Methodology and Applications [R] . Moore, N. R., Ebbeler, D. H., Newlin, L. E., 1992

机译：改进的飞行准备认证方法：法律传播和涡轮叶片失效的概率模型。第1卷：方法和应用

Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

摘要

著录项

相似文献

相关主题

期刊订阅