Very Fast Computation of PLSA by Parallel Training and Quick Unigram Rescaling Calculation

机译：通过并行训练和快速的Unigram重标计算快速计算PLSA

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Probabilistic Latent Semantic Analysis (PLSA) is a powerful statistical language model that calculates probabilities of word occurrence that depend on a document. Probabilistic Latent Semantic Analysis (PLSA) is a good tool for adapting a language model into a specific domain using a constraint of global context. However, on applying PLSA to practical speech recognition application, we have a problem that calculation of PLSA is computationally expensive, for both training and applying to the decoding process. The training process of PLSA is computationally very expensive in term of both time and memory consumption. On using PLSA in a decoding process, we have to use the unigram rescaling technique to combine a PLSA model and an n-gram model. A problem of combining them is that the calculation of combined probability requires normalization, which makes calculation of probabilities slower. In this paper, we propose the methods that expedite training and application of PLSA. To make PLSA training faster, we exploit parallel computation into the training process. In addition, we propose an algorithm that calculates unigram rescaling very quickly without any approximation. Using these methods, it becomes realistic to calculate PLSA with tens thousands of topics using very large corpus, as well as to incorporate a unigram-rescaled language model into the decoder.

机译：概率潜在语义分析（PLSA）是一种功能强大的统计语言模型，可计算依赖于文档的单词出现概率。概率潜在语义分析（PLSA）是一种使用全局上下文约束将语言模型改编为特定域的好工具。然而，在将PLSA应用于实际语音识别应用中时，我们存在一个问题，即对于训练和将其应用于解码过程，PLSA的计算在计算上是昂贵的。就时间和内存消耗而言，PLSA的训练过程在计算上非常昂贵。在解码过程中使用PLSA时，我们必须使用unigram重缩放技术来组合PLSA模型和n-gram模型。组合它们的问题是组合概率的计算需要归一化，这使概率的计算变慢。在本文中，我们提出了加快PLSA训练和应用的方法。为了使PLSA培训更快，我们在培训过程中采用了并行计算。另外，我们提出了一种算法，该算法可以非常快速地计算出字母组合重标度，而无需进行任何近似计算。使用这些方法，使用非常大的语料库计算具有成千上万个主题的PLSA，以及将经过unigram缩放的语言模型合并到解码器中，将成为现实。

著录项

来源
《10th Western Pacific Acoustics Conference.》|2009年|p.1-7|共7页
会议地点 Beijing(CN);Beijing(CN)
作者
Masaharu Katoh; Tetsuo Kosaka; Akinori Ito; Shozo Makino;
展开▼
作者单位

Graduate School of Science and Engineering,Yamagata University Yonezawa 992-8510 Japan Graduate School of Engineering,Tohoku University Sendai 980-8510 Japan;

Graduate School of Science and Engineering,Yamagata University Yonezawa 992-8510 Japan;

Graduate School of Engineering,Tohoku University Sendai 980-8510 Japan;

Graduate School of Engineering,Tohoku University Sendai 980-8510 Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类声学;声学;
关键词

相似文献

外文文献
中文文献
专利

1. An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling [J] . Masaharu Kato, Tetsuo Kosaka, Akinori Ito, IAENG Internaitonal journal of computer science . 2009,第4期

机译：带有字母整形缩放的快速计算退避N元语法概率的算法
2. A fast calculation method of optical transfer function using GPU parallel computation [J] . Zhang Quan, Bao Hua, Rao Changhui, Optical review . 2015,第6期

机译：利用GPU并行计算的光学传递函数快速计算方法
3. A fast method for rescaling voxel S values for arbitrary voxel sizes in targeted radionuclide therapy from a single Monte Carlo calculation [J] . Medical Physics . 2013,第8期

机译：一种快速方法，用于从单个蒙特卡罗计算中靶向放射性核素治疗中的任意体素大小的voxel s值
4. Very Fast Computation of PLSA by Parallel Training and Quick Unigram Rescaling Calculation [C] . Masaharu Katoh, Tetsuo Kosaka, Akinori Ito, Western Pacific Acoustics Conference . 2009

机译：通过并行培训和快速unigram重新扫描计算非常快速计算PLSA
5. Fast, General Parallel Computation for Machine Learning [D] . Yancey, Robin Elizabeth. 2019

机译：快速，一般并行计算机器学习
6. Fast and Accurate Calculation of a Computationally Intensive Statistic for Mapping Disease Genes [O] . Sang-Cheol Seok, Michael Evans, Veronica J. Vieland -1

机译：快速准确地计算疾病基因定位的计算强度统计数据
7. Achieving Fast Computer-Generated Hologram Calculations via Parallelization [O] . Hyun Jun Choi, Dong Kwan Kim 2015

机译：通过并行化实现快速计算机生成的全息图计算
8. Fast, Parallelized Computational Approach Based on Sparse LU Factorization, for Predictions of Spatial and Time-Dependent Currents and Voltages in Full-Body Bio-Models [R] . Mishra, A. , Joshi, R. P. , Schoenbach, K. H. , 2006

机译：基于稀疏LU分解的快速并行计算方法，用于全身生物模型中空间和时间依赖电流和电压的预测

Very Fast Computation of PLSA by Parallel Training and Quick Unigram Rescaling Calculation

摘要

著录项

相似文献

相关主题

期刊订阅