Unsupervised training of acoustic models for large vocabulary continuous speech recognition

Wessel F.; Ney H.

首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Unsupervised training of acoustic models for large vocabulary continuous speech recognition

【24h】

Unsupervised training of acoustic models for large vocabulary continuous speech recognition

机译：用于大词汇量连续语音识别的声学模型的无监督训练

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

For large vocabulary continuous speech recognition systems, the amount of acoustic training data is of crucial importance. In the past, large amounts of speech were thus recorded from various sources and had to be transcribed manually. It is thus desirable to train a recognizer with as little manually transcribed acoustic data as possible. Since untranscribed speech is available in various forms nowadays, the unsupervised training of a speech recognizer on recognized transcriptions is studied in this paper. A low-cost recognizer trained with between one and six h of manually transcribed speech is used to recognize 72 h of untranscribed acoustic data. These transcriptions are then used in combination with a confidence measure to train an improved recognizer. The effect of the confidence measure which is used to detect possible recognition errors is studied systematically. Finally, the unsupervised training is applied iteratively. Starting with only one h of transcribed acoustic data, a recognition system is trained fully automatically. With this iterative training procedure, the word error rates are reduced from 71.3% to 38.3% on the Broadcast News'96 evaluation test set and from 65.6% to 29.3% on the Broadcast News'98 evaluation test set. In comparison with an optimized system trained with the manually generated transcriptions of the complete 72 h training corpus, the word error rates increase by 14.3% relative and 18.6% relative, respectively.

机译：对于大词汇量连续语音识别系统，声学训练数据的数量至关重要。过去，因此从各种来源记录了大量语音，因此必须手动转录。因此，期望以尽可能少的人工转录的声学数据来训练识别器。由于当今非转录语音有多种形式，因此本文研究了语音识别器在识别转录方面的无监督训练。经过培训的低成本识别器需要经过一到六小时的人工转录语音，才能识别72小时的未转录声学数据。这些转录然后与置信度度量结合使用以训练改进的识别器。系统研究了用于检测可能的识别错误的置信度度量的效果。最后，无监督的训练被迭代地应用。从仅转录一个小时的声学数据开始，便会完全自动地训练识别系统。通过这种迭代训练程序，广播新闻96评估测试集的单词错误率从71.3％降低到38.3％，广播新闻98评估测试集的单词错误率从65.6％降低到29.3％。与由完整的72小时训练语料库的人工生成的转录本训练的优化系统相比，单词错误率相对增加了14.3％，相对错误增加了18.6％。

著录项

来源
《IEEE Transactions on Speech and Audio Proceessing》 |2005年第1期|p.23-31|共9页
作者
Wessel F.; Ney H.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类电声技术和语音信号处理;
关键词
acoustic signal processing; error statistics; speech recognition; acoustic model; acoustic training data; large vocabulary continuous speech recognition; speech recognizer; unsupervised training; untranscribed speech; word error rate;

机译：声音信号处理;误差统计;语音识别;声学模型;语音训练数据;大词汇量连续语音识别;语音识别器;无监督训练;ran写语音;单词错误率;

相似文献

外文文献
中文文献
专利

1. Acoustic Models of the Elderly for Large-Vocabulary Continuous Speech Recognition [J] . Akira Baba, Shinichi Yoshizawa, Miichi Yamada, Electronics and Communications in Japan. Part 2, Electronics . 2004,第7期

机译：大词汇量连续语音识别的老年人声学模型
2. Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition [J] . Watanabe S., Sako A., Nakamura A. IEEE transactions on audio, speech and language processing . 2006,第3期

机译：基于变分贝叶斯估计和聚类的大词汇量连续语音识别自动确定声学模型拓扑
3. Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech [J] . Krerksak Likitsupin, Proadpran Punyabukkana, Chai Wutiwiwatchai, Engineering journal . 2016,第2期

机译：改进大词汇量连续语音基于片段的语音识别的声学方法
4. Unsupervised training of acoustic models for large vocabulary continuous speech recognition [C] . Frank Wessel, Hermann Ney IEEE Workshop on Automatic Speech Recognition and Understanding . 2001

机译：大型词汇连续语音识别的无监督培训声学模型
5. Modeling lexical tones for Mandarin large vocabulary continuous speech recognition. [D] . Lei, Xin. 2006

机译：为普通话大词汇量连续语音识别建模词汇声调。
6. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition [O] . Edvin Pakoci, Branislav Popović, Darko Pekar 2019

机译：在塞尔维亚大型词汇语音识别的语言建模中使用形态学数据
7. Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition [O] . Hung-an Chang, James R. Glass 2010

机译：用于大词汇量连续语音识别的分层声学模型的判别训练

Unsupervised training of acoustic models for large vocabulary continuous speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅