Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

Garau G.; Renals S.

首页> 外文期刊>IEEE transactions on audio, speech and language processing >Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

【24h】

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

机译：结合频谱表示法进行大词汇量连续语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in combination with conventional features such as Mel frequency cepstral coefficients. Pitch-synchronous acoustic features are of particular interest when used with vocal tract length normalization (VTLN) which is known to be affected by the fundamental frequency. We have combined these spectral representations directly at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA) and at the system level using ROVER. We evaluated this approach on three LVCSR tasks: dictated newspaper text (WSJCAM0), conversational telephone speech (CTS), and multiparty meeting transcription. The CTS and meeting transcription experiments were both evaluated using standard NIST test sets and evaluation protocols. Our results indicate that combining conventional and pitch-synchronous acoustic feature sets using HLDA results in a consistent, significant decrease in word error rate across all three tasks. Combining at the system level using ROVER resulted in a further significant decrease in word error rate.

机译：在本文中，我们研究了大词汇量连续语音识别（LVCSR）中互补声学特征流的组合。我们已经探索了通过音高同步分析Straight获得的声学特征与常规特征（例如梅尔频率倒谱系数）的结合使用。与已知受基频影响的声道长度归一化（VTLN）一起使用时，音高同步声学特征特别受关注。我们已经使用异方差线性判别分析（HLDA）直接在声学特征级别上组合了这些频谱表示，并使用ROVER在系统级别上组合了这些频谱表示。我们在三个LVCSR任务上评估了该方法：口述报纸文本（WSJCAM0），对话电话语音（CTS）和多方会议抄录。 CTS和会议转录实验均使用标准NIST测试集和评估协议进行评估。我们的结果表明，使用HLDA组合常规和音高同步声学特征集会在所有三个任务中导致字错误率的持续显着降低。使用ROVER在系统级别进行组合会进一步显着降低单词错误率。

著录项

来源
《IEEE transactions on audio, speech and language processing》 |2008年第3期|p.508-518|共11页
作者
Garau G.; Renals S.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
Feature combination; ROVER; STRAIGHT; heteroscedastic linear discriminant analysis (HLDA); large-vocabulary continuous speech recognition (LVCSR); pitch-synchronous; vocal tract length normalization (VTLN);

机译：特征组合;漫游;直线;异方差线性判别分析（HLDA）;大词汇连续语音识别（LVCSR）;音高同步;声道长度归一化（VTLN）;

相似文献

外文文献
中文文献
专利

1. Reducing latency for language identification based on large-vocabulary continuous speech recognition [J] . Takuma Okamoto, Atsuo Hiroe, Hisashi Kawai Acoustical science and technology . 2017,第1期

机译：减少基于大词汇量连续语音识别的语言识别延迟
2. Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances [J] . Saon G., Chien J.-T. Signal Processing Magazine, IEEE . 2012,第6期

机译：大词汇量连续语音识别系统：最近的一些进展
3. Advances in Missing Feature Techniques for Robust Large-Vocabulary Continuous Speech Recognition [J] . Van Segbroeck M.Van Hamme H. Audio, Speech, and Language Processing, IEEE Transactions on . 2011,第1期

机译：健壮的大词汇量连续语音识别功能缺失技术的进展
4. Rapid Nonlinear Speaker Adaptation for Large-Vocabulary Continuous Speech Recognition [C] . Zoi Roupakia, Anton Ragni, Mark Gales Annual conference of the International Speech Communication Association . 2012

机译：快速非线性说话人自适应，用于大词汇量连续语音识别
5. Balancing model resolution and generalizability in large-vocabulary continuous speech recognition. [D] . Luo, Xiaoqiang. 1999

机译：在大词汇量连续语音识别中平衡模型的分辨率和可推广性。
6. Neural speech recognition: Continuous phoneme decoding using spatiotemporal representations of human cortical activity [O] . David A Moses, Nima Mesgarani, Matthew K Leonard, -1

机译：神经语音识别：使用人类皮层活动的时空表示进行连续音素解码
7. Large-vocabulary speaker-independent continuous speech recognition with semi-continuous hidden Markov models [O] . X. D. Huang, H. W. Hon, K. F. Lee 1989

机译：具有半连续隐马尔可夫模型的大词汇量与说话者无关的连续语音识别
8. Articulatory Trajectories for Large-Vocabulary Speech Recognition. [R] . A. Stolcke C. Richey H. Nam J. Yuan M. Liberman V. Mitra W. Wang 2013

机译：大词汇量语音识别的发音轨迹。

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅