首页> 美国卫生研究院文献>other >Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition

【2h】

Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition

机译：类别韵律模型的无监督适应用于韵律标记和语音识别

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic speech recognition (ASR) systems rely almost exclusively on short-term segment-level features (MFCCs), while ignoring higher level suprasegmental cues that are characteristic of human speech. However, recent experiments have shown that categorical representations of prosody, such as those based on the Tones and Break Indices (ToBI) annotation standard, can be used to enhance speech recognizers. However, categorical prosody models are severely limited in scope and coverage due to the lack of large corpora annotated with the relevant prosodic symbols (such as pitch accent, word prominence, and boundary tone labels). In this paper, we first present an architecture for augmenting a standard ASR with symbolic prosody. We then discuss two novel, un-supervised adaptation techniques for improving, respectively, the quality of the linguistic and acoustic components of our categorical prosody models. Finally, we implement the augmented ASR by enriching ASR lattices with the adapted categorical prosody models. Our experiments show that the proposed unsupervised adaptation techniques significantly improve the quality of the prosody models; the adapted prosodic language and acoustic models reduce binary pitch accent (presence versus absence) classification error rate by 13.8% and 4.3%, respectively (relative to the seed models) on the Boston University Radio News Corpus, while the prosody-enriched ASR exhibits a 3.1% relative reduction in word error rate (WER) over the baseline system.

机译：自动语音识别（ASR）系统几乎完全依赖于短期段级特征（MFCC），而忽略了人类语音所特有的高级段上提示。但是，最近的实验表明，韵律的分类表示（例如基于音调和中断索引（ToBI）注释标准的分类表示）可用于增强语音识别器。但是，由于缺少带有相关韵律符号（例如音高，单词突出和边界音调标签）的大型语料库，类别韵律模型的范围和覆盖范围受到严重限制。在本文中，我们首先提出一种使用符号韵律增强标准ASR的体系结构。然后，我们讨论两种新颖的，无监督的适应技术，分别用于改善分类韵律模型的语言和声学组件的质量。最后，我们通过使用自适应分类韵律模型丰富ASR格来实现增强ASR。我们的实验表明，提出的无监督自适应技术可以显着提高韵律模型的质量；改编后的韵律语言和声学模型在波士顿大学广播新闻语料库上分别降低了二进制音调重音（存在与不存在）的分类错误率（相对于种子模型）分别为13.8％和4.3％，而富含韵律的ASR则显示了相对于基线系统，单词错误率（WER）相对降低了3.1％。

著录项

期刊名称 other
作者
Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan;
展开▼
作者单位

展开▼
年(卷),期 -1(17),1
年度 -1
页码 138–149
总页数 28
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition [J] . Ananthakrishnan S., Narayanan S. IEEE transactions on audio, speech and language processing . 2009,第1期

机译：类别韵律模型的无监督适应，用于韵律标记和语音识别
2. Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs [J] . Chen-Yu YANG, Zhen-Hua LING, Li-Rong DAI IEICE transactions on information and systems . 2014,第6期

机译：使用上下文相关HMM的语音合成数据库的无监督韵律标记
3. Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs [J] . Chen-Yu YANG, Zhen-Hua LING, Li-Rong DAI IEICE transactions on information and systems . 2014,第6期

机译：使用上下文相关HMM的语音合成数据库的无监督韵律标记
4. Advanced Unsupervised Joint Prosody Labeling and Modeling for Mandarin Speech and Its Application to Prosody Generation for TTS [C] . Chen-Yu Chiang, Sin-Horng Chen, Yih-Ru Wang International Speech Communication Association . 2009

机译：高级无监督的联合韵律标签和普通话言论建模及其对TTS韵文的应用
5. Prosody dependent speech recognition on American radio news speech. [D] . Chen, Ken. 2004

机译：美国广播新闻语音中依赖于韵律的语音识别。
6. A NOVEL ALGORITHM FOR UNSUPERVISED PROSODIC LANGUAGE MODEL ADAPTATION [O] . Sankaranarayanan Ananthakrishnan, Shrikanth Narayanan -1

机译：非监督的prosodic语言模型自适应的新算法
7. Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs [O] . Chen-Yu YANG, Zhen-Hua LING, Li-Rong DAI 2014

机译：使用上下文相关的HMMS的语音合成数据库无监督的韵律标记

Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅