首页> 美国卫生研究院文献>other >Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition
【2h】

Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition

机译:类别韵律模型的无监督适应用于韵律标记和语音识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Automatic speech recognition (ASR) systems rely almost exclusively on short-term segment-level features (MFCCs), while ignoring higher level suprasegmental cues that are characteristic of human speech. However, recent experiments have shown that categorical representations of prosody, such as those based on the Tones and Break Indices (ToBI) annotation standard, can be used to enhance speech recognizers. However, categorical prosody models are severely limited in scope and coverage due to the lack of large corpora annotated with the relevant prosodic symbols (such as pitch accent, word prominence, and boundary tone labels). In this paper, we first present an architecture for augmenting a standard ASR with symbolic prosody. We then discuss two novel, un-supervised adaptation techniques for improving, respectively, the quality of the linguistic and acoustic components of our categorical prosody models. Finally, we implement the augmented ASR by enriching ASR lattices with the adapted categorical prosody models. Our experiments show that the proposed unsupervised adaptation techniques significantly improve the quality of the prosody models; the adapted prosodic language and acoustic models reduce binary pitch accent (presence versus absence) classification error rate by 13.8% and 4.3%, respectively (relative to the seed models) on the Boston University Radio News Corpus, while the prosody-enriched ASR exhibits a 3.1% relative reduction in word error rate (WER) over the baseline system.
机译:自动语音识别(ASR)系统几乎完全依赖于短期段级特征(MFCC),而忽略了人类语音所特有的高级段上提示。但是,最近的实验表明,韵律的分类表示(例如基于音调和中断索引(ToBI)注释标准的分类表示)可用于增强语音识别器。但是,由于缺少带有相关韵律符号(例如音高,单词突出和边界音调标签)的大型语料库,类别韵律模型的范围和覆盖范围受到严重限制。在本文中,我们首先提出一种使用符号韵律增强标准ASR的体系结构。然后,我们讨论两种新颖的,无监督的适应技术,分别用于改善分类韵律模型的语言和声学组件的质量。最后,我们通过使用自适应分类韵律模型丰富ASR格来实现增强ASR。我们的实验表明,提出的无监督自适应技术可以显着提高韵律模型的质量;改编后的韵律语言和声学模型在波士顿大学广播新闻语料库上分别降低了二进制音调重音(存在与不存在)的分类错误率(相对于种子模型)分别为13.8%和4.3%,而富含韵律的ASR则显示了相对于基线系统,单词错误率(WER)相对降低了3.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号