Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis

机译：使用双向联想记忆和音节级F0功能改进基于HMM的普通话语音合成的F0预测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The speech generated by hidden Markov model (HMM) based speech synthesis method always sounds monotonous compared with natural recordings. An important reason is that the predicted F0 trajectories are over-smoothed. This arises from the adoption of frame-level F0 features and the averaging effect of acoustic modeling using Gaussians in the conventional F0 modeling approach. In this paper, we propose a method to improve the F0 prediction of HMM-based Mandarin speech synthesis in a post-filtering way. Syllable-level F0 features, e.g., length-normalized logF0 vectors or quantitative target approximation (qTA) parameters, are extracted from the F0 trajectories predicted by the conventional approach. These features are mapped towards natural ones by Gaussian bidirectional associative memory (GBAM) based transformation. Our subjective experiments indicate that the GBAM-based F0 post-filtering method using either logF0 vectors or qTA parameters can significantly improve the naturalness of synthetic speech. Using raw logF0 vectors for post-filtering can achieve better performance than using derived qTA parameters.

机译：与自然记录相比，基于隐马尔可夫模型（HMM）的语音合成方法生成的语音始终听起来单调。一个重要的原因是预测的F0轨迹过于平滑。这是由于采用了帧级F0功能以及在常规F0建模方法中使用高斯进行声学建模的平均效果。在本文中，我们提出了一种以后过滤的方式改善基于HMM的普通话语音合成的F0预测的方法。从常规方法预测的F0轨迹中提取音节级别的F0特征，例如长度标准化的logF0向量或定量目标近似（qTA）参数。通过基于高斯双向联想记忆（GBAM）的转换，将这些功能映射为自然功能。我们的主观实验表明，使用logF0向量或qTA参数的基于GBAM的F0后过滤方法可以显着提高合成语音的自然性。与使用派生的qTA参数相比，使用原始logF0向量进行后过滤可以实现更好的性能。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2014年|275-279|共5页
会议地点
作者
Gao Li; Ling Zhen-Hua; Chen Ling-Hui; Dai Li-Rong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature extraction; Hidden Markov models; Speech; Speech synthesis; Training; Trajectory; Vectors; bidirectional associative memory; hidden Markov model; speech synthesis; target approximation;

机译：特征提取;隐马尔可夫模型;语音;语音合成;训练;轨迹;向量;双向联想记忆;隐马尔可夫模型;语音合成;目标逼近;

相似文献

外文文献
中文文献
专利

1. F0 Parameterization of Glottalized Tones in HMM-Based Speech Synthesis for Hanoi Vietnamese [J] . Duy Khanh NINH, Yoichi YAMASHITA IEICE transactions on information and systems . 2015,第12期

机译：基于HMM的河内越南语语音合成中声门音的F0参数化
2. Soft context clustering for F0 modeling in HMM-based speech synthesis [J] . Soheil Khorram, Hossein Sameti, Simon King EURASIP journal on advances in signal processing . 2015,第1期

机译：基于HMM的语音合成中的F0建模的软上下文聚类
3. Soft context clustering for F0 modeling in HMM-based speech synthesis [J] . Soheil Khorram, Hossein Sameti, Simon King EURASIP journal on advances in signal processing . 2015,第1期

机译：基于HMM的语音合成中的F0建模的软上下文聚类
4. Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis [C] . Gao Li, Ling Zhen-Hua, Chen Ling-Hui, International Symposium on Chinese Spoken Language Processing . 2014

机译：基于HMM的普通话语音合成，使用双向关联存储器和音节级F0特征改善F0预测
5. Investigating the Effect of Musical Training on Speech-in-Speech Perception: The Role of f0, Timing, and Spectral Cues [D] . Cohn, Michelle Dana. 2018

机译：调查音乐训练对语音讲话的影响：F0，时序和光谱线索的作用
6. Effect of F0 contour on perception of Mandarin Chinese speech against masking [O] . Meihong Wu 2012

机译：F0等高线对普通话语音掩蔽感知的影响
7. Soft context clustering for F0 modeling in HMM-based speech synthesis [O] . 2015

机译：基于HMM的语音合成中的F0建模的软上下文聚类

Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅