首页> 外文会议>International Symposium on Chinese Spoken Language Processing >Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis
【24h】

Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis

机译:使用双向联想记忆和音节级F0功能改进基于HMM的普通话语音合成的F0预测

获取原文

摘要

The speech generated by hidden Markov model (HMM) based speech synthesis method always sounds monotonous compared with natural recordings. An important reason is that the predicted F0 trajectories are over-smoothed. This arises from the adoption of frame-level F0 features and the averaging effect of acoustic modeling using Gaussians in the conventional F0 modeling approach. In this paper, we propose a method to improve the F0 prediction of HMM-based Mandarin speech synthesis in a post-filtering way. Syllable-level F0 features, e.g., length-normalized logF0 vectors or quantitative target approximation (qTA) parameters, are extracted from the F0 trajectories predicted by the conventional approach. These features are mapped towards natural ones by Gaussian bidirectional associative memory (GBAM) based transformation. Our subjective experiments indicate that the GBAM-based F0 post-filtering method using either logF0 vectors or qTA parameters can significantly improve the naturalness of synthetic speech. Using raw logF0 vectors for post-filtering can achieve better performance than using derived qTA parameters.
机译:与自然记录相比,基于隐马尔可夫模型(HMM)的语音合成方法生成的语音始终听起来单调。一个重要的原因是预测的F0轨迹过于平滑。这是由于采用了帧级F0功能以及在常规F0建模方法中使用高斯进行声学建模的平均效果。在本文中,我们提出了一种以后过滤的方式改善基于HMM的普通话语音合成的F0预测的方法。从常规方法预测的F0轨迹中提取音节级别的F0特征,例如长度标准化的logF0向量或定量目标近似(qTA)参数。通过基于高斯双向联想记忆(GBAM)的转换,将这些功能映射为自然功能。我们的主观实验表明,使用logF0向量或qTA参数的基于GBAM的F0后过滤方法可以显着提高合成语音的自然性。与使用派生的qTA参数相比,使用原始logF0向量进行后过​​滤可以实现更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号