首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Prosodic Clustering for Phoneme-Level Prosody Control in End-to-End Speech Synthesis
【24h】

Prosodic Clustering for Phoneme-Level Prosody Control in End-to-End Speech Synthesis

机译:韵律聚类用于端到端语音合成中的音素级韵律控制

获取原文

摘要

This paper presents a method for controlling the prosody at the phoneme level in an autoregressive attention-based text-to-speech system. Instead of learning latent prosodic features with a variational framework as is commonly done, we directly extract phoneme-level F0 and duration features from the speech data in the training set. Each prosodic feature is discretized using unsupervised clustering in order to produce a sequence of prosodic labels for each utterance. This sequence is used in parallel to the phoneme sequence in order to condition the decoder with the utilization of a prosodic encoder and a corresponding attention module. Experimental results show that the proposed method retains the high quality of generated speech, while allowing phoneme-level control of F0 and duration. By replacing the F0 cluster centroids with musical notes, the model can also provide control over the note and octave within the range of the speaker.
机译:本文介绍了一种在自回归关注文本到语音系统中控制音素级别的韵律的方法。 除了通常完成的常见框架中,我们将从培训集中的语音数据中直接提取音素级F0和持续时间特征,而不是学习潜在韵律特征。 每个韵律特征是使用无监督聚类离散化,以便为每个话语产生一系列韵律标签。 该序列与音素序列并行使用,以便通过利用韵律编码器和相应的注意模块来调节解码器。 实验结果表明,该方法保留了高质量的生成语音,同时允许对F0和持续时间进行音素级控制。 通过用音符替换F0集群质心,该模型还可以在扬声器范围内提供对音符和八度音的控制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号