Emphatic Speech Synthesis and Control Based on Characteristic Transferring in End-to-End Speech Synthesis

机译：端到端语音合成中基于特征转移的重点语音合成与控制

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

End-to-end text-to-speech (E2E TTS) synthesis has achieved great success. This work investigates the emphatic speech synthesis and control mechanisms in the E2E framework and proposes an E2E-based method for transferring emphasis characteristic between speakers. Characteristic differences between emphatic and neutral speech are learned from a smallscale corpus containing parallel neutral and emphasis speech utterances recorded by one speaker and further transferred to another speaker so that we can generate emphatic speech with latter speakers voice. Emphasis embedding is injected to the encoder of the extended E2E TTS model to capture the aforementioned differences; while the decoder and attention module are used to decode those differences into synthetic neutral / emphatic speech. Speaker codes linked to the decoder and attention module provide the E2E model the ability for characteristic transferring between speakers. To control the emphatic strength, an encoder memory manipulation mechanism is proposed. Experimental results indicate the effectiveness of our proposed model.

机译：端到端文本到语音（E2E TTS）合成取得了巨大的成功。这项工作研究了E2E框架中强调语音的合成和控制机制，并提出了一种基于E2E的方法来在说话者之间传递强调特征。强调语音和中性语音之间的特征差异是从一个小语料库中获悉的，该语料库包含由一个说话者录制的平行的中性和强调语音话语，然后进一步转移给另一位说话者，这样我们就可以使用后一个说话者的语音来产生强调语音。强调嵌入被注入到扩展的E2E TTS模型的编码器中，以捕获上述差异。而解码器和注意模块则用于将这些差异解码为合成的中性/强调语音。链接到解码器和注意模块的扬声器代码为E2E模型提供了在扬声器之间传递特征的能力。为了控制强调强度，提出了一种编码器存储器操纵机制。实验结果表明了我们提出的模型的有效性。

著录项

来源
《2018 First Asian Conference on Affective Computing and Intelligent Interaction》|2018年|1-6|共6页
会议地点 Beijing(CN)
作者
Mu Wang; Zhiyong Wu; Xixin Wu; Helen Meng; Shiyin Kang; Jia Jia; Lianhong Cai;
展开▼
作者单位

Tsinghua University, Shenzhen, China;

Tsinghua University, The Chinese University of Hong Kong, Shenzhen, China;

The Chinese University of Hong Kong, Hong Kong, China;

The Chinese University of Hong Kong, Tsinghua University, Hong Kong, China;

Tencent AI Lab, Shenzhen, China;

Tsinghua University, Beijing, China;

Tsinghua University, Beijing, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech synthesis; Decoding; Hidden Markov models; Training; Spectrogram; Speech coding; Acoustics;

机译：语音合成解码隐马尔可夫模型训练频谱图语音编码声学;

相似文献

外文文献
中文文献
专利

1. Generating emphatic speech with hidden Markov model for expressive speech synthesis [J] . Wu Zhiyong, Ning Yishuang, Zang Xiao, Multimedia Tools and Applications . 2015,第22期

机译：使用隐马尔可夫模型生成强调语音以进行表达性语音合成
2. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis [J] . Yuxuan Wang, Daisy Stanton, Yu Zhang, JMLR: Workshop and Conference Proceedings . 2018,第1期

机译：样式令牌：端到端语音合成中的无监督样式建模，控制和传输
3. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron [J] . RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：达到最终韵律转移，用于塔歇尔斯竞争语言合成
4. Emphatic Speech Synthesis and Control Based on Characteristic Transferring in End-to-End Speech Synthesis [C] . Mu Wang, Zhiyong Wu, Xixin Wu, Asian Conference on Affective Computing and Intelligent Interaction . 2018

机译：基于端到端语音合成中特征转移的强调语音合成与控制
5. Inverse solution of speech production based on perturbation theory and its application to articulatory speech synthesis. [D] . Yu, Zhenli. 1998

机译：基于摄动理论的语音产生逆解及其在发音语音合成中的应用。
6. Surface Electromyographic Control of a Novel Phonemic Interface for Speech Synthesis [O] . Meredith J. Cler, Alfonso Nieto-Castañón, Frank H. Guenther, -1

机译：用于语音合成的新型音素界面的表面肌电图控制
7. Controllable Emotion Transfer For End-to-End Speech Synthesis [O] . Tao Li, Shan Yang, Liumeng Xue, 2021

机译：端到端语音合成的可控情绪转移

Emphatic Speech Synthesis and Control Based on Characteristic Transferring in End-to-End Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅