首页> 外文期刊>電子情報通信学会技術研究報告. 音声. Speech >Multi-speaker speech synthesis and speaker adaptation based on deep bidirectional long short-term memory recurrent neural network
【24h】

Multi-speaker speech synthesis and speaker adaptation based on deep bidirectional long short-term memory recurrent neural network

机译:基于深度双向长短期记忆递归神经网络的多说话人语音合成与说话人自适应

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, a deep bidirectional long short-term memory recurrent neural network (DBLSTM-RNN) based multi-speaker synthesis model is proposed to improve the synthesis quality for a target speaker whose corpus is limited. This model consists of speaker independent network (SIN) and speaker dependent network (SDN), where SIN is jointly trained by multiple speakers and SDN is designed for designed for each of the target speakers. In particular, gender code as well as speaker code or i-vector are prepared as augmented input information to help SIN realize better distinction among different target speakers. Experimental results show that our proposed model improves the synthesis performance with a fairly small database for each speaker, compared with DNN-based multi--speaker TTS and conventional DBLSTM-RNN based TTS. In addition, this multi-speaker model can also be used to perform speaker adaptation, and is experimentally shown to be capable of achieving good quality speech of a new speaker in terms of naturalness and speaker identity.
机译:为了提高语料库有限的目标说话者的合成质量,本文提出了一种基于深度双向双向长短期记忆递归神经网络(DBLSTM-RNN)的多说话者综合模型。此模型由独立于扬声器的网络(SIN)和依赖扬声器的网络(SDN)组成,其中SIN由多个扬声器共同训练,并且SDN是为每个目标扬声器而设计的。特别地,准备性别代码以及说话者代码或i-vector作为增强的输入信息,以帮助SIN更好地区分不同目标说话者。实验结果表明,与基于DNN的多扬声器TTS和基于DBLSTM-RNN的传统TTS相比,我们提出的模型可以通过每个发言人较小的数据库来提高综合性能。另外,该多说话者模型也可以用于执行说话者自适应,并且在自然上和说话者身份方面通过实验证明能够实现新说话者的高质量语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号