首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings
【24h】

Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings

机译:具有先进的神经说话人嵌入功能的零发声多说话人语音合成

获取原文

摘要

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers. We investigate multi-speaker modeling for end-to-end text-to-speech synthesis and study the effects of different types of state-of-the-art neural speaker embeddings on speaker similarity for unseen speakers. Learnable dictionary encoding-based speaker embeddings with angular softmax loss can improve equal error rates over x-vectors in a speaker verification task; these embeddings also improve speaker similarity and naturalness for unseen speakers when used for zero-shot adaptation to new speakers in end-to-end speech synthesis.
机译:虽然使用扬声器Embeddings对端到端语音合成的扬声器适应可以产生训练期间看到的扬声器的良好扬声器相似性,但仍然存在零击适应看不见的扬声器的差距。我们研究了用于端到端文本到语音合成的多扬声器建模,并研究不同类型最先进的神经扬声器嵌入对看不见的扬声器的扬声器相似性的影响。学习词典编码的扬声器嵌入具有角度软显示的eMbeddings可以在扬声器验证任务中提高X-Vectors的相同错误率;当用于零击适应结束语音合成中的新扬声器时,这些嵌入式的展示器也可提高看不见的扬声器的扬声器相似性和自然性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号