Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings

机译：具有先进的神经说话人嵌入功能的零发声多说话人语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers. We investigate multi-speaker modeling for end-to-end text-to-speech synthesis and study the effects of different types of state-of-the-art neural speaker embeddings on speaker similarity for unseen speakers. Learnable dictionary encoding-based speaker embeddings with angular softmax loss can improve equal error rates over x-vectors in a speaker verification task; these embeddings also improve speaker similarity and naturalness for unseen speakers when used for zero-shot adaptation to new speakers in end-to-end speech synthesis.

机译：虽然使用扬声器Embeddings对端到端语音合成的扬声器适应可以产生训练期间看到的扬声器的良好扬声器相似性，但仍然存在零击适应看不见的扬声器的差距。我们研究了用于端到端文本到语音合成的多扬声器建模，并研究不同类型最先进的神经扬声器嵌入对看不见的扬声器的扬声器相似性的影响。学习词典编码的扬声器嵌入具有角度软显示的eMbeddings可以在扬声器验证任务中提高X-Vectors的相同错误率;当用于零击适应结束语音合成中的新扬声器时，这些嵌入式的展示器也可提高看不见的扬声器的扬声器相似性和自然性。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|6184-6188|共5页
会议地点
作者
Erica Cooper; Cheng-I Lai; Yusuke Yasuda; Fuming Fang; Xin Wang; Nanxin Chen; Junichi Yamagishi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech synthesis; speaker adaptation; speaker embeddings; transfer learning; speaker verification;

机译：语音合成;说话人自适应;说话人嵌入;转移学习;说话人验证;

相似文献

外文文献
中文文献
专利

1. State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations [J] . Jesus Villalba, Nanxin Chen, David Snyder, Computer speech and language . 2020,第Mara期

机译：NIST SRE18中具有神经网络嵌入功能的最先进的说话人识别功能，Wild评估中的说话人功能
2. An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis [J] . Beáta L?rincz, Adriana Stan, Mircea Giurgiu Procedia Computer Science . 2021,第a期

机译：对多扬声器深神经动词合成中记录条件和扬声器特性的客观评价
3. Multi-speaker speech synthesis and speaker adaptation based on deep bidirectional long short-term memory recurrent neural network [J] . Yi ZHAO, Nobuaki MINEMATSU, Daisuke SAITO 電子情報通信学会技術研究報告. 音声. Speech . 2015,第346期

机译：基于深度双向长短期记忆递归神经网络的多说话人语音合成与说话人自适应
4. Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings [C] . Erica Cooper, Cheng-I Lai, Yusuke Yasuda, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：零拍摄的多扬声器文本与最先进的神经扬声器嵌入式
5. The Online Adjustment of Speaker-Specific Phonetic Beliefs in Multi-Speaker Speech Perception [D] . Lai, Wei. 2021

机译：在多扬声器语音感知中的发言者特定语音信念的在线调整
6. Neural decoding of attentional selection in multi-speaker environments without access to clean sources [O] . James O’Sullivan, Zhuo Chen, Jose Herrero, -1

机译：在没有发言权的情况下对多说话人环境中的注意选择进行神经解码
7. Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora [O] . Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, 2019

机译：使用扬声器 - 不平衡的语音语料库训练多扬声器神经文本到语音系统
8. A Limited-Vocabulary, Multi-Speaker Automatic Isolated Word Recognition System [R] . Paul, J. E. 1969

机译：有限词汇，多扬声器自动孤立词识别系统

Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅