MASS: Multi-task anthropomorphic speech synthesis framework

Jinyin Chen; Linhui Ye; Zhaoyan Ming

首页> 外文期刊>Computer speech and language >MASS: Multi-task anthropomorphic speech synthesis framework

【24h】

MASS: Multi-task anthropomorphic speech synthesis framework

机译：MASS：多任务人力术语言综合框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text-to-Speech (TTS) synthesis plays an important role in human-computer interaction. Currently, most TTS technologies focus on the naturalness of speech, namely, making the speeches sound like humans. However, the key tasks of the expression of emotion and the speaker identity are ignored, which limits the application scenarios of TTS synthesis technology. To make the synthesized speech more realistic and expand the application scenarios, we propose a multi-task anthropomorphic speech synthesis framework (MASS), which can synthesize speeches from text with specified emotion and speaker identity. The MASS framework consists of a base TTS module and two novel voice conversion modules: the emotional voice conversion module and the speaker voice conversion module. We propose deep emotion voice conversion model (DEVC) and deep speaker voice conversion model (DSVC) based on convolution residual networks. It solves the problem of feature loss during voice conversion. The model trainings are independent of parallel datasets, and are capable of many-to-many voice conversion. In the emotional voice conversion, speaker voice conversion experiments, as well as the multi-task speech synthesis experiments, experimental results show DEVC and DSVC convert speech effectively. The quantitative and qualitative evaluation results of multi-task speech synthesis experiments show MASS can effectively synthesis speech with specified text, emotion and speaker identity.

机译：文本到语音（TTS）综合在人机交互中起着重要作用。目前，大多数TTS技术专注于语音的自然，即，使演讲称为人类。然而，忽略了情感和扬声器身份表达的关键任务，这限制了TTS合成技术的应用场景。为了使综合演讲更加现实并扩大应用程序方案，我们提出了一种多任务人力语音语音综合框架（质量），可以将来自特定情感和扬声器身份的文本综合演讲。群众框架包括基础TTS模块和两种新型语音转换模块：情绪语音转换模块和扬声器语音转换模块。我们提出了基于卷积残余网络的深度情感语音转换模型（DEVC）和深扬声器语音转换模型（DSVC）。它解决了语音转换过程中的功能损失问题。模型培训与并行数据集无关，并且能够多对多语音转换。在情感语音转换，扬声器语音转换实验中，以及多任务语音合成实验，实验结果表明DEVC和DSVC有效转换语音。多任务语音合成实验的定量和定性评估结果表明了质量可以有效地用指定的文本，情感和扬声器身份合成言论。

著录项

来源
《Computer speech and language》 |2021年第11期|101243.1-101243.19|共19页
作者
Jinyin Chen; Linhui Ye; Zhaoyan Ming;
展开▼
作者单位

Institute of Cyberspace Security Zhejiang University of Technology Hangzhou 310023 China College of Information Engineering Zhejiang University of Technology Hangzhou 310023 China;

College of Information Engineering Zhejiang University of Technology Hangzhou 310023 China;

Institute of Computing Innovation Zhejiang University Hangzhou 310027 China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Text-to-speech; Emotional voice conversion; Speaker voice conversion; Convolution residual network;

机译：文字转语音;情绪转换;扬声器语音转换;卷积剩余网络;

相似文献

外文文献
中文文献
专利

1. Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning [J] . Zhengqi Wen, Kehuang Li, Zhen Huang, Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：通过上下文特征参数化和多任务学习改善基于深度神经网络的语音合成
2. PFHTS-IDSS: A Hybrid HTS-based Framework for Indonesian Speech Synthesis via Phoneme and Full-context Lab [J] . Lei Zhenfeng, Zhai Junjun, Chen Juntao, International Journal of Pattern Recognition and Artificial Intelligence . 2021,第4期

机译：PFHTS-IDS：通过音素和全面上下文实验室的印度尼西亚语音合成的基于混合HTS的框架
3. Speech Enhancement Based on Analysis-Synthesis Framework with Improved Parameter Domain Enhancement [J] . Liu Bin, Tao Jianhua, Wen Zhengqi, Journal of signal processing systems for signal, image, and video technology . 2016,第2期

机译：基于改进参数域增强的分析综合框架的语音增强
4. Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework [C] . Shan Yang, Lei Xie, Xiao Chen, 2017 IEEE Automatic Speech Recognition and Understanding Workshop . 2017

机译：在多任务学习框架下使用生成对抗网络进行统计参数语音合成
5. A Framework for Mass-Market Inductive Program Synthesis [D] . Polozov, Oleksandr. 2017

机译：大众市场归纳计划综合框架
6. Automation Inner Speech as an Anthropomorphic Feature Affecting Human Trust: Current Issues and Future Directions [O] . Alessandro Geraci, Antonella DAmico, Arianna Pipitone, 2021

机译：自动化内心言论作为影响人类信任的人拟议特征：当前的问题和未来方向
7. MASS: Multi-task anthropomorphic speech synthesis framework [O] . Jinyin Chen, Linhui Ye, Zhaoyan Ming 2021

机译：MASS：多任务人力术语综合框架

MASS: Multi-task anthropomorphic speech synthesis framework

摘要

著录项

相似文献

相关主题

期刊订阅