Voice Conversion from Tibetan Amdo Dialect to Tibetan U-tsang Dialect Based on Generative Adversarial Networks

机译：基于生成对抗网络的藏族安多方言语音转换为藏族乌藏语

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a Voice Conversion (VC) method from Tibetan Amdo dialect to Tibetan U-tsang dialect based on Generative Adversarial Networks (GANs). An inevitable problem with the traditional VC framework is that the acoustic feature vector output from the conversion model is over-smoothing, which leads to a drop in the quality of the converted speech. This is because in the training phase of acoustic model, a specific probability model is used to model the distribution of data, so that the output of a relatively average parameter of the model is considered to be optimal. Acoustic parameter over-smoothing occurs as long as the analytical form of the model distribution is artificially designed. In order to overcome this problem, the VC framework proposed in this paper uses GANs as the modeling network of the acoustic model, directly uses a generator model to learn the distribution of data, and guides the generator through a discriminator model. The training of the model makes the sample distribution of the model close to the distribution of the target speaker data samples, thus alleviating the problem of over-smoothing of the converted speech spectrum. The experimental results show that the proposed method is superior to VC based on Deep Neural Networks (DNNs) in the sound quality and similarity of the converted speech.

机译：本文提出了一种基于生成对抗网络（GANs）的从藏族安多方言到藏语U-tsang方言的语音转换（VC）方法。传统VC框架不可避免的问题是，转换模型输出的声学特征向量过于平滑，从而导致转换后的语音质量下降。这是因为在声学模型的训练阶段，使用特定的概率模型对数据的分布进行建模，因此该模型的相对平均参数的输出被认为是最佳的。只要模型分布的分析形式是人为设计的，就会发生声学参数过度平滑的情况。为了克服这个问题，本文提出的VC框架使用GAN作为声学模型的建模网络，直接使用生成器模型来学习数据的分布，并通过鉴别器模型指导生成器。模型的训练使得模型的样本分布接近目标说话者数据样本的分布，从而减轻了转换后的语音频谱的过度平滑的问题。实验结果表明，该方法在语音质量和转换语音相似度方面优于基于深度神经网络（DNN）的VC。

著录项

来源
《IEEE Joint International Information Technology and Artificial Intelligence Conference》|2019年|325-329|共5页
会议地点
作者
Gan Zhenye; Zhao Guangying; Yang Hongwu; Xing Xiaotian; Jiao Yi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data models; Training; Acoustics; Generators; Gallium nitride; Analytical models; Feature extraction;

机译：数据模型;训练;声学;发电机;氮化镓;分析模型;特征提取;

相似文献

外文文献
中文文献
专利

1. The Remains of Ancient Tibetan and the Primitive Tibetan-Burman Language in Khampa Tibetan Dawu Dialect [J] . Gen Ga Weng Mu 中国藏学（英文版） . 2015,第002期

机译：康巴藏语大悟方言中的古代藏语遗存和原始藏缅语
2. Tibetan Multi-Dialect Speech and Dialect Identity Recognition [J] . Yue Zhao, Jianjian Yue, Wei Song, Computers, Materials & Continua . 2019,第3期

机译：西藏多方面言语和方言识别识别
3. A Generative Adversarial Network for Data Augmentation: The Case of Arabic Regional Dialects [J] . Xavier A. Carrasco, Ashraf Elnagar, Mohammed Lataifeh Procedia Computer Science . 2021,第a期

机译：用于数据增强的生成对抗性网络：阿拉伯区域方言的案例
4. Voice Conversion from Tibetan Amdo Dialect to Tibetan U-tsang Dialect Based on Generative Adversarial Networks [C] . Gan Zhenye, Zhao Guangying, Yang Hongwu, IEEE Joint International Information Technology and Artificial Intelligence Conference . 2019

机译：基于生成对抗网络的西藏AMDO方言从西藏AMDO方言转换为西藏U-Tsang方言
5. Structural Evolution of the Tibetan Syllable: A Cross-Dialectal Study [D] . Leongue, Vitor. 2018

机译：西藏音节的结构演变：跨方解板研究

Voice Conversion from Tibetan Amdo Dialect to Tibetan U-tsang Dialect Based on Generative Adversarial Networks

摘要

著录项

相似文献

相关主题

期刊订阅