首页> 外文会议>IEEE Joint International Information Technology and Artificial Intelligence Conference >Voice Conversion from Tibetan Amdo Dialect to Tibetan U-tsang Dialect Based on Generative Adversarial Networks
【24h】

Voice Conversion from Tibetan Amdo Dialect to Tibetan U-tsang Dialect Based on Generative Adversarial Networks

机译:基于生成对抗网络的藏族安多方言语音转换为藏族乌藏语

获取原文

摘要

This paper proposes a Voice Conversion (VC) method from Tibetan Amdo dialect to Tibetan U-tsang dialect based on Generative Adversarial Networks (GANs). An inevitable problem with the traditional VC framework is that the acoustic feature vector output from the conversion model is over-smoothing, which leads to a drop in the quality of the converted speech. This is because in the training phase of acoustic model, a specific probability model is used to model the distribution of data, so that the output of a relatively average parameter of the model is considered to be optimal. Acoustic parameter over-smoothing occurs as long as the analytical form of the model distribution is artificially designed. In order to overcome this problem, the VC framework proposed in this paper uses GANs as the modeling network of the acoustic model, directly uses a generator model to learn the distribution of data, and guides the generator through a discriminator model. The training of the model makes the sample distribution of the model close to the distribution of the target speaker data samples, thus alleviating the problem of over-smoothing of the converted speech spectrum. The experimental results show that the proposed method is superior to VC based on Deep Neural Networks (DNNs) in the sound quality and similarity of the converted speech.
机译:本文提出了一种基于生成对抗网络(GANs)的从藏族安多方言到藏语U-tsang方言的语音转换(VC)方法。传统VC框架不可避免的问题是,转换模型输出的声学特征向量过于平滑,从而导致转换后的语音质量下降。这是因为在声学模型的训练阶段,使用特定的概率模型对数据的分布进行建模,因此该模型的相对平均参数的输出被认为是最佳的。只要模型分布的分析形式是人为设计的,就会发生声学参数过度平滑的情况。为了克服这个问题,本文提出的VC框架使用GAN作为声学模型的建模网络,直接使用生成器模型来学习数据的分布,并通过鉴别器模型指导生成器。模型的训练使得模型的样本分布接近目标说话者数据样本的分布,从而减轻了转换后的语音频谱的过度平滑的问题。实验结果表明,该方法在语音质量和转换语音相似度方面优于基于深度神经网络(DNN)的VC。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号