首页> 外文期刊>Computer speech and language >A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus
【24h】

A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus

机译:用于统计参量语音合成的连续声码器及其使用视听注解的阿拉伯语语料库的评估

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present an extension of a novel continuous residual-based vocoder for statistical parametric speech synthesis by addressing two objectives. First, because the noise component is often not accurately modelled in modern vocoders (e.g. STRAIGHT), a new technique for modelling unvoiced sounds is proposed by adding time domain envelope to the unvoiced segments to avoid any residual buzziness. Four time-domain envelopes (Amplitude, Hilbert, Triangular and True) are investigated, enhanced, and then applied to the noise component of the excitation in our continuous vocoder, i.e. of which all parameters are continuous. With the future aim of producing high-quality Arabic speech synthesis, we secondly apply this vocoder on a modern standard Arabic audio-visual corpus which is annotated both phonetically and visually, and dedicated to emotional speech processing studies. In an objective experiment, we investigated the Phase Distortion Deviation, whereas a MUSHRA type subjective listening test was conducted comparing natural and vocoded speech samples. As a result, both experiments based on the proposed noise modelling have shown satisfactory results in terms of naturalness and intelligibility, while outperforming STRAIGHT and other earlier residual-based approaches.
机译:在本文中,我们通过解决两个目标提出了一种用于统计参数语音合成的新型基于连续残差的声码器的扩展。首先,由于在现代声码器(例如,STRAIGHT)中通常不能正确地对噪声成分进行建模,因此提出了一种通过对未发音段添加时域包络来避免任何残留的嗡嗡声来对未发音声音建模的新技术。研究,增强了四个时域包络(幅度,希尔伯特,三角形和真),然后将其应用于我们连续声码器中激励的噪声成分,即所有参数都是连续的。为了实现高质量阿拉伯语语音合成的未来目标,我们第二次将此声码器应用在现代标准的阿拉伯语视听语料库中,该语料库在语音和视觉上都进行了注释,并致力于情感语音处理研究。在客观实验中,我们调查了相位失真偏差,而进行了MUSHRA型主观听力测试,比较了自然语音和声码语音样本。结果,基于提议的噪声模型的两个实验在自然性和清晰度方面均显示出令人满意的结果,同时优于STRAIGHT和其他早期基于残差的方法。

著录项

  • 来源
    《Computer speech and language》 |2020年第3期|101025.1-101025.15|共15页
  • 作者单位

    Department of Telecommunications and Media Informatics Budapest University of Technology and Economics Budapest Hungary;

    Phonetics and linguistics department Alexandria University Egypt;

    Department of Telecommunications and Media Informatics Budapest University of Technology and Economics Budapest Hungary MTA-ELTE Lenduelet Lingual Articulation Research Croup Budapest Hungary;

    Faculty of Computers and Information Cairo University Egypt;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Speech synthesis; Continuous vocoder; Envelope; Arabic;

    机译:语音合成;连续声码器信封;阿拉伯;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号