A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus

Mohammed Salah Al-Radhi; Omnia Abdo; Tamas Gabor Csapo; Sherif Abdou; Geza Nemeth; Mervat Fashal

首页> 外文期刊>Computer speech and language >A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus

【24h】

A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus

机译：用于统计参量语音合成的连续声码器及其使用视听注解的阿拉伯语语料库的评估

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present an extension of a novel continuous residual-based vocoder for statistical parametric speech synthesis by addressing two objectives. First, because the noise component is often not accurately modelled in modern vocoders (e.g. STRAIGHT), a new technique for modelling unvoiced sounds is proposed by adding time domain envelope to the unvoiced segments to avoid any residual buzziness. Four time-domain envelopes (Amplitude, Hilbert, Triangular and True) are investigated, enhanced, and then applied to the noise component of the excitation in our continuous vocoder, i.e. of which all parameters are continuous. With the future aim of producing high-quality Arabic speech synthesis, we secondly apply this vocoder on a modern standard Arabic audio-visual corpus which is annotated both phonetically and visually, and dedicated to emotional speech processing studies. In an objective experiment, we investigated the Phase Distortion Deviation, whereas a MUSHRA type subjective listening test was conducted comparing natural and vocoded speech samples. As a result, both experiments based on the proposed noise modelling have shown satisfactory results in terms of naturalness and intelligibility, while outperforming STRAIGHT and other earlier residual-based approaches.

机译：在本文中，我们通过解决两个目标提出了一种用于统计参数语音合成的新型基于连续残差的声码器的扩展。首先，由于在现代声码器（例如，STRAIGHT）中通常不能正确地对噪声成分进行建模，因此提出了一种通过对未发音段添加时域包络来避免任何残留的嗡嗡声来对未发音声音建模的新技术。研究，增强了四个时域包络（幅度，希尔伯特，三角形和真），然后将其应用于我们连续声码器中激励的噪声成分，即所有参数都是连续的。为了实现高质量阿拉伯语语音合成的未来目标，我们第二次将此声码器应用在现代标准的阿拉伯语视听语料库中，该语料库在语音和视觉上都进行了注释，并致力于情感语音处理研究。在客观实验中，我们调查了相位失真偏差，而进行了MUSHRA型主观听力测试，比较了自然语音和声码语音样本。结果，基于提议的噪声模型的两个实验在自然性和清晰度方面均显示出令人满意的结果，同时优于STRAIGHT和其他早期基于残差的方法。

著录项

来源
《Computer speech and language》 |2020年第3期|101025.1-101025.15|共15页
作者
Mohammed Salah Al-Radhi; Omnia Abdo; Tamas Gabor Csapo; Sherif Abdou; Geza Nemeth; Mervat Fashal;
展开▼
作者单位

Department of Telecommunications and Media Informatics Budapest University of Technology and Economics Budapest Hungary;

Phonetics and linguistics department Alexandria University Egypt;

Department of Telecommunications and Media Informatics Budapest University of Technology and Economics Budapest Hungary MTA-ELTE Lenduelet Lingual Articulation Research Croup Budapest Hungary;

Faculty of Computers and Information Cairo University Egypt;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech synthesis; Continuous vocoder; Envelope; Arabic;

机译：语音合成;连续声码器信封;阿拉伯;

相似文献

外文文献
中文文献
专利

1. Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus [J] . Mohammad Abushariah, Raja Ainon, Roziati Zainuddin, The international arab journal of information technology . 2012,第1期

机译：基于语音丰富均衡的语料库的阿拉伯语独立于说话人的连续自动语音识别
2. Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis [J] . Mohammed Salah AL-RADHI, Tamás Gábor CSAPó, Géza NéMETH IEICE transactions on information and systems . 2020,第5期

机译：基于连续噪声掩蔽的统计参数语音合成声码器
3. Duration modelling and evaluation for Arabic statistical parametric speech synthesis [J] . Zangar Imene, Mnasri Zied, Colotte Vincent, Multimedia Tools and Applications . 2021,第6期

机译：阿拉伯语统计参数致辞合成的持续时间建模与评估
4. A Continuous Vocoder Using Sinusoidal Model for Statistical Parametric Speech Synthesis [C] . Mohammed Salah Al-Radhi, Tamas Gabor Csapo, Geza Nemeth International Conference on speech and computer . 2018

机译：使用正弦模型的连续声码器用于统计参数语音合成
5. Statistical Parametric Speech Synthesis using Deep Learning Architectures [D] . Kang, Shiyin. 2016

机译：使用深度学习架构的统计参数致辞
6. Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis [O] . Marvin Coto-Jiménez 2021

机译：基于深度学习的判别多流破旧用于增强统计参数致辞综合
7. Duration modelling and evaluation for Arabic statistical parametric speech synthesis [O] . Imene Zangar, Zied Mnasri, Vincent Colotte, 2020

机译：阿拉伯语统计参数致辞合成的持续时间建模与评估
8. Simulation and Evaluation of Phonetic Speech Recognition Techniques. Volume II. Segmentation of Continuous Speech into Phonemes [R] . Otten, K. W. 1964

机译：语音识别技术的仿真与评估。第二卷。将连续语音分割成音素

A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus

摘要

著录项

相似文献

相关主题

期刊订阅