On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

Ranniery Maia; Masami Akamine

首页> 外文期刊>Computer speech and language >On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

【24h】

On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

机译：激励和频谱参数对表达统计参数语音合成的影响

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a study on the importance of short-term speech parameterizations for expressive statistical parametric synthesis. Assuming a source-filter model of speech production, the analysis is conducted over spectral parameters, here defined as features which represent a minimum-phase synthesis filter, and some excitation parameters, which are features used to construct a signal that is fed to the minimum-phase synthesis filter to generate speech. In the first part, different spectral and excitation parameters that are applicable to statistical parametric synthesis are tested to determine which ones are the most emotion dependent. The analysis is performed through two methods proposed to measure the relative emotion dependency of each feature: one based on K-means clustering, and another based on Gaussian mixture modeling for emotion identification. Two commonly used forms of parameters for the short-term speech spectral envelope, the Mel cepstrum and the Mel line spectrum pairs are utilized. As excitation parameters, the anti-causal cepstrum, the time-smoothed group delay, and band-aperiodicity coefficients are considered. According to the analysis, the line spectral pairs are the most emotion dependent parameters. Among the excitation features, the band-aperiodicity coefficients present the highest correlation with the speaker's emotion. The most emotion dependent parameters according to this analysis were selected to train an expressive statistical parametric synthesizer using a speaker and language factorization framework. Subjective test results indicate that the considered spectral parameters have a bigger impact on the synthesized speech emotion when compared with the excitation ones.

机译：本文提出了对短期语音参数化对表达统计参数综合的重要性的研究。假设语音产生的源滤波器模型，则对频谱参数进行分析，频谱参数在这里定义为代表最小相位合成滤波器的特征，而某些激励参数则是用于构造被馈送到最小信号的特征相合成滤波器以生成语音。在第一部分中，测试了适用于统计参数综合的不同光谱和激励参数，以确定哪些参数与情感的依赖性最大。该分析是通过两种方法来测量每个特征的相对情感依存关系来进行的：一种是基于K均值聚类的方法，另一种是基于高斯混合模型进行情感识别的方法。短期语音频谱包络的两种常用参数形式为梅尔倒谱和梅尔线频谱对。作为激励参数，考虑了反因果倒谱，时间平滑的群时延和谱带亲和力系数。根据分析，线谱对是最依赖情感的参数。在激励特征中，带外差系数呈现出与说话者情绪的最高相关性。根据该分析，选择最依赖情感的参数，以使用说话者和语言分解框架来训练表达性统计参数合成器。主观测试结果表明，所考虑的频谱参数与激励语音参数相比对合成语音情感的影响更大。

著录项

来源
《Computer speech and language》 |2014年第5期|1209-1232|共24页
作者
Ranniery Maia; Masami Akamine;
展开▼
作者单位

Cambridge Research Laboratory, Toshiba Research Europe Limited, 208 Cambridge Science Park, Milton Road, Cambridge CB4 0GZ, UK;

Corporate Research and Development Center, Toshiba Corporation 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech synthesis; Statistical parametric speech synthesis; Expressive speech synthesis; Speech parameterization;

机译：语音合成;统计参数语音合成;表达性语音合成;语音参数化;

相似文献

外文文献
中文文献
专利

1. Excitation modelling using epoch features for statistical parametric speech synthesis [J] . M Kiran Reddy, K Sreenivasa Rao Computer speech and language . 2020,第Mara期

机译：使用纪元特征进行激励建模以进行统计参数语音合成
2. GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis [J] . Juvela Lauri, Bollepalli Bajibabu, Tsiaras Vassilis, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第6期

机译：GlotNet-统计参数语音合成中声门激励的原始波形模型
3. GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis [J] . Juvela Lauri, Bollepalli Bajibabu, Tsiaras Vassilis, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第6期

机译：GLOTNET - 一种原始波形模型，用于统计参数致辞综合作用
4. High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network [C] . Lauri Juvela, Bajibabu Bollepalli, Manu Airaksinen, IEEE International Conference on Acoustics, Speech and Signal Processing . 2016

机译：使用深度神经网络的统计参数语音合成中声门声码的高音调激励生成
5. Statistical Parametric Speech Synthesis using Deep Learning Architectures [D] . Kang, Shiyin. 2016

机译：使用深度学习架构的统计参数致辞
6. Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis [O] . Marvin Coto-Jiménez 2021

机译：基于深度学习的判别多流破旧用于增强统计参数致辞综合
7. A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis [O] . Takaki, Shinji, Yamagishi, Junichi 2016

机译：基于深度自动编码器的FFT谱包络的低维特征提取，用于统计参数语音合成

On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅