首页> 外文期刊>Computer speech and language >On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis
【24h】

On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

机译:激励和频谱参数对表达统计参数语音合成的影响

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a study on the importance of short-term speech parameterizations for expressive statistical parametric synthesis. Assuming a source-filter model of speech production, the analysis is conducted over spectral parameters, here defined as features which represent a minimum-phase synthesis filter, and some excitation parameters, which are features used to construct a signal that is fed to the minimum-phase synthesis filter to generate speech. In the first part, different spectral and excitation parameters that are applicable to statistical parametric synthesis are tested to determine which ones are the most emotion dependent. The analysis is performed through two methods proposed to measure the relative emotion dependency of each feature: one based on K-means clustering, and another based on Gaussian mixture modeling for emotion identification. Two commonly used forms of parameters for the short-term speech spectral envelope, the Mel cepstrum and the Mel line spectrum pairs are utilized. As excitation parameters, the anti-causal cepstrum, the time-smoothed group delay, and band-aperiodicity coefficients are considered. According to the analysis, the line spectral pairs are the most emotion dependent parameters. Among the excitation features, the band-aperiodicity coefficients present the highest correlation with the speaker's emotion. The most emotion dependent parameters according to this analysis were selected to train an expressive statistical parametric synthesizer using a speaker and language factorization framework. Subjective test results indicate that the considered spectral parameters have a bigger impact on the synthesized speech emotion when compared with the excitation ones.
机译:本文提出了对短期语音参数化对表达统计参数综合的重要性的研究。假设语音产生的源滤波器模型,则对频谱参数进行分析,频谱参数在这里定义为代表最小相位合成滤波器的特征,而某些激励参数则是用于构造被馈送到最小信号的特征相合成滤波器以生成语音。在第一部分中,测试了适用于统计参数综合的不同光谱和激励参数,以确定哪些参数与情感的依赖性最大。该分析是通过两种方法来测量每个特征的相对情感依存关系来进行的:一种是基于K均值聚类的方法,另一种是基于高斯混合模型进行情感识别的方法。短期语音频谱包络的​​两种常用参数形式为梅尔倒谱和梅尔线频谱对。作为激励参数,考虑了反因果倒谱,时间平滑的群时延和谱带亲和力系数。根据分析,线谱对是最依赖情感的参数。在激励特征中,带外差系数呈现出与说话者情绪的最高相关性。根据该分析,选择最依赖情感的参数,以使用说话者和语言分解框架来训练表达性统计参数合成器。主观测试结果表明,所考虑的频谱参数与激励语音参数相比对合成语音情感的影响更大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号