首页> 外文期刊>Computer speech and language >Robust processing techniques for voice conversion
【24h】

Robust processing techniques for voice conversion

机译:强大的语音转换处理技术

获取原文
获取原文并翻译 | 示例
           

摘要

Differences in speaker characteristics, recording conditions, and signal processing algorithms affect output quality in voice conversion systems. This study focuses on formulating robust techniques for a codebook mapping based voice conversion algorithm. Three different methods are used to improve voice conversion performance: confidence measures, pre-emphasis, and spectral equalization. Analysis is performed for each method and the implementation details are discussed. The first method employs confidence measures in the training stage to eliminate problematic pairs of source and target speech units that might result from possible misalignments, speaking style differences or pronunciation variations. Four confidence measures are developed based on the spectral distance, fundamental frequency (f0) distance, energy distance, and duration distance between the source and target speech units. The second method focuses on the importance of pre-emphasis in line-spectral frequency (LSF) based vocal tract modeling and transformation. The last method, spectral equalization, is aimed at reducing the differences in the source and target long-term spectra when the source and target recording conditions are significantly different. The voice conversion algorithm that employs the proposed techniques is compared with the baseline voice conversion algorithm with objective tests as well as three subjective listening tests. First, similarity to the target voice is evaluated in a subjective listening test and it is shown that the proposed algorithm improves similarity to the target voice by 23.0%. An ABX test is performed and the proposed algorithm is preferred over the baseline algorithm by 76.4%. In the third test, the two algorithms are compared in terms of the subjective quality of the voice conversion output. The proposed algorithm improves the subjective output quality by 46.8% in terms of mean opinion score (MOS).
机译:扬声器特性,录制条件和信号处理算法的差异会影响语音转换系统的输出质量。这项研究的重点是为基于码本映射的语音转换算法制定可靠的技术。三种不同的方法用于改善语音转换性能:置信度,预加重和频谱均衡。将对每种方法进行分析,并讨论实现细节。第一种方法在训练阶段采用置信度度量,以消除可能因未对准,说话风格差异或发音变化而导致的成对的源语音单元和目标语音单元。根据频谱距离,基频(f0)距离,能量距离以及源语音单元与目标语音单元之间的持续时间距离,开发了四个置信度度量。第二种方法侧重于基于线谱频率(LSF)的声道建模和转换中预强调的重要性。最后一种方法,光谱均衡,旨在减少当源和目标记录条件明显不同时,源和目标长期光谱之间的差异。将采用建议技术的语音转换算法与具有目标测试以及三种主观听觉测试的基准语音转换算法进行比较。首先,在主观听觉测试中评估了与目标语音的相似性,结果表明该算法将与目标语音的相似性提高了23.0%。进行了ABX测试,所提出的算法比基线算法要优先76.4%。在第三个测试中,根据语音转换输出的主观质量比较了这两种算法。提出的算法在平均意见得分(MOS)方面将主观输出质量提高了46.8%。

著录项

  • 来源
    《Computer speech and language》 |2006年第4期|p. 441-467|共27页
  • 作者

    Oytun Turk; Levent M. Arslan;

  • 作者单位

    Electrical and Electronics Engineering Department, Bogazici University, Bebek, Istanbul, Turkey;

    Electrical and Electronics Engineering Department, Bogazici University, Bebek, Istanbul, Turkey;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号