Robust processing techniques for voice conversion

Oytun Turk; Levent M. Arslan

首页> 外文期刊>Computer speech and language >Robust processing techniques for voice conversion

【24h】

Robust processing techniques for voice conversion

机译：强大的语音转换处理技术

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Differences in speaker characteristics, recording conditions, and signal processing algorithms affect output quality in voice conversion systems. This study focuses on formulating robust techniques for a codebook mapping based voice conversion algorithm. Three different methods are used to improve voice conversion performance: confidence measures, pre-emphasis, and spectral equalization. Analysis is performed for each method and the implementation details are discussed. The first method employs confidence measures in the training stage to eliminate problematic pairs of source and target speech units that might result from possible misalignments, speaking style differences or pronunciation variations. Four confidence measures are developed based on the spectral distance, fundamental frequency (f0) distance, energy distance, and duration distance between the source and target speech units. The second method focuses on the importance of pre-emphasis in line-spectral frequency (LSF) based vocal tract modeling and transformation. The last method, spectral equalization, is aimed at reducing the differences in the source and target long-term spectra when the source and target recording conditions are significantly different. The voice conversion algorithm that employs the proposed techniques is compared with the baseline voice conversion algorithm with objective tests as well as three subjective listening tests. First, similarity to the target voice is evaluated in a subjective listening test and it is shown that the proposed algorithm improves similarity to the target voice by 23.0%. An ABX test is performed and the proposed algorithm is preferred over the baseline algorithm by 76.4%. In the third test, the two algorithms are compared in terms of the subjective quality of the voice conversion output. The proposed algorithm improves the subjective output quality by 46.8% in terms of mean opinion score (MOS).

机译：扬声器特性，录制条件和信号处理算法的差异会影响语音转换系统的输出质量。这项研究的重点是为基于码本映射的语音转换算法制定可靠的技术。三种不同的方法用于改善语音转换性能：置信度，预加重和频谱均衡。将对每种方法进行分析，并讨论实现细节。第一种方法在训练阶段采用置信度度量，以消除可能因未对准，说话风格差异或发音变化而导致的成对的源语音单元和目标语音单元。根据频谱距离，基频（f0）距离，能量距离以及源语音单元与目标语音单元之间的持续时间距离，开发了四个置信度度量。第二种方法侧重于基于线谱频率（LSF）的声道建模和转换中预强调的重要性。最后一种方法，光谱均衡，旨在减少当源和目标记录条件明显不同时，源和目标长期光谱之间的差异。将采用建议技术的语音转换算法与具有目标测试以及三种主观听觉测试的基准语音转换算法进行比较。首先，在主观听觉测试中评估了与目标语音的相似性，结果表明该算法将与目标语音的相似性提高了23.0％。进行了ABX测试，所提出的算法比基线算法要优先76.4％。在第三个测试中，根据语音转换输出的主观质量比较了这两种算法。提出的算法在平均意见得分（MOS）方面将主观输出质量提高了46.8％。

著录项

来源
《Computer speech and language》 |2006年第4期|p. 441-467|共27页
作者
Oytun Turk; Levent M. Arslan;
展开▼
作者单位

Electrical and Electronics Engineering Department, Bogazici University, Bebek, Istanbul, Turkey;

Electrical and Electronics Engineering Department, Bogazici University, Bebek, Istanbul, Turkey;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques [J] . Turk O., Schroder M. Audio, Speech, and Language Processing, IEEE Transactions on . 2010,第5期

机译：语音转换和复制再合成技术对表达性语音合成的评估
2. Statistical sequence-to-frame mapping techniques for voice conversion [J] . Yu QIAO, Daisuke SAITO, Nobuaki MINEMATSU 電子情報通信学会技術研究報告 . 2010,第373期

机译：统计序列到帧的映射技术，用于语音转换
3. Statistical sequence-to-frame mapping techniques for voice conversion [J] . Yu QIAO, Daisuke SAITO, Nobuaki MINEMATSU 電子情報通信学会技術研究報告 . 2010,第374期

机译：统计序列到帧的映射技术，用于语音转换
4. MODEL-MAPPING BASED VOICE CONVERSION SYSTEM: A Novel Approach to Improve Voice Similarity and Naturalness using Model-based Speech Synthesis Techniques [C] . Baojie Li, Dalei Wu, Hui Jiang International Conference on Bio-inspired Systems and Signal Processing . 2010

机译：基于模型映射的语音转换系统：一种新的方法，可以使用基于模型的语音合成技术提高语音相似性和自然的方法
5. Robust voice mining techniques for telephone conversations. [D] . Manocha, Sandeep. 2006

机译：用于电话对话的可靠语音挖掘技术。
6. Processing of Voiced and Unvoiced Acoustic Stimuli in Musicians [O] . Cyrill Guy Martin Ott, Nicolas Langer, Mathias S. Oechslin, 2011

机译：音乐家中有声和无声声刺激的处理
7. Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement [O] . Tomoki Toda, Mikihiro Nakagiri, Kiyohiro Shikano 2013

机译：用于身体传导清音增强的统计语音转换技术
8. Information Processing Techniques Program. Volume II. Wideband Integrated Voice/Data Technology. [R] . Gold, B. 1978

机译：信息处理技术计划。第二卷。宽带集成语音/数据技术。

Robust processing techniques for voice conversion

摘要

著录项

相似文献

相关主题

期刊订阅