Statistical Text-to-Speech Synthesis Based on Segment-Wise Representation With a Norm Constraint

Tiomkin S.; Malah D.; Shechtman S.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Statistical Text-to-Speech Synthesis Based on Segment-Wise Representation With a Norm Constraint

【24h】

Statistical Text-to-Speech Synthesis Based on Segment-Wise Representation With a Norm Constraint

机译：基于范数约束的分段明智表示的统计文本语音合成

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In statistical HMM-based text-to-speech systems (STTS), speech feature dynamics is modeled by first- and second-order feature frame differences, which, typically, do not satisfactorily represent frame to frame feature dynamics present in natural speech. The reduced dynamics results in over-smoothing of speech features, often sounding as muffled synthesized speech. In this correspondence, we propose a method to enhance a baseline STTS system by introducing a segment-wise model representation with a norm constraint. The segment-wise representation provides additional degrees of freedom in speech feature determination. We exploit these degrees of freedom for increasing the speech feature vector norm to match a norm constraint. As a result, statistically generated speech features are less over-smoothed, resulting in more natural sounding speech, as judged by listening tests.

机译：在基于统计HMM的文本语音转换系统（STTS）中，语音特征动力学是通过一阶和二阶特征帧差异建模的，该差异通常不能令人满意地表示自然语音中存在的帧到帧特征动态。降低的动态性会导致语音功能过分平滑，通常听起来像是含糊的合成语音。在这种对应关系中，我们提出了一种通过引入具有范数约束的分段模型表示来增强基线STTS系统的方法。分段表示在语音特征确定中提供了额外的自由度。我们利用这些自由度来增加语音特征向量范数以匹配范数约束。结果，通过听觉测试判断，统计生成的语音特征不太平滑，导致语音听起来更加自然。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2010年第5期|p.1077-1082|共6页
作者
Tiomkin S.; Malah D.; Shechtman S.;
展开▼
作者单位

Department of Electrical Engineering, Technion-I.I.T, Israel Institute of Technology, Haifa, Israel;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Segment-wise model representation; speech feature dynamics; statistical TTS; text-to-speech (TTS) synthesis;

机译：分段模型表示;语音特征动力学;统计TTS;文本到语音（TTS）综合;

相似文献

外文文献
中文文献
专利

1. Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis [J] . Xin WANG, Shinji TAKAKI, Junichi YAMAGISHI IEICE transactions on information and systems . 2016,第10期

机译：基于神经网络的文本语音合成中使用各种语言单元连续表示的研究
2. A Hybrid Text-to-Speech System That Combines Concatenative and Statistical Synthesis Units [J] . Tiomkin S., Malah D., Shechtman S., Audio, Speech, and Language Processing, IEEE Transactions on . 2011,第5期

机译：混合级联和统计综合单元的混合文本语音转换系统
3. Reducing footprint of unit selection based text-to-speech system using compressed sensing and sparse representation [J] . Pulkit Sharma, Vinayak Abrol, Nivedita, Computer speech and language . 2018,第NOVa期

机译：使用压缩感测和稀疏表示来减少基于单元选择的文本语音转换系统的占用空间
4. Syllable-level representations of suprasegmental features for DNN-based text-to-speech synthesis [C] . Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi Annual Conference of the International Speech Communication Association . 2016

机译：基于DNN的文本到语音合成的Suprace段特征的音节级表示
5. Representation, evaluation and editing of feature-based and constraint-based design. [D] . Chen, Xiangping. 1995

机译：基于特征和基于约束的设计的表示，评估和编辑。
6. Brain Lesion Segmentation Based on Joint Constraints of Low-Rank Representation and Sparse Representation [O] . Ting Ge, Ning Mu, Tianming Zhan, 2019

机译：基于低秩表示和稀疏表示的联合约束的脑部病变分割
7. Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis [O] . Wang, Xin, Takaki, Shinji, Yamagishi, Junichi 2016

机译：基于神经网络的文本语音合成中各种语言单元连续表示的研究

Statistical Text-to-Speech Synthesis Based on Segment-Wise Representation With a Norm Constraint

摘要

著录项

相似文献

相关主题

期刊订阅