首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >An Approach to Proper Speech Segmentation for Quality Improvement in Concatenative Text-To-Speech System for Indian Languages
【24h】

An Approach to Proper Speech Segmentation for Quality Improvement in Concatenative Text-To-Speech System for Indian Languages

机译:适当的语音分割方法以提高印度语言的级联文本转语音系统的质量

获取原文
获取原文并翻译 | 示例
           

摘要

Most of the Indian-language Text-To-Speech (TTS) synthesis systems designed till date are based upon the concatenation of acoustic units. The prime challenge is the selection of proper units and their elegant concatenation. Due to the precincts of current automated techniques based on Hidden Markov Model (HMM) and Dynamic Time Warping (DTW), manual verification and labeling are often essential. Automatic placement of phoneme boundaries in a speech waveform using explicit statistical model for phoneme boundary is proposed in this paper. We are projecting the Harmonic plus Noise Model (HNM) in the first step and refine the boundary placement by searching for the best match in a region near the estimated boundary with predefined boundary model Technique like ESNOLA. This technique is applied for effective concatenation, which results in smooth output. Studies show that HNM is capable of synthesizing all vowels and diphones with good quality. This can remarkably reduce the size of the database. Further the pitch synchronous analysis and the Glottal Closure Instants (GCI) are accurately calculated. The quality of the synthesized speech improves if these units are obtained from the glottal signal rather than from processing the signal. The database has to be developed for VCV for all Indian languages as we have done for Oriya, one of the official languages of the Republic of India for our case study.
机译:迄今为止,大多数设计成印度语言的语音合成(TTS)合成系统都是基于声学单元的级联。首要的挑战是选择合适的单元及其优雅的串联。由于基于隐马尔可夫模型(HMM)和动态时间规整(DTW)的当前自动化技术的局限性,经常需要人工验证和标记。本文提出了使用显式音素边界统计模型在语音波形中自动定位音素边界的方法。我们将在第一步中投影谐波加噪声模型(HNM),并使用预定义的边界模型技术(如ESNOLA)在估算的边界附近搜索最佳匹配,从而优化边界位置。该技术适用于有效的级联,从而产生平滑的输出。研究表明,HNM能够合成高质量的所有元音和双音素。这可以显着减少数据库的大小。此外,还可以准确计算出音高同步分析和声门闭合瞬间(GCI)。如果从声门信号而不是从信号处理中获得这些单位,则合成语音的质量将会提高。就像我们为Oriya(案例研究中印度共和国的官方语言之一)所做的一样,必须为所有印度语言的VCV开发数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号