An Approach to Proper Speech Segmentation for Quality Improvement in Concatenative Text-To-Speech System for Indian Languages

SANGHAMITRA MOHANTY; SUMAN BHATTACHARYA; SUMIT BOSE; SABYASACHI SWAIN

首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >An Approach to Proper Speech Segmentation for Quality Improvement in Concatenative Text-To-Speech System for Indian Languages

【24h】

An Approach to Proper Speech Segmentation for Quality Improvement in Concatenative Text-To-Speech System for Indian Languages

机译：适当的语音分割方法以提高印度语言的级联文本转语音系统的质量

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of the Indian-language Text-To-Speech (TTS) synthesis systems designed till date are based upon the concatenation of acoustic units. The prime challenge is the selection of proper units and their elegant concatenation. Due to the precincts of current automated techniques based on Hidden Markov Model (HMM) and Dynamic Time Warping (DTW), manual verification and labeling are often essential. Automatic placement of phoneme boundaries in a speech waveform using explicit statistical model for phoneme boundary is proposed in this paper. We are projecting the Harmonic plus Noise Model (HNM) in the first step and refine the boundary placement by searching for the best match in a region near the estimated boundary with predefined boundary model Technique like ESNOLA. This technique is applied for effective concatenation, which results in smooth output. Studies show that HNM is capable of synthesizing all vowels and diphones with good quality. This can remarkably reduce the size of the database. Further the pitch synchronous analysis and the Glottal Closure Instants (GCI) are accurately calculated. The quality of the synthesized speech improves if these units are obtained from the glottal signal rather than from processing the signal. The database has to be developed for VCV for all Indian languages as we have done for Oriya, one of the official languages of the Republic of India for our case study.

机译：迄今为止，大多数设计成印度语言的语音合成（TTS）合成系统都是基于声学单元的级联。首要的挑战是选择合适的单元及其优雅的串联。由于基于隐马尔可夫模型（HMM）和动态时间规整（DTW）的当前自动化技术的局限性，经常需要人工验证和标记。本文提出了使用显式音素边界统计模型在语音波形中自动定位音素边界的方法。我们将在第一步中投影谐波加噪声模型（HNM），并使用预定义的边界模型技术（如ESNOLA）在估算的边界附近搜索最佳匹配，从而优化边界位置。该技术适用于有效的级联，从而产生平滑的输出。研究表明，HNM能够合成高质量的所有元音和双音素。这可以显着减少数据库的大小。此外，还可以准确计算出音高同步分析和声门闭合瞬间（GCI）。如果从声门信号而不是从信号处理中获得这些单位，则合成语音的质量将会提高。就像我们为Oriya（案例研究中印度共和国的官方语言之一）所做的一样，必须为所有印度语言的VCV开发数据库。

著录项

来源
《International Journal of Computer Processing of Oriental Languages》 |2005年第1期|p.41-51|共11页
作者
SANGHAMITRA MOHANTY; SUMAN BHATTACHARYA; SUMIT BOSE; SABYASACHI SWAIN;
展开▼
作者单位

RC-ILTS-Oriya, P.G Department of Computer Science, Utkal University, Bhubaneswar, Orissa, India-751004;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
HMM; DTW; HNM; GCI; ESNOLA; phase space;

机译：HMM;DTW;HNM;GCI;ESNOLA;相空间;

相似文献

外文文献
中文文献
专利

1. Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthesis Method With The Syllable [J] . Sangramsing N. Kayte, Monica Mundada, Dr. Charansing N. Kayte, International Journal of Engineering Research and Applications . 2015,第11期

机译：基于音节的级联综合方法构建马拉地语语音转换系统
2. Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders [J] . AKEMI IIDA, NICK CAMPBELL International journal of speech technology . 2003,第4期

机译：沟通障碍者级联文本语音转换系统的语音数据库设计
3. An efficient model for text-to-speech synthesis in Indian languages [J] . Soumya Priyadarsini Panda, Ajit Kumar Nayak International journal of speech technology . 2015,第3期

机译：印度语言中文本到语音合成的有效模型
4. A Rule-Based Concatenative Approach to Speech Synthesis in Indian Language Text-to-Speech Systems [C] . Soumya Priyadarsini Panda, Ajit Kumar Nayak International Conference on Intelligent Computing, Communication and Devices . 2015

机译：印度语言文本到语音系统中语音合成的基于规则的连接方法
5. Improving high quality concatenative text-to-speech synthesis using the circular linear prediction model. [D] . Shukla, Sunil Ravindra. 2007

机译：使用圆形线性预测模型改善高质量的串联文本到语音合成。
6. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems [O] . BETH G. GREENE, JOHN S. LOGAN, DAVID B. PISONI -1

机译：规则自动产生的合成语音的感知：八个文本到语音系统的可理解性
7. Significance of knowledge sources for a text-to-speech system for Indian languages [O] . B Yegnanarayana, S Rajendran, V R Ramachandran, 1994

机译：知识来源对印度语言文本到语音系统的意义

An Approach to Proper Speech Segmentation for Quality Improvement in Concatenative Text-To-Speech System for Indian Languages

摘要

著录项

相似文献

相关主题

期刊订阅