Modelling the temporal structure of newsreaders' speech on neural networks for Estonian text-to-speech synthesis

机译：建模Newsreaders在爱沙尼亚语文字综合性神经网络中的言论语音

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Generation of natural-sounding synthetic speech from a text requires perfect control over the temporal structure of speech flow. The present paper describes an attempt to replace the rule-based durational model, hitherto used in Estonian text-to-speech synthesis, by neural networks (NN). For this aim, fluent speech of radio announcers and newsreaders was analysed and its temporal structure was modelled on neural networks. Analysis of pauses in extended material revealed that if a text is read out with a normal speech rate, it is quite possible to classify the pauses made, so that the results can be used in speech synthesis. For sound durations, certain characteristics of phone context as well as certain syllable-level features were found to be the relevant input for an NN algorithm. For models of pause durations and positions, however, the prevalent features were variables characterizing text structure (punctuation marks and conjunctions).

机译：从文本中产生自然探测的合成语音需要完全控制语音流量的时间结构。本文介绍了通过神经网络（NN）代替迄今为止在爱沙尼亚语文本与语音合成中使用的规则的持久模型。为此目的，分析了广播播音员和NewsReaders的流利演讲，其时间结构在神经网络上进行了建模。扩展材料中暂停的分析显示，如果以正常的语音速率读出文本，则很有可能对所做的暂停进行分类，因此可以在语音合成中使用结果。对于声音持续时间，发现电话上下文的某些特征以及某些音节级别功能是NN算法的相关输入。然而，对于暂停持续时间和位置的模型，普遍的功能是表征文本结构的变量（标点符号和连词）。

著录项

来源
《International Conference on Speech and Computer》|2006年||共4页
会议地点
作者
Mark Fishel; Meelis Mihkla;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN911-53;
关键词

相似文献

外文文献
中文文献
专利

1. Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks [J] . Reddy V. Ramu, Rao K. Sreenivasa Neurocomputing . 2016,第JANa1期

机译：使用前馈神经网络进行基于音节的语音合成的韵律建模
2. Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis [J] . V. Ramu Reddy, K. Sreenivasa Rao Computer speech and language . 2013,第5期

机译：使用前馈神经网络的两阶段音调建模，用于基于音节的文本到语音合成
3. F0 Contour Modeling for Arabic Text-to-Speech Synthesis Using Fujisaki Parameters and Neural Networks [J] . Fatouma Boukadida, Noureddine Ellouze, Zied Mnasri Signal Processing: An International Journal . 2011,第6期

机译：使用Fujisaki参数和神经网络的F0轮廓建模，用于阿拉伯文本到语音的合成
4. Modelling the temporal structure of newsreaders' speech on neural networks for Estonian text-to-speech synthesis [C] . Mark Fishel, Meelis Mihkla International Conference on Speech and Computer . 2006

机译：建模Newsreaders在爱沙尼亚语文字综合性神经网络中的言论语音
5. Modeling temporal coordination in speech production using an artificial central pattern generator neural network. [D] . Rusaw, Erin Christine. 2013

机译：使用人工中央模式生成器神经网络对语音产生中的时间协调进行建模。
6. Human EEG and Recurrent Neural Networks Exhibit Common Temporal Dynamics During Speech Recognition [O] . Saeedeh Hashemnia, Lukas Grasse, Shweta Soni, 2021

机译：人体EEG和经常性神经网络在语音识别期间表现出共同的时间动态
7. Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks [O] . Valentini Botinhao, Cassia, Wang, Xin, Takaki, Shinji, 2016

机译：使用深度递归神经网络的噪声鲁棒文本到语音合成系统的语音增强
8. Text-To-Speech Phrasing Enhancement System Using Neural Networks [R] . Julig, L. F. 1995

机译：基于神经网络的文本语音语音增强系统

Modelling the temporal structure of newsreaders' speech on neural networks for Estonian text-to-speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅