首页> 外文会议>International Conference on Speech and Computer >Modelling the temporal structure of newsreaders' speech on neural networks for Estonian text-to-speech synthesis
【24h】

Modelling the temporal structure of newsreaders' speech on neural networks for Estonian text-to-speech synthesis

机译:建模Newsreaders在爱沙尼亚语文字综合性神经网络中的言论语音

获取原文

摘要

Generation of natural-sounding synthetic speech from a text requires perfect control over the temporal structure of speech flow. The present paper describes an attempt to replace the rule-based durational model, hitherto used in Estonian text-to-speech synthesis, by neural networks (NN). For this aim, fluent speech of radio announcers and newsreaders was analysed and its temporal structure was modelled on neural networks. Analysis of pauses in extended material revealed that if a text is read out with a normal speech rate, it is quite possible to classify the pauses made, so that the results can be used in speech synthesis. For sound durations, certain characteristics of phone context as well as certain syllable-level features were found to be the relevant input for an NN algorithm. For models of pause durations and positions, however, the prevalent features were variables characterizing text structure (punctuation marks and conjunctions).
机译:从文本中产生自然探测的合成语音需要完全控制语音流量的时间结构。本文介绍了通过神经网络(NN)代替迄今为止在爱沙尼亚语文本与语音合成中使用的规则的持久模型。为此目的,分析了广播播音员和NewsReaders的流利演讲,其时间结构在神经网络上进行了建模。扩展材料中暂停的分析显示,如果以正常的语音速率读出文本,则很有可能对所做的暂停进行分类,因此可以在语音合成中使用结果。对于声音持续时间,发现电话上下文的某些特征以及某些音节级别功能是NN算法的相关输入。然而,对于暂停持续时间和位置的模型,普遍的功能是表征文本结构的变量(标点符号和连词)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号