首页> 外国专利> EXPRESSIVE TEXT-TO-SPEECH UTILIZING CONTEXTUAL WORD-LEVEL STYLE TOKENS

EXPRESSIVE TEXT-TO-SPEECH UTILIZING CONTEXTUAL WORD-LEVEL STYLE TOKENS

机译:使用上下文单词级样式标记的表达文本致辞

摘要

The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate expressive audio for input texts based on a word-level analysis of the input text. For example, the disclosed systems can utilize a multi-channel neural network to generate a character-level feature vector and a word-level feature vector based on a plurality of characters of an input text and a plurality of words of the input text, respectively. In some embodiments, the disclosed systems utilize the neural network to generate the word-level feature vector based on contextual word-level style tokens that correspond to style features associated with the input text. Based on the character-level and word-level feature vectors, the disclosed systems can generate a context-based speech map. The disclosed systems can utilize the context-based speech map to generate expressive audio for the input text.
机译:本公开涉及基于输入文本的字级分析生成用于输入文本的富有表达音频的系统,方法和非暂时性计算机可读介质。 例如,所公开的系统可以利用多通道神经网络,基于输入文本的多个字符和输入文本的多个字符来利用多通道神经网络生成字符级别特征向量和字级别特征向量 。 在一些实施例中,所公开的系统利用神经网络基于与输入文本相关联的样式特征来生成基于上下文的字级样式令牌来生成字级特征向量。 基于字符级和字级特征向量,所公开的系统可以生成基于上下文的语音映射。 所公开的系统可以利用基于上下文的语音映射来生成输入文本的富有表现力音频。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号