首页> 外国专利> EXPRESSIVE TEXT-TO-SPEECH UTILIZING CONTEXTUAL WORD-LEVEL STYLE TOKENS

EXPRESSIVE TEXT-TO-SPEECH UTILIZING CONTEXTUAL WORD-LEVEL STYLE TOKENS

机译：使用上下文单词级样式标记的表达文本致辞

页面导航

摘要
著录项
相似文献

摘要

The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate expressive audio for input texts based on a word-level analysis of the input text. For example, the disclosed systems can utilize a multi-channel neural network to generate a character-level feature vector and a word-level feature vector based on a plurality of characters of an input text and a plurality of words of the input text, respectively. In some embodiments, the disclosed systems utilize the neural network to generate the word-level feature vector based on contextual word-level style tokens that correspond to style features associated with the input text. Based on the character-level and word-level feature vectors, the disclosed systems can generate a context-based speech map. The disclosed systems can utilize the context-based speech map to generate expressive audio for the input text.

机译：本公开涉及基于输入文本的字级分析生成用于输入文本的富有表达音频的系统，方法和非暂时性计算机可读介质。例如，所公开的系统可以利用多通道神经网络，基于输入文本的多个字符和输入文本的多个字符来利用多通道神经网络生成字符级别特征向量和字级别特征向量。在一些实施例中，所公开的系统利用神经网络基于与输入文本相关联的样式特征来生成基于上下文的字级样式令牌来生成字级特征向量。基于字符级和字级特征向量，所公开的系统可以生成基于上下文的语音映射。所公开的系统可以利用基于上下文的语音映射来生成输入文本的富有表现力音频。

著录项

公开/公告号US2022028367A1

专利类型
公开/公告日2022-01-27

原文格式PDF
申请/专利权人 ADOBE INC.;
展开▼

申请/专利号US202016934836
发明设计人 SUMIT SHEKHAR;GAUTAM CHOUDHARY;ABHILASHA SANCHETI;SHUBHANSHU AGARWAL;E SANTHOSH KUMAR;RAHUL SAXENA;
展开▼

申请日2020-07-21
分类号G10L13/047;G10L25/30;
国家 US
入库时间 2022-08-24 23:33:00

相似文献

专利
外文文献
中文文献