首页>
外国专利>
METHOD FOR GENERATING SPEAKER-ADAPTED SPEECH SYNTHESIZER MODEL WITH A FEW SAMPLES USING A FINE-TUNING BASED ON DEEP CONVOLUTIONAL NEURAL NETWORK AI
METHOD FOR GENERATING SPEAKER-ADAPTED SPEECH SYNTHESIZER MODEL WITH A FEW SAMPLES USING A FINE-TUNING BASED ON DEEP CONVOLUTIONAL NEURAL NETWORK AI
展开▼
机译:基于深度卷积神经网络的微调生成带有少量样本的说话人自适应语音合成器模型的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention relates to an artificial intelligence for synthesizing speech, and in particular, a method for generating a speech-suited speech synthesis model with a small amount of samples through fine-tuning based on deep synthetic neural network artificial intelligence, wherein the method comprises text Converting text into a number representing text information using an encoder (character embedding); Converting a target voice file into speaker embedding using a speaker encoder; Converting text embedding and speaker embedding into a context vector using linguistic knowledge, phoneme, and phoneme knowledge using personalized attention; Transforming a context vector into a predicted mel-spectrogram using an audio decoder; And generating a waveform-type voice file by using the predicted Mel-spectrogram and SR using the vocoder. Through the technique of generating a speaker-compatible speech synthesis model with a small amount of sample provided by the present invention, data required for a speaker-compatible speech synthesis model has been greatly reduced from about 5 hours to about 10 minutes. This saves time and money required to create a speech synthesis system.
展开▼