The computer system 1 includes a speaker information estimation unit 130 that estimates speaker information of an unknown speaker based on acoustic features of the unknown speaker without requiring input of text as teacher data. The speaker information of the unknown speaker includes a speaker code representing the degree of similarity between the distribution of the acoustic feature of the unknown speaker and the distribution of the acoustic features of each of the plurality of known speakers as a probability. The computer system 1 uses the multi-speaker acoustic model (DNN) 230 to generate synthesized acoustic features of the unknown speaker based on the input language feature of the text and the speaker information of the unknown speaker. It further includes a synthetic acoustic feature quantity generation unit 220 for generating an amount, and a synthetic speech generation unit 240 for generating a synthesized speech of the unknown speaker based on the synthesized acoustic feature quantity of the unknown speaker.
展开▼