Natural-sounding speech synthesizers requires the information from a model quantitatively describing prosody. Fujisaki's model [1] has shown considerable accuracy on many languages [4] [6]. We propose a method for Fujisaki's model parameters estimation, i.e. an inversion methods, based on relative extremes of pitch contour and a gradient algorithm refinement procedure. Preliminary results show excellent performance of the proposed method in matching the pitch contours. Preliminary results of synthesis making use of obtained features are surely encouraging.
展开▼