The paper presents a study on the effect of different methods of coding the STRAIGHT aperiodicity coefficients and models of the vocal tract on the quality of synthetic speech generated using HMMs. Three different coding schemes were implemented in the HTS synthesis system: the classic coding of the mean value in five frequency sub-bands, Mel-cepstral coefficients, and a simple unit selection method. The effect of removing the energy and spectral tilt from the speech spectrum, and modeling them independently from the vocal tract was also studied. Five systems were trained using the ARCTIC_SLT database to test the proposed methods. The synthetic voices were evaluated in three subjective listening tests.
展开▼