This paper presents some experiments with feature and score normalization for text-independent speaker verification of cellular data. The speaker verification system is based on cepstral features and Gaussian mixture models with 1024 components. The following methods, which have been proposed for feature and score normalization, are reviewed and evaluated on cellular data: cepstral mean subtraction (CMS), variance normalization, feature warping, T-norm, Z-norm and the cohort method. We found that the combination of feature warping and T-norm gives the best results on the NIST 2002 test data (for the one-speaker detection task). Compared to a baseline system using both CMS and variance normalization and achieving a 0.410 minimal decision cost function (DCF), feature warping and T-norm respectively bring 8% and 12% relative reductions, whereas the combination of both techniques yields a 22% relative reduction, reaching a DCF of 0.320. This result approaches the state-of-the-art performance level obtained for speaker verification with land-line telephone speech.
展开▼