A robust voice activity detection (VAD) is a prerequisite for many speech based applications like speech recognition. We investigated two VAD techniques that use time domain and frequency domain characteristics of speech signal. The temporal characteristic of the autocorrelation lag is able to discriminate speech and nonspeech regions. In the frequency domain, peak value of the magnitude spectrum in different sub-bands is used for VAD. Performance of the proposed methods are evaluated on TIMIT database with noises from NOISEX-92 database at various signal-to-noise ratio (SNR) levels. From the experimental results, it is observed that VAD based on autocorrelation lag is working consistently better than the maximum peak value of the autocorrelation function based method. However, it performs inferior compared to our second approach and AMR-VAD2. Our second approach i.e., VAD based on maximum spectral amplitude in sub-bands outperforms AMR-VAD2 and Sohn VAD for some noise conditions. Moreover, it is shown that a threshold independent of noises and their levels can be selected in the proposed method.
展开▼