In this paper, we have conducted a comparative study on several confidence measures (CMs) for large vocabulary speech recognition. Firstly, we propose a novel high-level CM that is based on the inter-word mutual information (MI). Secondly, we experimentally investigate several popular low-level CMs, such as word posterior probabilities, N-best counting, Likelihood Ratio Testing (LRT), etc. Finally, we have studied a simple linear interpolation strategy to combine the best low-level CMs with the best high-level CMs. All of these CMs are examined in two large vocabulary ASR tasks, namely the Switchboard task and a mandarin dictation task, to verify the recognition errors in baseline recognition systems. Experimental results show: 1) the proposed Mi-based CMs greatly surpass another existing high-level CMs which are based on the LSA technique; 2) Among all low-level CMs, word posteriori probabilities give the best verification performance; 3) When combining the word posteriori probabilities with the Mi-based CMs, the equal error rate is reduced from 24.4% to 23.9% in the Switchboard task and from 17.5% to 16.2% in the mandarin dictation task.
展开▼