We describe several language and pronunciation modeling techniques that were applied to the 1996 Hub 4 Broadcast News transcription task. These include topic adaptation, the use of remote corpora, vocabulary size optimization, n-gram cutoff optimization, modeling of spontaneous speech, handling of unknown linguistic boundaries, higher order n-grams, weight optimization in rescoring, and lexical modeling of phrases and acronyms.
展开▼