首页>
外国专利>
HYBRID PHONEME, DIPHONE, MORPHEME, AND WORD-LEVEL DEEP NEURAL NETWORKS
HYBRID PHONEME, DIPHONE, MORPHEME, AND WORD-LEVEL DEEP NEURAL NETWORKS
展开▼
机译:混合音素,双音素,语气和单词级深层神经网络
展开▼
页面导航
摘要
著录项
相似文献
摘要
An approach of hybrid frame, phone, diphone, morpheme, and word-level Deep Neural Networks (DNN) in model training and applications is described. The approach can be applied to many applications. The approach is based on a regular ASR system, which can be based on Gaussian Mixture Models (GMM) or DNN. In the first step, a regular ASR model is trained. All the training data (in the format of features) are aligned with the transcripts in terms of phonemes and words with the timing information. Feature normalization can be applied for these new features. Based on the alignment timing information, new features are formed in terms of phonemes, diphones, morphemes, and up to words. A first pass regular speech recognition is performed, and the result lattice is produced. In the lattice, there is the timing information for each word. A feature is then extracted and sent to the word-level DNN for scoring. If the word is not in the word-level DNN vocabulary, then a forced alignment is performed to get the timing information for each phoneme. Then features from these phonemes, diphones, and morphemes are sent to the corresponding DNNs for training. And these scores are combined to form the word level scores. In this way, the lattice is rescored, and a new recognition result is produced.
展开▼