Topic Mining based on Word Posterior Probability in Spoken Document

Lei Zhang; Guo-xing Chen; Xue-zhi Xiang; Jing-xin Chang

首页> 外文期刊>Journal of Computers >Topic Mining based on Word Posterior Probability in Spoken Document

【24h】

Topic Mining based on Word Posterior Probability in Spoken Document

机译：语音文档中基于词后验概率的主题挖掘

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

For speech recognition system, there are three kinds of result representations as one-best, N-best and Lattice. Since lattice has multi-path which can reduce the effect of recognition error rate, it is widely applied nowadays. In fact, there are amount of redundancies in lattice, which leads to the increasing of complexity of latter algorithm based on it. Additionally, for the decoding algorithm, it is acted as maximum a posterior probability (MAP) which can only guarantee the posterior probability of the whole sentence is of maximum. For MAP does not mean the highest syllable recognition rate, here, confusion network is introduced in topic mining system. In the clustering during confusion network, the minimum word error rule is adopted, which is proper to topic mining system since the least meaningful unit is word in Chinese and word information is most important in topic mining. In this paper, a simplified confusion network generation algorithm is proposed to handle some problems caused by insertion error during recognition. Then based on the confusion network, a word list extraction approach is proposed, in which, the dictionary is adopted to judge whether the consecutive arc in confusion sets is a word. At this stage, the error word information produced by error recognition rate can be corrected to some extent. After the competition part in word list extraction on confusion network, a final word list with posterior probability can be obtained. Furthermore, this kind of posterior probability can be combined in topic mining system. SVD and NMF are adopted here to decompose the term-document matrix on the word list of confusion network. From the experiments, it can be drawn that the proposed approach based on confusion network can achieve better performance than that of one-best and N-best. Additionally, the modified weight which combined posterior probability into term-document matrix can further improve the system performance.

机译：对于语音识别系统，有三种结果表示形式：最佳，N最佳和格。由于晶格具有多路径，可以降低识别错误率的影响，因此在当今已得到广泛应用。实际上，晶格中存在大量冗余，这导致后一种基于它的算法的复杂性增加。另外，对于解码算法，它充当最大后验概率（MAP），只能保证整个句子的后验概率最大。由于MAP并不意味着最高的音节识别率，因此在主题挖掘系统中引入了混淆网络。在混淆网络的聚类中，采用了最小单词错误规则，该规则适用于主题挖掘系统，因为最不有意义的单位是中文单词，而单词信息在主题挖掘中最为重要。本文提出了一种简化的混淆网络生成算法，以解决识别过程中由于插入错误引起的一些问题。然后在混淆网络的基础上，提出了一种词表提取方法，该方法采用字典来判断混淆集中的连续弧是否是一个词。在这一阶段，可以在一定程度上校正由错误识别率产生的错误词信息。通过在混淆网络中抽取词表中的竞争部分后，可以获得具有后验概率的最终词表。此外，这种后验概率可以在主题挖掘系统中进行组合。这里采用SVD和NMF分解混淆网络词表上的术语文档矩阵。从实验中可以看出，所提出的基于混淆网络的方法可以实现比最佳和最佳的性能更好的性能。另外，将后验概率结合到期限文档矩阵中的改进权重可以进一步提高系统性能。

著录项

来源
《Journal of Computers》 |2011年第11期|共8页
作者
Lei Zhang; Guo-xing Chen; Xue-zhi Xiang; Jing-xin Chang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
topic miningspoken documentposterior probabilityconfusion networkmodified weight;

机译：主题挖掘说话文档后验概率混淆网络修改权重;

相似文献

外文文献
中文文献
专利

1. Topic Mining based on Word Posterior Probability in Spoken Document [J] . Lei Zhang, Guo-xing Chen, Xue-zhi Xiang, Journal of software . 2011,第11期

机译：语音文档中基于词后验概率的主题挖掘
2. Topic Mining based on Word Posterior Probability in Spoken Document [J] . Lei Zhang, Guo-xing Chen, Xue-zhi Xiang, Journal of software . 2011,第11期

机译：语音文档中基于词后验概率的主题挖掘
3. Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection [J] . Haiyang LI, Tieran ZHENG, Guibin ZHENG, IEICE transactions on information and systems . 2014,第3期

机译：基于上下文一致性的词汇出现概率和主题自适应度的置信度测度
4. ANALYTICAL COMPARISON BETWEEN POSITION SPECIFIC POSTERIOR LATTICES AND CONFUSION NETWORKS BASED ON WORDS AND SUBWORD UNITS FOR SPOKEN DOCUMENT INDEXING [C] . Yi-cheng Pan, Hung-lin Chang, Lin-shan Lee IEEE Workshop on Automatic Speech Recognition and Understanding . 2007

机译：基于单词和次字单元的位置特定后晶格和混淆网络之间的分析比较
5. Connecting Documents, Words, and Languages Using Topic Models [D] . Yang, Weiwei. 2019

机译：使用主题模型连接文档，单词和语言
6. Effects of Phonotactic Probabilities on the Processing of Spoken Words and Nonwords by Adults with Cochlear Implants Who Were Postlingually Deafened [O] . Michael S. Vitevitch, David B. Pisoni, Karen Iler Kirk, -1

机译：语音策略对成年后耳聋的人工耳蜗成年人口语和非单词处理的影响
7. Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing [O] . Yi-cheng Pan, Hung-lin Chang, Lin-shan Lee 2007

机译：基于单词和子单元的语音文档索引的位置特定后格与混淆网络的分析比较

Topic Mining based on Word Posterior Probability in Spoken Document

摘要

著录项

相似文献

相关主题

期刊订阅