首页> 外文会议>Proceedings of the speech recognition workshop >Hub4 Language Modeling Using Domain Interpolation and Data Clustering

【24h】

Hub4 Language Modeling Using Domain Interpolation and Data Clustering

机译：使用域插值和数据聚类的Hub4语言建模

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In SRI's language modeling experiments for the Hub4 domain, three basic approaches were pursued: interpolating multiple models estimated from Hub4 and non-Hub4 training data, adapting the language model (LM) to the focus conditions, and adapting the LM to different topic types.rnIn the first approach, we built separate LMs for the closely transcribed Hub4 material (acoustic training transcripts) and the loosely transcribed Hub4 material (LM training data), as well as the North-American Business News (NABN) and Switchboard training data, projected onto the Hub4 vocabulary. By interpolating the probabilities obtained from these models, we obtained a 20% reduction in perplexity and a 1.8% reduction in word error rate, compared to a baseline Hub4-only language model.rnTwo adaptation approaches are also described: adapting language models to the speech styles correlated with different focus conditions, and building cluster-specific LM mixtures. These two approaches give some reduction in perplexity, but no significant reduction in word error.rnFinally, we identify the problems and future directions of our work.

机译：在SRI针对Hub4域的语言建模实验中，采用了三种基本方法：对从Hub4和非Hub4训练数据估计的多个模型进行插值，使语言模型（LM）适应焦点条件以及使LM适应不同的主题类型。 rn在第一种方法中，我们分别为紧密转录的Hub4材料（声学培训成绩单）和宽松转录的Hub4材料（LM培训数据）以及北美商业新闻（NABN）和总机培训数据构建了单独的LM。到Hub4词汇表上。通过对从这些模型中获得的概率进行插值，与仅使用Hub4的基本语言模型相比，我们获得了20％的困惑度降低和1.8％的单词错误率降低。还描述了两种适应方法：使语言模型适应语音样式与不同的聚焦条件相关，并建立特定于群集的LM混合。这两种方法可以减少一定的困惑，但不会显着减少单词错误。最后，我们确定了我们的问题和未来的发展方向。

著录项

来源
《Proceedings of the speech recognition workshop》|1997年|147-151|共5页
会议地点 Chantilly VA(US)
作者
Fuliang Weng; Andreas Stolcke; Ananth Sankar;
展开▼
作者单位

Speech Technology And Research Laboratory SRI International Menlo Park, California;

Speech Technology And Research Laboratory SRI International Menlo Park, California;

Speech Technology And Research Laboratory SRI International Menlo Park, California;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自然科学理论与方法论;自动模拟理论（自动仿真理论）;
关键词

相似文献

外文文献
中文文献
专利

1. Using out-of-domain data to improve in-domain language models [J] . Iyer R., Ostendorf M. IEEE signal processing letters . 1997,第8期

机译：使用域外数据改善域内语言模型
2. ModeL4CEP: Graphical domain-specific modeling languages for CEP domains and event patterns [J] . Boubeta-Puig Juan, Ortiz Guadalupe, Medina-Bulo Inmaculada Expert Systems with Application . 2015,第21期

机译：ModeL4CEP：针对CEP域和事件模式的图形化域特定建模语言
3. A Reference Model of CAD System Generation from Various Object Model-Based Specification Description Languages Specific to Individual Domains [J] . Lukman EFENDY, Masaaki HASHIMOTO, Keiichi KATAMINE IEICE Transactions on Information and Systems . 2000,第4期

机译：从各个对象专用的各种基于对象模型的规范描述语言生成CAD系统的参考模型
4. Hub4 Language Modeling Using Domain Interpolation and Data Clustering [C] . DARPA speech recognition workshop . 1997

机译：使用域插值和数据群集的Hub4语言建模
5. Rational function interpolation of electromagnetic transfer functions of high-speed interconnect systems from discrete time-domain and frequency-domain data. [D] . Moon, Se-Jung. 2009

机译：从离散时域和频域数据对高速互连系统的电磁传递函数进行有理函数插值。
6. Enhancing African low-resource languages: Swahili data for language modelling [O] . Casper S. Shikali, Refuoe Mokhosi 2020

机译：增强非洲低资源语言：语言建模的斯瓦希里语数据
7. Cross-domain Paraphrasing For Improving Language Modelling Using Out-of-domain Data [O] . Liu X, Gales Mark John, Woodland Philip Charles 2013

机译：使用域外数据改进语言建模的跨域释义

获取原文

客服邮箱：kefu@zhangqiaokeyan.com

京公网安备：11010802029741号 ICP备案号：京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有

客服微信
服务号