Genericity and portability for task-independent speech recognition

Fabrice Lefevre; Jean-Luc Gauvain; Lori Lamel

首页> 外文期刊>Computer speech and language >Genericity and portability for task-independent speech recognition

【24h】

Genericity and portability for task-independent speech recognition

机译：与任务无关的语音识别的通用性和可移植性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As core speech recognition technology improves, opening up a wider range of applications, genericity and portability are becoming important issues. Most of todays recognition systems are still tuned to a particular task and porting the system to a new task (or language) requires a substantial investment of time and money, as well as human expertise. This paper addresses issues in speech recognizer portability and in the development of generic core speech recognition technology. First, the genericity of wide domain models is assessed by evaluating their performance on several tasks of varied complexity. Then, techniques aimed at enhancing the genericity of these wide domain models are investigated. Multi-source acoustic training is shown to reduce the performance gap between task-independent and task-dependent acoustic models, and for some tasks to out-perform task-dependent acoustic models. Transparent methods for porting generic models to a specific task are also explored. Transparent unsupervised acoustic model adaptation is contrasted with supervised adaptation, and incremental unsupervised adaptation of both the acoustic and linguistic models is investigated. Experimental results on a dialog task show that with the proposed scheme, a transparently adapted generic system can perform nearly as well (about a 1% absolute gap in word error rate) as a task-specific system trained on several tens of hours of manually transcribed data.

机译：随着核心语音识别技术的改进，打开了广泛的应用程序，通用性和可移植性已成为重要的问题。当今的大多数识别系统仍旧针对特定任务进行调整，并且要将系统移植到新任务（或语言）上需要大量时间和金钱以及人类专业知识的投入。本文讨论了语音识别器的可移植性以及通用核心语音识别技术的发展中的问题。首先，通过评估其在各种复杂程度不同的任务上的性能来评估广域模型的通用性。然后，研究了旨在增强这些广域模型通用性的技术。研究表明，多源声学训练可以减少任务无关和任务依赖的声学模型之间的性能差距，并且对于某些任务，其性能要优于任务相关的声学模型。还探讨了将通用模型移植到特定任务的透明方法。将透明的无监督声学模型适应与有监督适应进行对比，并研究了声学和语言模型的增量无监督适应。对话任务的实验结果表明，与所建议的方案相比，经过透明修改的通用系统可以执行与数十小时手动转录训练的任务特定系统几乎一样的性能（字错误率的绝对差距约为1％）数据。

著录项

来源
《Computer speech and language》 |2005年第3期|p. 345-363|共19页
作者
Fabrice Lefevre; Jean-Luc Gauvain; Lori Lamel;
展开▼
作者单位

Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France;

Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France;

Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. USING FUZZY SETS TO MODEL PARALINGUISTIC CONTENT IN SPEECH AS A GENERIC SOLUTION FOR CURRENT PROBLEMS IN SPEECH RECOGNITION AND SPEECH SYNTHESIS [J] . SACHIN LAKRA, T. V. PRASAD, G. RAMAKRISHNA Journal of Theoretical and Applied Information Technology . 2015,第3期

机译：语音识别和语音合成中当前问题的通用解决方案：使用模糊集建模语音中的参数内容
2. SPEECH RECOGNITION USING GENERIC SHORTEST DISTANCE ALGORITHM [J] . Mahak Dureja, Sumanlata Gautam Indian Journal of Computer Science and Engineering . 2016,第3期

机译：使用最短距离算法进行语音识别
3. A Generic and Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator Using Logic on Memory [J] . Bapat O.A., Franzon P.D., Fastow R.M. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2014,第12期

机译：基于内存逻辑的大型声学模型和大型词汇语音识别加速器的通用可扩展体系结构
4. Improving Genericity for Task-Independent Speech Recognition [C] . Fabrice Lefevre, Jean-Luc Gauvain, Lori Lamel European conference on speech communication and technology . 2001

机译：改善独立任务语音识别的常见性
5. A Generic, Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator. [D] . Bapat, Ojas Ashok. 2013

机译：用于大型声学模型和大型词汇语音识别加速器的通用可扩展体系结构。
6. Recognition of time-compressed speech does not predict recognition of natural fast-rate speech by older listeners [O] . Sandra Gordon-Salant, Danielle J. Zion, Carol Espy-Wilson -1

机译：时间压缩语音的识别无法预测年长听众对自然快速语音的识别
7. Low-cost portable text recognition and speech synthesis with generic software, l [O] . Lahti Lauri, Kurhila Jaakko 2007

机译：使用通用软件的低成本便携式文本识别和语音合成
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Genericity and portability for task-independent speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅