...
首页> 外文期刊>Computer speech and language >Genericity and portability for task-independent speech recognition
【24h】

Genericity and portability for task-independent speech recognition

机译:与任务无关的语音识别的通用性和可移植性

获取原文
获取原文并翻译 | 示例
           

摘要

As core speech recognition technology improves, opening up a wider range of applications, genericity and portability are becoming important issues. Most of todays recognition systems are still tuned to a particular task and porting the system to a new task (or language) requires a substantial investment of time and money, as well as human expertise. This paper addresses issues in speech recognizer portability and in the development of generic core speech recognition technology. First, the genericity of wide domain models is assessed by evaluating their performance on several tasks of varied complexity. Then, techniques aimed at enhancing the genericity of these wide domain models are investigated. Multi-source acoustic training is shown to reduce the performance gap between task-independent and task-dependent acoustic models, and for some tasks to out-perform task-dependent acoustic models. Transparent methods for porting generic models to a specific task are also explored. Transparent unsupervised acoustic model adaptation is contrasted with supervised adaptation, and incremental unsupervised adaptation of both the acoustic and linguistic models is investigated. Experimental results on a dialog task show that with the proposed scheme, a transparently adapted generic system can perform nearly as well (about a 1% absolute gap in word error rate) as a task-specific system trained on several tens of hours of manually transcribed data.
机译:随着核心语音识别技术的改进,打开了广泛的应用程序,通用性和可移植性已成为重要的问题。当今的大多数识别系统仍旧针对特定任务进行调整,并且要将系统移植到新任务(或语言)上需要大量时间和金钱以及人类专业知识的投入。本文讨论了语音识别器的可移植性以及通用核心语音识别技术的发展中的问题。首先,通过评估其在各种复杂程度不同的任务上的性能来评估广域模型的通用性。然后,研究了旨在增强这些广域模型通用性的技术。研究表明,多源声学训练可以减少任务无关和任务依赖的声学模型之间的性能差距,并且对于某些任务,其性能要优于任务相关的声学模型。还探讨了将通用模型移植到特定任务的透明方法。将透明的无监督声学模型适应与有监督适应进行对比,并研究了声学和语言模型的增量无监督适应。对话任务的实验结果表明,与所建议的方案相比,经过透明修改的通用系统可以执行与数十小时手动转录训练的任务特定系统几乎一样的性能(字错误率的绝对差距约为1%)数据。

著录项

  • 来源
    《Computer speech and language》 |2005年第3期|p. 345-363|共19页
  • 作者单位

    Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France;

    Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France;

    Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号