首页> 外文期刊>Computer speech and language >Identification of related languages from spoken data: Moving from off-line to on-line scenario
【24h】

Identification of related languages from spoken data: Moving from off-line to on-line scenario

机译:识别来自口头数据的相关语言:从离线移动到在线方案

获取原文
获取原文并翻译 | 示例
           

摘要

The accelerating flow of information we encounter around the world today makes many companies deploy speech recognition systems that, to an ever-growing extent, process data on-line rather than off-line. These systems, e.g., for real-time 24/7 broadcast transcription, often work with input-stream data containing utterances in more than one language. This multilingual data can correctly be transcribed in real-time only if the language used is identified with just a small latency for each input frame. For this purpose, a novel approach to online spoken language identification is proposed in this work. Its development is documented within a series of consecutive experiments starting in the off-line mode for 11 Slavic languages, going through artificially prepared multilingual data for the on-line scenario, and ending with real bilingual TV programs containing utterances in mutually similar Czech and Slovak. The resulting scheme that we propose operates frame-by-frame; it takes in a multilingual stream of speech frames and outputs a stream of the corresponding language labels. It utilizes a weighted finite-state transducer as a decoder, which smooths the output from a language classifier fed by multilingual and augmented bottleneck features. An essential factor from the accuracy point of view is that these features, as well as the classifier itself, are based on deep neural network architectures that allow the modeling of long-term time dependencies. The obtained results show that our scheme allows us to determine the language spoken in real-world bilingual TV shows with an average latency of around 2.5 seconds and with an increase in word error rate by a mere 2.9% over the reference 18.1% value yielded by using manually prepared language labels.
机译:我们今天遇到世界各地的信息流程使许多公司部署了语音识别系统,以不断增长的程度,处理数据在线而不是离线。这些系统,例如,用于实时24/7广播转录,通常使用包含多种语言中的液化的输入流数据。只有当使用的语言被每个输入帧的小延迟识别使用的语言时,才能正确转录此多语言数据。为此目的,在这项工作中提出了一种新的在线口语识别方法。它的开发在一系列连续实验中记录了11个斯拉夫语言的离线模式,通过人工制备的在线情景的多语言数据,并以互相类似的捷克和斯洛伐克的话语在一起结束。 。我们提出的结果方案操作逐帧;它采用多语言语音帧流,并输出相应的语言标签的流。它利用加权有限状态换能器作为解码器,其将来自多语言和增强瓶颈特征馈送的语言分类器的输出平滑。从精度的观点来看,这些特征以及分类器本身的基本因素基于深度神经网络架构,允许长期时间依赖性建模。所获得的结果表明,我们的计划允许我们确定现实世界双语电视节目中所说的语言,平均延迟大约2.5秒,并且在参考的参考18.1%的值中仅为2.9%的单词错误率增加18.1%值使用手动准备的语言标签。

著录项

  • 来源
    《Computer speech and language》 |2021年第7期|101180.1-101180.19|共19页
  • 作者单位

    Faculty of Mechatronics Informatics and Interdisciplinary Studies Technical University of Liberec Studentska 2 Liberec 461 17 Czech Republic;

    Faculty of Mechatronics Informatics and Interdisciplinary Studies Technical University of Liberec Studentska 2 Liberec 461 17 Czech Republic;

    Faculty of Mechatronics Informatics and Interdisciplinary Studies Technical University of Liberec Studentska 2 Liberec 461 17 Czech Republic;

    Faculty of Mechatronics Informatics and Interdisciplinary Studies Technical University of Liberec Studentska 2 Liberec 461 17 Czech Republic;

    Faculty of Mechatronics Informatics and Interdisciplinary Studies Technical University of Liberec Studentska 2 Liberec 461 17 Czech Republic;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Spoken language identification; Deep neural networks; Weighted finite-state transducers; On-line processing; Slavic languages;

    机译:口语语言识别;深神经网络;加权有限状态换能器;在线加工;斯拉夫语言;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号