首页> 美国卫生研究院文献>Scientific Reports >Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
【2h】

Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

机译:以语音包络作为时间参考的自动语音识别的大脑启发式语音分割

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.
机译:语音分割是自动语音识别中的关键步骤,因为对每个成帧的语音片段都执行了额外的语音分析。为了简化计算,传统的分段技术主要使用固定帧大小对语音进行分段。但是,这种方法不足以捕获语音的准规则结构,这会在嘈杂的环境中导致严重的识别失败。在任何情况下,大脑如何处理准规则的结构化语音并保持较高的识别性能?最近的神经生理学研究表明,听觉皮层中神经元振荡的相位通过在不同的时间尺度上将语音分段分成较小的单元,有助于准确的语音识别。最近已经获得了神经元振动和语音包络之间的锁相关系,这表明语音包络为多时标语音片段信息提供了基础。在这项研究中,我们定量地研究了语音包络作为使用其瞬时相位信息的分段语音的潜在时间参考的作用。我们通过在各种嘈杂环境中获得的信息增益和识别性能来评估所提出的方法。结果表明,提出的分割方案不仅从语音中提取了更多信息,而且在识别测试中提供了更高的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号