首页> 美国卫生研究院文献>Scientific Reports >Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

【2h】

Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

机译：以语音包络作为时间参考的自动语音识别的大脑启发式语音分割

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

机译：语音分割是自动语音识别中的关键步骤，因为对每个成帧的语音片段都执行了额外的语音分析。为了简化计算，传统的分段技术主要使用固定帧大小对语音进行分段。但是，这种方法不足以捕获语音的准规则结构，这会在嘈杂的环境中导致严重的识别失败。在任何情况下，大脑如何处理准规则的结构化语音并保持较高的识别性能？最近的神经生理学研究表明，听觉皮层中神经元振荡的相位通过在不同的时间尺度上将语音分段分成较小的单元，有助于准确的语音识别。最近已经获得了神经元振动和语音包络之间的锁相关系，这表明语音包络为多时标语音片段信息提供了基础。在这项研究中，我们定量地研究了语音包络作为使用其瞬时相位信息的分段语音的潜在时间参考的作用。我们通过在各种嘈杂环境中获得的信息增益和识别性能来评估所提出的方法。结果表明，提出的分割方案不仅从语音中提取了更多信息，而且在识别测试中提供了更高的鲁棒性。

著录项

期刊名称 Scientific Reports
作者
Byeongwook Lee; Kwang-Hyun Cho;
展开▼
作者单位

展开▼
年(卷),期 -1(6),-1
年度 -1
页码 37647
总页数 12
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference [J] . Byeongwook Lee, Kwang-Hyun Cho Scientific reports. . 2016,第1期

机译：以语音包络作为时间参考的自动语音识别的大脑启发式语音分割
2. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [J] . Xugang Lu, Masashi Unoki, Masato Akagi Acoustical science and technology . 2008,第6期

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器
3. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [J] . Masashi Unoki, Masato Akagi, Xugang Lu Acoustical science and technology . 2008,第6期

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器
4. A cross-channel modeling approach for automatic segmentation of conversational telephone speech automatic speech recognition applications [C] . Daben Liu, Kubala, F. . 2003

机译：用于会话电话语音自动分段的跨通道建模方法自动语音识别应用程序
5. An automatic speech recognition oriented study on segmentation, low dimensional feature extraction, and temporal trajectory information capture. [D] . Zhu, Yonggang. 2002

机译：面向语音识别的自动研究，涉及分割，低维特征提取和时间轨迹信息捕获。
6. Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise [O] . Guangxin Hu, Sarah C. Determan, Yue Dong, 2020

机译：用于噪声中的人类和自动语音识别的光谱和颞包络线
7. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [O] . Lu, Xugang, Unoki, Masashi, Akagi, Masato 2008

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器

Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

摘要

著录项

相似文献

相关主题

期刊订阅