A Sequential Contrastive Learning Framework for Robust Dysarthric Speech Recognition

机译：一种稳健的烦恼性语音识别的顺序对比学习框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Dysarthria is a manifestation of disruption in the neuromuscular physiology resulting in uneven, slow, slurred, harsh, or quiet speech. Despite the remarkable progress of automatic speech recognition (ASR), it poses great challenges in developing stable ASR for dysarthric individuals due to the high intra- and inter-speaker variations and data deficiency. In this paper, we propose a contrastive learning framework for robust dysarthric speech recognition (DSR) by capturing the dysarthric speech variability. Several speech data augmentation strategies are explored to form two branches of the framework, meanwhile alleviating the scarcity of dysarthria data. We also develop an efficient projection head acting on a sequence of learned hidden representations for defining contrastive loss. Experiment results on DSR demonstrate that the model is better than or comparable to the supervised baseline.

机译：扰动性是神经肌肉生理中断的表现，导致不均匀，缓慢，腐败，苛刻或令人安静的演讲。尽管自动语音识别（ASR）取得了显着进展，但由于扬声器和扬声器间变化和数据缺陷的高度和数据缺陷，在发育性稳定的人中发育稳定的ASR挑战。在本文中，我们通过捕获发狂语音变异性来提出坚固的缺陷语音识别（DSR）的对比学习框架。探索了几种语音数据增强策略来形成框架的两个分支，同时减轻了扰动数据的稀缺性。我们还开发了一个有效的投影头，用于定义对比损失的一系列学习隐藏表示。 DSR上的实验结果表明，该模型优于或与监督基线更好。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2021年|7303-7307|共5页
会议地点
作者
Lidan Wu; Daoming Zong; Shiliang Sun; Jing Zhao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Neuromuscular; Conferences; Speech recognition; Signal processing; Physiology; Acoustics; Speech processing;

机译：神经肌肉;会议;语音识别;信号处理;生理学;声学;语音处理;

相似文献

外文文献
中文文献
专利

1. Generative Model-Driven Feature Learning for dysarthric speech recognition [J] . Rajeswari N., Chandrakala S. Biocybernetics and biomedical engineering . 2016,第4期

机译：生成模型驱动的特征学习，用于构音障碍语音识别
2. An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech [J] . Tu Yanhui, Du Jun, Wang Qing, Computer speech and language . 2017,第nova期

机译：具有多通道特征串联和多视角系统组合的信息融合框架，用于基于深度学习的麦克风阵列语音鲁棒识别
3. An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition [J] . Bo Wu, Kehuang Li, Fengpei Ge, Selected Topics in Signal Processing, IEEE Journal of . 2017,第8期

机译：端到端深度学习方法可同时进行语音去混响和声学建模，以实现可靠的语音识别
4. Phonetic Analysis of Dysarthric Speech Tempo and Applications to Robust Personalised Dysarthric Speech Recognition [C] . Feifei Xiong, Jon Barker, Heidi Christensen IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：语音异常的语音分析及其在个性化语音异常识别中的应用
5. A Practical and Efficient Multistream Framework for Noise Robust Speech Recognition [D] . Mallidi, Sri Harish. 2018

机译：实用高效的多流噪声鲁棒语音识别框架
6. A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：在带通滤波调制多流功能根据框架鲁棒语音识别
7. Adaptive speech recognition framework for dysarthric patients [O] . Gabriella Simon-Nagy, Annamária R. Várkonyi-Kóczy 2016

机译：缺陷患者的自适应语音识别框架

A Sequential Contrastive Learning Framework for Robust Dysarthric Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅