首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >A Real-Time Speech Separation Method Based on Camera and Microphone Array Sensors Fusion Approach
【2h】

A Real-Time Speech Separation Method Based on Camera and Microphone Array Sensors Fusion Approach

机译:基于摄像头和麦克风阵列传感器融合方法的实时语音分离方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the context of assisted human, identifying and enhancing non-stationary speech targets speech in various noise environments, such as a cocktail party, is an important issue for real-time speech separation. Previous studies mostly used microphone signal processing to perform target speech separation and analysis, such as feature recognition through a large amount of training data and supervised machine learning. The method was suitable for stationary noise suppression, but relatively limited for non-stationary noise and difficult to meet the real-time processing requirement. In this study, we propose a real-time speech separation method based on an approach that combines an optical camera and a microphone array. The method was divided into two stages. Stage 1 used computer vision technology with the camera to detect and identify interest targets and evaluate source angles and distance. Stage 2 used beamforming technology with microphone array to enhance and separate the target speech sound. The asynchronous update function was utilized to integrate the beamforming control and speech processing to reduce the effect of the processing delay. The experimental results show that the noise reduction in various stationary and non-stationary noise environments were 6.1 dB and 5.2 dB respectively. The response time of speech processing was less than 10ms, which meets the requirements of a real-time system. The proposed method has high potential to be applied in auxiliary listening systems or machine language processing like intelligent personal assistant.
机译:在辅助人员的背景下,识别和增强非平稳语音目标是各种噪声环境(例如鸡尾酒会)中的语音,是实时语音分离的重要问题。先前的研究主要使用麦克风信号处理来执行目标语音分离和分析,例如通过大量训练数据和有监督的机器学习进行特征识别。该方法适用于平稳噪声抑制,但对于非平稳噪声相对有限,难以满足实时处理要求。在这项研究中,我们提出了一种实时的语音分离方法,该方法基于将光学相机和麦克风阵列结合在一起的方法。该方法分为两个阶段。第1阶段将计算机视觉技术与相机配合使用,以检测和识别兴趣目标并评估光源角度和距离。第二阶段使用带有麦克风阵列的波束成形技术来增强和分离目标语音。利用异步更新功能将波束形成控制和语音处理集成在一起,以减少处理延迟的影响。实验结果表明,在各种固定和非固定噪声环境下的降噪分别为6.1 dB和5.2 dB。语音处理的响应时间小于10ms,可以满足实时系统的要求。该方法具有很大的潜力,可应用于辅助听音系统或智能个人助理等机器语言处理中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号