...
首页> 外文期刊>Computer speech and language >Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection
【24h】

Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection

机译:使用声学环境分类的深度神经网络集成,用于基于统计模型的语音活动检测

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we investigate the ensemble of deep neural networks (DNNs) by using an acoustic environment classification (AEC) technique for the statistical model-based voice activity detection (VAD). From an investigation of the statistical model-based VAD, it is known that the traditional decision rule is based on the geometric mean of the likelihood ratio or the support vector machine (S VM), which is a shallow model with zero or one hidden layer. Since the shallow models cannot take an advantage of the diversity of the space distribution of features, in the training step, we basically build the multiple DNNs according the different noise types by employing the parameters of the statistical model-based VAD algorithm. In addition, the separate DNN is designed for the AEC algorithm in order to choose the best DNN for each noise. In the on-line noise-aware VAD step, the AEC is first performed on a frame-by-frame basis using the separate DNN so the a posteriori probabilities to identify noise are obtained. Once the probabilities are achieved for each noise, the environmental knowledge is contributed to allow us to combine the speech presence probabilities which are derived from the ensemble of the DNNs trained for the individual noise. Our approach for VAD was evaluated in terms of objective measures and showed significant improvement compared to the conventional algorithm.
机译:在本文中,我们通过使用基于统计模型的语音活动检测(VAD)的声学环境分类(AEC)技术,研究了深度神经网络(DNN)的集合。通过对基于统计模型的VAD的研究,可以知道传统决策规则是基于似然比或支持向量机(S VM)的几何平均值,它是具有零或一个隐藏层的浅层模型。由于浅层模型无法利用特征空间分布的多样性,因此在训练步骤中,我们基本上采用了基于统计模型的VAD算法的参数,根据不同的噪声类型构建了多个DNN。此外,为AEC算法设计了单独的DNN,以便为每种噪声选择最佳DNN。在在线噪声感知VAD步骤中,首先使用单独的DNN在逐帧的基础上执行AEC,以便获得识别噪声的后验概率。一旦为每种噪声实现了概率,便会提供环境知识,以使我们能够结合从针对单个噪声训练的DNN的集合中得出的语音存在概率。我们的VAD方法是根据客观指标进行评估的,与传统算法相比,显示出显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号