首页> 外文会议>European Conference on Speech Communication and Technology v.3; 20010903-20010907; Aalborg; DK >Enhancing Distributed Speech Recognition with Back-End Speech Reconstruction
【24h】

Enhancing Distributed Speech Recognition with Back-End Speech Reconstruction

机译:通过后端语音重建增强分布式语音识别

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present a method to enhance the usefulness of a Distributed Speech Recognition (DSR) system by providing it the capability to reconstruct speech at the back-end. Speech reconstruction is achieved using the standard DSR parameters, viz., Mel-Frequency Cepstral Coefficients (MFCC) and log-energy, and some additional parameters, viz., voicing class, pitch period, and (optionally) higher-resolution energy information. From the MFCC parameters and energy information, the spectral magnitudes at the harmonics of the pitch frequency are estimated. Based on the class information, the harmonic phases are appropriately modeled. The harmonic magnitudes and phases are used to reconstruct speech according to the well-known sinusoidal model for speech synthesis. Transmission of the additional parameters for speech reconstruction increases the DSR bit rate by less than 20%. Evaluation by Mean-Opinion-Score (MOS) test and Diagnostic Rhyme Test (DRT) show that speech reconstructed as above is of reasonable quality and quite intelligible.
机译:在本文中,我们提出了一种通过提供在后端重构语音的能力来增强分布式语音识别(DSR)系统的实用性的方法。使用标准DSR参数(即梅尔频率倒谱系数(MFCC)和对数能量)以及一些其他参数(即发声等级,基音周期和(可选)更高分辨率的能量信息)可以实现语音重建。根据MFCC参数和能量信息,可以估算音调频率谐波处的频谱幅度。基于类别信息,可以对谐波相位进行适当建模。根据众所周知的用于语音合成的正弦模型,谐波幅度和相位用于重建语音。用于语音重建的其他参数的传输将DSR比特率提高了不到20%。通过Mean-Opinion-Score(MOS)测试和Diagnostic Rhyme Test(DRT)进行的评估表明,如上所述重建的语音质量合理且相当清晰。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号