【24h】

Sliding Window-based Speech-to-Lips Conversion with Low Delay

机译:低延迟的基于滑动窗口的语音到嘴唇转换

获取原文

摘要

The goal of a good speech-to-lips conversion system is to synthesize high quality, realistic lips movement which is time synchronized with the input speech. Previously, the maximum probability estimation of visual trajectory by Gaussian Mixture Model (GMM) has been successfully proposed and tested for speech-to-lips conversion. It works as a sentence level batch process that convert acoustic speech signals to visual lips movement trajectory. In this paper, we propose a moving window based, low delay speech-to-lips conversion method for real-time communication applications. The new approach is an approximation of the MLE-GMM conversion but can render lips movement on-the-fly with a low time latency. Experimental results on the LIPS2009 dataset shows that proposed real-time method can achieve a latency of less than 100ms while maintain comparable quality as the batch method.
机译:良好的语音到嘴唇转换系统的目标是合成高质量,逼真的嘴唇运动,该运动与输入语音在时间上同步。以前,已经成功提出了通过高斯混合模型(GMM)估计视觉轨迹的最大概率,并进行了语音到嘴唇的转换测试。它用作句子级批处理,可将语音信号转换为视觉的嘴唇运动轨迹。在本文中,我们提出了一种用于实时通信应用的基于移动窗口的低延迟语音到嘴唇转换方法。新方法是MLE-GMM转换的近似方法,但可以以较低的时间延迟实时显示嘴唇运动。 LIPS2009数据集上的实验结果表明,提出的实时方法可以实现小于100ms的延迟,同时保持与批处理方法相当的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号