...
首页> 外文期刊>Canadian acoustics >FORCED-ALIGNMENT OF THE SUNG ACOUSTIC SIGNAL USING DEEP NEURAL NETS
【24h】

FORCED-ALIGNMENT OF THE SUNG ACOUSTIC SIGNAL USING DEEP NEURAL NETS

机译:使用深神经网的Sung声学信号的强制对准

获取原文
获取原文并翻译 | 示例
           

摘要

Sung speech shows significant acoustic differences from spoken speech. One challenge in analyzing both spoken and sung speech is identifying the individual speech sounds. Forcedalignment systems such as P2FA [1] and the Montreal Forced Aligner [2] have been designed to accomplish this task for spoken speech, however, there is no such tool for sung speech. Previous work used a combination of hidden Markov models and convolutional neural networks on log-Mel filterbanks to segment phones in sung Mandarin opera [3]. We, in turn, trained a deep neural network to extract phone-level information from a sung acoustic signal. The primary objective was to create a model that can take a WAV file containing a target song as the input, and produce time-aligned phonemic labels automatically as output. To measure the performance of our model on these tasks, we primarily measured accuracy on identifying the correct phone label at a given time-step. We also compared the accuracy of our model to other state of the art systems, trained on spoken speech, performing the same task with sung speech.
机译:Sung演讲显示出与口语言论的显着声学差异。分析口语和唱歌语音的一个挑战是识别各个语音声音。诸如P2FA [1]和蒙特利尔强制对齐器等强制管理系统旨在实现这项任务以实现语音的语音,但是,没有唱歌语音的工具。以前的工作使用了隐马尔可夫模型和卷积神经网络的组合在log-mel referbanks上到了Sung Mandarin Opera的段电话[3]。反过来,我们训练了深度神经网络,以从SUNG声学信号中提取电话级信息。主要目标是创建一个模型,可以将包含目标歌曲作为输入的WAV文件,并自动产生时间对齐的音素标签作为输出。为了测量我们对这些任务的模型的性能,我们主要测量在给定时间步骤中识别正确的电话标签的准确性。我们还将我们模型的准确性与其他艺术系统的准确性进行了比较,培训了口语演讲,与Sung语音进行了相同的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号