Automatic Singing Transcription Based on Encoder-decoder Recurrent Neural Networks with a Weakly-supervised Attention Mechanism

机译：基于编码器解码器经常性神经网络的自动唱歌转录，具有弱监督的注意力机制

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes neural singing transcription that estimates a sequence of musical notes directly from the audio signal of singing voice in an end-to-end manner without time-aligned training data. A conventional approach to singing transcription is to perform vocal F0 estimation followed by musical note estimation. The performance of this approach, however, is severely limited because the F0 estimation errors propagate to the note estimation step and rich acoustic information cannot be used. In addition, it is difficult and time-consuming to split continuous signals of singing voices into segments corresponding to musical notes for making precise time-aligned transcriptions. To solve these problems, we use an encoder-decoder model with an attention mechanism that can automatically learn an input-output alignment and mapping, even from non-aligned training data. The main challenge of our study is to estimate temporal categories (note values) in addition to instantaneous categories (pitches). We thus propose a novel loss function for the attention weights of time-aligned notes for semi-supervised alignment training. By gradually reducing the weight of the loss function, a better input-output alignment can be learned much more quickly. We showed that our method performed well for isolated singing voice in popular music.

机译：本文介绍了神经歌唱转录，其直接从端到端的方式从唱歌语音的音频信号估计一个音符序列，而没有时间对齐的训练数据。传统的唱歌转录方法是进行声音F0估计，然后进行音符估计。然而，这种方法的性能受到严重限制，因为F0估计误差传播到音符估计步骤，并且不能使用丰富的声学信息。另外，困难且耗时的是将唱歌的连续信号分成与音乐票据相对应的段，用于制作精确的时间排列的转录。为了解决这些问题，我们使用具有注意机制的编码器 - 解码器模型，即使来自非对齐训练数据，也可以自动学习输入输出对准和映射。除了瞬时类别（音高）之外，我们研究的主要挑战是估计时间类别（注释值）。因此，我们提出了一种新的损失函数，以便对半监督对准训练的时间对齐笔记的注意力重量。通过逐渐减小损失函数的重量，可以更快地学习更好的输入输出对准。我们表明，我们的方法在流行音乐中孤立的歌声表现良好。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|665p|共5页
会议地点
作者
Ryo Nishikimi; Eita Nakamura; Satoru Fukayama; Masataka Goto; Kazuyoshi Yoshii;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Automatic singing transcription; end-to-end learning; sequence-to-sequence learning; encoder-decoder recurrent neural networks; attention mechanism;

机译：自动唱歌转录;端到端学习;序列到序列学习;编码器 - 解码器经常性神经网络;注意机制;

相似文献

外文文献
中文文献
专利

1. Music Feature Classification Based on Recurrent Neural Networks with Channel Attention Mechanism [J] . Jie Gan Mobile information systems . 2021,第a期

机译：基于经常性神经网络的频道注意机制的音乐特征分类
2. Anomaly detection in smart grid based on encoder-decoder framework with recurrent neural network [J] . Zheng Fengming, Li Shufang, Guo Zhimin, 中国邮电高校学报（英文版） . 2017,第006期

机译：基于递归神经网络的编解码框架的智能电网异常检测
3. Removal notice to 'Equipping recurrent neural network with CNN-style attention mechanisms for sentiment analysis of network reviews' [Comput. Commun. 148 (2019) 98-106] [J] . Usama Mohd, Ahmad Belal, Yang Jun, Computer Communications . 2019,第Deca期

机译：删除“使用CNN式注意力机制为网络评论情感分析配备递归神经网络”的通知[计算机。公社148（2019）98-106]
4. Automatic Singing Transcription Based on Encoder-decoder Recurrent Neural Networks with a Weakly-supervised Attention Mechanism [C] . Ryo Nishikimi, Eita Nakamura, Satoru Fukayama, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：基于带有弱监督注意机制的编解码器递归神经网络的自动歌唱转录
5. Convolutional Recurrent Neural Networks and Attention Mechanisms for Robust Deep Learning [D] . Zheng, Jian . 2019

机译：坚固深度学习的卷积经常性神经网络和注意力机制
6. Automatic diagnosis of macular diseases from OCT volume based on its two-dimensional feature map and convolutional neural network with attention mechanism [O] . Yankui Sun, Haoran Zhang, Xianlin Yao 2020

机译：基于其二维特征图和卷积神经网络自动诊断来自OCT卷的关注机制
7. Music Feature Classification Based on Recurrent Neural Networks with Channel Attention Mechanism [O] . Jie Gan 2021

机译：基于经常性神经网络的频道注意机制的音乐特征分类

Automatic Singing Transcription Based on Encoder-decoder Recurrent Neural Networks with a Weakly-supervised Attention Mechanism

摘要

著录项

相似文献

相关主题

期刊订阅