首页> 外文期刊>Journal of Intelligent Information Systems >Singer identification based on computational auditory scene analysis and missing feature methods
【24h】

Singer identification based on computational auditory scene analysis and missing feature methods

机译:基于计算听觉场景分析和特征缺失方法的歌手识别

获取原文
获取原文并翻译 | 示例
           

摘要

A major challenge for the identification of singers from monaural popular music recording is to remove or alleviate the influence of accompaniments. Our system is realized in two stages. In the first stage, we exploit computational auditory scene analysis (CASA) to segregate the singing voice units from a mixture signal. First, the pitch of singing voice is estimated to extract the pitch-based features of each unit in an acoustic vector. These features are then exploited to estimate the binary time-frequency (T-F) masks, where 1 indicates that the corresponding T-F unit is dominated by the singing voice, and 0 indicates otherwise. These regions dominated by the singing voice are considered reliable, and other units are unreliable or missing. Thus the acoustic vector is incomplete. In the second stage, two missing feature methods, the reconstruction of acoustic vector and the marginalization, are used to identify the singer by dealing with the incomplete acoustic vectors. For the reconstruction of acoustic vector, the complete acoustic vector is first reconstructed and then converted to obtain the Gammatone frequency cepstral coefficients (GFCCs), which are further used to identify the singer. For the marginalization, the probabilities that the voice belonging to a certain singer are computed on the basis of only the reliable components. We find that the reconstruction method outperforms the marginalization method, while both methods have significantly good performances, especially at signal-to-accompaniment ratios (SARs) of 0 dB and -3 dB, in contrast to another system.
机译:从单声道流行音乐唱片中识别歌手的主要挑战是消除或减轻伴奏的影响。我们的系统分两个阶段实现。在第一阶段,我们利用计算听觉场景分析(CASA)将歌声单元与混合信号分离。首先,估计演唱声音的音调以提取声学矢量中每个单元的基于音调的特征。然后利用这些特征来估计二进制时频(T-F)掩码,其中1表示对应的T-F单元由歌声占主导地位,而0表示否则。这些以唱歌声为主的区域被认为是可靠的,而其他单元则不可靠或缺失。因此,声矢量是不完整的。在第二阶段,通过处理不完整的声学矢量,使用两个缺失特征方法,即声学矢量的重构和边缘化来识别歌手。对于声矢量的重建,首先重建完整的声矢量,然后将其转换以获得伽马通频率倒谱系数(GFCC),该系数进一步用于识别歌手。对于边缘化,仅基于可靠成分来计算属于某个歌手的声音的概率。我们发现,与另一种系统相比,重构方法的性能优于边缘化方法,而两种方法均具有显着良好的性能,尤其是在0 dB和-3 dB的信号伴奏比(SAR)下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号