Highlights'/> Synthetic speech detection using fundamental frequency variation and spectral features
首页> 外文期刊>Computer speech and language >Synthetic speech detection using fundamental frequency variation and spectral features
【24h】

Synthetic speech detection using fundamental frequency variation and spectral features

机译:利用基本频率变化和频谱特征进行合成语音检测

获取原文
获取原文并翻译 | 示例
           

摘要

HighlightsProposed synthetic speech detection using score fusion of CQCC, APGDF and fundamental frequency variation (FFV) features.Best spoofing detection performance on the ASVspoof 2015 evaluation dataset with an overall EER of 0.05%.Produced the state-of-the-art performance for ASV integrated with countermeasure framework.Superior performance in generalization ability assessment.AbstractRecent works on the vulnerability of automatic speaker verification (ASV) systems confirm that malicious spoofing attacks using synthetic speech can provoke significant increase in false acceptance rate. A reliable detection of synthetic speech is key to develop countermeasure for synthetic speech based spoofing attacks. In this paper, we targeted that by focusing on three major types of artifacts related to magnitude, phase and pitch variation, which are introduced during the generation of synthetic speech. We proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV). CQCC and APGDF were individually used earlier for spoofing detection task and yielded the best performance among magnitude and phase spectrum related features, respectively. The novel FFV feature introduced in this paper to extract pitch variation at frame-level, provides complementary information to CQCC and APGDF. Experimental results show that the proposed approach produces the best stand-alone spoofing detection performance using Gaussian mixture model (GMM) based classifier on ASVspoof 2015 evaluation dataset. An overall equal error rate of 0.05% with a relative performance improvement of 76.19% over the next best-reported results is obtained using the proposed method. In addition to outperforming all existing baseline features for both known and unknown attacks, the proposed feature combination yields superior performance for ASV system (GMM with universal background model/i-vector) integrated with countermeasure framework. Further, the proposed method is found to have relatively better generalization ability when either one or both of copy-synthesized data and limited spoofing data are available a priori in the training pool.
机译: 突出显示 使用CQCC,APGDF和基本频率变化(FFV)功能的分数融合提议的合成语音检测。 在具有ASVspoof 2015评估数据集的最佳欺骗检测性能上,总体EER为0.05%。 为与对策框架集成的ASV提供了最新技术。 在泛化能力评估中的出色表现。 摘要 近期作品关于自动说话人验证(ASV)系统的漏洞的证据表明,使用合成语音进行的恶意欺骗攻击可以大大增加错误接受率。对合成语音的可靠检测是开发针对基于合成语音的欺骗攻击的对策的关键。在本文中,我们通过针对与幅度,相位和音高变化有关的三种主要伪像来实现目标,这些伪像是在合成语音的生成过程中引入的。我们提出了一种使用前端特征的分数级融合来检测合成语音的新方法,这些特征包括恒定Q倒谱系数(CQCCs),全极点群延迟函数(APGDF)和基频变化(FFV)。 CQCC和APGDF分别较早地用于欺骗检测任务,并且在幅度和相位谱相关特征中分别获得最佳性能。本文介绍的新颖FFV功能可在帧级别提取音高变化,为CQCC和APGDF提供补充信息。实验结果表明,该方法在ASVspoof 2015评估数据集上使用基于高斯混合模型(GMM)的分类器产生了最佳的独立欺骗检测性能。使用所提出的方法可获得0.05%的整体均等错误率,相对于下一个最佳报告的结果,其相对性能提高了76.19%。除了在已知和未知攻击方面均胜过所有现有基准功能之外,建议的功能组合还为与对策框架集成的ASV系统(具有通用背景模型/ i矢量的GMM)提供了卓越的性能。此外,当在训练池中事先获得副本合成数据和有限欺骗数据中的一个或两个时,发现该方法具有相对较好的泛化能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号