Synthetic speech detection using fundamental frequency variation and spectral features

Monisankha Pal; Dipjyoti Paul; Goutam Saha

首页> 外文期刊>Computer speech and language >Synthetic speech detection using fundamental frequency variation and spectral features

【24h】

Synthetic speech detection using fundamental frequency variation and spectral features

机译：利用基本频率变化和频谱特征进行合成语音检测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Highlights

•Proposed synthetic speech detection using score fusion of CQCC, APGDF and fundamental frequency variation (FFV) features.

•Best spoofing detection performance on the ASVspoof 2015 evaluation dataset with an overall EER of 0.05%.

•Produced the state-of-the-art performance for ASV integrated with countermeasure framework.

•Superior performance in generalization ability assessment.

Abstract

Recent works on the vulnerability of automatic speaker verification (ASV) systems confirm that malicious spoofing attacks using synthetic speech can provoke significant increase in false acceptance rate. A reliable detection of synthetic speech is key to develop countermeasure for synthetic speech based spoofing attacks. In this paper, we targeted that by focusing on three major types of artifacts related to magnitude, phase and pitch variation, which are introduced during the generation of synthetic speech. We proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV). CQCC and APGDF were individually used earlier for spoofing detection task and yielded the best performance among magnitude and phase spectrum related features, respectively. The novel FFV feature introduced in this paper to extract pitch variation at frame-level, provides complementary information to CQCC and APGDF. Experimental results show that the proposed approach produces the best stand-alone spoofing detection performance using Gaussian mixture model (GMM) based classifier on ASVspoof 2015 evaluation dataset. An overall equal error rate of 0.05% with a relative performance improvement of 76.19% over the next best-reported results is obtained using the proposed method. In addition to outperforming all existing baseline features for both known and unknown attacks, the proposed feature combination yields superior performance for ASV system (GMM with universal background model/i-vector) integrated with countermeasure framework. Further, the proposed method is found to have relatively better generalization ability when either one or both of copy-synthesized data and limited spoofing data are available a priori in the training pool.

机译：

突出显示

• 使用CQCC，APGDF和基本频率变化（FFV）功能的分数融合提议的合成语音检测。

• 在具有ASVspoof 2015评估数据集的最佳欺骗检测性能上，总体EER为0.05％。

• 为与对策框架集成的ASV提供了最新技术。

• 在泛化能力评估中的出色表现。

摘要

近期作品关于自动说话人验证（ASV）系统的漏洞的证据表明，使用合成语音进行的恶意欺骗攻击可以大大增加错误接受率。对合成语音的可靠检测是开发针对基于合成语音的欺骗攻击的对策的关键。在本文中，我们通过针对与幅度，相位和音高变化有关的三种主要伪像来实现目标，这些伪像是在合成语音的生成过程中引入的。我们提出了一种使用前端特征的分数级融合来检测合成语音的新方法，这些特征包括恒定Q倒谱系数（CQCCs），全极点群延迟函数（APGDF）和基频变化（FFV）。 CQCC和APGDF分别较早地用于欺骗检测任务，并且在幅度和相位谱相关特征中分别获得最佳性能。本文介绍的新颖FFV功能可在帧级别提取音高变化，为CQCC和APGDF提供补充信息。实验结果表明，该方法在ASVspoof 2015评估数据集上使用基于高斯混合模型（GMM）的分类器产生了最佳的独立欺骗检测性能。使用所提出的方法可获得0.05％的整体均等错误率，相对于下一个最佳报告的结果，其相对性能提高了76.19％。除了在已知和未知攻击方面均胜过所有现有基准功能之外，建议的功能组合还为与对策框架集成的ASV系统（具有通用背景模型/ i矢量的GMM）提供了卓越的性能。此外，当在训练池中事先获得副本合成数据和有限欺骗数据中的一个或两个时，发现该方法具有相对较好的泛化能力。

著录项

来源
《Computer speech and language》 |2018年第3期|31-50|共20页
作者
Monisankha Pal; Dipjyoti Paul; Goutam Saha;
展开▼
作者单位

Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology Kharagpur;

Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology Kharagpur;

Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology Kharagpur;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
All-pole group delay function (APGDF); Anti-spoofing; Constant Q cepstral coefficient (CQCC); Fundamental frequency variation (FFV); Score-level fusion; Spoofing attack;

机译：全极点群延迟函数（APGDF）;反欺骗;恒定Q倒频谱系数（CQCC）;基本频率变化（FFV）;分数级融合;欺骗攻击;

相似文献

外文文献
中文文献
专利

1. On Normalized MSE Analysis of Speech Fundamental Frequency in the Cochlear Implant-Like Spectrally Reduced Speech [J] . Do C.-T., Pastor D., Goalic A. Biomedical Engineering, IEEE Transactions on . 2010,第3期

机译：人工耳蜗样频谱减少语音中语音基本频率的归一化MSE分析
2. Spectral Features for Synthetic Speech Detection [J] . Dipjyoti Paul, Monisankha Pal, Goutam Saha Selected Topics in Signal Processing, IEEE Journal of . 2017,第4期

机译：合成语音检测的频谱特征
3. Long-term high frequency features for synthetic speech detection [J] . Digital Signal Processing . 2020,第期

机译：合成语音检测的长期高频功能
4. Evaluation of the effects of speech enhancement algorithms on the detection of fundamental frequency of speech [C] . Garcia Narciso, Vasquez-Correa J.C., Vargas-Bonilla J.F., Symposium of Signals, Images and Artificial Vision . 2014

机译：评估语音增强算法对语音基本频率检测的效果
5. Estimation of glottal source features from the spectral envelope of the acoustic speech signal. [D] . Torres, Juan Felix. 2010

机译：从声音语音信号的频谱包络估计声门源特征。
6. No interaction between fundamental-frequency differences and spectral region when perceiving speech in a speech background [O] . Sara M. K. Madsen, Torsten Dau, Andrew J. Oxenham 2021

机译：在语音背景中感知语音时基本频率差异与光谱区域之间没有相互作用
7. No interaction between fundamental-frequency differences and spectral region when perceiving speech in a speech background [O] . Sara M. K. Madsen, Torsten Dau, Andrew J. Oxenham 2021

机译：在语音背景中感知语音时，基本频率差异与光谱区域之间没有相互作用
8. Speech Recognition, Articulatory Feature Detection, and Speech Synthesis in Multiple Languages [R] . Ore, B. M. 2009

机译：语音识别，发音特征检测和多语言语音合成

Synthetic speech detection using fundamental frequency variation and spectral features

摘要

著录项

相似文献

相关主题

期刊订阅