Restricted Boltzmann machines for vector representation of speech in speaker recognition

Omid Ghahabi; Javier Hernando

首页> 外文期刊>Computer speech and language >Restricted Boltzmann machines for vector representation of speech in speaker recognition

【24h】

Restricted Boltzmann machines for vector representation of speech in speaker recognition

机译：说话人识别中用于语音矢量表示的受限玻尔兹曼机

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over the last few years, i-vectors have been the state-of-the-art technique in speaker recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need phonetically labeled background data. The aim of this work is to develop an efficient alternative vector representation of speech by keeping the computational cost as low as possible and avoiding phonetic labels, which are not always accessible. The proposed vectors will be based on both Gaussian Mixture Models (GMM) and Restricted Boltzmann Machines (RBM) and will be referred to as GMM-RBM vectors. The role of RBM is to learn the total speaker and session variability among background GMM supervectors. This RBM, which will be referred to as Universal RBM (URBM), will then be used to transform unseen supervectors to the proposed low dimensional vectors. The use of different activation functions for training the URBM and different transformation functions for extracting the proposed vectors are investigated. At the end, a variant of Rectified Linear Units (ReLU) which is referred to as variable ReLU (VReLU) is proposed. Experiments on the core test condition 5 of NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process.

机译：在过去的几年中，i向量已成为说话人识别的最新技术。深度学习（DL）技术的最新进展已经提高了i向量的质量，但是使用的DL技术在计算上非常昂贵，并且需要用语音标记的背景数据。这项工作的目的是通过保持尽可能低的计算成本并避免并非总是可访问的语音标签来开发有效的语音矢量替代表示。提出的矢量将基于高斯混合模型（GMM）和受限玻尔兹曼机（RBM），并将被称为GMM-RBM矢量。 RBM的作用是了解背景GMM超向量之间的说话人和会话的总变异性。然后，该RBM（将被称为通用RBM（URBM））将用于将看不见的超向量转换为建议的低维向量。研究了使用不同的激活函数来训练URBM，以及使用不同的转换函数来提取提出的向量。最后，提出了一种整流线性单元（ReLU）的变体，称为变量ReLU（VReLU）。在NIST SRE 2010的核心测试条件5上进行的实验表明，在向量提取过程中，计算量明显较低，可以实现与常规i-vector相当的结果。

著录项

来源
《Computer speech and language》 |2018年第1期|16-29|共14页
作者
Omid Ghahabi; Javier Hernando;
展开▼
作者单位

TALP Research Center, Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain;

TALP Research Center, Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Restricted Boltzmann machine; Deep learning; Variable rectified linear unit; Speaker recognition; GMM-RBM vector; i-vector;

机译：受限的玻尔兹曼机深度学习;可变整流线性单元;说话人识别;GMM-RBM载体;向量;

相似文献

外文文献
中文文献
专利

1. Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition [J] . Sailor Hardik B., Patil Hemant A. The Journal of the Acoustical Society of America . 2017,第6期

机译：听觉特征表示，使用卷积限制Boltzmann机器和Teager能量操作员进行语音识别
2. Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows ? [J] . Umair Khan, Pooyan Safari, Javier Hernando Applied Sciences . 2019,第13期

机译：限制Boltzmann机器矢量用于扬声器聚类和电视广播节目中的跟踪任务吗？
3. Speaker Recognition System for Limited Speech Data Using High-Level Speaker Specific Features and Support Vector Machines [J] . Satyanand Singh, Assaf Mansour H., Nitin Agarwal, International Journal of Applied Engineering Research . 2017,第19aPta1期

机译：使用高级扬声器特定功能和支持向量机有限语音数据的扬声器识别系统
4. Restricted Boltzmann Machine supervectors for speaker recognition [C] . Ghahabi Omid, Hernando Javier IEEE International Conference on Acoustics, Speech and Signal Processing . 2015

机译：用于说话人识别的受限玻尔兹曼机超向量
5. Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition [D] . Guo, Jinxi. 2019

机译：基于神经网络的语言和扬声器识别的模拟
6. Attention-Based Recurrent Temporal Restricted Boltzmann Machine for Radar High Resolution Range Profile Sequence Recognition [O] . Yifan Zhang, Xunzhang Gao, Xuan Peng, 2018

机译：基于注意力的递归时间受限玻尔兹曼机用于雷达高分辨率测距剖面序列识别
7. Restricted Boltzmann machines for vector representation of speech in speaker recognition [O] . Ghahabi, Omid, Hernando Pericás, Francisco Javier 2018

机译：受限制的Boltzmann机器用于说话人识别中的语音矢量表示

Restricted Boltzmann machines for vector representation of speech in speaker recognition

摘要

著录项

相似文献

相关主题

期刊订阅