首页> 外国专利> TEXT INDEPENDENT SPEAKER-VERIFICATION ON A MEDIA OPERATING SYSTEM USING DEEP LEARNING ON RAW WAVEFORMS

TEXT INDEPENDENT SPEAKER-VERIFICATION ON A MEDIA OPERATING SYSTEM USING DEEP LEARNING ON RAW WAVEFORMS

机译：在原始波形上使用深度学习的媒体操作系统的文本独立发言人验证

页面导航

摘要
著录项
相似文献

摘要

An artificial neural network architecture is provided for processing raw audio waveforms to create speaker representations that are used for text-independent speaker verification and recognition. The artificial neural network architecture includes a strided convolution layer, first and second sequentially connected residual blocks, a transformer layer, and a final fully connected (FC) layer. The strided convolution layer is configured to receive raw audio waveforms from a speaker. The first and the second residual blocks both include multiple convolutional and max pooling layers. The transformer layer is configured to aggregate frame level embeddings to an utterance level embedding. The output of the FC layer creates a speaker representation for the speaker whose raw audio waveforms were inputted into the strided convolution layer.

机译：提供了一种人工神经网络架构，用于处理原始音频波形，以创建用于独立于文本的扬声器验证和识别的扬声器表示。人工神经网络架构包括冲击卷积层，第一和第二序贯连接的残余块，变压器层和最终完全连接（FC）层。冲突卷积层被配置为从扬声器接收原始音频波形。第一和第二剩余块均包括多个卷积和最大池池层。变压器层被配置为聚合帧级嵌入到嵌入的话语级别。 FC层的输出为其RAW音频波形被输入到Strive卷积层的扬声器产生扬声器表示。

著录项

公开/公告号WO2021133714A1

专利类型
公开/公告日2021-07-01

原文格式PDF
申请/专利权人 ALPHONSO INC.;
展开▼

申请/专利号WO2020US66337
发明设计人 MUHAMED AASHIQ;GHOSE SUSMITA;
展开▼

申请日2020-12-21
分类号G10L17/18;G06N3/04;
国家 US
入库时间 2022-08-24 19:52:18

相似文献

专利
外文文献
中文文献