首页> 外国专利> USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK FOR SPEAKER DIARIZATION SEGMENTATION

USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK FOR SPEAKER DIARIZATION SEGMENTATION

机译：使用长时间记忆递归神经网络对扬声器进行分割

页面导航

摘要
著录项
相似文献

摘要

Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.

机译：对包括第一说话者的语音，第二说话者的语音和静音的音频数据执行说话者二值化。说话者区分包括使用长短期记忆（LSTM）递归神经网络（RNN）分割音频数据，以识别将音频数据划分为多个片段的音频数据的变化点。说话者区分包括使用LSTM RNN将选自标签组的标签分配给音频数据的每个片段。标签组包括与第一扬声器，第二扬声器和静音相对应的标签。每个改变点是从第一扬声器，第二扬声器和静音中的一个到第一扬声器，第二扬声器和静音中的一个的过渡。可以在每个对应于第一说话者和第二说话者中的一个的片段上执行语音识别。

著录项

公开/公告号US2018166066A1

专利类型
公开/公告日2018-06-14

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US201615379010
发明设计人 SAMUEL THOMAS;MICHAEL PICHENY;DIMITRIOS B. DIMITRIADIS;GEORGE SAON;DAVID C. HAWS;
展开▼

申请日2016-12-14
分类号G10L15/04;G10L15/16;G10L25/81;G10L15/06;
国家 US
入库时间 2022-08-21 13:01:57

相似文献

专利
外文文献
中文文献