首页> 外国专利> USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK FOR SPEAKER DIARIZATION SEGMENTATION

USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK FOR SPEAKER DIARIZATION SEGMENTATION

机译:使用长时间记忆递归神经网络对扬声器进行分割

摘要

Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.
机译:对包括第一说话者的语音,第二说话者的语音和静音的音频数据执行说话者二值化。说话者区分包括使用长短期记忆(LSTM)递归神经网络(RNN)分割音频数据,以识别将音频数据划分为多个片段的音频数据的变化点。说话者区分包括使用LSTM RNN将选自标签组的标签分配给音频数据的每个片段。标签组包括与第一扬声器,第二扬声器和静音相对应的标签。每个改变点是从第一扬声器,第二扬声器和静音中的一个到第一扬声器,第二扬声器和静音中的一个的过渡。可以在每个对应于第一说话者和第二说话者中的一个的片段上执行语音识别。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号