Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks

Haoqi Li; Brian Baucom; Shrikanth Narayanan; Panayiotis Georgiou

首页> 外文期刊>Computer speech and language >Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks

【24h】

Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks

机译：使用Triplet增强的上下文化网络的行为建模学习无监督的语音表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech encodes a wealth of information related to human behavior and has been used in a variety of automated behavior recognition tasks. However, extracting behavioral information from speech remains challenging including due to inadequate training data resources stemming from the often low occurrence frequencies of specific behavioral patterns. Moreover, supervised behavioral modeling typically relies on domain-specific construct definitions and corresponding manually-annotated data, rendering generalizing across domains challenging. In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way. We hypothesize that nearby segments of speech share the same behavioral context and hence map onto similar underlying behavioral representations. We present an encoder-decoder based Deep Contextualized Network (DCN) as well as a Triplet-Enhanced DCN (TE-DCN) framework to capture the behavioral context and derive a manifold representation, where speech frames with similar behaviors are closer while frames of different behaviors maintain larger distances. The models are trained on movie audio data and validated on diverse domains including on a couples therapy corpus and other publicly collected data (e.g., stand-up comedy). With encouraging results, our proposed framework shows the feasibility of unsupervised learning within cross-domain behavioral modeling.

机译：语音编码与人类行为有关的大量信息，并已用于各种自动行为识别任务。然而，从语音中提取行为信息仍然具有挑战性，包括由于来自特定行为模式的经常出现频率的训练数据资源不足。此外，监督的行为建模通常依赖于域特定的构造定义和对应的手动注释的数据，呈现跨域挑战的域概括。在本文中，我们利用交互中的人类行为的静止性质，并提出了一种以无监督方式从语音中捕获行为信息的表示学习方法。我们假设附近的演讲部分份额份额相同的行为背景，因此地图到类似的潜在行为表示。我们介绍了一种基于编码器解码器的深层上下文化网络（DCN）以及三重型增强DCN（TE-DCN）框架，用于捕获行为上下文并导出歧管表示，其中具有类似行为的语音帧在不同的帧时越近行为保持更大的距离。该模型在电影音频数据上培训并在不同的域上验证，包括夫妻治疗语料库和其他公共收集的数据（例如，站立喜剧）。随着令人鼓舞的结果，我们提出的框架显示了跨域行为建模中无监督学习的可行性。

著录项

来源
《Computer speech and language》 |2021年第11期|101226.1-101226.13|共13页
作者
Haoqi Li; Brian Baucom; Shrikanth Narayanan; Panayiotis Georgiou;
展开▼
作者单位

Signal Analysis and Interpretation Laboratory (SAIL) University of Southern California Los Angeles USA;

Department of Psychology University of Utah Salt Lake City USA;

Signal Analysis and Interpretation Laboratory (SAIL) University of Southern California Los Angeles USA;

Signal Analysis and Interpretation Laboratory (SAIL) University of Southern California Los Angeles USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Behavior modeling; Unsupervised representation learning; Context information; Metric learning;

机译：行为建模;无监督的代表学习;上下文信息;度量学习;

相似文献

外文文献
中文文献
专利

1. Learning behavioral models by recurrent neural networks with discrete latent representations with application to a flexible industrial conveyor [J] . Brusaferri Alessandro, Matteucci Matteo, Spinelli Stefano, Computers in Industry . 2020,第1期

机译：通过具有离散的神经网络与应用于柔性工业输送机的离散潜在表示学习行为模型
2. Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech [J] . Benjamin Sertolli, Zhao Ren, Bjoern W. Schuller, Computer speech and language . 2021,第Jula期

机译：从言语中，从深端到端语音识别网络中的代表转移学习
3. Spatial position constraint for unsupervised learning of speech representations [J] . Mohammad Ali Humayun, Hayati Yassin, Pg Emeroylariffion Abas PeerJ Computer Science . 2021,第a期

机译：无监督学习语音表示的空间位置约束
4. Contextual Behaviors and Internal Representations Acquired by Reinforcement Learning with a Recurrent Neural Network in a Continuous State and Action Space Task [C] . Hiroki Utsunomiya, Katsunari Shibata ICONIP 2008;International conference on advances in neuro-information processing . 2009

机译：在连续状态和动作空间任务中使用递归神经网络通过强化学习获得的情境行为和内部表示
5. Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition [D] . Guo, Jinxi. 2019

机译：基于神经网络的语言和扬声器识别的模拟
6. A Two-Stage Unsupervised Learning Algorithm Reproduces Multisensory Enhancement in a Neural Network Model of the Corticotectal System [O] . Thomas J. Anastasio, Paul E. Patton 2003

机译：两阶段无监督学习算法在皮质神经系统神经网络模型中再现了多感官增强
7. Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks [O] . Haoqi Li, Brian Baucom, Shrikanth Narayanan, 2021

机译：使用Triplet增强的上下文化网络的行为建模学习无监督的语音表示

Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks

摘要

著录项

相似文献

相关主题

期刊订阅