首页> 外文学位 >Towards social virtual listeners: Computational models of human nonverbal behaviors.
【24h】

Towards social virtual listeners: Computational models of human nonverbal behaviors.

机译:面向社会虚拟听众:人类非语言行为的计算模型。

获取原文
获取原文并翻译 | 示例

摘要

Human nonverbal communication is a highly interactive process, in which the participants dynamically send and respond to nonverbal signals. These signals play a significant role in determining the nature of a social exchange. Although human can naturally recognize, interpret and produce these nonverbal signals in social contexts, computers are not equipped with such abilities. Therefore, creating computational models for holding fluid interactions with human participants has become an important topic for many research fields including human-computer interaction, robotics, artificial intelligence, and cognitive sciences. Central to the problem of modeling social behaviors is the challenge of understanding the dynamics involved with listener backchannel feedbacks (i.e. the nods and paraverbals such as ``uh-hu'' and ``mm-hmm'' that listeners produce as someone is speaking). In this thesis, I present a framework for modeling visual backchannels of a listener during a dyadic conversation. I address the four major challenges involved in modeling nonverbal human behaviors, more specifically listener backchannels: (1)High Dimensionality: Human communication is a complicated phenomenon that involves many behaviors (i.e dimensions) such smile, nod, hand moving, and voice pith. A better understanding and analysis of social behaviors can be obtained by discovering the subset of features relevant to a specific social signal (e.g., backchannel feedback). In this thesis, I present a new feature ranking scheme which exploits the sparsity of probabilistic models when trained on human behavior problems. This technique gives researchers a new tool to analyze individual differences in social nonverbal communication. Furthermore, I present a feature selection approach which first looks at the important behaviors for each individual, called self-features, before building a consensus. (2)Multimodal Processing: This high dimensional data comes from different communicative channels (modalities) that contain complementary information essential to interpretation and understanding of human behaviors. Therefore, effective and efficient fusion of these modalities is a challenging task. If integrated carefully, different modalities have the potential to provide complementary information that will improve the model performance. In this thesis, I introduce a new model called Latent Mixture of Discriminative Experts which can automatically learn the temporal relationship between different modalities. Since, I train separate experts for each modality, LMDE is capable of improving the prediction performance even with limited amount of data. (3) Visual Influence: Human communication is dynamic in the sense that people affect each other's nonverbal behaviors (i.e. gesture mirroring). Therefore, while predicting the nonverbal behaviors of a person of interest, the visual gestures from the second interlocutor should also be taken into account. In this thesis, I propose a context-based prediction framework that models the visual influence of an interlocutor in a dyadic conversation, even if the visual modality from the second interlocutor is absent. (4) Variability in Human's Behaviors: It is known that age, gender and culture affect people's social behaviors. Therefore, there are differences in the way people display and interpret nonverbal behaviors. A good model of human nonverbal behaviors should take these differences into account. Furthermore, gathering labeled data sets is time consuming and often expensive in many real life scenarios. In this thesis, I use "wisdom of crowds" that enables parallel acquisition of opinions from multiple annotators/labelers. I propose a new approach for modeling wisdom of crowds called wisdom-LMDE, which is able to learn the variations and commonalities among different crowd members (i.e. labelers).
机译:人类非语言交流是一个高度互动的过程,参与者可以动态发送和响应非语言信号。这些信号在决定社交交流的性质方面起着重要作用。尽管人类可以在社交环境中自然地识别,解释和产生这些非语言信号,但是计算机没有这种能力。因此,创建用于保持与人类参与者的流畅互动的计算模型已成为许多研究领域的重要课题,包括人机交互,机器人技术,人工智能和认知科学。社交行为建模问题的核心是理解听众反向通道反馈所涉及的动态变化的挑战(即听众在讲话时产生的点头和副词,例如“嗯”和“ mm-hmm”) )。在这篇论文中,我提出了一个用于在二元对话中对听众的视觉反向通道进行建模的框架。我解决了建模非语言人类行为时涉及的四个主要挑战,更具体地说是听众的反向渠道:(1)高维度:人类交流是一种复杂的现象,涉及许多行为(即维度),例如微笑,点头,动手和发声。通过发现与特定社交信号相关的特征子集(例如反向渠道反馈),可以更好地理解和分析社交行为。在本文中,我提出了一种新的特征排序方案,该方案在对人类行为问题进行训练时利用了概率模型的稀疏性。这项技术为研究人员提供了一种新工具,可以分析社交非语言交流中的个体差异。此外,我提出了一种特征选择方法,在建立共识之前,首先要研究每个人的重要行为,即自我特征。 (2)多模态处理:这种高维数据来自不同的交流渠道(模态),其中包含对于解释和理解人类行为必不可少的补充信息。因此,这些方式的有效和高效融合是一项艰巨的任务。如果仔细集成,则不同的模式可能会提供补充信息,从而改善模型性能。在本文中,我介绍了一种新模型,称为判别专家的潜在混合,它可以自动学习不同模态之间的时间关系。由于我为每种模式培训了独立的专家,因此即使数据量有限,LMDE也能够提高预测性能。 (3)视觉影响力:人的交流是动态的,即人们会影响彼此的非语言行为(即手势镜像)。因此,在预测感兴趣的人的非语言行为时,还应考虑第二个对话者的视觉手势。在这篇论文中,我提出了一个基于上下文的预测框架,该模型可以模拟对话者在二进对话中的视觉影响,即使第二对话者的视觉形式不存在。 (4)人类行为的变异性:众所周知,年龄,性别和文化会影响人们的社会行为。因此,人们显示和解释非语言行为的方式存在差异。人类非语言行为的良好模型应考虑这些差异。此外,在许多现实情况下,收集标记的数据集非常耗时,而且通常很昂贵。在本文中,我使用“人群的智慧”,它使得能够从多个注释者/标签者中并行获取观点。我提出了一种新的建模人群智慧的方法,称为智慧LMDE,它能够了解不同人群成员(即贴标者)之间的差异和共性。

著录项

  • 作者

    Ozkan, Derya.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 129 p.
  • 总页数 129
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号