首页> 外文会议>International Conference on speech and computer >Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human- Computer Conversations

【24h】

Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human- Computer Conversations

机译：凝视，韵律和语义：人机对话中各种多模式信号与收件人检测的相关性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The present research is focused on multimodal addressee detection in human-human-computer conversations. A modern spoken dialogue system operating under realistic conditions that may include multiparty interaction (several people solve a cooperative task by addressing the system while talking to each other) is supposed to distinguish machine- from human-addressed utterances. Machine-addressed queries should be directly responded to, while human-addressed utterances should be either ignored or processed in an implicit way. We propose a multimodal system performing the visual, acoustic-prosodic, and textual analysis of users' utterances. We managed to outperform the existing baseline for the Smart Video Corpus by applying our system. We also investigated the performance of different models for separate speech categories with various speech spontaneity and determined that the acoustical model has difficulties in classifying constrained speech, and the textual model performs worse for spontaneous speech, while the performance of the visual model drops for read human-addressed speech and for spontaneous human-addressed speech significantly due to the ambiguous behaviour of users.

机译：本研究的重点是在人机对话中的多模式收件人检测。在现实条件下运行的现代口语对话系统可能包括多方互动（几个人在互相交谈的同时通过寻址该系统来解决合作任务）被认为可以将机器语音和人类语音分开。机器寻址的查询应直接响应，而人类寻址的话语则应被忽略或以隐式方式处理。我们提出了一种多模式系统，可以对用户的话语进行视觉，声学，韵律和文本分析。通过应用我们的系统，我们设法超越了智能视频语料库的现有基准。我们还研究了具有不同语音自发性的不同语音类别的不同模型的性能，并确定了声学模型难以对受限语音进行分类，而文本模型对自发语音的性能较差，而视觉模型的性能却对阅读者有所下降由于用户的模棱两可的行为，导致语音寻址和自发的人类语音产生很大影响。

著录项

来源
《International Conference on speech and computer》|2018年|1-10|共10页
会议地点
作者
Oleg Akhtiamov; Vasily Palkov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Computational Paralinguistics; Off-Talk; Speaking style Text classification; Frontal face detection; Spoken dialogue system;

机译：计算语言学;脱口说口语风格文字分类;正面人脸检测;口语对话系统;

相似文献

外文文献
中文文献
专利

1. A Study of Multimodal Addressee Detection in Human-Human-Computer Interaction [J] . Tsai T.J., Stolcke Andreas, Slaney Malcolm Multimedia, IEEE Transactions on . 2015,第9期

机译：人机交互中多模式收件人检测的研究
2. Automatic bad channel detection in implantable brain-computer interfaces using multimodal features based on local field potentials and spike signals [J] . Computers in Biology and Medicine . 2020,第期

机译：使用基于本地现场电位和尖峰信号的多模式特征在植入式脑接口中自动频道检测
3. B2B social media semantics: Analysing multimodal online meanings in marketing conversations [J] . Mehmet Mehmet I., Clarke Rodney J. Industrial marketing management . 2016,第Apra期

机译：B2B社交媒体语义：分析营销对话中的多模式在线含义
4. Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human- Computer Conversations [C] . Oleg Akhtiamov, Vasily Palkov International Conference on Speech and Computer . 2018

机译：凝视，韵律和语义：各种多峰信号与接受人力计算机对话中的接受检测的相关性
5. Information processing and semantic correlation in multimodal, multilingual, human computer interfaces. [D] . Jung, Namsoon. 2012

机译：多模式，多语言，人机界面中的信息处理和语义关联。
6. Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze [O] . Michael Barz, Daniel Sonntag 2021

机译：使用预训练的计算机视觉模型和人类凝视自动视觉注意力检测移动眼睛跟踪
7. The multimodal texture of engagement: prosodic language, gaze and posture in engaged, creative classroom interaction [O] . Taylor, Roberta 2016

机译：参与的多模式纹理：韵律语言，参与中的凝视和姿势，创造性的课堂互动

Gaze, Prosody and Semantics: Relevance of Various Multimodal Signals to Addressee Detection in Human-Human- Computer Conversations

摘要

著录项

相似文献

相关主题

期刊订阅