Deep-Learning-Based Multimodal Emotion Classification for Music Videos

机译：基于深度学习的音乐视频的多模式情感分类

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Music videos contain a great deal of visual and acoustic information. Each information source within a music video influences the emotions conveyed through the audio and video, suggesting that only a multimodal approach is capable of achieving efficient affective computing. This paper presents an affective computing system that relies on music, video, and facial expression cues, making it useful for emotional analysis. We applied the audio–video information exchange and boosting methods to regularize the training process and reduced the computational costs by using a separable convolution strategy. In sum, our empirical findings are as follows: (1) Multimodal representations efficiently capture all acoustic and visual emotional clues included in each music video, (2) the computational cost of each neural network is significantly reduced by factorizing the standard 2D/3D convolution into separate channels and spatiotemporal interactions, and (3) information-sharing methods incorporated into multimodal representations are helpful in guiding individual information flow and boosting overall performance. We tested our findings across several unimodal and multimodal networks against various evaluation metrics and visual analyzers. Our best classifier attained 74% accuracy, an f1-score of 0.73, and an area under the curve score of 0.926.

机译：音乐视频包含大量的视觉和声学信息。音乐视频中的每个信息源都会影响通过音频和视频传达的情绪，表明只有多模式方法能够实现高效的情感计算。本文提出了一种依赖于音乐，视频和面部表情提示的情感计算系统，使其可用于情绪分析。我们应用了音频 - 视频信息交换和提升方法来规范训练过程，并通过使用可分离的卷积策略来降低计算成本。总之，我们的实证研究结果如下：（1）复合交涉有效地获取包含在每个音乐视频的所有声音和视觉情感线索;（2）每个神经网络的计算成本显著通过因式分解标准的2D / 3D卷积减少进入单独的频道和时空相互作用，（3）包含在多式式表示中的信息共享方法有助于引导各个信息流程并提高整体性能。我们对各种评估指标和视觉分析仪进行了几个单峰和多模态网络测试了我们的发现。我们最好的分类器达到74％的准确性，F1分数为0.73，曲线得分为0.926。

著录项

期刊名称 Sensors (Basel Switzerland)
作者
Yagya Raj Pandeya; Bhuwan Bhattarai; Joonwhoan Lee;
展开▼
作者单位

展开▼
年(卷),期 2021(21),14
年度 2021
页码 4927
总页数 22
原文格式 PDF
正文语种
中图分类
关键词

机译：通道和过滤可分离卷积;端到端的情感分类;单峰和多式联数;

相似文献

外文文献
中文文献
专利

1. Deep learning-based late fusion of multimodal information for emotion classification of music video [J] . Yagya Raj Pandeya, Joonwhoan Lee Multimedia Tools and Applications . 2021,第2期

机译：基于深度学习的音乐视频情感分类的多峰信息深融合
2. Classification of emotions induced by music videos and correlation with participants' rating [J] . Syed Naser Daimi, Goutam Saha Expert Systems with Application . 2014,第13期

机译：音乐视频诱发的情绪分类以及与参与者的评分相关
3. A Multimodal Music Emotion Classification Method Based on Multifeature Combined Network Classifier [J] . Changfeng Chen, Qiang Li Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：一种基于多焦点组合网络分类器的多模式音乐情感分类方法
4. Multi-label Emotion Classification in Music Videos Using Ensembles of Audio and Video Features [C] . Bruno Kostiuk, Yandre M. G. Costa, Alceu S. Britto, IEEE International Conference on Tools with Artificial Intelligence . 2019

机译：使用音频和视频功能的组合在音乐视频中进行多标签情感分类
5. Multimodal Sensing and Data Processing for Speaker and Emotion Recognition Using Deep Learning Models with Audio, Video and Biomedical Sensors [D] . Abtahi, Farnaz. 2018

机译：使用具有音频，视频和生物医学传感器的深度学习模型，对说话人和情感识别进行多模式传感和数据处理
6. Women’s Empowerment Agency and Self-Determination in Afrobeats Music Videos: A Multimodal Critical Discourse Analysis [O] . Simphiwe Emmanuel Rens 2021

机译：妇女赋予Afrobeats音乐视频的赋权代理和自我决定：多模式关键话语分析
7. Deep learning-based late fusion of multimodal information for emotion classification of music video [O] . Yagya Raj Pandeya, Joonwhoan Lee 2020

机译：基于深度学习的音乐视频情感分类的多峰信息深融合

Deep-Learning-Based Multimodal Emotion Classification for Music Videos

摘要

著录项

相似文献

相关主题

期刊订阅