Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

机译：骨架索引深度多模态特征学习，用于高性能人体动作识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new framework for action recognition with multi-modal data. A skeleton-indexed feature learning procedure is developed to further exploit the detailed local features from RGB and optical flow videos. In particular, the proposed framework is built based on a deep Convolutional Network (ConvNet) and a Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM). A skeleton-indexed transform layer is designed to automatically extract visual features around key joints, and a part-aggregated pooling is developed to uniformly regulate the visual features from different body parts and actors. Besides, several fusion schemes are explored to take advantage of multi-modal data. The proposed deep architecture is end-to-end trainable and can better incorporate different modalities to learn effective feature representations. Quantitative experiment results on two datasets, the NTU RGB+D dataset and the MSR dataset, demonstrate the excellent performance of our scheme over other state-of-the-arts. To our knowledge, the performance obtained by the proposed framework is currently the best on the challenging NTU RGB+D dataset.

机译：本文提出了一种用于多模式数据的动作识别的新框架。开发了骨架索引特征学习程序，以进一步利用RGB和光流视频中的详细局部特征。特别地，所提出的框架是基于深度卷积网络（ConvNet）和具有长短期记忆（LSTM）的递归神经网络（RNN）构建的。骨架索引转换层旨在自动提取关键关节周围的视觉特征，并开发了部分集合池以统一调节来自不同身体部位和演员的视觉特征。此外，探索了几种融合方案以利用多模态数据。所提出的深度架构是端到端可训练的，并且可以更好地合并不同的模式以学习有效的特征表示。在两个数据集（NTU RGB + D数据集和MSR数据集）上的定量实验结果证明了我们的方案优于其他最新技术的性能。据我们所知，在具有挑战性的NTU RGB + D数据集上，所提出的框架获得的性能目前是最好的。

著录项

来源
《IEEE International Conference on Multimedia and Expo》|2018年|1-6|共6页
会议地点
作者
Sijie Song; Cuiling Lan; Junliang Xing; Wenjun Zeng; Jiaying Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
convolution; feature extraction; feedforward neural nets; image recognition; image representation; learning (artificial intelligence); recurrent neural nets;

机译：卷积;特征提取;前馈神经网络;图像识别;图像表示;学习（人工智能）;递归神经网络;

相似文献

外文文献
中文文献
专利

1. Electromagnetic Signal Feature Fusion and Recognition based on Multi-Modal Deep Learning [J] . Changbo Hou, Xiao Zhang, Xiang Chen International Journal of Performability Engineering . 2020,第6期

机译：基于多模态深度学习的电磁信号特征融合与识别
2. Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition [J] . Verma Pratishtha, Sah Animesh, Srivastava Rajeev Multimedia Systems . 2020,第6期

机译：基于深度学习的多模态方法，使用RGB和骨架序列进行人类活动识别
3. Learning Long-Term Temporal Features With Deep Neural Networks for Human Action Recognition [J] . Yu Sheng, Xie Li, Liu Lin, Quality Control, Transactions . 2020,第期

机译：学习具有深度神经网络的长期时间特征，用于人类行动识别
4. Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition [C] . Sijie Song, Cuiling Lan, Junliang Xing, IEEE International Conference on Multimedia and Expo . 2018

机译：高性能人体行动认可的骨架索引深层多模态特征学习
5. Reducing Covariate Factors of Gait Recognition Using Feature Selection, Dictionary-Based Sparse Coding, and Deep Learning. [D] . Alotaibi, Munif. 2017

机译：使用特征选择，基于字典的稀疏编码和深度学习减少步态识别的协变量因素。
6. Comparing Humans and Deep Learning Performance for Grading AMD: A Study in Using Universal Deep Features and Transfer Learning for Automated AMD Analysis [O] . Philippe Burlina, Katia D. Pacheco, Neil Joshi, -1

机译：比较人类和深度学习性能以评估AMD：使用通用深度功能和转移学习进行自动AMD分析的研究
7. Complex Human Action Recognition Using a Hierarchical Feature Reduction and Deep Learning-Based Method [O] . Fatemeh Serpush, Mahdi Rezaei 2021

机译：使用分层特征减少和基于深度学习的方法复杂的人体行动识别

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

摘要

著录项

相似文献

相关主题

期刊订阅