首页> 外文会议>IEEE International Conference on Multimedia and Expo >Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition
【24h】

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

机译:骨架索引深度多模态特征学习,用于高性能人体动作识别

获取原文

摘要

This paper presents a new framework for action recognition with multi-modal data. A skeleton-indexed feature learning procedure is developed to further exploit the detailed local features from RGB and optical flow videos. In particular, the proposed framework is built based on a deep Convolutional Network (ConvNet) and a Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM). A skeleton-indexed transform layer is designed to automatically extract visual features around key joints, and a part-aggregated pooling is developed to uniformly regulate the visual features from different body parts and actors. Besides, several fusion schemes are explored to take advantage of multi-modal data. The proposed deep architecture is end-to-end trainable and can better incorporate different modalities to learn effective feature representations. Quantitative experiment results on two datasets, the NTU RGB+D dataset and the MSR dataset, demonstrate the excellent performance of our scheme over other state-of-the-arts. To our knowledge, the performance obtained by the proposed framework is currently the best on the challenging NTU RGB+D dataset.
机译:本文提出了一种用于多模式数据的动作识别的新框架。开发了骨架索引特征学习程序,以进一步利用RGB和光流视频中的详细局部特征。特别地,所提出的框架是基于深度卷积网络(ConvNet)和具有长短期记忆(LSTM)的递归神经网络(RNN)构建的。骨架索引转换层旨在自动提取关键关节周围的视觉特征,并开发了部分集合池以统一调节来自不同身体部位和演员的视觉特征。此外,探索了几种融合方案以利用多模态数据。所提出的深度架构是端到端可训练的,并且可以更好地合并不同的模式以学习有效的特征表示。在两个数据集(NTU RGB + D数据集和MSR数据集)上的定量实验结果证明了我们的方案优于其他最新技术的性能。据我们所知,在具有挑战性的NTU RGB + D数据集上,所提出的框架获得的性能目前是最好的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号