首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Global and Local Knowledge-Aware Attention Network for Action Recognition
【24h】

Global and Local Knowledge-Aware Attention Network for Action Recognition

机译:全球和本地知识意识的行动识别关注网络

获取原文
获取原文并翻译 | 示例
           

摘要

Convolutional neural networks (CNNs) have shown an effective way to learn spatiotemporal representation for action recognition in videos. However, most traditional action recognition algorithms do not employ the attention mechanism to focus on essential parts of video frames that are relevant to the action. In this article, we propose a novel global and local knowledge-aware attention network to address this challenge for action recognition. The proposed network incorporates two types of attention mechanism called statistic-based attention (SA) and learning-based attention (LA) to attach higher importance to the crucial elements in each video frame. As global pooling (GP) models capture global information, while attention models focus on the significant details to make full use of their implicit complementary advantages, our network adopts a three-stream architecture, including two attention streams and a GP stream. Each attention stream employs a fusion layer to combine global and local information and produces composite features. Furthermore, global-attention (GA) regularization is proposed to guide two attention streams to better model dynamics of composite features with the reference to the global information. Fusion at the softmax layer is adopted to make better use of the implicit complementary advantages between SA, LA, and GP streams and get the final comprehensive predictions. The proposed network is trained in an end-to-end fashion and learns efficient video-level features both spatially and temporally. Extensive experiments are conducted on three challenging benchmarks, Kinetics, HMDB51, and UCF101, and experimental results demonstrate that the proposed network outperforms most state-of-the-art methods.
机译:卷积神经网络(CNNS)显示了一种学习视频中的动作识别的时空表示的有效方法。然而,大多数传统的动作识别算法不采用注意机制专注于与动作相关的视频帧的基本部分。在本文中,我们提出了一种新的全球和本地知识意识的关注网络,以解决行动认可的这一挑战。该拟议的网络包含了两种类型的注意机制,称为基于统计的注意力(SA)和基于学习的注意力(LA),以便在每个视频帧中的重要元素附加更高的重要性。随着全球汇集(GP)模型捕获全球信息,而注意模型专注于充分利用其隐含互补优势的重要细节,我们的网络采用三流架构,包括两个注意力流和GP流。每个注意力流采用融合层来组合全局和本地信息并产生复合功能。此外,建议全球关注(GA)正规化,以指导两个注意力流,以更好地参考全局信息的复合特征的模型动态。采用Softmax层的融合来更好地利用SA,LA和GP流之间的隐式互补优势,并获得最终的综合预测。所提出的网络以端到端的方式培训,并在空间和时间上学习高效的视频级功能。在三个具有挑战性的基准测试,动力学,HMDB51和UCF101上进行了广泛的实验,实验结果表明,所提出的网络优于最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号