...
首页> 外文期刊>IEEE transactions on multimedia >Where-and-When to Look: Deep Siamese Attention Networks for Video-Based Person Re-Identification
【24h】

Where-and-When to Look: Deep Siamese Attention Networks for Video-Based Person Re-Identification

机译:何时何地:基于视频的人员重新识别的深层暹罗注意网络

获取原文
获取原文并翻译 | 示例
           

摘要

Video-based person re-identification (re-id) is a central application in surveillance systems with a significant concern in security. Matching persons across disjoint camera views in their video fragments are inherently challenging due to the large visual variations and uncontrolled frame rates. There are two steps crucial to person re-id, namely, discriminative feature learning and metric learning. However, existing approaches consider the two steps independently, and they do not make full use of the temporal and spatial information in the videos. In this paper, we propose a Siamese attention architecture that jointly learns spatiotemporal video representations and their similarity metrics. The network extracts local convolutional features from regions of each frame and enhances their discriminative capability by focusing on distinct regions when measuring the similarity with another pedestrian video. The attention mechanism is embedded into spatial gated recurrent units to selectively propagate relevant features and memorize their spatial dependencies through the network. The model essentially learns which parts (where) from which frames (when) are relevant and distinctive for matching persons and attaches higher importance therein. The proposed Siamese model is end-to-end trainable to jointly learn comparable hidden representations for paired pedestrian videos and their similarity value. Extensive experiments on three benchmark datasets show the effectiveness of each component of the proposed deep network while outperforming state-of-the-art methods.
机译:基于视频的人员重新识别(re-id)是监视系统中的一个中心应用,其安全性受到极大关注。由于视频变化较大且帧速率不受控制,因此在视频片段中通过不相交的相机视图匹配人本身具有挑战性。对于人员重新识别至关重要的两个步骤,即区分特征学习和度量学习。但是,现有方法独立地考虑了两个步骤,并且它们没有充分利用视频中的时间和空间信息。在本文中,我们提出了一种暹罗注意力体系,该体系可共同学习时空视频表示及其相似性指标。该网络从每个帧的区域提取局部卷积特征,并在测量与另一个行人视频的相似度时将注意力集中在不同的区域上,从而增强了它们的判别能力。注意机制嵌入到空间门控循环单元中,以有选择地传播相关特征并通过网络存储其空间依赖性。该模型从本质上学习了哪个帧(何时),哪个帧(何时)对于匹配人员而言是相关且独特的,并在其中具有更高的重要性。所提出的暹罗模型是端到端可训练的,可以共同学习配对行人视频及其相似度值的可比较隐藏表示。在三个基准数据集上进行的大量实验表明,所提出的深层网络每个组件的有效性都超过了最新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号