HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

Jie Lin; Ling-Yu Duan; Shiqi Wang; Yan Bai; Yihang Lou; Vijay Chandrasekhar; Tiejun Huang; Alex Kot; Wen Gao

首页> 外文期刊>IEEE transactions on multimedia >HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

【24h】

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

机译：HNIP：用于视频匹配，本地化和检索的紧凑型深度不变表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With emerging demand for large-scale video analysis, MPEG initiated the compact descriptor for video analysis (CDVA) standardization in 2014. Beyond handcrafted descriptors adopted by the current MPEG-CDVA reference model, we study the problem of deep learned global descriptors for video matching, localization, and retrieval. First, inspired by a recent invariance theory, we propose a nested invariance pooling (NIP) method to derive compact deep global descriptors from convolutional neural networks (CNNs), by progressively encoding translation, scale, and rotation invariances into the pooled descriptors. Second, our empirical studies have shown that a sequence of well designed pooling moments (e.g., max or average) may drastically impact video matching performance, which motivates us to design hybrid pooling operations via NIP (HNIP). HNIP has further improved the discriminability of deep global descriptors. Third, the technical merits and performance improvements by combining deep and handcrafted descriptors are provided to better investigate the complementary effects. We evaluate the effectiveness of HNIP within the well-established MPEG-CDVA evaluation framework. The extensive experiments have demonstrated that HNIP outperforms the state-of-the-art deep and canonical handcrafted descriptors with significant mAP gains of 5.5% and 4.7%, respectively. In particular the combination of HNIP incorporated CNN descriptors and handcrafted global descriptors has significantly boosted the performance of CDVA core techniques with comparable descriptor size.

机译：随着对大规模视频分析的新兴需求，MPEG在2014年启动了用于视频分析（CDVA）标准化的紧凑描述符。除了当前MPEG-CDVA参考模型采用的手工描述符之外，我们还研究了用于视频匹配的深度学习全局描述符问题，本地化和检索。首先，受最近不变性理论的启发，我们提出了一种嵌套不变性合并（NIP）方法，通过将平移，缩放和旋转不变性逐步编码到合并的描述符中，从卷积神经网络（CNN）导出紧凑的深度全局描述符。其次，我们的经验研究表明，精心设计的一系列合并时刻（例如，最大或平均）可能会严重影响视频匹配性能，这促使我们通过NIP（HNIP）设计混合合并操作。 HNIP进一步改善了深层全局描述符的可分辨性。第三，通过结合深度描述和手工制作的描述符来提供技术优点和性能改进，以更好地研究互补效果。我们在完善的MPEG-CDVA评估框架内评估HNIP的有效性。广泛的实验表明，HNIP的mAP增益分别为5.5％和4.7％，远胜过最新的深度和规范的手工制作描述符。特别是，结合了HNIP的CNN描述符和手工制作的全局描述符的组合，以可比的描述符大小显着提高了CDVA核心技术的性能。

著录项

来源
《IEEE transactions on multimedia》 |2017年第9期|1968-1983|共16页
作者
Jie Lin; Ling-Yu Duan; Shiqi Wang; Yan Bai; Yihang Lou; Vijay Chandrasekhar; Tiejun Huang; Alex Kot; Wen Gao;
展开▼
作者单位

Institute for Infocomm Research, A*STAR, Singapore;

Institute of Digital Media, Peking University, Beijing, China;

Department of Computer Science, City University of Hong Kong, Hong Kong, China;

Institute of Digital Media, Peking University, Beijing, China;

Institute of Digital Media, Peking University, Beijing, China;

Nanyang Technological University, Singapore;

Institute of Digital Media, Peking University, Beijing, China;

Institute for Infocomm Research, A*STAR, Singapore;

Institute of Digital Media, Peking University, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Transform coding; Encoding; Feature extraction; Robustness; Standards; Visualization; Image coding;

机译：变换编码;编码;特征提取;稳健性;标准;可视化;图像编码;

相似文献

外文文献
中文文献
专利

1. Invariant multi-scale descriptor for shape representation, matching and retrieval [J] . Jianyu Yang, Hongxing Wang, Junsong Yuan, Computer vision and image understanding . 2016,第apra期

机译：用于形状表示，匹配和检索的不变多尺度描述符
2. Matching Trajectories between Video Sequences by Exploiting a Sparse Projective Invariant Representation [J] . Nunziati Walter, Sclaroff Stan, Del Bimbo Alberto Pattern Analysis and Machine Intelligence, IEEE Transactions on . 2010,第3期

机译：利用稀疏投影不变表示匹配视频序列之间的轨迹
3. From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval [J] . Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le Tan, ACM transactions on multimedia computing communications and applications . 2019,第2期

机译：从选择性深度卷积特征到用于图像检索的紧凑型二进制表示
4. Compact Deep Invariant Descriptors for Video Retrieval [C] . Yihang Lou, Yan Bai, Jie Lin, Data compression conference . 2017

机译：用于视频检索的紧凑型深度不变描述符
5. Robust view-invariant representation for classification and retrieval in image and video data. [D] . Chen, Xu. 2010

机译：用于图像和视频数据分类和检索的稳健的视图不变表示。
6. MBR-SIFT: A mirror reflected invariant feature descriptor using a binary representation for image matching [O] . Mingzhe Su, Yan Ma, Xiangfen Zhang, -1

机译：MBR-SIFT：使用二进制表示进行图像匹配的镜面反射不变特征描述符
7. An Invariant Representation for Matching Trajectories across Uncalibrated Video Streams [O] . Sclaroff, Stan, Del Bimbo, Alberto, Nunziati, Walter 2005

机译：匹配的不变表示跨未校准视频流的轨迹
8. Invariant Representation for Matching Trajectories Across Uncalibrated Video Streams [R] . Nunziati, W. , Sclaroff, S. , Del Bimbo, A. 2005

机译：用于匹配未校准视频流中的轨迹的不变表示

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅