...
首页> 外文期刊>IEEE transactions on multimedia >HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval
【24h】

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

机译:HNIP:用于视频匹配,本地化和检索的紧凑型深度不变表示

获取原文
获取原文并翻译 | 示例
           

摘要

With emerging demand for large-scale video analysis, MPEG initiated the compact descriptor for video analysis (CDVA) standardization in 2014. Beyond handcrafted descriptors adopted by the current MPEG-CDVA reference model, we study the problem of deep learned global descriptors for video matching, localization, and retrieval. First, inspired by a recent invariance theory, we propose a nested invariance pooling (NIP) method to derive compact deep global descriptors from convolutional neural networks (CNNs), by progressively encoding translation, scale, and rotation invariances into the pooled descriptors. Second, our empirical studies have shown that a sequence of well designed pooling moments (e.g., max or average) may drastically impact video matching performance, which motivates us to design hybrid pooling operations via NIP (HNIP). HNIP has further improved the discriminability of deep global descriptors. Third, the technical merits and performance improvements by combining deep and handcrafted descriptors are provided to better investigate the complementary effects. We evaluate the effectiveness of HNIP within the well-established MPEG-CDVA evaluation framework. The extensive experiments have demonstrated that HNIP outperforms the state-of-the-art deep and canonical handcrafted descriptors with significant mAP gains of 5.5% and 4.7%, respectively. In particular the combination of HNIP incorporated CNN descriptors and handcrafted global descriptors has significantly boosted the performance of CDVA core techniques with comparable descriptor size.
机译:随着对大规模视频分析的新兴需求,MPEG在2014年启动了用于视频分析(CDVA)标准化的紧凑描述符。除了当前MPEG-CDVA参考模型采用的手工描述符之外,我们还研究了用于视频匹配的深度学习全局描述符问题,本地化和检索。首先,受最近不变性理论的启发,我们提出了一种嵌套不变性合并(NIP)方法,通过将平移,缩放和旋转不变性逐步编码到合并的描述符中,从卷积神经网络(CNN)导出紧凑的深度全局描述符。其次,我们的经验研究表明,精心设计的一系列合并时刻(例如,最大或平均)可能会严重影响视频匹配性能,这促使我们通过NIP(HNIP)设计混合合并操作。 HNIP进一步改善了深层全局描述符的可分辨性。第三,通过结合深度描述和手工制作的描述符来提供技术优点和性能改进,以更好地研究互补效果。我们在完善的MPEG-CDVA评估框架内评估HNIP的有效性。广泛的实验表明,HNIP的mAP增益分别为5.5%和4.7%,远胜过最新的深度和规范的手工制作描述符。特别是,结合了HNIP的CNN描述符和手工制作的全局描述符的组合,以可比的描述符大小显着提高了CDVA核心技术的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号