首页> 外文会议>IEEE International Conference on Image Processing >RECOGNIZING UNSEEN ACTIONS IN A DOMAIN-ADAPTED EMBEDDING SPACE
【24h】

RECOGNIZING UNSEEN ACTIONS IN A DOMAIN-ADAPTED EMBEDDING SPACE

机译:识别在域适应的嵌入空间中的看不见的操作

获取原文

摘要

With the sustaining bloom of multimedia data, Zero-shot Learning (ZSL) techniques have attracted much attention in recent years for its ability to train learning models that can handle "unseen" categories. Existing ZSL algorithms mainly take advantages of attribute-based semantic space and only focus on static image data. Besides, most ZSL studies merely consider the semantic embedded labels and fail to address domain shift problem. In this paper, we purpose a deep two-output model for video ZSL and action recognition tasks by computing both spatial and temporal features from video contents through distinct Convolutional Neural Networks (CNNs) and training a Multi-layer Perceptron (MLP) upon extracted features to map videos to semantic embedding word vectors. Moreover, we introduce a domain adaptation strategy named "ConSSEV" - by combining outputs from two distinct output layers of our MLP to improve the results of zero-shot learning. Our experiments on UCF101 dataset demonstrate the purposed model has more advantages associated with more complex video embedding schemes, and outperforms the state-of-the-art zero-shot learning techniques.
机译:随着多媒体数据的维持盛开,近年来零射击学习(ZSL)技术因其培训能够处理“看不见”类别的学习模式而引起了很多关注。现有ZSL算法主要采用基于属性的语义空间的优点,并仅关注静态图像数据。此外,大多数ZSL研究仅考虑语义嵌入式标签,并不能解决域移位问题。在本文中,我们目的是通过通过不同的卷积神经网络(CNNS)计算来自视频内容的空间和时间特征,并通过不同的卷积神经网络(CNNS)来训练多层的Perceptron(MLP)来实现一个深度的两个输出识别任务。将视频映射到语义嵌入字向量。此外,我们介绍了名为“conssev”的域适应策略 - 通过组合来自我们的MLP的两个不同输出层的输出来改善零射击学习的结果。我们在UCF101数据集上的实验证明了所用模型与更复杂的视频嵌入方案相关的更优点,并且优于最先进的零射击学习技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号