...
首页> 外文期刊>IEEE transactions on multimedia >A Fine Granularity Object-Level Representation for Event Detection and Recounting
【24h】

A Fine Granularity Object-Level Representation for Event Detection and Recounting

机译:事件检测和叙述的细粒度对象级表示

获取原文
获取原文并翻译 | 示例
           

摘要

Multimedia events such as "birthday party" usually involve the complex interaction between humans and objects. Unlike actions and sports, these events rarely contain unique motion patterns to be vividly explored for recognition. To encode rich objects in the events, a common practice is to tag an individual video frame with object labels, represented as a vector signifying probabilities of object appearances. These vectors are then pooled across frames to obtain a video-level representation. The current practices suffer from two deficiencies due to the direct employment of deep convolutional neural network (DCNN) and standard feature pooling techniques. First, the use of max-pooling and softmax layers in DCNN overemphasize the primary object or scene in a frame, producing a sparse vector that overlooks the existence of secondary or small-size objects. Second, feature pooling by max or average operator over sparse vectors makes the video-level feature unpredictable in modeling the object composition of an event. To address these problems, this paper proposes a new video representation, named Object-VLAD, which treats each object equally and encodes them into a vector for multimedia event detection. Furthermore, the vector can be flexibly decoded to identify evidences such as key objects to recount the reason why a video is retrieved for an event of interest. Experiments conducted on MED13 and MED14 datasets verify the merit of Object-VLAD by consistently outperforming several state-of-the-arts in both event detection and recounting.
机译:“生日聚会”等多媒体事件通常涉及人与人物之间的复杂互动。与行动和体育不同,这些事件很少包含独特的运动模式,以便生动地探索认可。为了在事件中编码丰富的对象,常识是使用对象标签标记单个视频帧,表示为对象外观的矢量表示概率。然后将这些向量汇集在帧上以获得视频级表示。由于深度卷积神经网络(DCNN)和标准特征池技术直接就业,目前的实践遭受了两种缺陷。首先,在DCNN中使用MAX池和SoftMax层封存帧中的主要对象或场景,产生疏忽的速度向量,忽略次要或小尺寸对象的存在。其次,MAX或普通运算符在稀疏向量上汇总的功能使视频级功能无法预测,在建模事件的对象组合中。为了解决这些问题,本文提出了一个名为Object-VLAD的新视频表示,它同样对待每个对象并将它们进行编码为多媒体事件检测的向量。此外,可以灵活地解码向量以识别诸如关键对象的证据,以重新计算用于感兴趣的事件检索视频的原因。在MED13和MED14数据集上进行的实验通过在事件检测和回忆中一致地表现出几种最先进的若干现实,验证了对象-VLAD的优点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号