A Fine Granularity Object-Level Representation for Event Detection and Recounting

Zhang Hao; Ngo Chong-Wah

首页> 外文期刊>IEEE transactions on multimedia >A Fine Granularity Object-Level Representation for Event Detection and Recounting

【24h】

A Fine Granularity Object-Level Representation for Event Detection and Recounting

机译：事件检测和叙述的细粒度对象级表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multimedia events such as "birthday party" usually involve the complex interaction between humans and objects. Unlike actions and sports, these events rarely contain unique motion patterns to be vividly explored for recognition. To encode rich objects in the events, a common practice is to tag an individual video frame with object labels, represented as a vector signifying probabilities of object appearances. These vectors are then pooled across frames to obtain a video-level representation. The current practices suffer from two deficiencies due to the direct employment of deep convolutional neural network (DCNN) and standard feature pooling techniques. First, the use of max-pooling and softmax layers in DCNN overemphasize the primary object or scene in a frame, producing a sparse vector that overlooks the existence of secondary or small-size objects. Second, feature pooling by max or average operator over sparse vectors makes the video-level feature unpredictable in modeling the object composition of an event. To address these problems, this paper proposes a new video representation, named Object-VLAD, which treats each object equally and encodes them into a vector for multimedia event detection. Furthermore, the vector can be flexibly decoded to identify evidences such as key objects to recount the reason why a video is retrieved for an event of interest. Experiments conducted on MED13 and MED14 datasets verify the merit of Object-VLAD by consistently outperforming several state-of-the-arts in both event detection and recounting.

机译：“生日聚会”等多媒体事件通常涉及人与人物之间的复杂互动。与行动和体育不同，这些事件很少包含独特的运动模式，以便生动地探索认可。为了在事件中编码丰富的对象，常识是使用对象标签标记单个视频帧，表示为对象外观的矢量表示概率。然后将这些向量汇集在帧上以获得视频级表示。由于深度卷积神经网络（DCNN）和标准特征池技术直接就业，目前的实践遭受了两种缺陷。首先，在DCNN中使用MAX池和SoftMax层封存帧中的主要对象或场景，产生疏忽的速度向量，忽略次要或小尺寸对象的存在。其次，MAX或普通运算符在稀疏向量上汇总的功能使视频级功能无法预测，在建模事件的对象组合中。为了解决这些问题，本文提出了一个名为Object-VLAD的新视频表示，它同样对待每个对象并将它们进行编码为多媒体事件检测的向量。此外，可以灵活地解码向量以识别诸如关键对象的证据，以重新计算用于感兴趣的事件检索视频的原因。在MED13和MED14数据集上进行的实验通过在事件检测和回忆中一致地表现出几种最先进的若干现实，验证了对象-VLAD的优点。

著录项

来源
《IEEE transactions on multimedia》 |2019年第6期|1450-1463|共14页
作者
Zhang Hao; Ngo Chong-Wah;
展开▼
作者单位

City Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China;

City Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multimedia event detection and recounting; object encoding; search result reasoning;

机译：多媒体事件检测和叙述;对象编码;搜索结果推理;

相似文献

外文文献
中文文献
专利

1. A Fine Granularity Object-Level Representation for Event Detection and Recounting [J] . Zhang Hao, Ngo Chong-Wah IEEE transactions on multimedia . 2019,第6期

机译：用于事件检测和重新计数的精细粒度对象级表示
2. Mismatched feature detection with finer granularity for emotional speaker recognition [J] . Li?Chen, Ying-chun?Yang, Zhao-hui?Wu Journal of Zhejiang university science . 2014,第10期

机译：不匹配的特征检测和更精细的粒度，可实现情感说话者识别
3. Mismatched feature detection with finer granularity for emotional speaker recognition [J] . Li CHEN, Ying-chun YANG, Zhao-hui WU 浙江大学学报（英文版）（C辑：计算机与电子） . 2014,第010期

机译：不匹配的特征检测和更精细的粒度，可实现情感说话者识别
4. LEARNING OBJECT-LEVEL SPATIO-TEMPORAL REPRESENTATION FOR ABNORMAL EVENT DETECTION [C] . Jongmin Yu, Sejeong Lee, Moongu Jeon IASTED international conference on modelling, identification and control . 2017

机译：用于异常事件检测的学习对象水平时空表示
5. Learning, detection, representation, indexing and retrieval of multi-agent events in videos. [D] . Hakeem, Asaad. 2007

机译：视频中多主体事件的学习，检测，表示，索引和检索。
6. Representation Learning for Fine-Grained Change Detection [O] . Niall O’ Mahony, Sean Campbell, Lenka Krpalkova, 2021

机译：用于细粒度变化检测的表示学习
7. Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge [O] . Hinami, Ryota, Mei, Tao, Satoh, Shin'ichi 2017

机译：深度学习联合检测和重述异常事件通用知识
8. Highly Efficient Multimedia Event Recounting from User Semantic Preferences. [R] . Tsai, C., Alexander, M. L., Okwara, N., 2014

机译：从用户语义偏好重新计算高效的多媒体事件。

A Fine Granularity Object-Level Representation for Event Detection and Recounting

摘要

著录项

相似文献

相关主题

期刊订阅