首页> 外文会议>International Conference on Advanced Communication Technology >Unsupervised Object of Interest Discovery in Multi-view Video Sequence
【24h】

Unsupervised Object of Interest Discovery in Multi-view Video Sequence

机译:多视图视频序列中的兴趣发现对象

获取原文

摘要

This paper presents a novel algorithm in unsupervised object of interest discovery for multi-view video sequences. We classify a multi-view video sequence based on the degree of movement in a video sequence. In a video sequence with movement, we first group video frames along and across views as a group of picture (GOP). Key points or feature vectors representing textures existing in video frames in GOP are extracted using Scale-Invariant Feature Transform (SIFT). Key points are clustered using K-mean algorithm. Visual words are assigned to all key points based on their clusters. Patches represented small areas with textures are generated using the Maximally Stable Extremal Regions (MSER) operator. One patch can contain more than one key point, which leads to more than one visual word. Therefore, the patch can be represented by different visual words in different degrees. Motion detection algorithm is used to determine movement regions in video frames. Patches in the movement regions have higher likelihoods to be parts of the object of interest. With the developed spatial modeling and appearance modeling as well as the motion detection, we compute the likelihood which patches will belong to the object of interest. The group of patches with high likelihoods is clustered and indicated as the object of interest. When there are no or not significant movement, we assume that the human subjects are the most important objects in video sequences. A face detection algorithm is used to determine the location of the object of interest. When there are no human subjects in video sequences, the frequencies of visual words occurring in video sequences are used to identify the object of interest. This can be done because patches, which will be parts of the objects of interest, can be derived from the visual words. The experimental results in various types of multi-view video sequences show that our proposed algorithm can discover the objects of interest in multi-view video sequences correctly over 80percent by average.
机译:本文介绍了多维兴趣对象的新型算法,用于多视图视频序列。我们根据视频序列中的移动程度来分类多视图视频序列。在具有移动的视频序列中,我们将在一组图片(GOP)沿着和跨视图进行视频帧。使用比例不变特征变换(SIFT)提取代表GOP中的视频帧中存在的纹理的关键点或特征向量。使用k均值算法群集关键点。视觉单词基于其集群分配给所有关键点。使用最大稳定的极值区域(MSER)操作员,将贴片表示具有纹理的小区域。一个补丁可以包含多个关键点,这导致多于一个视觉字。因此,补丁可以由不同程度的不同视觉单词表示。运动检测算法用于确定视频帧中的移动区域。移动区域中的贴片具有更高的可能性是对感兴趣对象的部分。通过开发的空间建模和外观建模以及运动检测,我们计算修补程序属于感兴趣对象的可能性。具有高似然性的斑块组成聚类,并表示为感兴趣的对象。当没有重要的运动时,我们假设人类受试者是视频序列中最重要的对象。面部检测算法用于确定感兴趣对象的位置。当视频序列中没有人类受试者时,在视频序列中发生的视觉词的频率用于识别感兴趣的对象。这可以是因为补丁,它将是感兴趣对象的部分,可以从视觉单词中派生。各种类型的多视图视频序列的实验结果表明,我们的所提出的算法可以通过平均值正确地发现多视图视频序列中的感兴趣对象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号