首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >What if we do not have multiple videos of the same action? - Video Action Localization Using Web Images
【24h】

What if we do not have multiple videos of the same action? - Video Action Localization Using Web Images

机译:如果我们没有多个相同行动的视频怎么办? - 使用Web Images的视频操作本地化

获取原文

摘要

This paper tackles the problem of spatio-temporal action localization in a video, without assuming the availability of multiple videos or any prior annotations. Action is localized by employing images downloaded from internet using action name. Given web images, we first dampen image noise using random walk and evade distracting backgrounds within images using image action proposals. Then, given a video, we generate multiple spatio-temporal action proposals. We suppress camera and background generated proposals by exploiting optical flow gradients within proposals. To obtain the most action representative proposals, we propose to reconstruct action proposals in the video by leveraging the action proposals in images. Moreover, we preserve the temporal smoothness of the video and reconstruct all proposal bounding boxes jointly using the constraints that push the coefficients for each bounding box toward a common consensus, thus enforcing the coefficient similarity across multiple frames. We solve this optimization problem using variant of two-metric projection algorithm. Finally, the video proposal that has the lowest reconstruction cost and is motion salient is used to localize the action. Our method is not only applicable to the trimmed videos, but it can also be used for action localization in untrimmed videos, which is a very challenging problem. We present extensive experiments on trimmed as well as untrimmed datasets to validate the effectiveness of the proposed approach.
机译:本文解决了视频中的时空动作定位问题,而无需假设多个视频或任何先前注释的可用性。采用使用操作名称从Internet下载的图像进行本地化。给定的网络图像,我们首先使用随机步行抑制图像噪声,并使用图像动作提案避免在图像中分散注意的背景。然后,给定视频,我们生成多个时空动作提案。我们通过在提案中利用光学流程梯度来抑制相机和背景生成的建议。为了获得最多的行动代表提案,我们建议通过利用图像中的行动提案来重建视频中的行动提案。此外,我们保留了视频的时间平滑度,并使用将每个边界框推向共同共识的约束来联合重建所有提案边界框,从而强制跨多个帧的系数相似度。我们使用双度量投影算法的变体来解决此优化问题。最后,使用重建成本最低并且是运动突出的视频提议用于本地化动作。我们的方法不仅适用于修剪的视频,而且它也可以用于未经监测视频中的行动本地化,这是一个非常具有挑战性的问题。我们对修剪的大量实验以及未经监测的数据集呈现,以验证所提出的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号