...
首页> 外文期刊>Multimedia Tools and Applications >Exploiting weak mask representation with convolutional neural networks for accurate object tracking
【24h】

Exploiting weak mask representation with convolutional neural networks for accurate object tracking

机译:利用卷积神经网络开发弱掩码表示以进行精确的对象跟踪

获取原文
获取原文并翻译 | 示例
           

摘要

Recent years have witnessed the popularity of Convolutional Neural Networks (CNN) in a variety of computer vision tasks, including video object tracking. Existing object tracking methods with CNN employ either a scalar score or a confidence map as CNN's output, which suffer the infeasibility of estimating the object's accurate scale and rotation angle. Specifically, as with other traditional methods, they assume the targets' scale aspect ratio and rotation angle are fixed. To address the limitation, we propose to take a binary mask as the output of CNN for tracking. To this end, we adapt a semantic segmentation model by online fine-tuning with augmented samples in the initial frame to uncover the target in the following frames. During the generation of training samples, we employ a Crop and Paste method to better utilize context information, add a random value to lightness component to mimic the illumination change, and take a Gaussian filtering approach to mimic the blur. During the tracking, due to the limitation of CNN's receptive field size and spatial resolution, the network may fail to identify the target if the estimated bounding box is considerably incorrect. Therefore we propose a bounding box approximation method by considering temporal consistency. Excluding the initial training cost, our tracker runs at 41 FPS on a single GeForce 1080Ti GPU. Evaluated on benchmarks including OTB-2015, VOT-2016 and TempleColor, it achieves comparable results with non real-time top trackers and state-of-the-art performance among those real-time ones.
机译:近年来,目睹了卷积神经网络(CNN)在各种计算机视觉任务(包括视频对象跟踪)中的普及。现有的使用CNN的对象跟踪方法采用标量得分或置信度图作为CNN的输出,这难以估计对象的准确比例和旋转角度。具体来说,与其他传统方法一样,它们假定目标的比例长宽比和旋转角度是固定的。为了解决此限制,我们建议采用二进制掩码作为CNN的输出进行跟踪。为此,我们通过在初始框架中使用增强样本进行在线微调来适应语义分割模型,以在随后的框架中发现目标。在生成训练样本的过程中,我们采用“裁剪和粘贴”方法来更好地利用上下文信息,向亮度分量添加随机值以模拟照明变化,并采用高斯滤波方法来模拟模糊。在跟踪期间,由于CNN的接收字段大小和空间分辨率的限制,如果估算的边界框非常不正确,则网络可能无法识别目标。因此,我们提出一种考虑时间一致性的边界框近似方法。不包括初始培训费用,我们的跟踪器在单个GeForce 1080Ti GPU上的运行速度为41 FPS。在包括OTB-2015,VOT-2016和TempleColor在内的基准上进行了评估,它可与非实时顶级跟踪器和这些实时跟踪器中的最新性能相媲美。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号