【24h】

WSLLN: Weakly Supervised Natural Language Localization Networks

机译:WSLLN:缺乏监督的自然语言本地化网络

获取原文

摘要

We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries. To learn the correspondence between visual segments and texts, most previous methods require temporal coordinates (start and end times) of events for training, which leads to high costs of annotation. WSLLN relieves the annotation burden by training with only video-sentence pairs without accessing to temporal locations of events. With a simple end-to-end structure, WSLLN measures segment-text consistency and conducts segment selection (conditioned on the text) simultaneously. Results from both are merged and optimized as a video-sentence matching problem. Experiments on ActivityNet Captions and DiDeMo demonstrate that WSLLN achieves state-of-the-art performance.
机译:我们建议使用弱监督的语言本地化网络(WSLLN),以在给定语言查询的情况下检测未修剪的长视频中的事件。为了学习视觉片段和文本之间的对应关系,大多数先前的方法都需要训练事件的时间坐标(开始和结束时间),这导致注释的成本很高。 WSLLN通过仅使用视频句子对进行训练而无需访问事件的时间位置,从而减轻了注释负担。通过简单的端到端结构,WSLLN可以测量段文本的一致性并同时进行段选择(以文本为条件)。来自两者的结果被合并和优化为视频句子匹配问题。在ActivityNet字幕和DiDeMo上进行的实验表明,WSLLN达到了最先进的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号