首页> 外文期刊>Computer speech and language >Generating unambiguous and diverse referring expressions
【24h】

Generating unambiguous and diverse referring expressions

机译:生成明确和不同的引用表达式

获取原文
获取原文并翻译 | 示例
       

摘要

Neural Referring Expression Generation (REG) models have shown promising results in generating expressions which uniquely describe visual objects. However, current REG models still lack the ability to produce diverse and unambiguous referring expressions (REs). To address the lack of diversity, we propose generating a set of diverse REs, rather than one-shot REs. To reduce the ambiguity of referring expressions, we directly optimise non-differ-entiable test metrics using reinforcement learning (RL), and we show that our approaches achieve better results under multiple different settings. Specifically, we initially present a novel RL approach to REG training, which instead of drawing one sample per input, it averages over multiple samples to normalize the reward during RL training. Secondly, we present an innovative REG model that utilizes an object attention mechanism that explicitly incorporates information about the target object and is optimised using our proposed RL approach. Thirdly, we propose a novel transformer model optimised with RL that exploits different levels of visual information. Our human evaluation demonstrates the effectiveness of this model, where we improve the state-of-the-art results in RefCOCO testA and testB in terms of task success from 76.95% to 81.66% and from 78.10% to 83.33% respectively. While in RefCOCO+ testA we show improvements from 58.85% to 83.33%. Finally, we present a thorough comparison of diverse decoding strategies (sampling and maximisation-based) and how they control the trade-off between the quality and diversity.
机译:神经引用的表达生成(reg)模型已经显示出有希望的结果,生成唯一描述视觉对象的表达式。然而,当前的REG模型仍然缺乏产生多样化和明确的参考表达式(RES)的能力。为了解决缺乏多样性,我们建议产生一套不同的res,而不是单次res。为了减少引用表达式的歧义,我们使用强化学习(RL)直接优化非差异可变的测试指标,并且我们表明我们的方法在多个不同的设置下实现了更好的结果。具体而言,我们最初提出了一种新颖的RL方法来训练训练,而不是每次输入绘制一个样本,它在多个样本上平均到R1训练期间的奖励。其次,我们提出了一种创新的reg模型,它利用了对象注意机制,该机制明确地结合了关于目标对象的信息,并使用我们提出的RL方法进行了优化。第三,我们提出了一种用RL优化的新型变压器模型,该模型利用不同级别的视觉信息。我们的人类评估展示了该模型的有效性,在那里,在任务成功的方面,从76.95%到81.66%,分别为78.10%至83.33%,我们将改善最先进的Resta和TestB。虽然在Refcoco + Testa中,我们显示出58.85%的改善至83.33%。最后,我们彻底比较了各种解码策略(采样和最大化的)以及它们如何控制质量和多样性之间的权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号