Generating unambiguous and diverse referring expressions

Nikolaos Panagiaris; Emma Hart; Dimitra Gkatzia

首页> 外文期刊>Computer speech and language >Generating unambiguous and diverse referring expressions

【24h】

Generating unambiguous and diverse referring expressions

机译：生成明确和不同的引用表达式

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Neural Referring Expression Generation (REG) models have shown promising results in generating expressions which uniquely describe visual objects. However, current REG models still lack the ability to produce diverse and unambiguous referring expressions (REs). To address the lack of diversity, we propose generating a set of diverse REs, rather than one-shot REs. To reduce the ambiguity of referring expressions, we directly optimise non-differ-entiable test metrics using reinforcement learning (RL), and we show that our approaches achieve better results under multiple different settings. Specifically, we initially present a novel RL approach to REG training, which instead of drawing one sample per input, it averages over multiple samples to normalize the reward during RL training. Secondly, we present an innovative REG model that utilizes an object attention mechanism that explicitly incorporates information about the target object and is optimised using our proposed RL approach. Thirdly, we propose a novel transformer model optimised with RL that exploits different levels of visual information. Our human evaluation demonstrates the effectiveness of this model, where we improve the state-of-the-art results in RefCOCO testA and testB in terms of task success from 76.95% to 81.66% and from 78.10% to 83.33% respectively. While in RefCOCO+ testA we show improvements from 58.85% to 83.33%. Finally, we present a thorough comparison of diverse decoding strategies (sampling and maximisation-based) and how they control the trade-off between the quality and diversity.

机译：神经引用的表达生成（reg）模型已经显示出有希望的结果，生成唯一描述视觉对象的表达式。然而，当前的REG模型仍然缺乏产生多样化和明确的参考表达式（RES）的能力。为了解决缺乏多样性，我们建议产生一套不同的res，而不是单次res。为了减少引用表达式的歧义，我们使用强化学习（RL）直接优化非差异可变的测试指标，并且我们表明我们的方法在多个不同的设置下实现了更好的结果。具体而言，我们最初提出了一种新颖的RL方法来训练训练，而不是每次输入绘制一个样本，它在多个样本上平均到R1训练期间的奖励。其次，我们提出了一种创新的reg模型，它利用了对象注意机制，该机制明确地结合了关于目标对象的信息，并使用我们提出的RL方法进行了优化。第三，我们提出了一种用RL优化的新型变压器模型，该模型利用不同级别的视觉信息。我们的人类评估展示了该模型的有效性，在那里，在任务成功的方面，从76.95％到81.66％，分别为78.10％至83.33％，我们将改善最先进的Resta和TestB。虽然在Refcoco + Testa中，我们显示出58.85％的改善至83.33％。最后，我们彻底比较了各种解码策略（采样和最大化的）以及它们如何控制质量和多样性之间的权衡。

著录项

来源
《Computer speech and language》 |2021年第7期|101184.1-101184.25|共25页
作者
Nikolaos Panagiaris; Emma Hart; Dimitra Gkatzia;
展开▼
作者单位

Edinburgh Napier University Edinburgh United Kingdom;

Edinburgh Napier University Edinburgh United Kingdom;

Edinburgh Napier University Edinburgh United Kingdom;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Referring expression generation; Natural language generation; Neural models;

机译：参考表达生成;自然语言生成;神经模型;

相似文献

外文文献
中文文献
专利

1. Unambiguous Scene Text Segmentation With Referring Expression Comprehension [J] . Rong Xuejian, Yi Chucai, Tian Yingli IEEE Transactions on Image Processing . 2020,第期

机译：具有参考表达理解的明确场景文本分段
2. Generating Spatial Referring Expressions in Interactive 3D Worlds [J] . Silva Diego dos Santos, Paraboni Ivandre Spatial cognition and computation . 2015,第3期

机译：在交互式3D世界中生成空间参照表达
3. Cross-cultural assessment of automatically generated multimodal referring expressions in a virtual world [J] . Van Der Sluis I., Luz S., Breitfu? W., International journal of human-computer studies . 2012,第9期

机译：虚拟世界中自动生成的多模式参照表达的跨文化评估
4. Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments [C] . Fethiye Irmak Dogan, Sinan Kalkan, Iolanda Leite IEEE/RSJ International Conference on Intelligent Robots and Systems . 2019

机译：学习为真实环境产生明确的空间引用表达式
5. Object Localization from RGB-D Images and Spatial Referring Expressions [D] . Mauceri, Cecilia. 2021

机译：来自RGB-D图像和空间引用表达式的对象本地化
6. Unambiguous Forward-Looking SAR Imaging on HSV-R Using Frequency Diverse Array [O] . Mengdi Zhang, Guisheng Liao, Xiongpeng He, 2020

机译：使用分频阵列在HSV-R上进行清晰的前视SAR成像
7. Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments [O] . Fethiye Irmak Dogan, Sinan Kalkan, Iolanda Leite 2019

机译：学习为真实环境产生明确的空间引用表达式

Generating unambiguous and diverse referring expressions

摘要

著录项

相似文献

相关主题

期刊订阅