The synergy of double attention: Combine sentence-level and word-level attention for image captioning

Haiyang Wei; Zhixin Li; Canlong Zhang; Huifang Ma

首页> 外文期刊>Computer vision and image understanding >The synergy of double attention: Combine sentence-level and word-level attention for image captioning

【24h】

The synergy of double attention: Combine sentence-level and word-level attention for image captioning

机译：双重关注的协同作用：相结合句子水平和单词级别的图像标题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The existing attention models of image captioning typically extract only word-level attention information, i.e., the attention mechanism extracts local attention information from the image to generate the current word, and lacks accurate image global information guidance.In this paper, we first propose an image captioning approach based on self-attention.Sentence-level attention information is extracted from the image through self-attention mechanism to represent the global image information needed to generate sentences.Furthermore, we propose a double attention model which combines sentence-level attention model with word-level attention model to generate more accurate captions.We implement supervision and optimization in the intermediate stage of the model to solve information interference problems.In addition, we perform two-stage training with reinforcement learning to optimize the evaluation metric of the model.Finally, we evaluated our model on three standard datasets, i.e., Flickr8k, Flickr30k and MSCOCO.Experimental results show that our double attention model can generate more accurate and richer captions, and outperforms many state-of-the-art image captioning approaches in various evaluation metrics.

机译：现有的图像标题注意模型通常仅提取单词级注意信息，即，注意机制从图像中提取本地注意信息以生成当前的字，并且缺少准确的图像全局信息指导。在本文中，我们首先提出根据自我关注的图像标题方法。通过自我注意机制从图像中提取了级别的注意信息，以表示生成句子所需的全局图像信息.Furtimore，我们提出了一种结合句子级注意模型的双重注意模型通过单词级注意模型来生成更准确的字幕。我们在模型的中间阶段实施监督和优化来解决信息干扰问题。此外，我们用加强学习进行两级训练，优化模型的评估度量。最后，我们在三个标准数据集中评估了我们的模型，即流体KR8K，FlickR30K和Mscoco.实验结果表明，我们的双重注意力模型可以产生更准确和更丰富的标题，并且在各种评估指标中优于许多最先进的图像标题方法。

著录项

来源
《Computer vision and image understanding》 |2020年第12期|103068.1-103068.10|共10页
作者
Haiyang Wei; Zhixin Li; Canlong Zhang; Huifang Ma;
展开▼
作者单位

Guangxi Key Lab of Multi-source Information Mining and Security Guangxi Normal University Guilin 541004 China;

Guangxi Key Lab of Multi-source Information Mining and Security Guangxi Normal University Guilin 541004 China;

Guangxi Key Lab of Multi-source Information Mining and Security Guangxi Normal University Guilin 541004 China;

Guangxi Key Lab of Multi-source Information Mining and Security Guangxi Normal University Guilin 541004 China College of Computer Science and Engineering Northwest Normal University Lanzhou 730070 China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image captioning; Sentence-level attention; Word-level attention; Reinforcement learning;

机译：图像标题;句子级注意;单词级注意;加强学习;

相似文献

外文文献
中文文献
专利

1. Image Captioning With Visual-Semantic Double Attention [J] . He Chen, Hu Haifeng ACM transactions on multimedia computing communications and applications . 2019,第1期

机译：视觉语义双重注意的图像字幕
2. Image Captioning Using Region-Based Attention Joint with Time-Varying Attention [J] . Wang Weixuan, Hu Haifeng Neural processing letters . 2019,第1期

机译：使用基于区域的注意力联合时变注意力的图像字幕
3. Image Captioning Using Region-Based Attention Joint with Time-Varying Attention [J] . Wang Weixuan, Hu Haifeng Neural processing letters . 2019,第1期

机译：使用基于区域的注意力关节与时变关节的图像标题
4. Image Captioning Based On Sentence-Level And Word-Level Attention [C] . Haiyang Wei, Zhixin Li, Canlong Zhang, International Joint Conference on Neural Networks . 2019

机译：基于句子水平和单词水平注意的图像字幕
5. Arabic Image Captioning Using Deep Learning with Attention [D] . Sabri, Sabri Monaf. 2021

机译：使用深入学习的阿拉伯语图像标题
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. Attention on Attention for Image Captioning [O] . Lun Huang, Wenmin Wang, Jie Chen, 2019

机译：注意图像标题的注意力

The synergy of double attention: Combine sentence-level and word-level attention for image captioning

摘要

著录项

相似文献

相关主题

期刊订阅