...
首页> 外文期刊>Computer vision and image understanding >The synergy of double attention: Combine sentence-level and word-level attention for image captioning
【24h】

The synergy of double attention: Combine sentence-level and word-level attention for image captioning

机译:双重关注的协同作用:相结合句子水平和单词级别的图像标题

获取原文
获取原文并翻译 | 示例
           

摘要

The existing attention models of image captioning typically extract only word-level attention information, i.e., the attention mechanism extracts local attention information from the image to generate the current word, and lacks accurate image global information guidance.In this paper, we first propose an image captioning approach based on self-attention.Sentence-level attention information is extracted from the image through self-attention mechanism to represent the global image information needed to generate sentences.Furthermore, we propose a double attention model which combines sentence-level attention model with word-level attention model to generate more accurate captions.We implement supervision and optimization in the intermediate stage of the model to solve information interference problems.In addition, we perform two-stage training with reinforcement learning to optimize the evaluation metric of the model.Finally, we evaluated our model on three standard datasets, i.e., Flickr8k, Flickr30k and MSCOCO.Experimental results show that our double attention model can generate more accurate and richer captions, and outperforms many state-of-the-art image captioning approaches in various evaluation metrics.
机译:现有的图像标题注意模型通常仅提取单词级注意信息,即,注意机制从图像中提取本地注意信息以生成当前的字,并且缺少准确的图像全局信息指导。在本文中,我们首先提出根据自我关注的图像标题方法。通过自我注意机制从图像中提取了级别的注意信息,以表示生成句子所需的全局图像信息.Furtimore,我们提出了一种结合句子级注意模型的双重注意模型通过单词级注意模型来生成更准确的字幕。我们在模型的中间阶段实施监督和优化来解决信息干扰问题。此外,我们用加强学习进行两级训练,优化模型的评估度量。最后,我们在三个标准数据集中评估了我们的模型,即流体KR8K,FlickR30K和Mscoco.实验结果表明,我们的双重注意力模型可以产生更准确和更丰富的标题,并且在各种评估指标中优于许多最先进的图像标题方法。

著录项

  • 来源
    《Computer vision and image understanding》 |2020年第12期|103068.1-103068.10|共10页
  • 作者单位

    Guangxi Key Lab of Multi-source Information Mining and Security Guangxi Normal University Guilin 541004 China;

    Guangxi Key Lab of Multi-source Information Mining and Security Guangxi Normal University Guilin 541004 China;

    Guangxi Key Lab of Multi-source Information Mining and Security Guangxi Normal University Guilin 541004 China;

    Guangxi Key Lab of Multi-source Information Mining and Security Guangxi Normal University Guilin 541004 China College of Computer Science and Engineering Northwest Normal University Lanzhou 730070 China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Image captioning; Sentence-level attention; Word-level attention; Reinforcement learning;

    机译:图像标题;句子级注意;单词级注意;加强学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号