【24h】

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

机译:使用多模态嵌入的动词无监督视觉歧义消除

获取原文

摘要

We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illustration. We introduce VerSe, a new dataset that augments existing multimodal datasets (COCO and TUHOI) with sense labels. We propose an unsupervised algorithm based on Lesk which performs visual sense disambiguation using textual, visual, or multimodal embeddings. We find that textual embeddings perform well when gold-standard textual annotations (object labels and image descriptions) are available, while multimodal embeddings perform well on unanno-tated images. We also verify our findings by using the textual and multimodal embeddings as features in a supervised setting and analyse the performance of visual sense disambiguation task.
机译:我们引入了一项新任务,即动词的视觉歧义消除:给定一个图像和一个动词,为动词分配正确的意义,即描述图像中描述的动作的动词。正如文本意义上的歧义消除可用于多种NLP任务一样,视觉意义上的歧义消除可用于多模式任务,例如图像检索,图像描述和文本插图。我们介绍了VerSe,这是一个新的数据集,它使用感官标签扩充了现有的多峰数据集(COCO和TUHOI)。我们提出了一种基于Lesk的无监督算法,该算法使用文本,视觉或多模式嵌入来执行视觉歧义消除。我们发现,当可以使用黄金标准的文本注释(对象标签和图像描述)时,文本嵌入效果很好,而在无注释的图像上,多峰嵌入效果很好。我们还通过使用文本和多模式嵌入作为监督环境中的功能来验证我们的发现,并分析视觉消除歧义任务的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号