首页> 外国专利> MULTIMODAL IMAGE CLASSIFIER USING TEXTUAL AND VISUAL EMBEDDINGS

MULTIMODAL IMAGE CLASSIFIER USING TEXTUAL AND VISUAL EMBEDDINGS

机译:使用文本和可视嵌入的多模式图像分类器

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.
机译:方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于实现多模式图像分类器。在一个方面,对于多个图像的每个图像,方法包括:通过文本发生器模型处理图像以获得描述图像内容的一组短语,其中每个短语是一个或多个术语,通过文本嵌入模型处理一组短语,以获取图像的预测文本的嵌入,并使用图像嵌入模型处理图像以获得图像的图像像素的嵌入。然后,多模式图像分类器在图像的预测文本的嵌入物上培训,用于图像的图像像素的嵌入式,以产生输出分类法的输出标签,以基于图像将图像分类为作为输入。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号