首页> 外国专利> MULTIMODAL IMAGE CLASSIFIER USING TEXTUAL AND VISUAL EMBEDDINGS

MULTIMODAL IMAGE CLASSIFIER USING TEXTUAL AND VISUAL EMBEDDINGS

机译：使用文本和可视嵌入的多模式图像分类器

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.

机译：方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于实现多模式图像分类器。在一个方面，对于多个图像的每个图像，方法包括：通过文本发生器模型处理图像以获得描述图像内容的一组短语，其中每个短语是一个或多个术语，通过文本嵌入模型处理一组短语，以获取图像的预测文本的嵌入，并使用图像嵌入模型处理图像以获得图像的图像像素的嵌入。然后，多模式图像分类器在图像的预测文本的嵌入物上培训，用于图像的图像像素的嵌入式，以产生输出分类法的输出标签，以基于图像将图像分类为作为输入。

著录项

公开/公告号EP3791322A1

专利类型
公开/公告日2021-03-17

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号EP20190818391
发明设计人 FUXMAN ARIEL;LI ZHEN;SHAH MANAN;VISWANATHAN KRISHNAMURTHY;LU CHUN-TA;TIMOFEEV ALEKSEI;SUN CHEN;JIA CHAO;
展开▼

申请日2019-11-18
分类号G06K9/62;
国家 EP
入库时间 2022-08-24 17:45:13

相似文献

专利
外文文献
中文文献