MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

机译：MCEN：利用潜在变量模型弥合烹饪食谱和菜品图像之间的跨模态差距

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, driven by the increasing concern on diet and health, food computing has attracted enormous attention from both industry and research community. One of the most popular research topics in this domain is Food Retrieval, due to its profound influence on health-oriented applications. In this paper, we focus on the task of cross-modal retrieval between food images and cooking recipes. We present Modality-Consistent Embedding Network (MCEN) that learns modality-invariant representations by projecting images and texts to the same embedding space. To capture the latent alignments between modalities, we incorporate stochastic latent variables to explicitly exploit the interactions between textual and visual features. Importantly, our method learns the cross-modal alignments during training but computes embeddings of different modalities independently at inference time for the sake of efficiency. Extensive experimental results clearly demonstrate that the proposed MCEN outperforms all existing approaches on the benchmark Recipe1M dataset and requires less computational cost.

机译：如今，在人们日益关注饮食和健康的推动下，食品计算已经引起了业界和研究界的极大关注。由于它对面向健康的应用产生了深远的影响，因此在该领域最受欢迎的研究主题之一是食品检索。在本文中，我们专注于在食物图像和烹饪食谱之间进行跨模式检索的任务。我们提出了模态一致的嵌入网络（MCEN），该网络通过将图像和文本投影到相同的嵌入空间来学习模态不变的表示形式。为了捕获模态之间的潜在对齐方式，我们合并了随机潜在变量以显式利用文本和视觉功能之间的相互作用。重要的是，为了提高效率，我们的方法在训练过程中学习了跨模态比对，但在推理时独立地计算了不同模态的嵌入。大量的实验结果清楚地表明，所提出的MCEN优于基准Recipe1M数据集上的所有现有方法，并且所需的计算成本更低。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2020年|14558-14568|共11页
会议地点
作者
Han Fu; Rui Wu; Chenghao Liu; Jianling Sun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Task analysis; Visualization; Correlation; Computational modeling; Computer architecture; Computer vision;

机译：培训;任务分析;可视化;关联;计算建模;计算机体系结构;计算机视觉;

相似文献

外文文献
中文文献
专利

1. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images [J] . Marin Javier, Biswas Aritro, Ofli Ferda, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2021,第1期

机译：Refipe1M +：用于学习跨莫代尔嵌入式烹饪食谱和食物图像的数据集
2. COMPARISON OF EXPERIMENTAL DATA FOR SENSIBLE AND LATENT HEAT STORAGE MATERIALS FOR LATE-EVENING COOKING BASED ON A DISH-TYPE SOLAR COOKER [J] . Himanshu Agrawal, Vikrant Yadav, Yogender Kumar, International journal of energy for a clean environment . 2014,第1期

机译：基于碟式太阳能炊具的晚熟烹调材料的潜热实验数据比较
3. A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables [J] . Irini Moustaki The British journal of mathematical and statistical psychology . 2003,第2期

机译：序数清单变量的潜在变量模型的一般类别，对清单和潜在变量具有协变量影响
4. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images [C] . Amaia Salvador, Nicholas Hynes, Yusuf Aytar, IEEE Conference on Computer Vision and Pattern Recognition . 2017

机译：学习用于烹饪食谱和食物图像的跨模态嵌入
5. Pushing the Envelope of Mobile Computing: Improving Security, Energy, and Latency by Bridging the Gap between Analytical Modeling and System Design [D] . Li, Yongbo. 2018

机译：推动移动计算的发展：通过弥合分析建模和系统设计之间的差距来提高安全性，能源和延迟
6. Bridging the Gap Between Science and Clinical Efficacy: Physiology Imaging and Modeling of Aerosols in the Lung [O] . Chantal Darquenne, John S. Fleming, Ira Katz, -1

机译：弥合科学与临床功效之间的鸿沟：肺部气溶胶的生理学成像和建模
7. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images [O] . Javier Marin, Aritro Biswas, Ferda Ofli, 2021

机译：Refipe1M +：用于学习跨莫代尔嵌入式烹饪食谱和食物图像的数据集

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

摘要

著录项

相似文献

相关主题

期刊订阅