【24h】

Strong and Simple Baselines for Multimodal Utterance Embcddings

机译:用于多式联字话语嵌入式的强大和简单的基线

获取原文

摘要

Human language is a rich multimodal signal consisting of spoken words, facial expressions, body gestures, and vocal intonations. Learning representations for these spoken utterances is a complex research problem due to the presence of multiple heterogeneous sources of information. Recent advances in multimodal learning have followed the general trend of building more complex models that utilize various attention, memory and recurrent components. In this paper, we propose two simple but strong baselines to learn embeddings of multimodal utterances. The first baseline assumes a conditional factorization of the utterance into uni-modal factors. Each unimodal factor is modeled using the simple form of a likelihood function obtained via a linear transformation of the embedding. We show that the optimal embedding can be derived in closed form by taking a weighted average of the unimodal features. In order to capture richer representations, our second baseline extends the first by factorizing into unimodal, bimodal, and tri-modal factors, while retaining simplicity and efficiency during learning and inference. From a set of experiments across two tasks, we show strong performance on both supervised and semi-supervised multimodal prediction, as well as significant (10 times) speedups over neural models during inference. Overall, we believe that our strong baseline models offer new benchmarking options for future research in multimodal learning.
机译:人类的语言是由口头语言,面部表情,身体姿势和声音语调丰富的多模态信号。学习表示这些所说话语是一个复杂的研究课题由于多个信息异构数据源的存在。在多学习的最新进展,随后建设一个利用各种注意力,记忆力和复发性成分更复杂的模型的大势所趋。在本文中,我们提出了两种简单而强大的基线学习多式联运话语的嵌入。第一基线假定讲话的条件分解为单峰的因素。每个单峰因子是使用经由嵌入的线性变换而获得的似然函数的简单形式建模。我们表明,最佳嵌入可在封闭的形式采取的单峰特性的加权平均得出。为了获取更丰富的表示,我们的第二基线延伸第一个由因式分解成单峰,双峰和三峰因素,同时学习和推理过程中保持简单和高效。从一组跨越两个任务的实验中,我们表现出强劲的性能上的推理过程中既监督和半监督多预测,以及显著(10次)以上的神经车型的加速。总体而言,我们相信,我们强大的基线机型提供了可在多模态学习的未来研究的新标杆选项。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号