首页> 外文会议>IEEE Winter Conference on Applications of Computer Vision >Faces à la Carte: Text-to-Face Generation via Attribute Disentanglement
【24h】

Faces à la Carte: Text-to-Face Generation via Attribute Disentanglement

机译:点菜面孔:通过属性解剖学的文本到面一代

获取原文

摘要

Text-to-Face (TTF) synthesis is a challenging task with great potential for diverse computer vision applications. Compared to Text-to-Image (TTI) synthesis tasks, the textual description of faces can be much more complicated and detailed due to the variety of facial attributes and the parsing of high dimensional abstract natural language. In this paper, we propose a Text-to-Face model that not only produces images in high resolution (1024×1024) with text-to-image consistency, but also outputs multiple diverse faces to cover a wide range of unspecified facial features in a natural way. By fine-tuning the multi-label classifier and im age encoder, our model obtains the adjustment vectors and image embeddings which are used to transform the input noise vector sampled from the normal distribution. Afterwards, the transformed noise vector is fed into a pre-trained high-resolution image generator to produce a set of faces with the desired facial attributes. We refer to our model as TTF-HD. Experimental results show that TTF-HD generates high-quality synthesised faces from free-form text descriptions with state-of-the-art performance.
机译:文本面对面(TTF)合成是一个具有挑战性的任务,具有巨大的计算机视觉应用。与文本到图像(TTI)合成任务相比,由于各种面部属性和高维抽象自然语言的解析,面孔的文本描述可能更复杂和详细。在本文中,我们提出了一种文本到面模型,不仅在具有文本到图像一致性中产生高分辨率(1024×1024)的图像,而且还输出多个不同的面,以覆盖各种未指明的面部特征一种自然的方式。通过微调多标签分类器和IM年龄编码器,我们的模型获取调整向量和图像嵌入,用于改变从正态分布采样的输入噪声向量。然后,将变换的噪声向量馈入预先训练的高分辨率图像发生器以产生具有所需面部属性的一组面。我们将我们的模型称为TTF-HD。实验结果表明,TTF-HD从最先进的性能产生了从自由形式文本描述的高质量合成面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号