【24h】

Unique Faces Recognition in Videos

机译:视频中独特的人脸识别

获取原文

摘要

This paper tackles face recognition in videos employing metric learning methods and similarity ranking models. The paper compares the use of the Siamese network with contrastive loss and Triplet Network with triplet loss implementing the following architectures: Google/Inception architecture, 3D Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent Neural Network. We make use of still images and sequences from videos for training the networks and compare the performances implementing the above architectures. The dataset used was the YouTube Face Database designed for investigating the problem of face recognition in videos. The contribution of this paper is two-fold: to begin, the experiments have established 3-D Convolutional networks and 2-D LSTMs with the contrastive loss on image sequences do not outperform Google/Inception architecture with contrastive loss in top n rank face retrievals with still images. However, the 3-D Convolution networks and 2-D LSTM with triplet Loss outperform the Google/Inception with triplet loss in top n rank face retrievals on the dataset; second, a Support Vector Machine (SVM) was used in conjunction with the CNNs' learned feature representations for facial identification. The results show that feature representation learned with triplet loss is significantly better for n-shot facial identification compared to contrastive loss. The most useful feature representations for facial identification are from the 2-D LSTM with triplet loss. The experiments show that learning spatio-temporal features from video sequences is beneficial for facial recognition in videos.
机译:本文针对采用度量学习方法和相似性排名模型的视频中的人脸识别问题进行了研究。该白皮书比较了具有对比损失的暹罗网络和具有三重损失的Triplet网络在以下架构上的使用:Google / Inception架构,3D卷积网络(C3D)和2-D长短期记忆(LSTM)递归神经网络网络。我们利用视频中的静止图像和序列来训练网络,并比较实现上述架构的性能。所使用的数据集是YouTube人脸数据库,旨在调查视频中人脸识别的问题。本文的贡献有两个方面:首先,实验建立了3-D卷积网络和2-D LSTM,图像序列上的对比损失不超过Google / Inception体系结构,在前n个等级的人脸检索中对比损失与静止图像。但是,在数据集的前n个排名面部检索中,具有三重损失的3-D卷积网络和2-D LSTM优于具有三重损失的Google / Inception。其次,将支持向量机(SVM)与CNN的学习特征表示结合使用来进行面部识别。结果表明,与对比丢失相比,具有三重丢失的学习到的特征表示方法对于n镜头面部识别而言明显更好。用于面部识别的最有用的特征表示来自具有三重态损失的2-D LSTM。实验表明,从视频序列中学习时空特征有利于视频中的面部识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号