Facial expression recognition plays an increasingly important role in human behavior analysis and human computer interaction. Facial action units (AUs) coded by the Facial Action Coding System (FACS) provide rich cues for the interpretation of facial expressions. Much past work on AU analysis used only frontal view images, but natural images contain a much wider variety of poses. The FG 2017 Facial Expression Recognition and Analysis challenge (FERA 2017) requires participants to estimate the AU occurrence and intensity under nine different pose angles. This paper proposes a multi-task deep network addressing the AU intensity estimation sub-challenge of FERA 2017. The network performs the tasks of pose estimation and pose-dependent AU intensity estimation simultaneously. It merges the pose-dependent AU intensity estimates into a single estimate using the estimated pose. The two tasks share transferred bottom layers of a deep convolutional neural network (CNN) pre-trained on ImageNet. Our model outperforms the baseline results, and achieves a balanced performance among nine pose angles for most AUs.
展开▼