CopyRight 2012-2014 DS文库版权所有
多任务级联卷积神经网络Join face detection and alignment using multi-task cascaded convolutional networks中国科学院深圳先进技术研究院,乔宇老师组
(0 次评价)1031 人阅读1 次下载

 1  Abstract— Face detection and alignment in unconstrained en-vironment are challenging due to various poses, illuminations and occlusions. Recent studies show that deep learning approaches can achieve impressive performance on these two tasks. In this paper, we propose a deep cascaded multi-task framework which exploits the inherent correlation between detection and alignment to boost up their performance. In particular, our framework leverages a cascaded architecture with three stages of carefully designed deep convolutional networks to predict face and land-mark location in a coarse-to-fine manner. In addition, we propose a new online hard sample mining strategy that further improves the performance in practice. Our method achieves superior ac-curacy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmarks for face detection, and AFLW benchmark for face alignment, while keeps real time per-formance. Index Terms —Face detection, face alignment, cascaded con-volutional neural network I. I NTRODUCTION ACE detection and alignment are essential to many face applications, such as face recognition and facial expression analysis. However, the large visual variations of faces, such as occlusions, large pose variations and extreme lightings, impose great challenges for these tasks in real world applications. The cascade face detector proposed by Viola and Jones [2] utilizes Haar-Like features and AdaBoost to train cascaded classifiers, which achieves good performance with real-time efficiency. However, quite a few works [1, 3, 4] indicate that this kind of detector may degrade significantly in real-world applications with larger visual variations of human faces even with more advanced features and classifiers. Besides the cas-cade structure, [5, 6, 7] introduce deformable part models Copyright (c) 2015 IEEE. Personal use of this material is permitted. How-ever, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to K.-P. Zhang, Z.-F. Li and Y. Qiao are with Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China. E-mail:;; Z.-P. Zhang is with the Department of Information Engineering, The Chi-nese University of Hong Kong, Hong Kong. E-mail: This work was funded by External Cooperation Program of BIC, Chinese Academy of Sciences (172644KYSB20160033, 172644KYSB20150019), Shenzhen Research Program (KQCX2015033117354153, JSGG20150925164 740726, CXZZ20150930104115529, CYJ20150925163005055, and JCYJ201 60510154736343), Guangdong Research Program (2014B050505017 and 2015B010129013), Natural Science Foundation of Guangdong Province (2014A030313688) and the Key Laboratory of Human Machine Intelli-gence-Synergy Systems through the Chinese Academy of Sciences. (DPM) for face detection and achieve remarkable performance. However, they are computationally expensive and may usually require expensive annotation in the training stage. Recently, convolutional neural networks (CNNs) achieve remarkable progresses in a variety of computer vision tasks, such as image classification [9] and face recognition [10]. Inspired by the significant successes of deep learning methods in computer vision tasks, several studies utilize deep CNNs for face detec-tion. Yang et al . [11] train deep convolution neural networks for facial attribute recognition to obtain high response in face regions which further yield candidate windows of faces. However, due to its complex CNN structure, this approach is time costly in practice. Li et al . [19] use cascaded CNNs for face detection, but it requires bounding box calibration from face detection with extra computational expense and ignores the inherent correlation between facial landmarks localization and bounding box regression. Face alignment also attracts extensive research interests. Researches in this area can be roughly divided into two cate-gories, regression-based methods [12, 13, 16] and template fitting approaches [14, 15, 7]. Recently, Zhang et al . [22] proposed to use facial attribute recognition as an auxiliary task to enhance face alignment performance using deep convolu-tional neural network. However, most of previous face detection and face alignment methods ignore the inherent correlation between these two tasks. Though several existing works attempt to jointly solve them, there are still limitations in these works. For example, Chen et al. [18] jointly conduct alignment and detection with random forest using features of pixel value difference. But, these handcraft features limit its performance a lot. Zhang et al. [20] use multi-task CNN to improve the accuracy of multi-view face detection, but the detection recall is limited by the initial detection window produced by a weak face detector. On the other hand, mining hard samples in training is critical to strengthen the power of detector. However, traditional hard sample mining usually performs in an offline manner, which significantly increases the manual operations. It is desirable to design an online hard sample mining method for face detection, which is adaptive to the current training status automatically. In this paper, we propose a new framework to integrate these two tasks using unified cascaded CNNs by multi-task learning. The proposed CNNs consist of three stages. In the first stage, it produces candidate windows quickly through a shallow CNN. Then, it refines the windows by rejecting a large number of non-faces windows through a more complex CNN. Finally, it uses a more powerful CNN to refine the result again and output five facial landmarks positions. Thanks to this multi-task learning framework, the performance of the algorithm can be Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Senior Member, IEEE , and Yu Qiao, Senior Member, IEEE F


0 星


于 2020-07-15 上传

版权及免责声明|RISC-V单片机中文网 |网站地图

GMT+8, 2022-10-2 14:17 , Processed in 0.692035 second(s), 30 queries .