FACE RECOGNITION USING CONVOLUTIONAL MACROPIXEL COMPARISON APPROACH by Yunke Li B.Sc. Electronic and Information Engineering Tianjin University, 2013 THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA April, 2018 c Yunke Li, 2018 Abstract Convolutional Neural Network (CNN) is a widely used deep learning framework and is applied in the field of face recognition achieving outstanding results. Macropixel Comparison Approach is a shallow mathematical approach that recognizes face by comparing original pixel blocks of face images. In this thesis, we are inspired by ideas of the currently popular deep neural network framework and introduce two features into the mathematical approach: deep overlap and weighted filter. The aim is exploring if the idea of deep learning could benefit mathematical method which might extends the scope of face recognition research. Results from our experiments show that the new proposed approach achives markedly better recognition rates than the original macropixel method. i Contents Abstract i List of Figures v Acknowledgements viii 1 Introduction 1.1 1 Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Face Verification and Face Identification . . . . . . . . . . . . . 2 1.1.2 System Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Challenges of Face Recognition . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Facial Movement . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Illumination Variation . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.4 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Face Recognition Applications . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.6 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 ii 2 Literature Review 13 2.1 The Face Recognition System . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 2.4.1 Eigenfaces and Principal Component Analysis . . . . . . . . . . 17 2.4.2 Fisherface and Linear Discriminant Analysis . . . . . . . . . . . 19 2.4.3 Independent Component Analysis . . . . . . . . . . . . . . . . . 20 2.4.4 Gabor Wavelets and Elastic Bunch Graph Matching . . . . . . . 21 2.4.5 Local binary Pattern . . . . . . . . . . . . . . . . . . . . . . . . 22 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.1 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . 23 2.5.2 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3 Previous Work 3.1 3.2 25 Pairwise Macropixel Comparison . . . . . . . . . . . . . . . . . . . . . 25 3.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2 Recognition Process . . . . . . . . . . . . . . . . . . . . . . . . 28 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 The general framework . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.2 Improvement of CNN . . . . . . . . . . . . . . . . . . . . . . . . 32 4 Proposed Algorithms 33 4.1 Convolutional Macropixel Comparison Approach . . . . . . . . . . . . . 33 4.2 Overlapping: A Convolutional Way . . . . . . . . . . . . . . . . . . . . 34 4.3 Weighted Filter Counter . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.1 Training Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 iii 4.3.2 Recognition Stage . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.3 Independent Weight Training . . . . . . . . . . . . . . . . . . . 40 5 Experiments 42 5.1 Face Dataset 5.2 Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.1 Experiment for Weighted Filter . . . . . . . . . . . . . . . . . . 45 5.2.2 Experiment for Overlapping . . . . . . . . . . . . . . . . . . . . 47 5.2.3 Experiment for Compact System . . . . . . . . . . . . . . . . . 47 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3.1 Result for Weight Training . . . . . . . . . . . . . . . . . . . . . 48 5.3.2 Results for Overlap . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3.3 Results for Compact System . . . . . . . . . . . . . . . . . . . . 52 6 Conclusion and Discussion 56 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Bibliography 58 iv List of Figures 1.1 The difference between identification and verification. . . . . . . . . . . 3 1.2 A general face recognition system flow chart. . . . . . . . . . . . . . . . 5 2.1 A general face recognition system flow chart with training process. . . . 14 2.2 PCA focuses on variation within-class while LDA finds direction that maximizes variation between-class. . . . . . . . . . . . . . . . . . . . . 19 2.3 The process of LBP feature extraction. The decimal value represents the texture information. . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 The left one which has 5 transitions and in only assigned to a single label. The right one with 1 transition(transitions ≤ 2) is an uniform LBP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1 Assuming image with size 12 × 12, the 2 pixels margin area are for shifting purpose, and we segment 4 macropixels with size 4 × 4 from this image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Calculating distances between macropixel M in image A and candidates Ci(i = 1, 2, ..., n) in image B. The corresponding object of M is among Ci and has the shortest distance. . . . . . . . . . . . . . . . . . . . . . 28 v 3.3 LeNet-5 is a typical Convolutional Neural Network framework. C1, C2 and C3 are convolutional layers. P1 and P2 are pooling layers. F1 is fully-connected layer. 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 The Whole Framework is shown above. The system is divided into two stages: training stage and recognition stage. The training stage produces weighted filter which is used on original recognition. One thing to note here is that all macropixel comparison operations are employing deep overlap. . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 For image with size 10×10(the effective area is 8×8 removing 2 shifting space) and macropixel with size 4 × 4, deep overlap will extract 5 × 5 macropixels out. In this example, P = Q = 5. . . . . . . . . . . . . . . 36 4.3 Here is how to transfer an Error Rate Counter to Weighted Filter. The number of weight testing images Ot est is 90 and we make 90 minus each value in the counter to get an intermediate matrix. After pick out the max value 55 from the matrix, we divide each element by 55 to produce weight filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 By comparing between training images at1 and at2 , the system identify the first pixel of at1 as ID1 which is a success so it adds 0 to the counter. For image at2 , the identification operation fails so system adds 1 to the counter. The total value for counter in the first location is 0 + 1 = 1. . 40 5.1 Face images of 4 sample subjects from Yale Database. Each subject contains 11 images with different illumination conditions and expressions. 43 5.2 Face images of 4 sample subjects from ORL Database. Each subject contains 10 images with different illumination conditions, face directions and expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 vi 5.3 Face images of 2 sample subjects from PIE Database. Figure only shows 80 images for each subject. There are totally 68 subjects and each of them contains more than 100 images. . . . . . . . . . . . . . . . 44 5.4 The total value change of weighted filter over training times. It drops under 5 after repeating 50 times. . . . . . . . . . . . . . . . . . . . . . 46 5.5 Red bar is the running time using CPU with an average of 17 seconds per comparison operation. Blue bar is the running time using GPU with an average of 0.58 seconds per operation. . . . . . . . . . . . . . . 55 vii Acknowledgements I want to specifically thank my supervisor, Dr. Liang Chen. Thank him for giving me great advices, support, supervision, understandment, patience and help in my study and life. He will always be my example in the future. This thesis can not be done without him. I also want to thank University of Northern British Columbia for the learning and life support. Thanks to Prof. Jernej Polajnar, Prof. Desanka Polajnar and Yan Ling for their support when I’m doing TA. The financial support from mitacs and TA program through Dr. Chen’s go-between also helps me a lot. To all my colleagues in UNBC, Negar, Raj, Farhana, Meng Xi, Tony Zhuang, Olivia Wang and Hongyuan Shi, thank you for the help and support in my research and study. To all the friends I met in Prince George, Boshi, Finch, Huan, Michelle, Jess, Ricki, Danny, Steven, Jarratt, thank you for the company, support and help. To Andrea Li, my girlfriend, my life in Canada would not be this colorful and happy without you. I’m very grateful to my parents. Thank you for your guidance and love. I love you forever. viii Chapter 1 Introduction Face recognition, as a daily behavior of human beings, still remains quite a challenge and a major problem when applied to computer science. It associates with many fields like computer vision, biometrics, machine learning and image/video processing. By successfully identifying and verifying individuals from digital images or video clips, the facial recognition system is able to be widely employed to modern society and play a significant role in some specific areas, for example public security, human-computer interaction and video surveillance. Beyond that, researching face recognition which is a spontaneous biological act helps us exploring the mechanism of visual system. Therefore, we focus on this area not only because of its extensive use in the modern society, but also due to its essence which may uncover the secret of how the brain works. Despite that face recognition is a classic research subject that has been developed since 1960s by Woodrow Bledsoe[2]. It still attracts sizeable number of researchers to contribute to it and shows good momentum of development in the machine learning 1 community. New approaches and improvements of existing ones are continuously put forth. The approaches to solve face recognition problem fall into two main categories. One is mathematical method and another is simulating biological neurocognitive process by utilizing artificial neural network. During the field’s first several decades (1960s-1980s), technology had been rapidly developing together with the algorithms, which made it possible for face recognition system to obtain considerable achievements. Since 1990s, with the applying of PCA (principle component analysis) using Eigenface by Kirby and Sirovich[45] as a milestone, mathematical methods for face recognition emerge massively. On the other hand, although the neural networks were strongly connected with face recognition by researchers from the very beginning, they could not gain advantages over classical methods which are more efficient and intuitive. But in 2010s, the neural network methods got reborn with the development of deep learning and general purpose computation. Convolutional Neural Network (CNN) was considered the symbolic approach for face recognition among deep learning methods. In this paper, we are inspired by the CNN framework and integrate its solution into an original mathematical method. 1.1 Face Recognition 1.1.1 Face Verification and Face Identification In most cases, face recognition as a biometric procedure is referred to both face verification and face identification as shown in Figure 1.1. The decision that which one should be applied or implemented into the system is made mainly based on the purpose. 2 Figure 1.1: The difference between identification and verification. For face verification, we first claim a specified identity and then use the system to confirm whether the face we want to verify matchs this identity. It is a ”if he/she is the person” question. As a result, face verification is a one-to-one feature matching task comparing query image with template image in the database which has established identities. For face identification, on the other hand, we identify the face within all the identities we know (or claim the identity is not known currently). It is a ”do we know this person” question. In order to identify, a one-to-many feature matching task comparing query image with the whole database is employed. For identification, the system is required to find the most similar face or faces with the highest matching scores, or even to report the identity of the query image is unknown if all the matching scores are lower than a specified threshold. In this thesis, we focus mainly on the face verification and plan to extend the method to face identification in the future. 3 1.1.2 System Structure For human, brain as a well-connected system is capable of executing the whole recognition process in an instant, but computers lack such ability. So a pipeline which passes results from former process to the next one is constructed in order to deal with the job step by step. It is also helpful to research because algorithms corresponding to each step could be different. Generally speaking, an integrated face recognition system can be divided into five sections: image/video acquisition, face detection, preprocessing, feature extraction and feature matching. System flow chart is shown in Figure 1.2. First of all, the system should acquire graphics and detect face areas in the images/videos and cut them out from the background. Most of the times, face images/videos which are acquired by camera or other image acquisition devices are not suitable for direct recognition processing because conditions like illumination, angle, face position and gray scale vary. So preprocessing is required to achieve image fixing, alignment and normalization. The next step is extracting features, which contain the most valuable and most stable information to distinguish a person’s face from another, from images and organizing them. At last, those features extracted from input face would be matched with enrolled faces’. As we know the identities of enrolled faces, the system would either identify the input face as the closest matching face or claim it as unknown. 4 Figure 1.2: A general face recognition system flow chart. 1.2 Challenges of Face Recognition As we mentioned previously, the development of face recognition shows both spectacular progresses and tough challenges. Algorithms and methods we proposed have already acquired remarkable effect. Therefore, it is safe to say that there are systems that could achieve satisfactory recognition rate under particular circumstances. But if diversity and uncertainty of the environment exceed the capacity of system, the performance would highly depend on how well the system could handle challenges listed below. 1.2.1 Facial Movement Faces in the daily life, contrary to static images the researchers usually deal with, is changing all the time. So it has always been a challenging task for face recognition methods that the face processing methods applying to static faces could also be used on moving faces. Facial movements are manifold and for different types of movement the system should process in different ways. The two major types of facial movements are rigid movement and non-rigid movement[32]. Rigid movement is also 5 called rigid head motion. It is the moving of head orientation which provides many views of the head instead of static front face (e.g., nodding, looking above and turning around). Non-rigid movement, however, refers to the facial parts’ translocation and shape changing which are caused by the movement of skeleton and muscle (e.g., facial expression, speech production and sight changing). Both movements’ combination constitutes human face behavior in normal life and delivers more information for communication. However, as face images are two-dimensional graphics transformed from three-dimensional models’ projection, the movements would bring uncertainty, complexity and information loss to face recognition. 1.2.2 Illumination Variation Illumination variation, also known as light direction changing and light condition changing, is one of the major challenges that face recognition system should face. On account of illumination’s uneven distribution on faces, the image could vary greatly on luminance, grayscale and shade, partially or globally. This means illumination does not only affect the image in pixel scale, but can also diminish features’ information and integrity. Even the problem has been studied for years with enormous progress[39], researches still approve that variation of illumination has no negligible negative effect on face recognition performance[26]. For example, results in FRVT 2006[34] indicate that recognition rate for faces with illumination variation especially uncontrolled illumination variation still has enough room for improvement. 6 1.2.3 Occlusion It is very common for human beings to wear accessories or disguises like sunglasses, wig and gauze mask. And human could easily recognize others even under such condition. But for automatic face recognition system, occluded face is a big challenge owing to information loss of the covered part. The issue becomes particularly serious when the recognition algorithm is based on features which are corrupted by the occlusion (such as PCA[3] and LDA[50])[58]. Some methods tend to localize the occluded part then discard features related to that part and others process local features instead of global features so the recognition would not be affected as much by occlusion[55]. 1.2.4 Others There are far more challenges than we could solve once and for all when it comes to uncontrolled face recognition, ageing, scar, makeup, size of the image/video, image/video quality, etc. 1.3 Face Recognition Applications Although face recognition is a technique facing many challenges and problems, it has now achieved a remarkable amount of reliable solutions for today’s applications in personal, commercial or governmental environment. The reasons why face recognition as a biometric is popular are listed below: • Nonintrusive. Unlike fingerprint, iris or handwriting, face image/video acquisi7 tion does not force people to perform specific behavior. The process could be done without being noticed if necessary. This makes it an ideal tool for security and surveillance tasks. • Compatibility and cheapness. Cameras have high penetration rate not only in public area but also in personal life by being implemented in most laptops and cellphones. So it is easy and economical to build or add a face recognition system. In Machine Readable Travel Documents (MRTD), facial got the highest rate of compatibility among six popular biometric attributes[14]. • Natural. Face recognition is a daily behavior that performed by human all the time, intentionally or unconsciously. This involuntary characteristic gives face recognition convenience when being acquainted by public. And some applications of face recognition are listed below: • Security and surveillance. CCTV control system and video surveillance system are widely used in public and private property area with face recognition system involving. Canada Border Services Agency is developing facial recognition system as part of the self-service border clearance kiosks program in major Canadian airports, which will improve security reliability and raise efficiency[28]. • Social activity and entertainment. Face recognition based on cameras is a new and better method of human-computer interaction. Facebook can provide tag suggestion for user photos which is based on facial recognition software[27]. Their face recognition project called DeepFace could reach an outstanding accuracy of 0.9735[49]. • Access control. This includes system login and area entering. Face, as a reliable identify label, has good compatibility with other access control methods like 8 password or key. It further provides higher reliability and lower risk of key stolen or password leaked. The requirement of recognition accuracy is not so strict thanks to the small number of expected access objects. As an example, Microsoft implements Windows Hello Face Authentication into Windows 10 to make user login safer and faster[29]. 1.4 Research Background As described in the history of face recognition, Convolutional Neural Network(CNN) is the one that bring neural network methods back to the academic view of computer vision and machine learning by showing striking outperformance on ILSVRC-2010 benchmark[18]. From what we learn, deep learning especially CNN has achieved great performance on computer vision and face recognition[33][47][43][49][48]. CNN, which is inspired by receptive field and varied from multilayer perceptron, is a combination of classic ideas[22] like locally connecting, convolutional filter, pooling and several new concepts like dropout[25], overlapping, Rectified Linear Units(ReLu) and GPU computation. Each convolved filter could be viewed as a way of feature extraction. And the convolutional process is selecting corresponding high activate value part from the image. The outstanding capacity of feature extracting utilizing convolution is one of the key factors that helps CNN to achieve great performance of computer vision problems. In 2010, a face recognition method that based on Pairwise Macropixel(a small pixel block that contains a few pixels) comparison is proposed by Liang Chen[6]. It shows that this pixel block matching approach, even at its very early stage of development, could perform no worse than other mainstream holistic approaches. However, it is 9 irrefutable that comparing macropixel directly is a shallow approach and requires further improvements. By investigating the approach, we find that Macropixels are more like generalized, fixed-size, indiscriminate features for face image. According to this line of thinking, there is resemblance between Pairwise Macropixel Comparison and feature matching. Therefore, considering CNN owns decent feature extracting ability, we wish to explore if CNN’s convolutional method could benefit the macropixel approach. 1.5 Research Contributions There are three major contributions. First, considering the inherent relation between macropixel and basic features in conventional neural network, we propose a face recognition approach based on macropixel approach and inspired by convolutional neural network solution. Two key features are introduced into the original macropixel method: Heavily Overlap and Weight Filter. The former which is inspired by convolutional operation extracts macropixel in an overlapping way. The latter inspired by convolutional kernel trains a filter and uses this filter in the recognition phase. Results from the experiments which are set under the same condition of original macropixel method demonstrate that our proposed approach achieves markedly better recognition rates than the original method. Second, as heavily overlap and weight training requires more computing resources, we also introduce parallel computation into the system using graphics processing unit(GPU) as a demonstration that shows the efficiency potential of the macropixel approach. Third, this research explores the possibility and potential of an original face recognition approach. It also proves that the ideas of deep learning framework are possible to benefit traditional 10 face recognition method, which motivates us to further study the relationship of these two kinds of approaches. 1.6 Outline of the Thesis The structure of this thesis is organized as follows: In chapter 2 previous related works in the field of face recognition are reviewed. It contains a brief description of the system, followed by several current recognition approaches. In chapter 3, researches that related to our work are discussed. The first part is focused on the Macropixel comparison approach and analysing the characteristics of it. The second one mainly provides the introduction of convolutional neural network together with state of the art progresses of it. In chapter 4, two approaches are proposed. The first is to import overlapping with different degrees into the comparison and the second approach is to further add weight filter. Different tentative methods for each approach are discussed. In chapter 5, The procedure of our experimental setup as well as the exploration of graphic processing is described in detail at the beginning. Then the experiment results of proposed approaches are presented step by step. We first report the performance from each approach separately, and then combine them together in order to get an outperformance. This chapter also provides efficiency test and analysis afterwards. In chapter 6, thesis and research is concluded and future work is discussed. 11 Chapter 2 Literature Review In this chapter, we will first go through the processes of face recognition in general. As a complicated multistage system, it requires us to discuss some stages a little further to get a more complete picture. And then we will focus on the most essential stage of the system, which is recognition stage (including feature extracting, feature matching and classification). Several recognition algorithms that utilize different methods to comprehend and paraphrase characteristics of faces are introduced. 2.1 The Face Recognition System We have already provided a brief flow diagram for face recognition system in Figure 2.1. As shown in the flow diagram, the recognition system contains image/video acquisition, face detection, preprocessing, feature extraction and feature matching. If we go a step further and take training process into consideration, pipeline of the 12 system is as depicted in Figure 2.1. Both training and recognizing should go through the same processes until the last step. The resulting features of training are stored in the database in order to be matched with features extracted by images waiting to be recognized. Figure 2.1: A general face recognition system flow chart with training process. 2.2 Face Detection The primary task of face recognition, after it has acquired images/videos, is to detect face’s location. Face detection is a fundamental technology for computer vision as it is not only necessary for face recognition, but also indispensable for auto-focus tracking, photo retouching, face modeling, expression recognizing and many other facial analysis processes. According to M. H. Yang, D. J. Kriegman, and N. Ahuja’s definition, ”the objective of face detection is to identify if there are faces in the given image and, if present, return face location and extend of each face”[56]. This definition clearly indicates that the inputs of face detection are images and the outputs are coordinates and images of face. As with face recognition, face detection could also be disturbed by face’s position, scale, illumination, expression, etc. From what we learn from previous surveys for face detection, researchers in this field have achieved proposing more than 100 approaches, including some outstanding ones that are widely used in people’s lives today[56][57]. As a famous classifier, 13 SVM (Support Vector Machine) is used on face detection in Osuna and Heisele’s researches[31][13]. The former implements SVM into face detection together with a decomposition algorithm to overcome SVM’s low efficiency when dealing with large data set and the latter comes up with a two-level SVM classifier that has better result for face rotation problem. Then Romdhani and Ratsch speed up the calculation of SVM based algorithm using reduced set vectors[36][37]. Other than that, a new Bayesian Discriminating Features method proposed by Liu uses Bayesian classifier to distinguish face class modeled as multivariate normal distribution from no-face class[24]. That algorithm only models no-face class that is closest to face classes and excludes most of the no-face objects. Paul Viola and Michael Jones proposed a detection algorithm using Integral Image as image representation and AdaBoost as learning method[51]. Integral Image approach is constructing an image representation that allows fast selection for partial rectangular area, which makes it easier to compute Haar-base features. This detection approach had much imrpoved operational efficiency comparing to previous ones and it had become widely used in real life afterwards. Neural Network has also been applied on this area, from Rowley’s classic Neural Network-based filter[38] to Garcia’s convolutional neural architecture[11]. 2.3 Preprocessing As mentioned earlier about challenges face recognition may suffer from, some of these challenges could be very difficult, or almost impossible to overcome if we leave them to feature extraction or matching phase. So preprocessing (face normalization, noise elimination, lighting variation correction, etc.) plays a significant role in face recognition to deal with those kinds of problems. 14 Among all the preprocessing approaches, traditional image enhancement methods have the advantages of not requiring prior knowledge and processing directly and quickly. So they are commonly used by many face recognition systems. Approaches based on modifying Histogram (for example Histogram Equalization) are popular dealing with illumination variation[41]. Histogram equalization redistributes the histogram into a uniform one in order to enhance local contrast of the image to readjust image illumination condition[35][8]. Histogram Specification/Matching transfers the histogram into a predefined form that is proved to have a better illumination condition. Logarithmic Transformation stretches low grey part of the histogram and condenses high grey part to get better visual adaptation. Aside from classic histogram methods, Gamma Intensity Correction was introduced by Shan to correct the overall brightness of the face images to a pre-defined ”canonical” face image[44]. Face Geometric Normalization is another kind of important technique in face preprocessing. Its purpose is normalizing face images to get fixed eye/nose/mouth position, uniform face orientation, same image size, etc. The normalization is usually carried out by selecting a facial feature and modifying image based on it. The most commonly used feature is eyes and the methods to modify the image are clipping, rotation or scaling[5]. 2.4 Feature Extraction Feature extraction is aimed at acquiring the most representative information that could be used to distinguish one face from another from a facial image. In the study of face recognition, stage of feature extraction has gotten extensive attention because of its most important role in the whole system. Therefore, researchers keep exploring 15 possibilities that could improve the performance of this stage including choosing a feasible feature for extracting and employing a proper way to extract and describe the feature. The methods of extraction have been divided into two categories. One is holistic representation and another is local feature[42]. Holistic representation is focusing on the whole image which could be treated as a high dimensional feature space. Feature vectors from the space are the features extracted. Because holistic features represent the whole face, there are feature vectors that hold little effective information for face recognition which might put a drag on system efficiency as well as recognition accuracy. Local features are facial characteristics and their location information including lip, eyes, nose and their relative positions. Here we will introduce some of the most representative extraction methods for both categories. 2.4.1 Eigenfaces and Principal Component Analysis The original developer of Eigenfaces method is Sirovich and Kirby who found it effective as a face feature representation[45]. This holistic approach is one of the most widely used algorithms in face recognition. The key idea behind Eigenfaces is to project face images in pixel space into a subspace which has lower dimension but still holds significant characteristics of faces. The key method of space projection is Principal Component Analysis (PCA). We first get eigenvectors from the covariance matrix of facial images by eigenvalues decomposing. Then we use eigenvectors (i.e. eigenfaces) with the largest eigenvalues as the features to represent characteristics or variation of faces. Each image could be projected into the eigenvector space, in 16 other words, by using the linear combination of these basis eigenvectors, we can get an expression of each image and use that expression for comparison and recognition. By reducing the dimensionality of input data, PCA approach removes redundant information from features, gets higher feature stability and increases the computation efficiency which as a result improves recognition performance. Here is the example of steps of PCA algorithm for face recognition. Giving a set of N face images S = {I1 , I2 , I3 , ..., IN }, step one is to calculate the mean face image I¯ of all faces. In step two the covariance matrix A is calculated: N 1 X ¯ i − I) ¯> (Ii − I)(I A= N 1 (2.1) The next step is to get the eigenvectors and eigenvalues of covariance matrix: AUi = λi Ui (2.2) where Ui is eigenvector of the corresponding eigenvalue λi . By sorting eigenvalues in the order of descending, the first N eigenvalues and their corresponding eigenvectors U = {U1 , U2 , ...UN } are selected. The last step is to project input images on eigenvectors’ direction: ¯ Bi = U (Ii − I) (2.3) where Bi is the eigenvector representative of input image Ii . The projected input images will then be used for recognition by comparing Euclidean distance or using classifier. 17 Figure 2.2: PCA focuses on variation within-class while LDA finds direction that maximizes variation between-class. 2.4.2 Fisherface and Linear Discriminant Analysis Fisherface, which is firstly developed by Robert Fisher in 1936, is also a traditional holistic method for face recognition[3]. It is based on the theory of Linear Discriminant Analysis (LDA) and shares some similarities with Eigenface. They both project the input data on a low dimensional space in order to extract features. But differs from PCA which focuses on variation of features, LDA approach’s purpose is to find vectors that maximize discrimination among classes instead of feature differences within class. The difference between PCA and LDA has been shown in Figure 2.2. While the direction got by PCA has both classes mixed, the LDA finds the direction that distinguishes data from two classes at the extreme. We can represent the within-classes scatter matrix of input data as: Sw = C X i=1 18 Si (2.4) And Si can be defined as: Si = X (x − mi )(x − mi )> (i = 1, 2, ..., C) (2.5) xCi where mi is the mean of class Ci , Pi is the number of samples for the class. In order to make classes depart from each other, in other words, get bigger between-class scattering and gather together data in the same class, in other words, get smaller within-class scattering, we can get the Fisher criterion function below: J (Wo ) = W > Sb W W > Sw W (2.6) When maximizing J, the corresponding eigenvectors is what we are looking for. 2.4.3 Independent Component Analysis PCA is based on the second-order relationships of input data but the high-order relationships are ignored. However, in many situations, the discriminative information for face recognition lies in the high-order relationships of images. Therefore, Independent Component Analysis (ICA), which is a generalization of PCA and makes full use of high-order relationships information, is brought into view of face recognition research[17]. The idea behind ICA is to decompose the input data into a group of independent components through linear transformation. Comparing to PCA, ICA’s basis vectors can better represent localizes features and it is able to apply to highorder decorrelation. These advantages make it has better performance than PCA in many cases. 19 2.4.4 Gabor Wavelets and Elastic Bunch Graph Matching Dynamic Link Architecture(DLA), which was proposed in 1981, was meant to fix a deficiency of neural network that the way to express relations among different types of active neurons is unclear[52] and Lades first implemented it into face recognition system in 1993[19]. The core idea of DLA was inspired by the plasticity of synapse: neurons could be formed into graphs which contain a group of connected nodes, and both nodes and their connections vary rapidly in the short term or slowly in the long term. In order to achieve this, two kinds of weights, Tij and Jij (where I and j are connecting neurons), are introduced. Jij changes rapidly with the constrain of Tij (0 ≤ Jij ≤ Tij ) corresponding to short-term memory meanwhile Tij changes slowly in a long time scale representing permanent memory and weight of conventional neural network. For DLA method, Gabor-based wavelet is chosen as the local feature representation. Gabor wavelet is a kind of complex wavelet modulated by Gaussian function. Researchers found that it is similar with the response comes from cells in biological vision system[9]. And the wavelet, as a feature representation, has strong robustness to rotation, scaling and distortion as well as good adaptability to illumination changes. So it is widely used in the field of computer vision. The DLA approach first chooses several local feature pixels or regions like eyes, nose and shape of jaw then extracts them with Gabor wavelets as Gabor ”jet”. After that, a bunch graph with the shape of rectangle is created containing nodes correspond to jets. Comparing jets’ similarity of two faces is the way of recognition. Elastic Bunch Graph Matching (EBGM) was then developed based on DLA by Wiskott[54]. Instead of containing only one jet for each node, the node now attaches 20 a bunch of jets extracted from different face images in order to locate corresponding features for different faces. EBGM approach got better result and it soon became one of the most popular feature extraction approaches among local feature based methods. However, both DLA and EBGM have the weakness of large calculation quantity. 2.4.5 Local binary Pattern Local Binary Pattern (LBP) is another feature extraction method proposed by Ojala in 1996[30] and used for face recognition by Ahonen in 2004[1]. This approach uses LBP operator to build LBP feature map for comparing and classifying. The original LBP operator is composed of a round of eight neighbor pixels and value of pixel locates in the center is the threshold. Any surrounding pixel with a higher gray value (or equal) than center one’s will be assigned number 1 and others get number 0. The LBP code is represented by the binary number sequence after processing which contains basic feature information inside. The principle of LBP shows in Figure 2.3. Figure 2.3: The process of LBP feature extraction. The decimal value represents the texture information. Improvement for LBP was then introduced including using different size of operators, different number of operator’s neighbor pixels and uniform pattern. A uniform LBP contains pattern with at most two bitwise transitions from 0 to 1 or vice versa 21 (Figure 2.4). By using uniform LBP, both computing resource and memory are spared. And only important basic features like edge, corner and spot are selected. Figure 2.4: The left one which has 5 transitions and in only assigned to a single label. The right one with 1 transition(transitions ≤ 2) is an uniform LBP. 2.5 Classification After getting features extracted from face images, we could either use it for matching directly or build a classifier for face classification. There are a large number of classifier been proposed for face recognition, and we choose some of the representative methods to introduce. 2.5.1 Support Vector Machine Generally speaking, Support Vector Machine (SVM) is a binary classification model that separates two classes in the most perfect way. It selects a boundary hyperplane to maximize the margin (the minimum distance between hyperplane and features on both sides). In order to achieve this, two parallel hyperplanes that push against nearest features are introduced. The distance between boundary hyperplane and either one of the parallel hyperplanes is the margin. Therefore, the larger the value of margin 22 is, the better is the separability of two classes. 2.5.2 Neural Network As the biological principle being applied on the bionic theory and computer science technology, artificial neural network is born and has been deeply explored and widely used in the field of artificial intelligence. Benefits from its architecture, classifiers based on neural network are naturally applied to face recognition. Artificial neurons are the basic components connected with each other using weights to control the passing information. A lot of architectures for neural network have been proposed including feedforward neural network, back propagation, multilayer perceptron and deep neural network. Convolutional neural network is currently the focus in face recognition researches and we will discuss it further more in the next chapter. 23 Chapter 3 Previous Work In this chapter, we will introduce previous works that founded and inspired our research. The foundational approach utilizing small pixel blocks as features for recognition named ”Pairwise Macropixel Comparison” is introduced at first. Its characteristics are then discussed. The next section will focus on discussing Convolutional Neural Network (CNN) which gives us ideas about how to improve the Macropixel approach. The general architecture of CNN is introduced together with some state-of-the-art research progresses. 3.1 Pairwise Macropixel Comparison In 2D face recognition system, pixel is the fundamental information carrier for human faces, whether they are represented by images or videos. However it is self-evident that single pixel is not able to provide effective information for recognition process. 24 But on the other hand, images with a few pixels might be able to represent some information for face features. As a result, urged by the question that whether small pixel block could be used for face recognition by comparing holistically, the idea of Macropixel is proposed. Research objective of the method is to explore if it can get no worse result than other holistic methods, which also implies that, since there is no subspace or dimension reduction applied, this currently developed technique may not be necessary for face recognition process[6]. 3.1.1 Definition The Macropixel approach is built based on several key concepts, which not only explain terminologies but also provide a complete blueprint for the designed process. These definitions are explained below: Preprocessing: Face geometric normalization is employed previously in order to get face images with fixed orientation, size and position using eyes as center alignment. It is assumed that each image after geometric normalization has a size of N × M . As conventional image enhancement methods support direct processing without changing data structure mentioned in 2.3, they are used for Macropixel for the sake of comparing with other approaches utilizing the same pre-processing techniques. As default, vector normalization is utilized on macropixel to eliminate the illumination variation. Macropixel and Shifting: Macropixel is a group of pixels segmented from the targeting image in the shape of square. The size of macropixel is set to K × K (the default number of K is 4). Macropixel is transferred into 1 dimensional vector V = [v1 , v2 , ..., vx ] where x = K 2 for distance calculation. For the purpose of aligning corresponding macropixel features between face images, there is a margin area with 25 size of s pixels (the default number of s is 2) at each side of the image. The macropixel and shifting is shown in Figure 3.1. Figure 3.1: Assuming image with size 12 × 12, the 2 pixels margin area are for shifting purpose, and we segment 4 macropixels with size 4 × 4 from this image. Distance: Measuring similarities is a major task for data analysis, which also applies to the comparison of Macropixels. The Euclidean distance between two macropixels is chosen as the method of similarity measurement[10]. Euclidean distance D(V1 , V2 ) between two macropixel vectors V1 = [v11 , v12 , ..., v1x ] and V2 = [v21 , v22 , ..., v2x ] is defined as: D(V1 , V2 ) = p (v11 − v21 )2 + (v12 − v22 )2 + ... + (v1x − v2x )2 v u x uX D(V1 , V2 ) = t (v1i − v2i )2 (3.1) (3.2) i=1 With a shorter distance, the degree of similarity is higher. Corresponding macropixel: Assuming there is a macropixel M whose left top corner’s coordinate is (x0 , y0 ) in image A, the eligible macropixels in image B are defined as ones with left top corner’s coordinates ranging from (x0 − s, y0 − s) to (x0 + s, y0 + s). After calculating distances between M and the eligible macropixels, the one with the shortest distance is selected as the corresponding macropixel of M. 26 The candidates and corresponding object are shown in Figure 3.2. Figure 3.2: Calculating distances between macropixel M in image A and candidates Ci(i = 1, 2, ..., n) in image B. The corresponding object of M is among Ci and has the shortest distance. 3.1.2 Recognition Process The recognition pipeline of Macropixel approach is shown as follows: a Query image A is the input face image waiting for identification, and the system stores images with different identities as templates. All images have the size of N × M. b A counter is set for each identity in the template images. c Assuming N − 2s and M − 2s are dividable by K, image A is segmented into a number of R macropixels which have a size of K × K leaving margin area with the size of s pixels for shifting. R = ((N − 2s)/K) ∗ ((M − 2s)/K). d Sequentially picking one macropixel P in A, the system finds corresponding macropixels from all template images by calculating and comparing the distances between P and eligible macropixels. e After comparing distances between P and all the corresponding macropixels, the one with the shortest distance is selected together with its identity T. 27 f The number of identity T is increased by 1 in the counter. g System repeats processes of d, e, f until all macropixels in Image A have been traversed. h Query image A is identified with the Identity which has the largest value in counter. 3.2 Convolutional Neural Network We have briefly introduced deep learning and CNN in chapter 1. As an improved architecture of neural network, deep learning is able to illustrate complex concept or object by comprehending more basic fact through layers, which is the state-of-the-art achievement in the field of constructing or simulating artificial intelligence. Among all deep learning methods, CNN is one of the major framework of deep learning. Benefiting from its inherent relationship with visual cortex, it has achieved great progress in computer vision. CNN is firstly motivated by Hubel and Wiesel’s biological study of animal visual cortex[15]. They not only discovered the cerebral hierarchical mechanism of visual neurons. But also, the concept of receptive field, which means a sensory region whose stimulus could make particular neuron responses, was introduced into the research. Then based on the Multilayer perceptron(MLP) architecture and Backpropagation(BP) algorithm originally proposed by Werbos[53] and carried forward by Rumelhart [40] , Lecun proposed the convolutional neural network utilizing gradient descent backpropagation algorithm[21][22] which was then promoted into face recognition[20]. 28 3.2.1 The general framework Figure 3.3: LeNet-5 is a typical Convolutional Neural Network framework. C1, C2 and C3 are convolutional layers. P1 and P2 are pooling layers. F1 is fully-connected layer. Figure 3.3 shows the typical architecture of LeNet-5[22], which is a classic CNN model used for digital recognition. The major frame of the first part is a series of feature extracting and feature mapping procedures operating in convolutional layer and subsampling layer respectively. The system gradually uses a number of detailed feature maps (like edges or blobs) as inputs coming from the previous layer and extracts local features from them, which is then used for constituting more abstract feature maps. In the second part, the features are imported into fully-connected layers which are equivalent to traditional MLP for the purpose of classification. Each stage of the network is further explained in detail below: Convolutional Layer: As a variant of feedforward neural network, CNN distinguishes itself because of two properties: sparse connectivity and shared weights. And these two ideas are implemented in the convolutional layer via convolutional kernel operation (see equation 3.3). The kernel is a linear filter with a set of weights and it only connects to one sub-region of the input feature map each time instead of fully connecting. By executing the computation repeatedly through the input image with the same kernel which equates to convolution of the input, a corresponding output feature map is produced. Hence, one feature map is generated by only one kernel, 29 which reduces the computational burden. Generally, the output consists of multiple feature maps to have a comprehensive representation of the input. Assuming a 2D input feature map in layer l is hl (with size M l × N l ) and each subregion is xlij with a stride of a, the output feature map is hl+1 and each sub-region is xl+1 ij , weighted filter is W (with size K × L), the bias is b, the convolutional operation is shown below: l xl+1 ij = A(W ∗ xij + b) (3.3) Where i ≤ M l+1 = (M − K + a)/a, j ≤ N l+1 = (N − L + a)/a and size of feature map hl+1 is M l+1 × N l+1 . A(x) is activation function that introduce non-linearity into the network, for example A(x) = max(0, x) (Rectified Linear Unit function [12]) orA(x) = tanh(x). Pooling Layer: Pooling layer, which is also called subsampling layer, is meant to compress the feature map by merging sub-regions into one value. There are many methods for pooling like average-pooling or sum-pooling [4] but the typical one is max-pooling that exports the maximum value from the sub-regions. The purpose of pooling is to eliminate unnecessary and redundant information from the feature map and preserve important ones. This leads to better robustness over position and lighting condition as well as reducing computation in the next layer. Fully Connection Layer: While the lower layers are several convolution and pooling layers, the upper layers have convolutional and traditional MLP architecture where neurons connect to all units in the previous layer. They play roles of classifiers mapping extracted features to the most correlated class. 30 Training Process: CNN’s training process is based on BP algorithm. BP is a method employed to pass cost function back through network by using gradient descent algorithm and update weights in order to minimize the cost. The system would propagate forward through the network in order to get the outputs which are used to calculate cost with the labeled correct values. Then it propagates backward calculating cost of each neuron’s output with gradient. Ultimately, weights are adjusted based on the costs. But unlike in traditional multilayer neural network, BP method in CNN network requires simulated operations restoring the status of hidden layers before pooling or convolution in order for propagating backward. 3.2.2 Improvement of CNN All Convolutional Net: In 2015, Springenberg proposed a CNN architecture replacing pooling layers with stride convolutional layers[46]. Since pooling layers’ major function is to reducing the spatially dimension of feature maps, he came up with an idea to build an all convolutional net serving the same purpose as the convolutionpooling structure. The result shows that all convolutional net could get better result sometimes and pooling layers are not necessary for CNN. Global Average Pooling (GAP): A network structure Network In Network (NIN) is proposed employing global averaging pooling layer instead of fully-connected layer[23]. In the approach, each feature map is a confidence map representing a specific category by using the average value of the feature map as the label which is called the confidence value. It significantly reduces the scale of network and alleviates overfitting. 31 Chapter 4 Proposed Algorithms Although macropixel method introduced in chapter 3 has been approved to be competitive among mainstream holistic approaches, it is still in its very primitive stage and requires development. So encouraged by the promising result comes from Macropixel Comparison Approach, we explored the improvable possibilities of it and proposed a new convolutional macropixel approach for face recognition. 4.1 Convolutional Macropixel Comparison Approach In the research of CNN, convolutional layer has been found to be extremely effective in extracting features from face images. Inspired by structure of the layer, two key concepts are introduced into the original approach of macropixel: deeply overlapped comparison and weighted filter counter. Both of them serve the purpose of obtaining richer feature information from macropixels. The experiments and results in chapter 32 5 show that even though utilizing one of them could not guarantee a better result comparing to the original one, combining them together would significantly improve the recognition rate. Figure 4.1 shows the pipeline of our approach. 4.2 Overlapping: A Convolutional Way Overlap representation of features is proved to be effective for improving the performance of face recognition methods based on low-level features[7]. Biological evidence also indicates that the animal vision system utilizes overlapping receptive fields, which is then developed into the convolutional operation in CNN framework[16]. So in our work, an overlapping extractor is applied to the method of macropixel comparison. A standard macropixel approach performs extracting with non-overlapping which was shown before in 3.1. We keep the extracting but add overlap operation onto it. Instead of skipping macropixel’s size K pixels to the next macropixel, the system skips less number of pixels to the next macropixel which results in overlapping extraction. Parameters p, q ∈ [1, K) are set to restrain the stride of extractor, where K is the length of macropixels. For the extreme configuration, p and q are set to 1 in order to perform deep overlap for macropixel extraction and comparison. As shown in Figure 4.2, for image A with size N × M and shifting number s, the macropixels (N × M ) extractor is convoluting through the whole image. In original method, the total number of macropixels for A is R = ((N − 2s)/K) ∗ ((M − 2s)/K). After the implantation of overlapping extractor, the amount of macropixels is: 33 34 Figure 4.1: The Whole Framework is shown above. The system is divided into two stages: training stage and recognition stage. The training stage produces weighted filter which is used on original recognition. One thing to note here is that all macropixel comparison operations are employing deep overlap. Figure 4.2: For image with size 10 × 10(the effective area is 8 × 8 removing 2 shifting space) and macropixel with size 4 × 4, deep overlap will extract 5 × 5 macropixels out. In this example, P = Q = 5. 35 R0 = ((N − 2s − k) ÷ p + 1) ∗ ((M − 2s − k) ÷ q + 1) (4.1) Overlapping of macropixels provides complete low-level feature information for comparing. But it also brings more non-significant and interfering features into the system, which might leads to higher verification error rate. In chapter 5, it is shown that macropixel method with overlapping extractor solely accomplishes fluctuated performances comparing to the original one. Accordingly, the weighted filter is employed to filter macropixels that do not benefit recognition. 4.3 Weighted Filter Counter As mentioned in 1.4, macropixel can be treated as discriminate features of faces which means that all features have equal importance. However, in face recognition, there are features that contribute a lot in distinguishing one face from another and features that provide less useful information. Therefore, a weighted filter is introduced filtrating macropixels with higher recognition error rate. In order to implement weighted filter, the system is composed of two stages: training stage and recognition stage. The training stage uses template images to calculate the weighted filter based on the error rate of each macropixel’s location. The training process would repeat several times in order to get a stable weighted filter. The filter is then used on the counter in the recognition stage. Recognition stage is almost the same as the original macropixel approach except the counter is connected to the weighted filter. 36 4.3.1 Training Stage Weight Training Operation: Template images are randomly divide into two sets: weight training set and weight testing set. Each set contains a certain number of images for each identity. The weight training operation is executed using these two sets the same way as the original macropixel comparison operation. Error Rate Counter: The purpose of training stage is to figure out the error rate on each macropixel’s location. With a higher error rate, macropixel at that location is more unreliable in face recognition. In order to do so, we employ a new counter which does not count the value of the identity with shortest distance in weight training operation, but counts if macropixel with specific coordinate is recognized as the wrong identity. The counter Ce is built in the form of matrix whose size is the same as the amount of macropixels extracted from one image(4.1). Each element cij (0 ≤ i ≤ P, 0 ≤ j ≤ Q) in the matrix counter is corresponding to the coordinate of the macropixel Mij in image. Weighted Filter: The weighted Filter is produced on the basis of error rate counter after training process has finished. First all elements in counter are used to subtract the total number of weight testing images Otest . Then we find the maximum value cmax among all elements in Ce . Then all elements are divided by cmax and we can get the weighted filter Oe0 : c0ij = (Ot − cij ) ÷ max(Ce ) The weighted filter is shown in Figure 4.3. The pipeline of training stage is shown as follow (Figure 4.4): 37 (4.2) Figure 4.3: Here is how to transfer an Error Rate Counter to Weighted Filter. The number of weight testing images Ot est is 90 and we make 90 minus each value in the counter to get an intermediate matrix. After pick out the max value 55 from the matrix, we divide each element by 55 to produce weight filter. a All the template images T ( with a number of O) are randomly divided into training images for weight Otrain = {at1 , at2 , ..., atn } and testing images for weight Otest = {bt1 , bt2 , ..., btm } where O = m + n. b Set a new error rate counter Cen (n = 1, 2, ..., X) where X is the repeat times of training. c Otrain and Otest are going through the original macropixel comparison process for the first macropixel. d For any image in Otest that is identified as a wrong person, error rate counter’s element value for the corresponding macropixel would increase by 1. e System repeatedly executing step b and c until the last macropixel. 0 f The weighted filter Cen is generated using counter with fomula 4.2. g System repeatedly go through step a to step f X times. By adding all weighted filter together and getting the average value, the stabilized weighted filter is produced: Cf inal = X X 1 38 0 ÷X Cen (4.3) Figure 4.4: By comparing between training images at1 and at2 , the system identify the first pixel of at1 as ID1 which is a success so it adds 0 to the counter. For image at2 , the identification operation fails so system adds 1 to the counter. The total value for counter in the first location is 0 + 1 = 1. 4.3.2 Recognition Stage We have explained the process of original macropixel approach in 3.1.2. There is not much difference in the new recognition stage except step f. In step f for the new method, we pick value of weighted filter’s element corresponding to the macropixel the system is executing, and add that value to the counter instead of 1. 4.3.3 Independent Weight Training Although Weighted Filter is an effective tool that improves performance of Macropixel comparison, its operation requires more computational resources comparing to the original method. To solve this problem, we propose independent training approach in which the training stage is operated individually to get pre-trained Weighted Filter and use it on recognition stage separately. The idea is to train a filter using template images and reuse it on other faces’ recognition. The independent training operator is 39 defined as: IDB nt wt (4.4) where DB is the data set used for training, nt is the number of template images and wt is the number of weight training images. From the experiment results in Chapter 5, it is indicated that the pre-trained weight filter produced by one database is of generality and could increase the recognition rate on different databases. 40 Chapter 5 Experiments In this chapter, the experiment’s scheme is introduced at first. We tested our framework on three databases following the original paper[6] including the UIUC version of Yale face database, the ORL Database of Faces, and The CMU Multi-PIE Face Database in order to measure the improvement of system intuitively. To explore the characteristics of our new algorithm, the performances of system on Yale database with a wide range of different situations are studied. And then we move on to ORL and PIE database which have larger data scale. 5.1 Face Dataset For the purpose of studying performance of proposed approach comparing to the original one, we continue to use UIUC version of Yale, ORL and PIE Database. The databases contain two size versions (pixel size of 32 × 32 and 64 × 64 ) and have been 41 Figure 5.1: Face images of 4 sample subjects from Yale Database. Each subject contains 11 images with different illumination conditions and expressions. geometric normalized. Yale Face Database: Yale is a classic database with 15 subjects and 11 images for each subject (see Figure 5.1). These 165 grayscale images are with different facial expressions, various illumination conditions and glass occlusion. We take the advantage of dataset’s small size and its diversiform conditions to evaluate the adaptation of our approach and to find the optimized parameters set. ORL Database: ORL database has 40 distinct subjects with 10 images each identity. Images are taken under varying head orientation, face expressions, illumination conditions and with or without glasses (see Figure 5.2). PIE Database: There are 11554 images of 68 subjects in PIE database. Images of each subject are taken under different illumination conditions and head orientations (see Figure 5.3). Because of the huge amount of data, we employed GPU for experiments which also explore the potential of computing efficiency of Macropixel Approach using parallel computing. 42 Figure 5.2: Face images of 4 sample subjects from ORL Database. Each subject contains 10 images with different illumination conditions, face directions and expressions. Figure 5.3: Face images of 2 sample subjects from PIE Database. Figure only shows 80 images for each subject. There are totally 68 subjects and each of them contains more than 100 images. 43 Table 5.1: The parameter settings of weighted filter experiment for Yale Database. Type of Parameter Parameters Set Size of Image(N × M ) N × M = {32 × 32, 64 × 64? } Template Images per Subject(NT) N T = {nt|2 ≤ nt ≤ 8, nt ∈ Z} Weight Training Images per Subject(WT) W T = {wt|1 ≤ wt ≤ N T − 1, wt ∈ Z} Training Frequency(X) X = {25, 50, 100? , 200}   N U LL, IY ale nt wt Independent Training Phase Note: ? indicates default setting which is applyed for experiments testing other type of parameters.  : ’NULL’ indicates experiment without independent training. nt = 2, 3, 4, 5, 6, 7, 8, mt = 2, ..., nt − 1 5.2 Experiment Design The designed experiment is consisting of three phases which test the performance of our proposed system integrally or separately. As overlapping and weighted filter could operate independently, we apply each of them into the recognition for analyzing and combine them together for optimized results. 5.2.1 Experiment for Weighted Filter To evaluate the performance of weighted filter, we start with testing the system on Yale database with a wild range of different parameters sets. The parameter settings are listed in Table 5.1. The weight training experiment traverses through all the WT settings in order to 44 Figure 5.4: The total value change of weighted filter over training times. It drops under 5 after repeating 50 times. pick out the WT number which could mostly improve the performance of the system and avoid over fitting or under fitting. We test on weight training frequency to balance efficiency and effectiveness as the frequency directly affects time consumption of the system. The change of weighted filter over training times is shown in Figure 5.4. The difference value has decreased to a slight degree after 50 times of repeating, which is the basis of our parameter selections for experiment on training frequency. We also study performance of independent weight training in the experiment. After randomly choosing one data sequence in Yale, we train the filter with every NT-WT combination and use them on Yale face recognition to find the best training parameters. As face images in different databases are taken under various conditions, the filter trained by Yale is then employed to be tested on ORL and PIE to see if it could improve performance of the system. 45 Table 5.2: The parameter settings of overlapping experiment for Yale, ORL and PIE Databases. Type of Parameter Parameters Set Size of Image(N × M ) N × M = {32 × 32, 64 × 64} Template Images per Subject(NT) N T = {nt|2 ≤ nt ≤ 8, nt ∈ Z} Stride of Overlap(S) S = {1, 2} Note: ? indicates default setting which is applyed for experiments testing other type of parameters. 5.2.2 Experiment for Overlapping As overlapping works better on larger dataset, we have tested on each database with different overlap degrees. The settings are shown in Table 5.2. The optimized parameter setting will be applied to the compact system experiment. Considering our experiment are based on macropixels of size 4 × 4, all three databases are tested for overlapping with a stride of 1 or 2. 5.2.3 Experiment for Compact System Table 5.3 shows the parameter settings for compact system experiment. For compact system, we choose the optimized parameter setting for testing. The parameters we selected are obtained by using the trial and error approach. 46 Table 5.3: The Parameter Settings of compact system experiment for Yale, ORL and PIE Databases. Type of Parameter Parameters Set Size of Image(N × M ) N × M = {32 × 32, 64 × 64} Template Images per Subject(NT) N T = {nt|2 ≤ nt ≤ 8, nt ∈ Z} Weight Training Images per Subject(WT) WT = 1 Training Frequency(X) X = 100 Stride of Overlap(S) S = {1, 2} 2 Shifting Pixel Number  Independent Training Phase 5.3 Experiment Result 5.3.1 Result for Weight Training N U LL, IY ale 21  Table 5.4 and 5.5 exhibit the recognition error rate of Macropixel Comparison system implanted with Weighted Filter Counter for Yale Database. 5.4 lists performances of all the WT settings and 5.5 lists performances of system employing independent weight training. Results from both tables are tested on images with size 64 × 64. As shown in the results, the optimal parameter settings for weighted filter is W T = 1 and for independent training it is IY ale 12 . As number of training images or number of template image for independent training gets higher, the error rate ascends. Comparing to the original result, the error rate has decreased by nearly 40 Table 5.6 and 5.7 show the results for other parameter settings, which are 32 × 32 image size and training frequency(X) separately. Weighted Filter also increases the performance of recognition for 32 × 32 images as average error rates fall by nearly 30 47 Table 5.4: Results of Macropixel approach with Weighted Filter for Yale database. Training Images (WT) original Template Images(NT) 2 3 4 5 6 7 8 14.9420 11.0306 8.0270 6.1481 4.6800 5.2722 3.4519 1 10.3111 7.3333 4.9524 3.8444 2.8800 3.4000 1.7778 2 NULL 9.0000 5.2000 4.0000 2.9333 3.5667 1.9556 3 NULL NULL 5.2762 4.1556 3.0133 3.6333 2.0444 4 NULL NULL NULL 4.1778 3.0667 3.7333 2.0000 5 NULL NULL NULL NULL 3.0667 3.8667 2.0444 6 NULL NULL NULL NULL NULL 3.8333 2.0889 7 NULL NULL NULL NULL NULL NULL 2.0444 Table 5.8 and Table 5.9 contain the results from ORL and PIE with weight training. The improvement is not as remarkable as for Yale, but it could still benefits the recognition rates for both sizes. 5.3.2 Results for Overlap Table 5.10 and 5.11 show the experiment results on all three database testing overlapping. The results of 5.10 shows that most the recognition rate does not get better but a little bit worse for Yale by employing overlapping because of a small quantity of subjects except the one from image size of 32 × 32 with a stride of 2. As seen from the results, the performance of system with overlap(S = 2) is better than deep overlap (S = 1 ). And it is worth noting that along with rising number of template images (NT), the performance of system is getting better. For larger databases which have more subjects, performance of the system has 48 Table 5.5: Results of Macropixel approach with Independent Weighted Filter for Yale database. Independent Train Operator Template Images(NT) (IDB wt nt ) 2 3 4 5 6 7 8 original 14.9420 11.0306 8.0270 6.1481 4.6800 5.2722 3.4519 IY ale 12 IY ale 13 IY ale 23 IY ale 14 IY ale 24 IY ale 34 IY ale 15 IY ale 25 IY ale 35 IY ale 45 IY ale 16 IY ale 26 IY ale 36 IY ale 46 IY ale 56 IY ale 17 IY ale 27 IY ale 37 IY ale 47 IY ale 57 IY ale 67 IY ale 18 IY ale 28 IY ale 38 IY ale 48 IY ale 58 IY ale 68 IY ale 78 9.3037 6.8000 4.7238 3.6889 2.8000 3.1667 1.9556 10.9778 7.6500 5.3143 4.0667 3.2533 3.5333 2.0444 10.9926 7.7333 5.4095 4.2444 3.2000 3.6000 2.0889 10.9037 7.8000 5.4286 4.1778 3.2000 3.8333 2.0889 11.0815 7.8667 5.5238 4.2000 3.1467 3.9667 2.0889 11.2815 8.0000 5.5619 4.2444 3.2533 4.0000 2.2667 11.2444 7.8667 5.6381 4.3778 3.2800 3.7333 2.2222 11.4815 8.0667 5.7143 4.4444 3.3067 3.9333 2.2667 11.7037 8.3000 5.8095 4.5111 3.3600 4.0333 2.2667 11.7778 8.4167 5.8476 4.5556 3.4133 4.1000 2.3111 10.9481 7.8167 5.3524 4.2000 3.1467 3.8000 2.0444 11.0815 7.9500 5.4286 4.3778 3.1467 3.7667 2.0889 11.1111 7.9833 5.5429 4.3778 3.2800 3.8000 2.0889 11.2148 8.0333 5.4667 4.3333 3.2800 3.8000 2.0444 11.3630 8.3000 5.6381 4.3778 3.3067 4.0667 2.1333 11.5852 8.2667 5.7524 4.5111 3.4933 4.2333 2.2667 11.8667 8.5167 5.9238 4.6222 3.6267 4.3000 2.5333 11.9556 8.5000 5.9810 4.6889 3.6267 4.3000 2.4000 12.0148 8.5833 6.0952 4.7333 3.6533 4.3333 2.5333 12.1185 8.6500 6.0571 4.7778 3.7067 4.3333 2.5778 12.1630 8.7333 6.0762 4.8000 3.7867 4.3667 2.5778 11.2593 7.9667 5.5238 4.4444 3.2267 3.8333 2.2222 11.4667 8.1500 5.7143 4.5111 3.4400 4.0333 2.2667 11.6296 8.2167 5.7905 4.5333 3.5200 4.1000 2.3111 11.7926 8.3333 5.9238 4.5333 3.5200 4.2000 2.3556 11.8963 8.5333 5.9619 4.5778 3.5467 4.2333 2.3111 11.8963 8.5000 5.9238 4.5778 3.5733 4.2000 2.3111 12.1037 8.6833 6.0381 4.5778 3.5467 4.2333 2.3111 49 Table 5.6: Results of smaller size images experiment for Yale database. Template Images(NT) Training Condition 2 3 4 5 6 7 8 original 24.9267 19.3750 16.2781 14.0241 13.7436 12.6750 11.6529 WT = 1 17.1630 13.2833 10.3238 8.8222 8.0933 7.3667 6.5333 IY ale 12 21.4074 16.3417 13.8095 11.5778 11.1733 10.7667 9.6000 Table 5.7: Results of frequency experiment for Yale database. Training Frequency(X) Template Images(NT) 2 3 4 5 6 7 8 Total Difference original(100) 10.3111 7.3333 4.9524 3.8444 2.8800 3.4000 1.7778 0 25 10.3333 7.4333 5.0667 3.8222 2.8000 3.4667 1.8222 -0.2454 50 10.2889 7.4167 5.0095 3.8444 2.8267 3.4000 1.8667 -0.1539 200 10.2667 7.3833 4.9905 3.8889 2.8267 3.4333 1.8667 -0.1571 Table 5.8: Results of Macropixel approach with Independent Weighted Filter for ORL database. Image Size 32 × 32 64 × 64 Training Condition Template Images(NT) 2 3 4 5 6 7 8 original 13.2620 7.2505 3.6090 1.9380 1.4038 0.7722 0.7583 IORL 12 13.0875 6.8786 3.4417 1.8200 1.2125 0.5667 0.7250 original 9.6188 5.1643 2.4083 1.1600 0.8250 0.4167 0.3500 IORL 12 9.1875 5.0250 2.2333 1.0400 0.6875 0.3500 0.3250 50 Table 5.9: Results of Macropixel approach with Independent Weighted Filter for PIE database. Image Size 32 × 32 64 × 64 Template Images(NT) Training Condition 5 10 20 30 50 70 80 90 110 130 original 32.8474 17.3366 7.0250 3.9663 1.6895 0.8564 0.6998 0.5469 0.3378 0.2106 IP IE 12 23.0199 10.3164 3.7177 1.9252 0.7808 0.3830 0.3255 0.2429 0.1620 0.1002 original 21.3626 9.7984 3.7438 2.1174 1.1228 0.6589 0.5920 0.46040 0.3933 0.2917 IP IE 12 18.9099 8.6086 3.3710 1.9529 0.9973 0.6597 0.5411 0.4233 0.3343 0.2572 significantly improved. For ORL, implanting deep overlap which is overlap with stride of 1 has decreased the error rate by more than 30 percent. As for PIE database, the situation is similar as the average error rate decreases by 40 percent on average. 5.3.3 Results for Compact System By designing and conducting a series of separated experiments and exploring the characteristics of our proposed system, we are able to select the optimal parameters for face recognition. Table 5.12 and 5.13 show the compact system results for all three databases. Results from Yale for compact system are not as good as the ones implementing weighted filter individually, as overlapping works better on database with more subjects. Performance of the compact system on ORL and PIE have exceeded the ones of original and separated system. We also test weighted filter trained by Yale database on ORL and PIE. Although the recognition rates are not as good as the optimal results, it is still better than the original ones. These results implies that weighted filter is of reasonable applicability and by producing pre-trained weighted filter and 51 Table 5.10: Results of Overlapping for Yale and ORL database. Target Database Image Size Template Images(NT) Stride (S) 2 3 4 5 6 7 8 original 24.9267 19.3750 16.2781 14.0241 13.7436 12.6750 11.6529 Yale Yale ORL ORL 32 × 32 S=2 24.4321 18.9778 15.6540 14.0778 13.6311 12.5778 10.8519 S=1 25.5481 19.6000 16.0571 14.6556 13.9200 12.8833 11.9556 original 14.9420 11.0306 8.0270 6.1481 4.6800 5.2722 3.4519 S=2 15.5370 11.8542 8.9286 5.4815 4.0667 5.0417 3.2222 S=1 16.7704 12.3500 9.4381 7.1630 5.0000 5.6333 3.3333 64 × 64 original 13.2620 7.2505 3.6090 1.9380 1.4038 0.7722 0.7583 S=2 9.6594 4.6399 1.8597 0.8250 0.5271 0.2833 0.2375 S=1 9.6521 4.5929 1.7306 0.8850 0.4875 0.1917 0.1375 original 9.6188 5.1643 2.4083 1.1600 0.8250 0.4167 0.3500 S=2 8.8031 4.3631 1.6750 0.8000 0.5188 0.3667 0.3750 S=1 8.0062 3.8786 1.5250 0.6900 0.4000 0.3000 0.3500 32 × 32 64 × 64 Table 5.11: Results of Overlapping for PIE database. Image Size 32 × 32 64 × 64 Stride (S) Template Images(NT) 5 10 20 30 50 90 110 130 original 32.8474 17.3366 7.0250 3.9663 1.6895 0.8564 0.6998 0.5469 0.3378 0.2106 S=2 23.2225 9.4408 2.9597 1.3428 0.4298 0.2283 0.1646 0.1301 0.0672 0.0445 S=1 23.8812 9.7518 3.0602 1.3805 0.4055 0.1698 0.1076 0.0799 0.0447 0.03135 original 21.3626 9.7984 3.7438 2.1174 1.1228 0.6589 0.5920 0.46040 0.3933 0.2917 S=2 17.1665 6.8520 2.2677 1.1868 0.5762 0.3881 0.3191 0.1608 S=1 16.2967 6.2375 1.9732 0.9869 0.4876 0.3095 0.2306 0.2156 0.1558 0.1197 52 70 80 0.2554 0.1987 Table 5.12: Optimized results of Compact System for Yale and ORL. Target Database Image Size Yale 32 × 32 Yale ORL ORL Stride Training (S) Condition Template Images(NT) 2 3 4 5 6 7 8 S=0 Original 24.9267 19.3750 16.2781 14.0241 13.7436 12.6750 11.6529 S=2 IY ale 12 20.9778 15.7500 12.9143 11.8222 10.7467 10.3000 8.6667 S=0 Original 14.9420 11.0306 8.0270 6.1481 4.6800 5.2722 3.4519 S=2 IY ale 12 10.0593 7.1500 4.9333 3.7778 2.8000 3.2667 2.1778 S=0 Original 13.2620 7.2505 3.6090 1.9380 1.4038 0.7722 0.7583 S=1 IORL 12 9.2813 4.3214 1.6333 0.7400 0.3875 0.1833 0.1000 S=0 Original 9.6188 5.1643 2.4083 1.1600 0.8250 0.4167 0.3500 64 × 64 S = 1 IORL 12 7.9688 3.7571 1.4833 0.6600 0.3875 0.2833 0.2500 S=1 IY ale 12 8.1687 3.8429 1.4917 0.6200 0.4000 0.1833 0.1500 64 × 64 32 × 32 Table 5.13: Optimized results of Compact System for PIE. Image Size 32 × 32 64 × 64 Stride Training (S) Condition Template Images(NT) 5 10 20 30 50 70 80 90 110 130 S=0 Original 32.8474 17.3366 7.0250 3.9663 1.6895 0.8564 0.6998 0.5469 0.3378 0.2106 S=1 IP IE 12 21.1784 8.4328 2.6290 1.1993 0.3586 0.1540 0.1050 0.0762 0.0442 0.0287 S=0 Original 21.3626 9.7984 3.7438 2.1174 1.1228 0.6589 0.5920 0.46040 0.3933 0.2917 S=1 IP IE 12 14.0806 4.4878 1.4028 0.7883 0.3311 0.2366 0.1963 0.1944 0.0982 0.0974 S=1 IY ale 12 15.3077 5.7477 1.7628 0.86959 0.4325 0.2772 0.2169 0.1907 0.14482 0.0983 reuse it on other occasion, we can reduce the computation and time significantly. 5.4 Parallel Computing Efficiency In order to exhibit the efficiency improvement of our system utilizing general purpose computation based on GPU, we test operation time on PIE database which is shown in Figure 5.5. The experiment records the operation time of one comparison operation comparing 221000 macropixels of size 4 ×4 with 2714 ones. The experiment results are obtained on a Nvidia Geforce GTX Titan Black 6GB GPU and Intel Xeon 53 Figure 5.5: Red bar is the running time using CPU with an average of 17 seconds per comparison operation. Blue bar is the running time using GPU with an average of 0.58 seconds per operation. E5-2603 1.8 Ghz CPU. As PIE has more than 10,000 images, the operation(basically matrix multiplication) requires a significant amount of computing resources. We can see from the figure that Parallel computing employing GPU accelerates the computation up to 30 times. 54 Chapter 6 Conclusion and Discussion 6.1 Conclusion In this thesis, we introduce a face recognition approach named Convolutional Macropixel Approach which is developed based on the naive Macropixel Comparison Method and inspired by the Convolutional Neural Network Architecture. Two major improvements are introduced to the approach and tested on several Datasets(i.e., Yale, ORL and PIE) separately or together. From what we learn from the experiments, the Convolutional Macropixel approach achieves significantly better recognition rate than the original Macropixel Comparison method in all three datasets. Weighted Filter is a pre-trained filter which is connected to the recognition phase and could generally increase the recognition rate. It works even better with less template images and less subjects. Overlap especially deep overlap uses convolutional way to compare macropixels and could improve the recognition rate significantly when the 55 number of recognition subjects is enormous. Parallel Computation is also tested for the proposed method and could greatly improve the recognition efficiency when dealing with large amount of data. Research proves that the idea of introducing CNN framework into mathematical face recognition method is feasible. It also demonstrates the potential of Macropixel Comparison Approach which not only could benefit from parallel computing, but also could utilize pre-trained weighted filter. 6.2 Future Work The conception of further improvements for the approach is listed below: • Face identification. The approach is proved to be reliable on face verification. To better explore the applicability of Convolutional Macropixel Approach, we could introduce it to face identification. • Weight training. It is possible to utilize other weighted filter training methods including deep learning framework to the approach. • Preprocessing. Inherited from the original macropixel approach, any preprocessing methods that applied to the images are compatible with our imrpoved approach. 56 Bibliography [1] T. Ahonen, A. Hadid, and M. Pietikäinen. Face recognition with local binary patterns. Computer vision-eccv 2004, pages 469–481, 2004. [2] M. Ballantyne, R. S. Boyer, and L. Hines. Woody bledsoe: His life and legacy. AI Magazine, 17(1):7, 1996. [3] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence, 19(7):711–720, 1997. [4] Y.-L. Boureau, J. Ponce, and Y. LeCun. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 111–118, 2010. [5] A. Bukis and R. Simutis. Face orientation normalization using eye positions. Computer Technology and Application, 4(10), 2013. [6] L. Chen. Pairwise macropixel comparison can work at least as well as advanced holistic algorithms for face recognition. In BMVC, pages 1–11. Citeseer, 2010. [7] D. Cox and N. Pinto. Beyond simple features: A large-scale feature search approach to unconstrained face recognition. In Automatic Face & Gesture Recogni57 tion and Workshops (FG 2011), 2011 IEEE International Conference on, pages 8–15. IEEE, 2011. [8] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005. [9] J. G. Daugman. Two-dimensional spectral analysis of cortical receptive field profiles. Vision research, 20(10):847–856, 1980. [10] B. J. Frey and D. Dueck. Clustering by passing messages between data points. science, 315(5814):972–976, 2007. [11] C. Garcia and M. Delakis. Convolutional face finder: A neural architecture for fast and robust face detection. IEEE Transactions on pattern analysis and machine intelligence, 26(11):1408–1423, 2004. [12] R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789):947, 2000. [13] B. Heisele, M. Pontil, et al. Face detection in still gray images. Technical report, DTIC Document, 2000. [14] R. Heitmeyer. Biometric identification promises fast and secure processing of airline passengers. ICAO journal, 55(9):10–11, 2000. [15] D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology, 160(1):106–154, 1962. 58 [16] D. H. Hubel and T. N. Wiesel. Receptive fields and functional architecture of monkey striate cortex. The Journal of physiology, 195(1):215–243, 1968. [17] A. Hyvärinen, J. Karhunen, and E. Oja. Independent component analysis, volume 46. John Wiley & Sons, 2004. [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [19] M. Lades, J. C. Vorbruggen, J. Buhmann, J. Lange, C. Von Der Malsburg, R. P. Wurtz, and W. Konen. Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on computers, 42(3):300–311, 1993. [20] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back. Face recognition: A convolutional neural-network approach. IEEE transactions on neural networks, 8(1):98–113, 1997. [21] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E. Hubbard, and L. D. Jackel. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems, pages 396–404, 1990. [22] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [23] M. Lin, Q. Chen, and S. Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013. [24] C. Liu. A bayesian discriminating features method for face detection. IEEE transactions on pattern analysis and machine intelligence, 25(6):725–740, 2003. 59 [25] A. Livnat, C. Papadimitriou, N. Pippenger, and M. W. Feldman. Sex, mixability, and modularity. Proceedings of the National Academy of Sciences, 107(4):1452– 1457, 2010. [26] R. M. Makwana. Illumination invariant face recognition: a survey of passive methods. Procedia Computer Science, 2:101–110, 2010. [27] F. Matt Hicks. Making photo tagging easier, 2011. Online; accessed 01-February2018. [28] C. N. Matthew Braga. Facial recognition technology is coming to canadian airports this spring, 2017. Online; accessed 06-March-2018. [29] Microsoft. Windows hello face authentication, 2016. Online; accessed 10-August2016. [30] T. Ojala, M. Pietikäinen, and D. Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 29(1):51–59, 1996. [31] E. Osuna, R. Freund, and F. Girosit. Training support vector machines: an application to face detection. In Computer vision and pattern recognition, 1997. Proceedings., 1997 IEEE computer society conference on, pages 130–136. IEEE, 1997. [32] A. J. O’Toole, D. A. Roark, and H. Abdi. Recognizing moving faces: A psychological and neural synthesis. Trends in cognitive sciences, 6(6):261–266, 2002. [33] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In BMVC, volume 1, page 6, 2015. 60 [34] P. J. Phillips, W. T. Scruggs, A. J. OToole, P. J. Flynn, K. W. Bowyer, C. L. Schott, and M. Sharpe. Frvt 2006 and ice 2006 large-scale results. National Institute of Standards and Technology, NISTIR, 7408(1), 2007. [35] K. Ramı́rez-Gutiérrez, D. Cruz-Pérez, and H. Pérez-Meana. Face recognition and verification using histogram equalization. In Proceedings of the 10th WSEAS international conference on Applied computer science, pages 85–89. World Scientific and Engineering Academy and Society (WSEAS), 2010. [36] M. Rätsch, S. Romdhani, and T. Vetter. Efficient face detection by a cascaded support vector machine using haar-like features. In Joint Pattern Recognition Symposium, pages 62–70. Springer, 2004. [37] S. Romdhani, P. Torr, B. Scholkopf, and A. Blake. Computationally efficient face detection. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, volume 2, pages 695–700. IEEE, 2001. [38] H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE Transactions on pattern analysis and machine intelligence, 20(1):23–38, 1998. [39] J. Ruiz-del Solar and J. Quinteros. Illumination compensation and normalization in eigenspace-based face recognition: A comparative study of different preprocessing approaches. Pattern Recognition Letters, 29(14):1966–1979, 2008. [40] D. E. Rumelhart, G. E. Hinton, R. J. Williams, et al. Learning representations by back-propagating errors. Cognitive modeling, 5(3):1, 1988. [41] J. C. Russ. The image processing handbook. CRC press, 2016. [42] M. S. Sarfraz, O. Hellwich, and Z. Riaz. Feature extraction and representation for face recognition. In Face Recognition. InTech, 2010. 61 [43] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 815–823, 2015. [44] S. Shan, W. Gao, B. Cao, and D. Zhao. Illumination normalization for robust face recognition against varying lighting conditions. In Analysis and Modeling of Faces and Gestures, 2003. AMFG 2003. IEEE International Workshop on, pages 157–164. IEEE, 2003. [45] L. Sirovich and M. Kirby. Low-dimensional procedure for the characterization of human faces. Josa a, 4(3):519–524, 1987. [46] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. [47] Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873, 2015. [48] Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2892–2900, 2015. [49] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1701–1708, 2014. [50] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of cognitive neuroscience, 3(1):71–86, 1991. [51] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. 62 Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I–I. IEEE, 2001. [52] C. Von Der Malsburg. The correlation theory of brain function. In Models of neural networks, pages 95–119. Springer, 1994. [53] P. J. Werbos. Beyond regression: New tools for prediction and analysis in the behavioral sciences. Doctoral Dissertation, Applied Mathematics, Harvard University, MA, 1974. [54] L. Wiskott, N. Krüger, N. Kuiger, and C. Von Der Malsburg. Face recognition by elastic bunch graph matching. IEEE Transactions on pattern analysis and machine intelligence, 19(7):775–779, 1997. [55] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, 31(2):210–227, 2009. [56] M.-H. Yang, D. J. Kriegman, and N. Ahuja. Detecting faces in images: A survey. IEEE Transactions on pattern analysis and machine intelligence, 24(1):34–58, 2002. [57] C. Zhang and Z. Zhang. A survey of recent advances in face detection, 2010. [58] Z. Zhou, A. Wagner, H. Mobahi, J. Wright, and Y. Ma. Face recognition with contiguous occlusion using markov random fields. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1050–1057. IEEE, 2009. 63