A WEIGHTED REGIONAL VOTING BASED ENSEMBLE OF MULTIPLE CLASSIFIERS FOR FACE RECOGNITION by Jing Cheng B.I.S., Tianjin University, 2007 B.A., Tianjin University, 2007 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN MATHEMATICAL, COMPUTER, AND PHYSICAL SCIENCES (COMPUTER SCIENCE) UNIVERSITY OF NORTHERN BRITISH COLUMBIA December 2011 © Jing Cheng, 2011 1+1 Library and Archives Canada Bibliotheque et Archives Canada Published Heritage Branch Direction du Patrimoine de I'edition 395 Wellington Street Ottawa ON K1A0N4 Canada 395, rue Wellington Ottawa ON K1A 0N4 Canada Your file Votre reference ISBN: 978-0-494-87531-5 Our file Notre reference ISBN: 978-0-494-87531-5 NOTICE: AVIS: The author has granted a non­ exclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distrbute and sell theses worldwide, for commercial or non­ commercial purposes, in microform, paper, electronic and/or any other formats. L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis. Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant. Canada Table of Contents List of Tables vi List of Figures vii Abstract ix Acknowledgements x Chapter 1 1 Introduction 1.1 Motivation 1 1.2 Major Contribution 2 Chapter 2 Literature Review 4 2.1 Procedure of Face Recognition Systems 5 2.2 Overview of Approaches 7 2.3 Principal Component Analysis (PCA) 9 2.4 Fisherface 11 2.5 Spectral Regression 13 2.6 Spatially Smooth Subspace Learning 15 2.6.1 2.7 Analysis (LDA) 16 2.6.2 Tensor Extensions 17 2.6.3 Spatially Smooth Subspace Learning 17 Regional Voting Chapter 3 3.1 Locality Preserving Projections (LPP) and Linear Discriminant 19 Proposed Approach 21 Weighting Scheme 22 3.1.1 One Applies One 25 3.1.2 One Applies All 25 3.1.3 Joint Weight 25 IV 3.2 Classifier Set Up and Recognition 25 Chapter 4 Experiments 29 Chapter 5 Analysis 38 Chapter 6 Summary 48 Bibliography 50 v List of Tables 3.1 Classifier Set Up 27 3.2 Classification on each region by one holistic algorithm for the probe image 27 WREC and REC in comparison with various face recognition algorithms on the Yale Database of 2, 5 and 8 training datasets with 11x11 and 16 x 16 divisions 32 WREC and REC in comparison with various face recognition algorithms on the ORL Database of 2, 5 and 8 training datasets with 11 x 11 and 16 x 16 divisions 37 WREC and REC in comparison with various face recognition algorithms on the PIE Database of 2, 5 and 8 training datasets with 11 x 11 and 16 x 16 divisions 37 4.1 4.2 4.3 vi List of Figures 2.1 Face Recognition Processing Flow 6 2.2 Pre-Processing 6 3.1 Regional Scheme 22 3.2 Weighting Evaluation 24 3.3 Splitting Iteration 25 3.4 First layer Voting 26 3.5 Voting Model 28 4.1 Yale Face Database 29 4.2 ORL Face Database 30 4.3 PIE Face Database 31 4.4 WREC on Yale database (2 training dataset) with divisions from 7 up to 20 33 WREC on Yale database (5 training dataset) with divisions from 7 up to 20 33 WREC on Yale database (8 training dataset) with divisions from 7 up to 20 34 WREC on ORL database (2 training dataset) with divisions from 7 up to 20 34 WREC on ORL database (5 training dataset) with divisions from 7 up to 20 35 WREC on ORL database (8 training dataset) with divisions from 7 up to 20 35 WREC on PIE database (5 training dataset) with divisions from 7 up to 16 36 WREC on PIE database (30. 80 and 130 training datasets) with divisions from 7 up to 16 36 4.5 4.6 4.7 4.8 4.9 4.10 4.11 vii 5.1 5.2 5.3 5.4 •5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 6.1 WREC compared to various individual holistic algorithms in different sized regions on Yale 2 training dataset 39 WREC compared to various individual holistic algorithms in different sized regions on Yale 5 training dataset 39 WREC compared to various individual holistic algorithms in different sized regions on Yale 8 training dataset 40 WREC compared to various individual holistic algorithms in different sized regions on ORL 2 training dataset 40 WREC compared to various individual holistic algorithms in different sized regions on ORL 5 training dataset 41 WREC compared to various individual holistic algorithms in different sized regions on ORL 8 training dataset 41 W 7 REC compared to various individual holistic algorithms in different sized regions on PIE 5 training dataset 42 WREC compared to various individual holistic algorithms in different sized regions on PIE 30 training dataset 42 WREC compared to various individual holistic algorithms in different sized regions on PIE 80 training dataset 43 W r REC Compared to various individual holistic algorithms in different sized regions on PIE 130 training Dataset 43 Weighting Distribution by Different Classifiers on 16 x 16 divi­ sion of Yale Data Set(2 training dataset) 46 Weighting Distribution by Different Classifiers on 16 x 16 divi­ sion of Yale Data Set(8 training dataset) 47 Example of regional divisions on gallery images 49 viii Abstract Face recognition is heavily studied for its wide range of application in areas such as information security, law enforcement, surveillance of the environment, entertain­ ment, smart cards, etc. Competing techniques have been proposed in computer vision conferences and journals, no algorithm has emerged as superior in all cases over the last decade. In this work, we developed a framework which can embed all available algorithms and achieve better results in all cases over the algorithms that we have embedded, without great sacrifice in time complexity. We build on the success of a recently raised concept - Regional Voting. The new system adds weights to different regions of the human face. Different methods of cooperation among algorithms are also proposed. Extensive experiments, carried out on benchmark face databases, show the proposed system's joint contribution from multiple algorithms is faster and more accurate than Regional Voting in every case. IX Acknowledgements First of all. I would like to thank my friends for their putting up with my impatience working on this thesis even as I interrupted their work in a consistent manner without strong reason. I would like to thank Heinrich Butow. our computer system support person. I would also like to thank Dr. Youqin Wang, our HPC Senior Lab Instructor. Extensive experiments were carried out in an optimized way with the her help. I would like to thank my family, for all of the encouraging (either spiritual or financial) support. Thanks to my humble and open minded parents. I gradually became both spiritually and financially independent. Most importantly, I would like to thank my supervisor Dr. Liang Chen for his guidance on my behalf, in both life and study. Lastly, I would like to thank professors of the department and my committee members: Dr. Desanka Polajnar and Dr. Jueyi Sui for their encouragement and guidance in this and everything else. x Chapter 1 Introduction Biometrics is tagged as one of the 'top ten emerging technologies that will change the world' in 2001 by the MIT Technology Review [Heyer. 2008], With potential to stamp out forgery and theft possible with other methods, it is a good fit for security authentication problems. The face stands out as a biometric compared to several other popular physiological or behavioral characteristics for the purpose of distinguishing between individuals: 1. The fingerprint can be defected by chemical contaminations; 2. The iris is not workable with diabetic victims; 3. The retina is not convenient during data collection for operational biometric systems; 4. The voice changes when a person's throat is infected and is sensitive to noise. As a biometric, face recognition uses faces to identify a person. The history of face recognition, began with excessive optimism followed by scepticism. After a continuous effort to overcome the exposed limitations, face recognition has become 'one of the most successful applications of image analysis and understanding [Zhao et al., 2003]." 1.1 Motivation Face recognition is broadly defined as assigning identity to one or more input face images with a registered identity in the database [Ijiri and Sakuragi. 2006]. Its usage is generally divided into: 1) identification and 2) verification. Identification is a "one-to-many" search to answer questions like "Who is he? " and is more common in surveillance. It is a closed set problem assuming that the input face image corresponds to an identity stored in the system. It is also referred to as a forced-choice experiment in the psychology literature [Moon and Phillips. 2001]. Verification is a "one-to-one" 1 2 search to answer question like "Is he Eric?" Given a probe image and a claimed identity, a decision, either "accept" or "reject" has to be made. This is more frequently used for security access. The latter is an open set problem as there may be no corresponding identities stored in the system. This research deals with the "one-tomany"' problem, but the system is believed to perform equally well on the "one-to-one" problem as well. A face recognition system automatically identifies a human face from database images. It is challenging as it needs to account for all possible appearance variations caused by change in illumination, facial features, occlusions, etc [P. Latha and Annadurai, 2009]. It has drawn a huge surge of attention due to its commercial potential and cultural significance. The need for face recognition can be found in smart envi­ ronment [Pentland and Choudhury, 2000], entertainment, smart cards, information security [Ijiri and Sakuragi, 2006], law enforcement and surveillance systems [Zhao et al., 2003]. Over the past semi-century, many approaches have been developed. All ap­ proaches require comparison of an input face image with face images labelled with known identities stored in a database to claim a match. There are two types of so­ lutions for the comparison problem: the first is a step-by-step based decision-making process, and the other is a 'learning mechanism' based decision-making process such as a neural network. A "step-by-step' system carries out executions previously de­ signed and set. It does not interact with an environment or get modified based on the result accumulated up to that point. A learning mechanism is a more dynamic system, presumably using what it. has learned. It identifies the information needed for its ongoing performance. Most face recognition systems belong to the former category. "'A step-by-step" system is adopted in this thesis. 1.2 Major Contribution In this thesis. I studied strategies for adopting a two-layer voting scheme for face recognition. A quick summary of the major contribution of this thesis is as follows: 1. Based on the regular face recognition procedure. I constructed a new face recog­ nition system adopting a two-layer voting scheme. Five top face recognition algorithms over the last decade have been employed. They are: 3 Prominent traditional approaches include Principle Component Analysis (PCA) and Fisherface which have profound effects. Newly developed algorithms published in leading journals and conferences: Spec­ tral Regression Dimension Analysis (SRDA) [Cai Deng and Jiawei. 2008]. Spa­ tially Smooth Version of Linear Discriminant Analysis (S-LDA) [Cai D. and Huang, 2007] and Spatially Smooth Version of Locality Preserving Projection (S-LPP) [He Xiaofei and Hongjiang, 2005]. 2. The system is not specific for embedding the algorithms mentioned above, but also available for some other-class approaches, local approaches such as Local Binary Patterns (LBP) [Tirno Ahonen and Pietikainen, 2004] for instance. 3. In the newly proposed system, weight takes a key role. I proposed three methods for generating weights. 4. Extensive experiments, carried out on benchmark face databases, show the pro­ posed system is faster and holds a lead in every case over an already proven system-Regional Voting which has been shown to be very stable in the face of a noisy environment. In a lot of cases, the weighting algorithms and ensemble among different embedded approaches proposed here reduce the error recogni­ tion rate by more than half. The same promising results on experiments of datasets with small number of images per person in the gallery images deserves emphasis as it belong to a especially sticky problem in the face recognition area: the SSS (small-sample-size) problem which will be further addressed in Chapter 3. We have only just scratched the surface with this introduction for the time being. The thesis is broken into "chunks" designed to fill different, needs. The following does not cover anything in depth, but instead gives a high-level overview of how the thesis is structured. The rest of the thesis is organized as follows: Chapter 2 traces the pertinent literature on face recognition. Chapter 3 elaborates the design of the proposed system. Chapter 4 demonstrates the performance of the system by results from the experiments followed by a result analysis in chapter 5. Chapter 6 summarizes the system presented and provides suggestions for future work. Chapter 2 Literature Review To distinguish a person from hundreds of others by just a mere glance despite varia­ tions in viewpoint, lighting, emotional expression and hairstyle, face recognition is one of the most amazing features of the visual system [McKone E. and N., 2009]. Previous efforts on prosopagnosia (a symptom that people fail to recognize human faces) have unveiled some properties of how human brain mechanisms function [Tranel and Damasio, 1985]. In 1991, against the conventional opinion for over the past 30 years that face recognition develops very slowly throughout infancy, childhood and adolescence [McKone E. and N., 2009], a two-process theory of infant face recognition based on experiments on infants supported the conclusion that infants are born being aware of information and structure of faces. It further brought forth two terminologies: CONSPEC and CONLEARN to reveal how infants foster their ability in face recog­ nition gradually [Morton and Johnson, 1991]. CONSPEC guides the preference for facelike patterns for newborn infants. CONLEARN is responsible for learning visual characteristic. A review of the effect of inversion upon face recognition claimed that face recognition is 'special" [Valentine, 1988]. This phenomenen was further validated in 1996 by evidence from neuropsychology [Farah, 1996]. In 1997, a cross-species study on face recognition in primates: monkey and adult humans, showed that both species have novel preference for their own species [Pascalis and Bachevalier, 1998]. In 2001, a meta-analytic review claimed that own-race faces are better remembered compared to other faces [Meissner and Brigham. 2001]. Over the last two decades in neuroethology. cognitive and neural mechanisms have been looked into with focus on their potential roles in enabling humans to recognize faces [McKone E. and N., 2009]. Similar to the behavior of a human brain that associates memory to distinguish between faces, face recognition tries to retrieve from a gallery of images labeled with known identities and assign an identity to a given probe image. It is easy to describe but hard to implement. Even though humans have always had the innate ability to 4 o recognize faces, automatic face recognition systems are still in an early stage either due to the lack of awareness of how human cognition works, or the infeasibilitv at computationally modeling billions of neurons. So. different, approaches have been proposed for the face recognition problem. The rest of this chapter is divided into seven sections. The first section starts with a review of the procedures of a face recog­ nition system, followed by an overview of solutions in the second section. Then, the most popular solution, the holistic approach, is elaborated, illustrating five impor­ tant algorithms within this family. The last section is a review of a recently proposed technique which has achieved great success: regional voting. 2.1 Procedure of Face Recognition Systems The procedure can be broken into six segments, to be handled regardless of the specific method used [Zhao et al., 2003]: 1. : Capture image 2. : Face location (detection) in image 3. : Face image pre-processing 4. : Feature extraction 5. : Template comparison 6. : Match declaration Figure 2.1 shows a more detailed processing flow of a face recognition system. First, the image is assumed as a structured collection of pixels. The acquisition can be accomplished either by scanning existing photographs or shooting live pictures of a person with a camera. Usually several samples have to be taken for a bigger possibility to be matched. Video is also considered as a source, as it consists of a sequence of still images. Detecting the face in an image alone is a hard task. Once the face images have been targeted, pre-processing starts the refinements on the images. Figure 2.2 shows the detailed steps in this segment. Pre-Processing includes the following four steps as shown in order: (1) finding the location of the pupil. (2) rotating to have pupils aligned. (3) scaled and (4) 6 Captured normalized/aligned Face Detection Pre-processing face images Normalization Feature Vectors (Templates) Feature Extraction Tracking Templates normalized/aligned Comparison Pre-processing Face Database face images Normalization Training ( Classifier) 5 n =r O co 3 p Figure 2.1: Face Recognition Processing Flow cropped. The "original image"' comes from the Olivetti Research Laboratory (ORL) database 1 . The following three images having pupils darkened show the intermediate stages during the process. At last, after cropping, images finally become the objects which are going to be classified by face recognition systems. To stay focused on the proposed system, we assume that the four steps have already taken place. For an overview of each step, please see Gonzalez et al [Gonzalez et al., 2009]. The algorithms on how Pre-processing plays around the eyes can be found in P. Wang et al [P. Wang and Wayman. 2005]. (a) Original (b) Pupil Image eating Lo- (c) Rotating (d) Scaling (e) Cropping Figure 2.2: Pre-Processing After pre-processing, images are cropped to a smaller size with pupils aligned. A face image becomes an h x iu matrix of pixel values having each of them standing for a colour value (h and w refers to the height and width of the image respectively). The *AT i is the sample vectors g of around its mean /j, t : - th)T\u = Si = Sb = Wj] g) ' (Pi ~ g)' (2.7) (2.8) v tr where n and are defined the same as in equation 2.7. and g is the overall mean for gallery data. The projection matrix P we are looking for this time is oriented to satisfy the following equation: IPT S PI P = arg max ^ p T ^ = \ p m • • • Pm] (2-9) where { p t \ i = 1,2. • • • . m} are generalized eigenvectors corresponding to m largest eigenvalues {A,|> = 1. 2. • • • . rn} of S b and SwSbPI = XiSwPi- i = 1. 2. • • • , m (2.10) where m is upper bounded to v -- 1. for the number of nonzero generalized eigen­ values is at most f — 1 [Duda and Hart. 1973]. 13 This works pretty well for general finding projection problems. While in the face recognition problem, the matrix Sw is probably singular for the number of classes to be classified, n is far less than the number of dimensions, hw. This threatens the denominator in equation 2.9. {P 1 SwP\ is exposed to the risk of becoming zero as no such projection P can be found. Belhumeur et al [Belhumeur. P and D, 1997] overcame the complication of a singular Sw by a method called Fisherface. PCA is first used to reduce the dimension of the feature space to n — y. where an alternative criterion is proposed to 2.9: Pit = PjldPpca (2-11) then Fisher's Linear Discriminant, defined in 2.9, is used to reduce the dimension to 0—1. P p c a = argmax {Pp C a C G P p c a } (2.12) where CG is computed by equation 2.3. \PfldPpca^nPpc - 1 ] 1 1 Given D } where 1 < j < 2. a discrete approximation for two-dimensional Lapla­ cian L is the N x K matrix: A = I)\ /. - /i I) , (2.26) where Ij is the iij x rij identity matrix for j = 1.2 and X is the Kronecker product [Horn and Johnson. 1991]. For a n x x n 2 dimensional vector 7. to check its smoothness on the i) 1 x ??2 lattice, we can check that || A • is proportional to the sum of the 19 squared differences between nearby grid points of 7 with its matrix form. Given a pre-defined graph structure with weight matrix W\ the SSSL approach is oriented for maximizing the following: '' G U ' G ^ (1 — a ) y T G D G T ~ / + Q E( 7 ) : (2.27) where 0 < a < 1 controls the smoothness of the estimator and E is the discretized Laplacian regularization function: E( 7 ) = |l A- 7 || 2 = 7 r A r A 7 (2.28) The 7(>l,t which maximizes the objective function 2.29 can be derived bv the max­ imum eigenvalue solutions to the following general eigenvalue problem. G W G 1 7 = A((l - q) G D G r + a A t A)7 (2.29) With the options of different W as described in subsection 2.6.1, spatially smooth version of LDA and LPP are hence derived. He et al [He Xiaofei and Hongjiang, 2005] showed a dramatic improvement of S-LPP over PCA and LDA for face recognition on the the Carnegie Melon University Pose. Illumination, and Expression database (CMU PIE) 3 and Yale 4 databases. S-LDA and S-LPP developed methods based on the SSSL model outperform the ordinary subspace learning algorithms and their tensor extensions. Zheng et al demonstrate SLPP showing a significant improvement over standard LPP [Z Zheng and Yang, 2007]. 2.7 Regional Voting Large variations in viewpoints, illumination or facial expressions always lead to a highly nonconvex and complex distribution of face images [Biehsel and Pentland. 1994]. Thus their success is limited to their linear nature [Juwei Lu and Li, 2006]. Either nonlinear models or a mixture of locally linear models could handle the nonconvex problem [Juwei Lu and Li. 2006]. Regional Voting, a framework proposed by Chen and Tokuda[Chen and Tokuda, 2010], is one solution for the latter case. It em­ beds all holistic algorithms. The image is broken into non-overlapping equally sized 3 CMU PIE database; http://www.ri.cmu.edu/research_project_detail.html7project. id=418&menu_ id=261. 4 Yale Database http://cvc.yale.edu/projects/yalefaces/yalefaces.html. 20 windows. Holistic classification and national voting are applied inside each window (region). After all the windows are enumerated, record the result for each region. The majority winning windows has the last word. This method is a complementary to holistic approaches which lack the knowledge of the spatial structure of the face and is more accommodating to ; noise\ Chen and Tokuda reconstruct face recognition as a 'stability' problem [Chen and Tokuda, 2010]. Under this concept, the probe image has undergone transformations caused by the environment: 'being obliterated 7 for instance. From this point of view, this system will still work as a whole even if some regions have been occluded. A deeper analysis of regional voting over national voting can be seen in Chen and Tokuda [Chen and Tokuda, 2005, 2003]. While since each window could work as a classifier, it naturally raises questions on the topic of classifier combination [Kittler and Matas, 1998]. Combining classifiers works best when they are different [Ali and Pazzani, 1995]. For example, two popular ensembles of classifiers employ majority voting, based on labels and label ranking respectively [Bagui and Pal, 1995], [T.K. Ho and Srihari, 1994], The hope, when multiple algorithms are embedded into one framework, is they will complement one other and contribute jointly to the final decision making. Chapter 3 Proposed Approach Holistic face recognition approaches based on statistical learning, such as the ones based on LDA. often suffer from the SSS (small-sample-size) problem, where the dimensionality of the sample images far exceeds the number of training sample images available for each subject [Raudys and Jain, 1991], [Juwei Lu and Li, 2006]. Building on the success of Regional Voting, we present a system called Weighted Regional Voting Based Ensemble of Multiple Classifiers (WREC) for face recognition. This idea exploits the fact that face regions are of different significance when recognizing a face. This concept can be traced back as early as 1970's in "'computer recognition of human faces" [Kanade, 1977]. Recent exploitations of the weight distribution can refer to "A Weighted Voting and Sequential Combination of Classifiers Scheme for Human Face Recognition [Xiaoyan Mu and Watta, 2005]", Local Binary Patterns by Timo Ahonen and others [Timo Ahonen and Pietikainen, 2004]. An automatic weighting evaluation is implemented in this thesis so that a more robust system having a larger capacity of bearing the variance of the face images is constructed. It is independent of human knowledge of the underlying structure of the face. For instance, images with exaggerated facial expressions in which symmetry has been violated is still tolerated by the system. The ensemble of multiple face recognition algorithms is motivated by the fact that different algorithms address different obstacles in face recognition. Though new algorithms have been added to the face recognition literature, none of them is able to integrate all the advantages into one. WREC provides an interface for different algorithms bringing their good attributes into full play. It has the following key features: 1. Images are partitioned into non-overlapping regions: 2. For each region, multiple holistic algorithms are employed: 3. A weight is associated with each holistic algorithm for each region: 21 22 4. The decision is made based on layers of voting schemes (Details will be illus­ trated in Subsection 3.2.) 3.1 Weighting Scheme First of all, we partition each image into a number of equally sized non-overlapping regions in a consistent manner. In order to better illustrate this, we make the following assumptions and formalize a set of corresponding denotations. Assuming we partition the image into I x m regions and name the region R(p,q) referring to the observation in row p and column q where 1 < p < I and 1 < q < m . The regional scheme is shown in Figure 3.1. Regional Scheme 1 111 1 2 •• • • • • •• • l •• • 1> Figure 3.1: Regional Scheme Assuming that, in our gallery, there are N subjects S i . S ? . - - - . S y . Each sub­ ject. S t . h a s K images G n . G i 2 . • • • . G x K • T h e r e is a set of holistic algorithms H = 23 { h i . h'2- • • • . h t } . For each holistic algorithm h 6 H . on each region, we use a gener­ ated "leaving one out" strategy to test the effectiveness of the holistic algorithm on that region and take it as the weighting value of that region for algorithm h. For each j, 1 < j < K, we select ; • • • - G NJ as the testing set and take the remaining images in the gallery as the training set. By doing so. for each j , we find correctly rec­ ognized images for each region by that algorithm: riyht(r\hj))- The weighting value on region r for holistic algorithm h w^, r ) * s calculated in an accumulated manner. It is formally defined in equation 3.1: ^2?right{r{hJ)) W { h 'r ) = ~ YVN The weighting evaluation procedure on a region is shown in Figure 3.2. Thus. u\h.r) stand for the average recognition accuracy on region r by holistic algorithm h and 0 < it\h,r) < 1- Besides regional scheme, there are three key segments: "leaving one out" strategy, lower dimensional subspace and regional weights generator. Lower dimensional subspace includes a series of dimension reduction techniques, like holistic algorithms: PCA, Fisherface, SRDA, S-LDA and S-LPP in this case. The regional weights generator compares the subspace regional feature vectors by Euclidean dis­ tance and selects the closest one as the classification sticking to the nearest neighbour classifier. After calculating the regional weighting of holistic algorithm h according to equation 3.1. it implements one of the equations among equations 3.2. 3.3 and 3.4. During this stage, each region is an independent classifier. The "'leaving one out'' strategy is shown in Figure 3.3. For each region, we use all the gallery images for training. The ''leaving one out" strategy is used to divide the regional gallery images into subTrain and subTest sets. It includes K iterations of splitting 0 . To describe the "'leaving one out" strategy, assume a k training dataset 6 with n different people in total. Wo denote the gallery images as: the first image of subject1. subject2, subjectn t h e second image of subject 1 , subject2. • • • subjectn the image of subject 1. subject2, • • • subjectn kth "splitting here refers to the division of gallery images into subTrain and subTest sets, not the split mentioned in the Chapter 4: Experiments, which refers to a component in the database. 6 k refers to the number of images per subject for training (gallery). 24 —SubTesi Kceionai \ ettors Leaving One Out Regional Galierv Regional linages Scheme Strategy —SubTrain Regional Veciors Vectors Gallery Regional Training Regional Lower Dimensional Subspaee Subspaee Vector Vectors Subspaee (Holistic Algorithms) Subspaee regional feature Database of Gallery vectors Regional Subspaee Vectors Regional Weighting Generator Regional Weights Database of Regional Weights Figure 3.2: Weighting Evaluation Each time, each row is "left in" and selected as the probe, all the rest are "left out"' to become training images. Then during all rounds of splitting, we project the regional subTrain and subTest images into a subspaee and find the recognition accuracies. Figure 3.3 shows splitting during one iteration. From the picture, we can see that, each time, the images of the same color are selected as the test images, keeping the rest as the training images. In this thesis, five holistic algorithms are embedded in the framework and thus t = 5. hi, h,2i h 3 . hi and in turn correspond to algorithms: PCA. Fisherface. SRDA. S-LDA and S-LPP. With their weighting distributions over all regions of a face image, three different schemes are adopted in this thesis for the final weight to be used during the test stage after accepting probe images. We call them: '"One Applies One", ""One Applies All' and ""Joint Weight" respectively. The following equations show the difference among the above three weighting schemes. In all cases, stands for the final weight which is going to be assigned to the region r for holistic 25 • Region(p.q) Region(p,q) Region(p.q) for person k for person 1 for person 2 Figure 3.3: Splitting Iteration algorithm h. 3.1.1 One Applies One ^(/i,r) (3.2) Wp(h,r) = w{h5.r) (3-3) ^'F(h.r) 3.1.2 One Applies All Here, during the weighting evaluation on the training set. only the effectiveness of S-LPP algorithm is tested. During the test stage having all probe images included, all algorithms (including S-LPP) use the weighting evaluated by S-LPP: 3.1.3 Joint Weight U'F(h.r) = ^2'lu\hal,r) (3.4) al where u \ h „ , , r ) re f er to the weighting assigned to region r evaluated bv algorithm h a i GH. 3.2 Classifier Set Up and Recognition This section involves two layers of voting/scoring models. The first layer is within the probe image. As we have a database of gallery regional vectors and a database of 26 regional weights, for each holistic algorithm h € H . we implement a regional weighting based classification of any probe image during the test stage. Euclidean distance is used during the matching procedure for comparison purpose and the smallest distance value identifies the subject ID of the probe image. Figure 3.4 describes this process. Test Image Regional Scheme Database of Gallery Regional Subspaee Vectors Gallery Database Regional Subspaee of Regional Vectors Weights Matching Regional Weights Similarity Value Voting/Scoring Model (Classifier Setting up) Figure 3.4: First layer Voting The weighted scoring model is shown in Figure 3.5. To avoid non-meaningful comparisons, shifting is implemented. For even after pre-processing, pupils, mouth, chin, forehead etc. may still locate in different regions with respect to different face images. Thus, we perturb each gallery region up to two pixels in four directions (north, south, west and east) to compensate for misalignment issues. All regional gallery images in all (25 in total) nearby positions are compared with the regional probe image using a nearest neighbour classifier and the results are stored. By selecting the closest one as the identity of the holistic algorithm, h. on that region, we get the classification on region r: h r (p r ). Instead of one vote a region cast for an identity i £ I. u'F(h.r) is used as the "number" of votes identities get from each region. What the voting machine does is sum up the number of votes each identity gets. The one that gets the "biggest number" of votes is taken as the subject ID for probe image. 27 p, by that algorithm (h). At this point, for each region r , for each holistic algorithm h . by using all gallery images, we obtain a classifier h T (p T ), where p is a probe image and p e P. The clas­ sifier set up is shown in Table 3.1. Table 3.2 returns the corresponding classification for each classifier: h r (p r ). The identity that wins the biggest "number" of votes is the final classification of holistic algorithm, h, on that probe image, p. By now, the first layer of voting for classification is set up. Table 3.1: Classifier Set Up hrpr hrpr hrpr hr p r (1,1) (1:2) (l,m) r r r r r r h p h p h p hrpr (2,1) (2,2) (2,m) hTpr (U) hrpr M) hrpr hrpr l,m (. ) Table 3.2: Classification on each region by one holistic algorithm for the probe image 12 k 13 i l 12 12 In ijd n in Table 3.2 is the total number of different identities to be classified and 1 < id < n. Thus, for each holistic algorithm h G H. we get its classification for each region. The second layer of voting is among the different holistic algorithms. Each algo­ rithm casts one vote to its identity and the final decision is made by a "winner takes all" strategy. That is. the identity voted by the majority of holistic algorithms is taken as the final result. After this round of voting, the final decision is made. Testing Image with Gallery Image with partitions into partitions into lxm regions lxm regions •ST 75f" R(p,q) of Test Image R(p. Gallery In tage ii with all its nei§ ht onn^rcgiofi • Calculate the similarity Between Testing Region and Gallery Regions(plus all neighboring regions by 2 shifts in four directions) Nearest Neighbor Database classifier of Regional Weights Closest Identity (most similar) Regional Weights weight lor R(p.q) votes for different identities Voting Machine Recognized Figure 3.5: Voting Model Chapter 4 Experiments In order to validate the WREC approach, three benchmark databases were used. They decrease the technical difficulties with face recognition by: • control of the environment such as background, lighting, camera angle and so on: • control of the subject's pose; • getting cooperation from subjects: having diverse images variant in expressions and accessories, wearing or not wearing glasses for instance. The Yale database from Yale University contains 11 grayscale GIF images variant in facial expression and configuration - centre-light, with glasses, happy, left-light, with no glasses, normal, right-light, sad, sleepy, surprised, and wink 7 - of each of 15 individuals in total. Figure 4.1 is a glimpse of the Yale database 8 . Figure 4.1: Yale Face Database The ORL database comes from Cambridge AT &; T Laboratory, formerly Olivetti Research Laboratory. It contains ten different images of each of 40 distinct subjects. The images were taken at different times, varj'ing the lighting, with different facial expressions (open / closed eves, smiling / not smiling) and with different facial details (glasses / without glasses) 9 . Figure 4.2 comes from the ORL Database 10 . ' See footnote 4 8 See footnote 4 9 See footnote 1 1 0 See footnote 1 29 Figure 4.2: ORL Face Database The Carnegie Melon University Pose, Illumination, and Expression database (CMU PIE) from Carnegie Melon University contains 41,368 images of 68 people. For each person, there are 13 different poses, 43 different illumination conditions and four different facial expressions 11 . Figure 4.3 is a glimpse of the PIE database 12 . Following the custom of researchers in this field, the faces for all three databases were simply manually aligned by pupils and cropped to 64 x 64 pixels with 256 gray levels per pixel. Here, I w r ant to clarify that the alignment with respect to pupils as opposed to other features of the face, such as the nose or lip corners, is for the "state of art comparison". All the experiments are carried out exactly on the same data. As a matter of fact, though eye positions and inter-ocular distance are quite commonly used [P. Wang and Wayman, 2005], it does not mean that all pre­ processing approaches have to wrap around the pupils. For instance, the Texas 3D Face Recognition Database (Texas 3DFRD) 13 has all faces normalized to the frontal position and the tip of the nose positioned at the center of the image 14 . It is difficult to standardize the face images with exactly the same techniques, thus it is hard to reproduce the face recognition algorithms of others to achieve the same performance. Therefore, it is fair to use published standardized data so that the comparison among the face recognition approaches is valid. To exclude any possible bias, including the pupil locating, rotating, scaling and cropping approach used for "standardization" during the Pre-Processing stage above mentioned, the UIUC versions of the ORL. Yale and PIE face databases are used. UIUC's version of the PIE database uses the near frontal poses (COS. C07. C09. C27, C29) which leaves us 11,554 face images. Each subject has 170 images, except subject 38 who has only 164 images. These 1 1 See footnote 3 1 2 See footnote 3 1 3 http://live.ece.utexas.edu/research/texas3dfr/ 1 4 The description of the Texas 3DFRD could be found here: databases/ http://www.face-rec.org/ 31 pre-processed, scaled and rotated images were provided by Deng Cai 10 . Neutral Smile Blinking Figure 4.3: PIE Face Database Two baseline approaches (PCA and Fisherface) and three newly developed ap­ proaches (SRDA, S-LPP and S-LDA) are compared. For the Eigenface approach, the eigenvector corresponding to the largest eigenvalue was removed accounting for noise caused by illumination, a — 0.01 was selected for regularization in SRDA and S-LPP approaches. For SLPP, cosine similarity was used to calculate the distances in the adjacency matrix. All gallery images were perturbed up to two pixels in each direction to make up for misalignment. All vectors(representation of images) were normalized to the length of 1. Probe images were linearly reduced and then classified according to nearest neighbour classification. In the first set of experiments, images were divided d ( d = 11,16) times hori­ zontally and vertically because these numbers yielded the best results with regional voting on the respective databases. And in this set of experiments, only the first weighting scheme was implemented. More specifically, in this set of experiments, WREC was using the "One Applies One"' weighting scheme. The results for 2. 5 and 8 training datasets 16 on ORL and Yale databases are given in Tables 4.1 and 4.2 respectively. The results on the PIE Database for 5. 30. 80 and 130 training datasets are given in Table 4.3. In addition, a simple version of WREC named REC was also carried out in comparison. REC. as seen from the name, is a framework of WREC without weighting. 1 5 All data and holistic algorithms were taken from http://www.zjucadcg.cn/dengcai/Data/ FaceData.html. 1 6 2. 5 and 8 refer to the number of images per subject for training (gallery). The rest data sets mentioned later in this thesis follow the same custom. 32 Table 4.1: WREC and REC in comparison with various face recognition algorithms ; Database o ' 2, 5 and 8 training datasets with l x l l a n d 1 6 x 1 6 d i v i s i o n s 5 Train 2 Train 8 Train Alg. Error Rate±Std Error RateiStd Error Rate±Std PCA 44.40% ±5.15% 33.84% ± 3.38% 30.93% ± 5.67% Fisheface 43.02% ± 4.67% 10.73% ±3.16% 7.24% ± 3.41% 30.71% ± 4.69% 11.38% ±2.96% 6.80% ± 4.06% SRDA S-LDA 29.90% ± 5.09% 13.58% ±3.11% 8.98% ±4.17% S-LPP 32.50% ± 4.40% 12.93% ± 3.60% 8.22% ± 4.38% REC(ll) 13.99% ± 2.54% 5.33% ± 2.45% 2.90% ± 2.24% WREC(ll) 11.16% ±2.37% 4.26% ±2.10% 2.18% ± 2.02% REC(16) 13.57% ± 2.79% 5.29% ± 2.44% 2.51% ± 2.01% WREC (16) 10.26% ± 2.76% 4.10% ± 1.95% 1.91% ± 2.01% In the second set of experiments, conducted on the Yale and ORL datasets, WREC were carried out with different number of regions, from 7 x 7 to 20 x 20. That is, each image was first divided 7 times vertically, and 7 times horizontally, and then 8 times each direction and so on up to 20 divisions vertically and 20 horizontally. This time, all three weighting schemas were implemented in this set of experiments. The results for the Yale database on 2, 5 and 8 training datasets are given in Figures 4.4. 4.5 and 4.6 respectively. And the results for the ORL database on 2, 5 and 8 training dataset are given in Figures 4.7, 4.8 and 4.9 respectively. On the PIE database, due to time constraints as well as the possible distribution of the best results, the experiments were carried out on divisions from 7 up to 16 with only one weighting scheme: One Applies One. As the recognition accuracy for the 5 Train dataset differs a lot from the rest of the datasets. we put it aside in a separate figure to have a better representation of the result . The result is shown in Figure 4.10 and 4.11. 33 •One Applies One Joint Weight • One Applies All 10 12 14 16 Divisions 18 20 Figure 4.4: WREC on Yale database (2 training dataset) with divisions from 7 up to 20 • One Applies One Joint Weight - One Applies All 6 8 10 12 14 16 Divisions 18 20 Figure 4.5: WREC on Yale database (5 training dataset) with divisions from 7 up to 20 34 One Applies One Joint Weight One Applies All 6 8 10 12 14 16 Divisions 18 20 Figure 4.6: WREC on Yale database (8 training dataset) with divisions from 7 up to 20 •One Applies One Joint Weight One Applies All 6 8 10 12 14 16 Divisions 18 20 Figure 4.7: WREC on ORL database (2 training dataset) with divisions from 7 up to 20 35 •One Applies One Joint Weight One Applies All 10 12 14 16 Divisions 18 20 Figure 4.8: WREC on ORL database (5 training dataset) with divisions from 7 up to 20 - One Applies One Joint Weight One Applies All 6 8 10 12 14 16 Divisions 18 20 Figure 4.9: WREC on ORL database (8 training dataset) with divisions from 7 up to 20 36 STrain 10 12 Divisions 14 Figure 4.10: WREC on PIE database (5 training dataset) with divisions from 7 up to 16 30Train SOTrain 130Train 10 12 Divisions 14 16 Figure 4.11: WREC on PIE database (30. 80 and 130 training datasets) with divisions from 7 up to 16 37 Table 4.2: WREC and REC in comparison with various face recognition algorithms on the ORL Database of 2. 5 and 8 training datasets with 11x11 and 16 x 16 divisions 2 Train 5 Train 8 Train Error Rate±Std Error Rate±Std Error Rate±Std Alg. 29.29% ±3.15% 11.48% ± 2.26% 6.05% ± 2.27% PCA Fisheface 22.28% ± 2.82% 3.45% ± 1.30% 1.65% ± 1.18% SRDA 18.19% ±2.81% 3.44% ± 1.19% 1.80% ± 1.50% S-LDA 16.45% ± 2.92% 2.47% ± 1.08% 0.83% ± 1.14% S-LPP 17.23% ± 3.04% 2.62% ± 1.20% 0.93% ± 0.99% REC(ll) 8.84% ± 2.05% 0.81% ± 0.85% 0.16% ±0.39% WREC(ll) 8.44% ± 1.90% 0.64% ±0.78% 0.14% ±0.38% REC(16) 8.97% ±2.14% 0.96% ± 0.75% 0.43% ± 0.58% WREC(16) 8.50% ±2.02% 0.87% ±0.76% 0.31% ± 0.52% Table 4.3: WREC and REC in comparison with various face recognition algorithms on the PIE Database of 2, 5 and 8 training datasets with 11x11 and 16 x 16 divisions 5 Train 30 Train 80 Train 130 Train Error Rate Error Rate Error Rate Error Rate Alg. PCA Fisheface SRDA S-LDA S-LPP REC(ll) WREC(ll) REC(16) \VREC(16) 70.23% 30.28% 28.32% 25.82% 27.71% 15.63% 13.97% 20.19% 17.71% 26.53% 12.04% 5.01% 3.50% 4.79% 1.01% 0.95% 1.79% 1.62% 7.18% 8.16% 3.12% 1.83% 2.58% 0.24% 0.18% 0.57% 0.49% 2.43% 5.71% 2.65% 1.58% 1.66% 0.13% 0.09% 0.31% 0.29% Chapter 5 Analysis The experiments demonstrate WREC's significant performanee advantages compared to several other leading approaches. The results shown in the tables achieve over­ whelmingly better recognition accuracy than all the other algorithms in all cases over all datasets. In a lot of cases, the error recognition rate drops more than half. Figures 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 4.10 and 4.11, all show the same pattern as Regional Voting [Chen and Tokuda, 2005] that accuracy goes up as the number of regions increases. After a certain point, the accuracy begins to drop for the regions become too small to distinguish from national voting. The above observations from the figures match precisely the theory of "Electoral College and Direct Popular vote". Besides, Figures 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9 and 5.10 show the transition point (the division where the recognition performance peaks) in WREC system appear earlier than Regional Voting. In this set of figures. WREC is represented by using the "One Applies One" weighting scheme for face recognition and accuracy does not differ much from the variant weighting versions for WREC. Even though, for each region, more algorithms are involved, we can achieve the best result within a shorter time. From the tables, we can see that the pure ensemble of multiple classifiers without weights also outperforms standard regional voting. Regarding the acceptable error rate for face recognition, we have to keep in mind that it is not governed by a formal theory. Different from that in physics or mathe­ matics. there is no absolute cutoff value (an error rate that can be applied to all other applications). Face recognition, although it has been an active research topic for more than 20 years, is still not mature enough to have a generally acceptable standard. The statistical success and failures are application-dependent. [R. Chellappa and Sirohey. 1995], Figure 5.11 shows the weighting distribution for 16 x 16 divisions on the Yale dataset (2 Training Dataset) by different classifiers. 38 39 •- WREC PCA Fisherface SRDA S-LDA --m-- S-LPP m- 10 12 14 16 Divisions 18 20 Figure 5.1: WREC compared to various individual holistic algorithms in different sized regions on Yale 2 training dataset WREC PCA Fisherface SRDA S-LDA S-LPP & 10 12 14 16 Divisions 18 20 Figure 5.2: WREC compared to various individual holistic algorithms in different sized regions on Yale 5 training dataset 40 WREC PCA Fisher face SRDA S-LDA S-LPP 10 12 14 16 Divisions 18 20 Figure 5.3: WREC compared to various individual holistic algorithms in different sized regions on Yale 8 training dataset WREC PCA Fisherface SRDA S-LDA S-LPP 10 12 14 16 Divisions 18 20 Figure 5.4: WREC compared to various individual holistic algorithms in different sized regions on ORL 2 training dataset 41 WREC -m- PCA • Fisherface SRDA S-LDA S-LPP 6 8 10 12 14 16 Divisions 18 20 Figure 5.5: WREC compared to various individual holistic algorithms in different sized regions on ORL 5 training dataset - WREC PCA -Fisherface - SRDA • S-LDA S-LPP \ \ \ • \A 1 \ » V'VSA 6 8 10 12 14 16 Divisions 18 20 Figure 5.6: WREC compared to various individual holistic algorithms in different sized regions on ORL 8 training dataset 42 WREC PCA Fisherface SRDA • S-LDA S-LPP -• - 10 12 Divisions Figure 5.7: WREC compared to various individual holistic algorithms in different sized regions on PIE 5 training dataset WREC PCA • Fisherface SRDA -v- S-LDA S-LPP -9- 8 10 12 Divisions 14 16 Figure 5.8: WREC compared to various individual holistic algorithms in different sized regions on PIE 30 training dataset 43 WREC PCA Fisher face SRDA S-LDA S-LPP 8 10 12 Divisions Figure -5.9: WREC compared to various individual holistic algorithms in different sized regions on PIE 80 training dataset WREC PCA Fisher face SRDA S-LDA S-LPP 10 12 Divisions 14 Figure 5.10: WREC Compared to various individual holistic algorithms in different sized regions on PIE 130 training Dataset 44 Figure 5.12 shows the weighting distribution for 16 x 16 divisions on the Yale dataset (8 training dataset) this time. The more faces involved in calculating the weighting, the closer it should be able to derive the face features. The brighter the region is. the higher the weight associated with that region. The figures, either Figure 5.11 or Figure 5.12 which draw weighting distributions from a larger pool of samples, tell us that salient features like eyes, mouth and nose do not necessarily yield higher weights. This confirms the superiority of using an automatic weighting estimation technique. Different weighting schemes have been proposed [Xiaoyan Mu and Watta, 2005, Timo Ahonen and Pietikainen, 2004] and in this thesis, an automatic weighting system is adopted. Thus, unlike the weighting algorithm Timo Ahonen et al proposed [Timo Ahonen and Pietikainen, 2004], which produced a symmetric weight distribution on classifiers, it treats each region separately. Com­ bining classifiers mentioned in Chapter 2, should work better than having symmetric weights attached to the regions. Finally, regarding time complexity, obviously, it is dependent on that of the embed­ ded algorithms. Despite the fact of the disparity in complexities for different holistic algorithms, we denote the time cost for each holistic algorithm as r. Then the time complexity of Regional Voting is IT where t is the number of regions. Examining each dimension-reduction algorithm, the time complexity of r is made up of the matrix multiplication with time complexity of O(ihwn) where h and w are the height and width of the image and k is the number of dimensions of the reduced subspace. After that, a classification based on the measurement of the distances between probe image and each gallery image (including all its neighboring images by shifting) is performed. During this stage, the time complexity is O(thivpn) where p is the total number of shifts and n is the total number of different subjects to be classified. Assume we have t holistic algorithms embedded in total. So the total complexity is 0(tihw(K + pn)) in the test stage. Examining the time complexity of the training, it is much like the that of the test stage except multiplying by k. where k is the number of images per person in the gallery, since a "leaving one out strategy" is implemented. For each algorithm, after k iterations, ki,hw(h- + pn) operations (shifting is again applied in the training) are required. Of the three different weighting schemes we have implemented in this thesis. "One 45 Applies One". "One Applies All" and "Joint Weight" - for both "One Applies One" and "Joint Weight" schemes, all holistic algorithms embedded are used. Thus the time complexity is tkihw(K + pn). And lastly, the "One Applies All" weighting scheme shortens the training period roughly to 1/5 as long for only one holistic algorithm is used in the training stage instead of five. The complexity is down to kihw(n+ pn). To this scheme's credit is that face recognition accuracy still runs neck to neck compared with that of the other two schemes. 10 20 30 40 50 50 70 EC 10 ?0 30 40 50 60 70 00 (a) Weighting Distribution by PCA (b) Weighting Distribution by Fisherface (c) Weighting Distribution by SRDA (d) Weighting Distribution by S-LDA (e) Weighting Distribution by S-LPP Figure 5.11: Weighting Distribution by Different Classifiers on 16 x 16 division of Yale Data Set(2 training dataset) ?0 30 40 50 60 70 00 10 30 40 50 6C 7G 80 (a) Weighting Distribution by PCA (b) Weighting Distribution by Fisherface (c) Weighting Distribution by SRDA (d) Weighting Distribution by S-LDA (e) Weighting Distribution by S-LPP Figure 5.12: Weighting Distribution by Different Classifiers on 16 x 16 division of Yale Data Set(8 training dataset) Chapter 6 Summary Face recognition is an emerging technology. Though it is by no means perfect, there is no denying its tremendous potential. Compared to human facial surveillance, au­ tomatic systems have a longer attention span, can be exposed to larger amounts of information and work in the same manner reliably in all cases. Holistic approaches, based on the concept of reducing the high dimensionality of the raw face image are regarded as the most popular solution for the face recognition task. Regional voting is a new approach for the face recognition problem which divides the image into equally sized non-overlapping regions and treats each region as an independent classifier. Within each region, holistic algorithms are implemented and the classification result is recorded. After all, a vote over all regions is carried out and the classification with majority votes is selected as the final result. WREC builds on the success of Regional Voting in the following way: 1. The Electoral College framework is adopted into the proposed system success­ fully. 2. An automatic weighting calculation method independent of human knowledge is implemented. This enables a more robust system accommodating face im­ ages having a bigger range of variance: for example, face images having nonsymmetric face features violated by postures, face expressions or occlusions. 3. Three different schemes for weighting are proposed. 4. In the newly proposed system, embedded algorithms are not independent any­ more. Two layers of voting models are included in the system. o. Extensive experiments are carried out on benchmark face databases. The pro­ posed system shows superiority over several other leading algorithms as well as the already best in class results of regional voting. The same promising results 48 49 are also derived on experiments of datasets with small number of images per person in the gallery. WREC, as a solution to the face recognition problem, is validated by extensive experiments, outperforming any individual holistic approaches embedded. Even a REC (a regional based ensemble of multiple classifiers) system shows promising results on datasets with a smaller number of gallery images. Finally, there are some points regarding the design of this experiment. This may suggest avenues for future work. Firstly, during the splitting, after each round, we get two sets of images: SubTest and SubTrain. Figure 6.1 shows the division on part of the ORL 8 training dataset picking up the first image of a person as the SubTest images. This splitting is easy to implement, while it does not include all objects that can be used for matching. The first image of the person himself is excluded as there is no distance at all between one and himself, while the other people's first image are also excluded by this algorithm. Both images come from the ORL Database 17 . (b) SubTraining Images SubTest Im­ ages Figure 6.1: Example of regional divisions on gallery images So. in the future, we could include more information provided by the gallery images during weighting evaluation to test the effectiveness of an algorithm on each region. Also, we can embed a wider range of algorithms, like the ones under local branch, for example. And lastly, the system could be expanded for face verification. 17 See footnote 1. Bibliography Nasir U. Ahmed and K. Ramamohan Rao. Orthogonal Transforms for Digital Signal Processing. Springer-Verlag New York. Inc.. Secaucus. NJ. USA, 1975. K.M. Ali and M.J. Pazzani. On the Link Between Error Correlation and Error Re­ duction in Decision Tree Ensembles. Technical report. 95-38. ICS-UCI. 1995. G. H. Golub B. L. Buzbee and C. W. Nielson. On Direct methods for solving poisson's equations. SIAM Journal on Numerical Analysis, 7(4):627-656, 1970. S.C. Bagui and N.R. Pal. A Multistage Generalization of the Rank Nearest Neighbor Classification Rule. Pattern Recognition Letters, 16:601-614, 1995. Hespanha. J Belhumeur. P and Kriegman. D. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711-720, 1997. M. Bichsel and A.P. Pentland. Human face recognition and the face image set's topology. CVGIP: Image Understanding, 59:254-261, 1994. Deng Cai, Xiaofei He, and Jiawei Han. Semi-supervised Discriminant Analysis. In 2007 IEEE 11th International Conference on Computer Vision, pages 1-7. IEEE. 2007a. Deng Cai. Xiaofei He, and Jiawei Han. Spectral Regression for Efficient Regular­ ized Subspace Learning. IEEE 11th International Conference on Computer Vision (2007), L(05):l-8, 2007b. Y. Hu J. Han Cai D., X. He and T. Huang. Learning a spatially smooth subspace for face recognition. IEEE International Conference on Computer Vision Pattern Recognition, 0:1-7, 2007. He Xiaofei Cai Deng and Han Jiawei. SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis. IEEE Transactions on Knowledge and Data Engineering. 20( 1):1—12, 2008. Liang Chen and Naoyuki Tokuda. Robustness of regional matching scheme over global matching scheme. Artificial Intelligence. 144(l-2):213-232. 2003. Liang Chen and Naoyuki Tokuda. A general stability analysis on regional and national voting schemes against noise-why is an electoral college more stable than a direct popular election? Artificial Intelligence, 163(1):47-66. 2005. Liang Chen and Naoyuki Tokuda. A unified framework for improving the accuracy of all holistic face identification algorithms. Artif. Intell. Rev., 33(1-2):107-122. 2010. 50 51 Ravi Das. An Application of Biometric Technology: Facial Recognition: http://www.technologyexecutivesclub.com/Articles/security/ artBiometricsFacialRecognition.php, 2011. R. Duda and P. Hart. Pattern Classification and Scene Analysis. New York: Wiley. 1973. H. D. Ellis. Processes underlying face recognition: Introduction in The Neurophysiol­ ogy of Face perception and Facial Expression. Hillsdale, NJ: Erlbaum, 1986. Martha J. Farah. Is face recognition 'special' Evidence from neuropsychology. Be­ havioural Brain Research, 76:181-189, 1996. Ronald Aylmer Fisher. The Use of Multiple Measures in Taxonomic Problems. Annals of Eugenics, 7:179 188, 1936. Rafael C Gonzalez, Richard E Woods, and Barry R Masters. Digital Image Processing, Third Edition. Journal of Biomedical Optics. 14(2):029901, 2009. H.-W. Chang H.-T. Chen and T.-L. Liu. Local discriminant embedding and its vari­ ants. In Proc. CVPR, 2005. Xiaofei He, Deng Cai, Shuicheng Yan, and Hong-Jiang Zhang. Neighborhood Pre­ serving Embedding. In Tenth IEEE International Conference on Computer Vision ICCV05 Volume 1, volume 2, pages 1208-1213. Ieee, 2005. Xiaofei He and Partha Niyogi. Locality Preserving Projections. Advances in Neural Information Processing Systems, 16:153-160, 2003. Hu Yuxiao Niyogi Partha He Xiaofei, Yan Shuicheng and Zhang Hongjiang. Face recognition using laplacianfaces. IEEE Transactions on Pattern Analysis and Ma­ chine Intelligence, 27(3):328-340. 2005. Rebecca Heyer. Biometrics Technology Review 2008. Land Operations Division De­ fence Science and Technology Organisation, 2008. R. Horn and C. Johnson. Topics in Matrix Analysis. Cambridge University Press. 1991. Y Ijiri and M Sakuragi. Security Management for Mobile Devices by Face Recognition. In 7th International Conference on Mobile Data Management MDM06. pages 4949, 2006. A. N. Venetsanopoulos Juwei Lu, K. N. Plataniotis and Stan Z. Li. Ensemble-based discriminant learning with boosting for face recognition. IEEE Transactions on Neural Networks. 17:178. 2006. Takeo Kanade. Computer recognition of human faces. Interdisciplinary Sys- tems Research, page 47. 1977. 52 Hatef Mohamad Duin Robert P. W. Kittler. Josef and Jiri Matas. On Combining Classifiers. IEEE Trans. Pattern Anal. Mach. Intell., 20:226-239. 1998. T. Kohonen. Self-Organization and Associative Memory. Springer. 1988. J. M. Lee. Introduction to Smooth Manifolds. Springer-Verlag New York. 2002. J. Buhmann J. Lange C. v. d. Malsburg M. Lades. J. Vorbruggen and R. Wurtz. Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Computers, 42:300 311, 1993. Crookes K. McKone E. and Kanwisher X. The cognitive and neural development of face recognition in hum,ans. Cambridge, MA, Bradford Books, 2009. Christian A Meissner and John C Brigham. Thirty years of investigating the own-race bias in memory for faces: A meta-analytic review. Psychology Public Policy and Law, 7(l):3-35, 2001. Hyeonjoon Moon and P Jonathon Phillips. Computational and performance aspects of PCA-based face-recognition algorithms. Perception, 30:303-321, 2001. John Morton and Mark H. Johnson. Conspec and Conlearn: A Two-Process Theory of Infant Face Recognition. Psychological Review. 98, No.2:164-181, 1991. F. O'Sullivan. Discretized laplacian smoothing by fourier methods. Journal of the American Statistical Association, 86(415) :634 -643, 1991. L. Ganesan P. Latha and S. Annadurai. Face Recognition using Neural Networks. Signal Processing: An International Journal (SPIJ), 3:153-160, 2009. Q. Ji P. Wang, M.B. Green and J. Wayman. Automatic eye detection and its valida­ tion. In Proc. of 2005 IEEE Conf. on Computer Vision and Pattern Recognition -Workshops, 3:164. 2005. Christopher C Paige and Michael A Saunders. LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares. ACM Transactions on Mathematical Software. 8(1):43-71, 1982. Olivier Pascalis and Jocelyne Bachevalier. Face recognition in primates: a crossspecies study. Behavioural Processes. 43(l):87-96, 1998. P. Penev and J. Atick. Local feature analysis: A general statistical theory for object representation. Netw.: Comput,. Neural Syst.. 7:477-500. 1996. Alex Pentland and Tanzeem Choudhury. Face recognition for smart environments. Computer. 33(2):50-55. 2000. Rauss P Phillips P J, Moon H and Rizvi S. The FERET evaluation methodology for face recognition algorithms. In Proceedings of Computer Vision and Pattern Recognition 97, 1997. 53 C.L. Wilson R. Chellappa and S. Sirohey. Human and Machine Recognition of Faces: A Survey. Proceedings of the IEEE, 83:705 740. 1995. S.J. Raudys and A.K. Jain. Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell., 13: 252-264, 1991. D. L. Ruderman. The statistics of natural images. Netw.: Comput. Neural Syst., 5: 598 605, 1994. J. Shlens. A Tutorial on Principal Component Analysis. Technical report, Institute for Nonlinear Science, UCSD, 2005. T. J. Stonham. Practical face recognition and verication with wisard. Aspects of face processing,Kluwer Academic Publishers, 1986. R Veldhuis R Gehlen S Tao Q, Van Rooseler and Weber. Optimal Decision Fusion and Its Application on 3D Face Recognition: http://www.3dface.org/ files/papers/veldhuis-CAST20070ptimalDecisionFusion.pdf, 2007. Abdenour Hadid Timo Ahonen and Matti Pietikainen. Face recognition with local binary patterns. Lecture Notes in Computer Science, 3021/2004:469-481, 2004. J.J. Hull T.K. Ho and S.N. Srihari. Decision Combination in Multiple Classifier Systems. IEEE Trans. Pattern Anal. Mach. Intell., 16:66-75, 1994. D Tranel and AR Damasio. Knowledge without awareness: an autonomic index of facial recognition by prosopagnosics. Science, 228:1453-1454, 1985. Matthew Turk and Alex Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3:71-86, 1991. T Valentine. Upside-down faces: a review of the effect of inversion upon face recog­ nition. British journal of psychology London England 1953, 79 ( Pt 4)(4):471-491, 1988. Xiaogang Wang and Xiaoou Tang. Unified Subspace Analysis for Face Recognition. Proceedings of the Ninth IEEE International Conference on Computer Vision. 2: 679686, 2003. Mohamad Hassoun Xiaoyan Mu and Paul Watta. A Weighted Voting and Sequential Combination of Classifiers Scheme for Human Face Recognition,. In International Conference on Computer Vision and Pattern Recognition (CVPR 2005), 2005. Y. Adini Y. Moses and S. Ullman. Face recognition: The problem of compensating for change in illumination direction. In In European Conf. On Computer Vision. 1994. 54 S Yan, D Xu, B Zhang, and Hong-Jiang Zhang. Graph Embedding: A General Frame­ work for Dimensionality Reduction. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR05. volume 2, pages 830-837. Ieee. 2005. W Tan J Jia Z Zheng, F Yang and J Yang. Gabor feature-based face recognition using supervised locality preserving projection. Signal Processing, 87(10):24732483, 2007. W Zhao, R Chellappa, P J Phillips, and A Rosenfeld. Face recognition: A Literature Survey. ACM Computing Surveys, 35(4):399-458, 2003. Chellappa R Zhao W. and Phillips P. J. Subspace linear discriminant analysis for face recognition. Technical report, Center for Automation Research, University of Maryland, College Park, MD., 1999.