Initial Investigation into Using A Two-Level Regional Voting Approach for Face Verification by Jun Ma B.Sc., University of Northern British Columbia, 2007 THESIS PROPOSAL SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN MATHEMATICAL, COMPUTER, AND PHYSICAL SCIENCES COMPUTER SCIENCE The University of Northern British Columbia April 2010 ©Jun Ma, 2010 1*1 Library and Archives Canada Bibliothgque et Archives Canada Published Heritage Branch Direction du Patrimoine de l'6dition 395 Wellington Street Ottawa ON K1A 0N4 Canada 395, rue Wellington Ottawa ON K1A 0N4 Canada Your file Votre reference ISBN: 978-0-494-60862-3 ISBN: 978-0-494-60862-3 Our file Notre reference NOTICE: AVIS: The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats. L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis. Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant. Canada Abstract Face verification is defined as a person whose identity is claimed a priori will be compared with the person's individual template in database, and then the system checks whether the similarity between pattern and template is sufficient to provide access. In this thesis we introduce a new procedure of face verification with an embedding Electoral College framework, which has been applied successfully in face identification. The approaches are evaluated by experiments on benchmark face databases applying the Electoral College framework embedded with standard baseline PCA algorithm and newly developed algorithm S-LDA. The results demonstrate that the proposed face verification systems improve the performance of these holistic algorithms. i Table of Contents Abstract i Table of Contents ii List of Figures iv List of Tables v Acknowledgments vi Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Major Contribution 2 1.3 Overview of the Thesis 3 Chapter 2 Literature Review 4 2.1 History of Face Recognition 4 2.1.1 Early Development 4 2.1.2 Recent Improvements 5 2.2 Identification vs. Verification 6 2.2.1 Examples of Face Recognition Methods/Algorithms 6 2.2.2 Face Verification Procedure 8 2.3 Multi-Level Regional Voting System as Face Recognition Approach Chapter 3 Approach/Methods 3.1 Proposed Procedure of Face Recognition/Verification 10 12 12 3.1.1 Face Verification Procedure 12 3.1.2 Regional Scheme 14 3.1.3 Regional Thresholds Generator 16 ii 3.1.4 Voting/Scoring Scheme 21 3.1.5 Shifting 23 3.2 Performance Measures 23 3.2.1 Reviews 23 3.2.2 Experimental Results Evaluation 26 Chapter 4 Experiments and Results 4.1 Data Sets 28 28 4.1.1 The ORL Database of Faces 28 4.1.2 The Yale Face Database 29 4.2 Experiment 30 4.2.1 Setup Mode 31 4.2.2 Model Training 34 4.2.3 Model Testing 35 4.3 Experiment Results Analysis and Discussions 36 4.3.1 Generating Thresholds 36 4.3.2 Comparison of Different Training Sets 39 4.3.3 Database ORL vs. Database Yale 42 4.3.4 Comparison of Different Methods 46 4.3.5 Recognition Results 47 Chapter 5 Conclusions 52 5.1 Summary 52 5.2 Future Works: 53 References 55 List of Figures Figure 2. 1 The procedure of face verification 9 Figure 3. 1 Proposed Training Procedure 12 Figure 3. 2 The procedure for face verification with regional voting/scoring scheme 13 Figure 3. 3 Two-Level Regional Division vs. Multi-Level Regional Division 15 Figure 3. 4 One threshold for each subject (Method 2) 19 Figure 3. 5 One threshold for each subject (Method 3) 20 Figure 3. 6 Two-Level Regional Voting/Scoring Scheme 22 Figure 3. 7 2-step shift of each direction makes 25 shifts in total 23 Figure 3. 8 Area Under the ROC Curve 27 Figure 4. 1 ORL Database of Face 29 Figure 4. 2 Yale Face Database 30 Figure 4. 3 Two-level Regional Voting Model Setup 33 Figure 4. 4 Training Data Collecting 34 Figure 4. 5 Comparison of Different Regional Thresholds (method 1) 36 Figure 4. 6 Comparison of different training sets (ORL PCA 1 x 1) 40 Figure 4. 7 Comparison of different training sets (ORL PCA lOx 10) 41 Figure 4. 8 Database ORL vs. Yale (PCA & S-LDA 2T 1 x 1) 43 Figure 4. 9 Database ORL vs. Yale (PCA & S-LDA 2T lOx 10) 44 Figure 4. 10 Comparison of Different Methods (2T) 46 Figure 4. 11 Recognition Results (2T) 48 List of Tables Table 3. 1 Confusion Matrix 24 Table 3. 2 Sensitivity/Specificity 25 Table 4. 1 Training Data Collection 35 Table 4. 2 AUC Results of Different Regional Thresholds (Method 1) 37 Table 4. 3 Threshold Set with Relatively Best AUC Result (Method 2) 38 Table 4. 4 Threshold Set with Relatively Best AUC Result (Method 3) 39 Table 4. 5 Database ORL vs. Database Yale 42 Table 4. 6 Recognition Results 50 Acknowledgments I would like to thank my supervisor, Dr. Liang Chen, for his continuous guidance and support throughout my studies at the University of Northern British Columbia. I also wish to express my thanks to Dr. Charles Brown and Dr. Jianbing Li for their encouragement, valuable advice and positive suggestions on my thesis. As well, I would like to take this opportunity to thank the University of Northern British Columbia for providing every support throughout my studies in Canada. I would like to give a special thank you to Dr. Jean Wang and the HPC lab for their technical support on my research work. Also I would like to thank all my friends, Lin, Motto, Ms. Beattie, and James who show their daily support and thank you to Vicki and Graham for their advice and suggestions on my thesis. Last but not least, I would also give my sincere thanks to my parents for their selfless love, endless patience and huge support (either financial or spiritual). Without them, I could not have reached this point. VI Chapter 1 Introduction This chapter presents an introduction to the topic of this thesis and the major contribution made through this research. The outline of this thesis will be provided as well. 1.1 Motivation Face recognition can be generally defined as identify or verify one or more persons in a still or video image of a scene by using a stored database of faces (Zhao et al. 2003). The only difference between identification and verification is that identification refers to a positive ID of identity within a predefined identities group (One-to-Many); while verification refers to a positive ID of specific Identity (One-to-One). Other than that, identification and verification work in exactly the same way. Because of increasing commercial and security needs, face recognition, as a biometrics technique, has received a lot of attentions in the past decades. Since the 1990s, the researchers have put tremendous efforts in this research area, and acquired an enormous accomplishment in a very short time. Numerous algorithms have been developed, and the face recognition systems which are based on these algorithms have been applied in real life (Biometrics History, NSTC Subcommittee on Biometrics). Holistic algorithms are the most popular approaches so far; they are mostly developed for the face identification, and can naturally be applied into the face verification systems. However, the field of face recognition is still full of challenges. Current face recognition does not work well under some conditions, such as poor lighting, sunglasses, long hair, or other 1 objects partially covering the subject's face. And sometimes, even a low resolution image or a big smile can lead to a less effect in the system. As a decision making strategy, voting scheme has been recently applied into the face identification system (Chen and Tokuda. 2003; Artiklar et al. 2003; Faltemier et al. 2006). According to the previous research works, we can see that multi-level voting scheme is able to significantly improve the performance of holistic algorithms in the face identification (Chen and Tokuda 2003; Chen and Tokuda 2003; Chen and Tokuda 2005; ). It has just been proved that the regional voting can be applied into face identification systems, and the face identification system with regional voting scheme has achieved improvements on performance (Chen and Tokuda 2009). So far, the regional voting hasn't been applied into face verification system yet, and the prospective difficulties on finding the "thresholds" for each region keeps people away this research topic. Therefore, my next step of inquiry is to investigate the possibility of adopting the regional voting scheme into the face verification system. This thesis intends to investigate strategies of embedding the Regional Voting scheme into a regular face verification procedure. 1.2 Major Contribution In this thesis, I studied the strategies for adopting a Multi-level Regional Voting Scheme for face verification. The major contributions of this thesis are: • Based on the regular face verification procedure, we have constructed a new face verification procedure adopting Two-level Regional Voting Scheme. By employing two face recognition algorithms: Principal Component Analysis (PCA) and Spatially Smooth 2 Version of LDA (S-LDA), we have developed two models of the proposed face verification procedure. • In the newly proposed procedure, a concept "threshold" takes a key role. We proposed the following 4 methods of generating thresholds: o One threshold for all subjects (0/1 voting); o One threshold for each subject (Method 2 and Method 3, 0/1 voting); o Weighted voting (the similarity value in each region is used as the weight). • Extensive experiments have been developed for testing the above models and methods, the advantages and disadvantages have been compared. 1.3 Overview of the Thesis The purpose of this research is to explore the applicability of Multi-level Regional Voting Scheme into face verification. The remaining chapters of this thesis are organized as follows. In the next chapter, a literature review introduces the background of the face recognition and reviews the approaches that are related to the proposed method in the literature. Chapter 3 describes the methodology of this research. In Chapter 4, the experiment procedures are demonstrated, and the results and accompanying analysis are presented. Finally, the conclusions and discussions of future work are provided in Chapter 5. 3 Chapter 2 Literature Review This chapter provides a brief review of the field of face recognition and a review of other research that has been done in this area in order to facilitate the introduction of the proposed method. As well, a number of face recognition approaches and procedures for face verification will be presented. Subsequently, the idea of college election, which is closely related to the proposed method, will be covered. 2.1 History of Face Recognition Face recognition has become a very popular research area in computer vision, and has been studied for the past decades (Biometrics History, NSTC Subcommittee on Biometrics). 2.1.1 Early Development Face recognition researches started in the 1960s. The first system for face recognition (Facial feature-based recognition, developed by Bledsoe, Kelly) (Kelly 1970; Bledsoe 1964) required the administrator to manually input and computed the measurements and locations of face features, such as hair colour and lip thickness. In the late 1980s, first semi-automated facial recognition system was deployed (the Lakewood Division of the Los Angeles County Sheriffs Department, 1988). In 1989 Kohonen brought up an idea called "eigenface," also known as the PCA approach, which computes "a face description by approximating the eigenvectors of the face image's autocorrelation matrix" (Kohonen 1989). Later, Kirby and Sirovich (Kirby and Sirovich 1990) introduced an algebraic manipulation to directly calculate the eigenface. The development of the Eigenface algorithm was a milestone because it showed that less than one hundred values were required to approximate a suitably aligned and normalized face image 4 (Sirovich and Kirby 1987). In 1991, Turk and Pentland (Turk and Pentland 1991) extended PCA to recognize faces, which enabled a reliable real-time automated face recognition system. In 1990s, Optical Character Recognition (OCR) was launched for consumer applications like scanning and faxing. In 1996, Belheumeur et al. implemented the algorithm LDA which was developed by Fisher in 1936 with PCA and brought up the idea of "fisherfaces" (Belhumeur et al. 1996). By about 1997, a face recognition system called "Bochum system" was developed and sold as a commercial product. It was used by customers such as Deutsche Bank and operators of airports. The software was described as "robust enough to make identifications from less-thanperfect face views. It can also often see through such impediments to identification as mustaches, beards, changed hair styles and glasses - even sunglasses" (ScienceDaily 1997). In 2000, a standard testing method and database called FERET was established to evaluate or compare facial recognition algorithms. In the same year, the first face recognition vendor test (FRVT 2000) was held. A popular face recognition algorithm, "Independent Component Analysis (ICA)" was implemented in 2002 by Bartlett et al (Bartlett et al. 2002). 2.1.2 Recent Improvements In 2006, the performance of the latest face recognition algorithms was evaluated in the Face Recognition Grand Challenge (FRGC). High-resolution face images, 3-D face scans, and iris images were used in the tests. The results indicated that "the new algorithms are 10 times more accurate than the face recognition algorithms of 2002 and 100 times more accurate than those of 1995. Some of the algorithms were able to outperform human participants in recognizing faces and could uniquely identify identical twins" (Phillips et al. 2005; Phillips et al. 2007). In the FRVT 2006, a FRR of 0.01 at a FAR of 0.001 was achieved by Neven Vision (NV1-NORM algorithm) on the very high-resolution still images and Viisage (V-3D-N 5 algorithm) on the 3D images. Furthermore, the FRVT 2006 established the first 3-D face recognition benchmark, and showed significant progress has been made in matching faces across changes in lighting (Phillips et al. 2007). 2.2 Identification vs. Verification Face recognition systems can be classified into two groups: Identification - "A one-to-many comparison of the captured face against a face database in an attempt to identify an unknown individual" (Biometrics, Wikipedia). In face identification, the system is trained with the patterns of a group of persons. An unknown pattern that is going to be identified is matched against every known template, yielding either a score or a distance describing the similarity between the pattern and the template. The system assigns the pattern to the person with the most similar template. Verification - "A one-to-one comparison of a captured biometric with a stored template to verify that the individual is who he claims to be" (Biometrics, Wikipedia). In verification system, the pattern that is verified is compared with the person's claimed individual template in order to decide whether the similarity between pattern and template is sufficient to support the claim. 2.2.1 Examples of Face Recognition Methods/Algorithms Appearance based approaches are the most successful and well-studied techniques in face recognition (Turk and Pentland 1991). In appearance-based approaches, an image of s i z e n X m pixels is usually represented by a vector in a n X m dimensional space. But in practice, these n X m dimensional spaces are too large to allow robust and fast face recognition. To 6 resolve this problem, dimensionality reduction techniques are used (He et al. 2005). Two of the most popular techniques for this purpose are Principal Component Analysis (PCA) (Turk and Pentland, 1991), and the Linear Discriminant Analysis (LDA) (Belhumeur et al., 1996; Zhao et al., 1998). In the following paragraphs, we will briefly introduce the PCA algorithm and a newly developed algorithm called Spatially Smooth Version of LDA which is based on LDA. 2.2.1.1 Principal Component Analysis (PCA) PCA is a statistical dimensionality-reduction method, which retains the majority of the variation present in the dataset, while reducing the dimensionality of a dataset. Kirby and Sirovich (Kirby and Sirovich 1990) applied PCA to representing faces and Turk and Pentland (Turk and Pentland 1991) extended PCA to recognizing faces. PCA-based face recognition method is an eigenvector method designed to model linear variation in high-dimensional data. The PCA can be used to find a subspace from a given higher dimensional vector. The input of PCA is a training set, set is zero ( ' oiN facial images such that the ensemble mean of the training ) (Moon and Phillips 2001). PCA projects the original n -dimensional data onto the ^ -dimensional linear subspace spanned by the leading eigenvectors of the data's covariance matrix (Turk and Pentland 1991; Martinez and Kak 2001). 2.2.1.2 Spatially Smooth Version of LDA (S-LDA) The Spatially Smooth Subspace Learning (SSSL) model (Cai et al. 2002) is a linear dimensionality reduction method that uses a laplacian penalty to constrain the coefficients to be spatially smooth and produce a spatially smooth subspace which is better for image representation. Recognition clustering and retrieval can be then performed in the image 7 subspace. It was developed based on an approach called "Graph Embedding (GE)" which was also proposed by Dr. Cai and his colleagues (He et al. 2005). The GE approach is defined as GE(W,D); Wdenotes a symmetric mxmmatrix with Wt] having the weight of the edge joining vertices / and j; D is a diagonal matrix whose entries are column (or row) sums of W, Z), = ^ W]t. Cai et al. has proved that many recently proposed manifold learning algorithms can j be interpreted into the Graph Embedding framework by changing W(tie et al. 2005; Cai et al. 2008). Therefore, the SSSL model can be applied to all the existing subspace learning algorithms, such as LDA. The research of Cai et al. has demonstrated that SSSL consistently outperforms the corresponding ordinary subspace learning algorithms and their tensor extensions (Cai et al. 2002). 2.2.1.3 Summary There are numerous face recognition algorithms, such as the independent component analysis (ICA) (Liu and Wechsler 1998; Delac et al. 2005; Bartlett et al. 2002; Comon 2003), the eigenspace-based adaptive approach (EP) (Liu and Wechsler 1998), the Elastic Bunch Graph Matching (EBGM) (Wiskott et al. 1997), and the support vector machine (SVM, Guo et al. 2000, Jonsson et al. 2000). All the above algorithms are appearance based. The other face recognition techniques based on face representation are called "Feature-based", which uses geometric facial features and geometric relationships between them. 2.2.2 Face Verification Procedure The general procedure of face verification is summarized in Figure 2.1. 8 PCA LDA S-LDA Figure 2.1 The procedure of face verification The ordinary face verification procedure is described as in Fig. 2.1. It includes three major components: the Holistic Algorithm Model, Matching Processor, and the Classifier. After some pre-processing, testing images are usually projected into lower dimensional subspaces by using some of the holistic algorithms in the Holistic Algorithm Model. Common holistic algorithms include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and some other newly developed holistic algorithms, such as S-LDA, and so on. The subspace vector which is obtained in the Holistic Algorithm Model then is passed into the Matching Processor. In this processor, the subspace vector of the testing image will be compared with the subspace vectors of the template images which are stored in the database. The output of the Matching Processor is similarities values which are usually obtained by measuring the similarity between the testing image vectors and the template image vectors. The last component is called Classifier in where the similarities values are compared with the preset thresholds, so that a verifying decision can be made (Kang et al. 2002). 9 2.3 Multi-Level Regional Voting System as Face Recognition Approach VOTING is a popular and important decision-making processes. It has not only been used in daily social and political activities, but has also been used in many scientific studies. Voting usually operates in two ways: national and regional voting. In national voting, a candidate(s) is selected directly by a simple majority of the entire voting population of the nation, while in regional voting the entire nation is divided into regions; the winner of the voting is determined by a majority of the winning regions, based on the winner-take-all principle (Chen and Tokuda 2003). A K -level Electoral College (Regional Voting) is simply defined as follows: "the original nation/area is said to be the 1st level (level 1). This nation is then partitioned into 2ndLevel regions. Each 2nd-level region is partitioned into 3r -level regions, which are then partitioned into 4th-level regions, and so on up to the K"' -level regions. The winner of each K'h level region is determined by a majority of its voting population. The winner of and i'h -level region (i = k -1, k - 2,...,1) is determined by the majority of the winning (i + \)th -level regions that the i'h -level region was partitioned into, based on the winner-takes-all principle."1 A voting scheme has also been introduced into the face identification field. It had an outstanding performance compared with other existing face identification models. Faltemier et al. used multiple regions of the face for matching, and tried to reduce the effects caused by expression variation between gallery and probe images, and the experimental results demonstrated an improved performance (Faltemier et al. 2006). Artiklar et al. developed a face 1 Chen, L. "Theory of Multi-Level Electoral College for Multi-Candidate Elections and Electoral College Based Face Recognition Surveillance & Intelligent Textual Information Retrieval Systems", NSERC Discovery Grant proposal, 2005. 10 recognition system by using local voting networks which combine the local distance computations and a voting scheme (Artiklar et al. 2003). Chen et al. recently demonstrated that regional voting scheme can be used as a general framework to significantly improve the performance of all holistic algorithms in face identification systems (Chen and Tokuda 2009). 11 Chapter 3 Approach/Methods First, this chapter formulates a model of the Regional Voting face verification system. Second, the evaluation method is presented. 3.1 Proposed Procedure of Face Recognition/Verification 3.1.1 Face Verification Procedure The proposed face verification procedure is developed based on the original face verification procedure. It includes four major components: the Regional Scheme, Lower Dimensional Space, the Matching Processor and the Voting/Scoring Model; and two databases: the Database of Gallery Regional Subspace Vectors, and the Database of Regional Thresholds. Training Set Figure 3 . 1 Proposed Training Procedure 12 Before constructing the new face verification process, we had to first establish two new databases. As shown in Fig. 3.1, in order to establish the two new databases, we had to first collect a training data which includes both the gallery images and the training images. Under the regional voting scheme, the gallery images were first divided into non-overlapping regions in the Regional Scheme. Then a holistic algorithm was used to obtain subspace vectors, and then to project the regional vectors into lower dimensional subspace. Finally those regional subspace vectors were saved as data in the Database of Gallery Regional Subspace Vectors. Meanwhile, the training images were also divided into regions and projected into lower dimensional subspaces. Then the regional subspace vectors of the training images and the corresponding gallery images were inputs to the Threshold Generator where the thresholds of each region were generated and saved into data as the Database of Regional Thresholds. The principle of Threshold Generator will be discussed in section 3.1.4. Figure 3. 2 The procedure for face verification with regional voting/scoring scheme 13 In the proposed face verification procedure (Fig. 3.2), the testing images are first partitioned into non-overlapping regions and then represent the subimage of each region by a raw vector. The vector of each region is then projected into the lower dimension space to generate a regional subspace vector. After obtaining the regional subspace vectors, in the matching processor, by calculating the Euclidean distance of two vectors, the similarities between the testing image regional vectors and the corresponding gallery image regional vectors are measured. Finally, in the Voting/Scoring Model, the regional similarity values are compared to the stored regional thresholds. If the similarity distance is less than or equal to the threshold, then the region gets a vote/score of 1, otherwise, a vote/score of 0. Since we are using the "Two-level Regional Voting Scheme," after the voting/scoring for each region, we will have the total votes for the whole image (sum the votes for all regions of one image). Another threshold is needed to classify the whole image. For example: take 3 x 3 division as an example; assume we set the threshold as 3; there are a total 4 regions out of 9 regions vote for 1 (i.e. the score is 4); then this image will be verified as the image that it declared. 3.1.2 Regional Scheme The Regional Scheme is described in the following way: the nation (l st -level) is represented as a rectangular area with size of nxm( wheren,me Z+) unit cells. Divide the nation into C x C equal shaped rectangles which called region (2nd-level) with size r x r ; (where n and m are divisible by r and r y , and ri and are positive integers). Each 2nd-level region can be considered as a nation, and divided into C x C equal shaped rectangles which called 3rd-level region with size ti x t, (where rt and r. are divisible by ti andf., andt t andf. are 14 positive integers), as shown in Fig. 3.3 (Chen and Tokuda 2005). By repeating the above steps, the nation can be partitioned into K ( K > 0 ) levels of regions. Second level Regions 3 x 3 Partitions With size ri y-r} Nation ( n x m ) Two-Level Regional "V Second Level Regions 3 x 3 Partitions With size Third Level Regions 3 x 3 Partitions With size t{ x t j Tin c c -L e ve 1/Multi-Le ve 1 Re gio nal Figure 3. 3 Two-Level Regional Division vs. Multi-Level Regional Division 15 3.1.3 Regional Thresholds Generator As it is explained in section 3.1.1 and Fig. 3.1, "Threshold" plays as a key role in the proposed procedure. There are two sets of threshold involved: threshold for each region and threshold for all regions. Assuming that we have one threshold for all subjects in each region and for all regions of each subject, then for a regional scheme of M by N regions, there will be a MN +1 threshold. Different thresholds will result in different sensitivity and specificity. We can expect that, whenMV out of MN +1 threshold are set, and ROC curve of specificity vs. sensitivity can be drawn by varying different values of the remaining threshold. A proper set of thresholds needs to be selected from the MN thresholds so that the AUC of ROC is sub-optimized. We will also suggest different thresholds for different subjects. In the following subsections, we develop 4 methods for generating the threshold for each region to ensure good performance on the Voting/Scoring model. These four methods are not the only ways or the best ways to generate thresholds, and we believe that other better ways can be developed later. 3.1.3.1 One Threshold for All Subjects (Method 1, 0/1 Voting) For the training phase, we constructed a group of training images Gtrajning which are used to generate the thresholds. 1. Divide the training images into regions, and compare to the Gallery images, and get the similarity distance S, m (which denotes to the Euclidean distance between the training region and the gallery region., and n = number of subjects, m = number of regions) of each region. 16 2. Find M, m = min(5'/ ), take each M, m as a candidate threshold T, m . 3. Use the regional similarity distance S, m again to compare with these T, m, and get the voting/scoring matrices by employing the Voting/Scoring scheme, so that the specificity of each S, m is calculated. 4. Find a T, m where the specificity matches the preset value jU (e.g., the preset value jU= 0.90, find where the specificity-0.90), and save this Tj m as the candidate threshold of Gallery Image In 's m'h region. In order to find the regional thresholds for gallery images in database, we set the preset value fi = [0.90, 0.80, 0.70, 0.60, and 0.50] (Note: the preset value // is same for all regions of all subjects) Among these preset values find the one with the best performance (e.g., to find the best performance is just to draw the ROC curves of the final results with each preset value j i , and then find the one with the largest AUC value), and save it to be the regional threshold of its corresponding gallery image into database. 3.1.3.2 One Threshold for Each Subject (Method 2 & 3, 0/1 Voting) Let each subject have its own group of training images {n = number of subjects). 1. Again divide all of the images into regions, and 2. Calculate the similarity between the training image and the gallery image for each region. The regional similarity distances SCln m (n= number of subjects, m = number of regions) are obtained. 17 3. Hence, each SC, m can be considered as a candidate threshold!) m of gallery image I n 's m'h region. 4. The number of the training images varies when the number of gallery images changes. In order to get a same size of candidate thresholds T, m for variant size of G"timing, we take SC,^m at Y intervals as our candidate thresholds, where Y = ml r(m = number of subjects *size of training group, and r is any number that can divide m exactly). So far, we have just finished collecting the candidate thresholds. To find the threshold with the best performance among these candidates as the threshold of gallery images in database, two methods are introduced in the following. • Method 2: choose the thresholds for all regions from the same entity of different candidate regional threshold set each time, and then save the threshold with best performance/with a largest AUC value as the threshold^ (m denotes to the m' h region of Gallery Image I n ) into database (Fig. 3.5). 18 TS, Region 1 l-SC/,.1 2. SCIt 3. SCIiA t.'sclu, ts, Region 2 1. SC Ti T2 2. SC,iu 3. SC.11, f. SC/,,2 TS, t Tm Region m 1. SC.11>m 2. SC^ n 3. SC,u m •••9 t. SC.11>m Figure 3 . 4 One threshold for each subject (Method 2) (Note: t= number of subjects * number of training images per subject) • Method 3: Find the threshold with the best performance/with a largest AUC value for each region, and then save it as the threshold^ of gallery image/„ 's mth region into database (Fig. 3.6). 19 TS Region 1 1. SC., With best performance 14. SC. t. sch,, With best performance Image 7, TS, Region 2 1. SC.h! 33. SC. t. SC re, 2 Region m 1. 5C, With best performance 24. SC, t. SC Figure 3. 5 One threshold for each subject (Method 3) Theoretically, models with Method 3 should have better performances than models with Method 2. And both Method 2 and Method 3 should outperform Method 1. 3.1.3.3 Weighted Voting in Regions (Method 4) In order to compare the regional voting scheme thoroughly with the national voting scheme, we also employed the fourth method which is regional voting weighted voting in our 20 experiment. We have mentioned previously that the most important concept of our proposed face verification procedure is 'Threshold.' In method 4, 'weighted voting' can also be considered as that when we set all of the regional thresholds to 0 (since we use Euclidean distance to measure the similarity), and use the similarity value in each region as the weight. To implement this method, we simply sum the similarity distance value Sj of each region and make up a score value S for an Image / . We believe that Method 4 should have the best performance among these 4 methods which we have proposed, because it excludes the error rate that may be caused by "Threshold." 3.1.4 Voting/Scoring Scheme The Voting/Scoring Scheme is constructed as in Fig 3.6. Each input regional vector of testing image is matched with its corresponding gallery regional vector by calculating the similarities between vectors. And then a corresponding regional threshold is used to determine whether the input regional vector is classified. If it is classified, this region gets a vote/score of 1 or 0. This process was repeated for all the regions and all the votes/scores received by each image in the database were tracked. Once the voting/scoring was done, a total score of each image was obtained by summing the votes/scores of all the regions. Now, by employing another threshold, we can determine whether the whole input image is classified. 21 Gallery Image with partition of 3x3 Testing Image with partition of 3x3 Testing Regionl Gallery Region9 Calculate the similarity between the Testing region and Gallery region Calculate the similarity between the Testing v region and Gallery region Similarity Distance Similarity Distance Vote/Score- 1 Vote/Score 0 Verrfied Testing Image = Gallery Image Figure 3. 6 Two-Level Regional Voting/Scoring Scheme 22 3.1.5 Shifting In order to provide robustness to small amounts of shift, normally the shift process is applied when computing the distance between a testing image region and gallery images regions. In our study, we use 2 steps shifting for 4 directions: north, south, east, and west, which gives 25 shifts in total for each region, and we then record the smallest distance (Artiklar et al. 1999). Figure 3. 7 2-step shift of each direction makes 25 shifts in total. 3.2 Performance Measures 3.2.1 Reviews 3.2.1.1 False Reject Rate and False Accept Rate A false accept rate or FAR is the probability that the system incorrectly matches the input pattern to a non-matching template in the database. It measures the percent of invalid inputs which are incorrectly accepted (Biometrics, Wikipedia). 23 False reject rate or FRR is the probability that the system fails to detect a match between the input pattern and a matching template in the database. It measures the percent of valid inputs which are incorrectly rejected (Biometrics, Wikipedia). In FRR/FAR, the result of classification, obtained by varying a threshold, can be represented in a confusion matrix as shown in Tab. 3.1. Table 3 . 1 Confusion Matrix Condition Test Outcome Positive Negative Positive True Positive False Negative Negative False Positive True Negative TP (True Positive), FP (false positive), TN (true negative) and FN (false negative) represent the number of examples falling into each possible outcome. The False Reject Rate (FRR) and the False Accept Rate (FAR) are defined as (Biometrics Wikipedia): FRR = FP TN + FP Equation 1 FN FAR = ———— FN + TP Equation 2 3.2.1.2. Sensitivity and Specificity 24 Sensitivity and specificity are statistical measures of the performance of a binary classification test. The sensitivity (or true positive rate/recall rate) measures the proportion of actual positives which are correctly identified, and the specificity (or true negative rate) measures the proportion of negatives which are correctly identified. A theoretical, optimal prediction can achieve 100% sensitivity and 100% specificity (Altman and Bland 1994). Table 3. 2 Sensitivity/Specificity Condition Positive Negative Test Positive True Positive False Positive outcome Negative False Negative True Negative 4 Sensitivity Specificity The Sensitivity and Specificity are defined as follows: ... Sensitivity = number of True Positives , „„„ = 1 - FRR number of True Positives + number of False Negatives Equation 3 Specificity = number of True Negatives = number of True Negative s + number of False Positives Equation 4 From Equations 3 and 4 (Biometrics, Wikipedia), we notice that Sensitivity/Specificity has close relation with FAR/FRR. Therefore, the Sensitivity/Specificity curve pair can works same as the FAR/FRR curve pair. It can be suited to set an optimal threshold for the biometric system. The higher the acceptance threshold, the lower the Sensitivity. Raising the acceptance threshold, however, also raises the Specificity. Therefore, using the threshold parameter most practical biometric systems are not adjusted for Sensitivity = Specificity. The goal must be to have as large an Sensitivity as possible for any given Specificity, and vice versa; i.e., compare the Sensitivities at common Specificity, and vice versa (Biometrics FAQ, Bioidentification). 25 3.2.2 Experimental Results Evaluation Our experimental results are evaluated base on the Area Under the Receiver Operating Characteristic Curve (AUC-ROC), sensitivity and specificity. The Receiver Operating Characteristic (ROC) space here is defined by specificity and sensitivity as x and y axes respectively, which depicts relative trade-offs between true positive and true negative. 3.2.2.1 Receiver Operating Characteristic (ROC) The Receiver Operating Characteristic (ROC, Wikipedia) plots Sensitivity values directly against Specificity values. In general, the matching algorithm makes a decision based on a threshold which determines how close to a template the input needs to be for it to be considered a match. If the threshold is set to be low, there will be a lower FRR/Specificity but higher FAR/Sensitivity. Correspondingly, a higher threshold will increase the FRR/Specificity but will reduce the FAR/Sensitivity. The ROC is limited to values between 0 and 1 on the x axis (Specificity) and y axis (Sensitivity). It has the following characteristic: 1. The ideal ROC only has values that lie either on the x axis (Specificity) or the y axis (Sensitivity); i.e., when the Sensitivity is 1, the Specificity is 0, or vice versa. 2. The highest point is for all systems given by Specificity=0 and Sensitivity=l. 3. The ROC cannot increase. Since ROC is independent of threshold scaling, it can be used to effectively compare between different systems (Biometrics FAQ, Bioidentification). 3.2.2.2. Area Under the ROC Curve (AUC-ROC) 26 The experimental results of the proposed model are evaluated according to the arithmetic mean of the so-called Area Under Curve (AUC) (Fogarty et al. 2005). AUC corresponds to the area under a ROC curve obtained by plotting Sensitivity against Specificity by varying a threshold on the prediction value to determine the classification result. And AUC is the same as the ROC Curve which is independent of threshold scaling and is limited to values between 0 and 1 on the x axis (Specificity) and y axis (Sensitivity); and it has the same characteristic as ROC. Therefore, the AUC-ROC statistic is often used for model comparison (Hanley and McNeil 1983). 1 0 spe 1 Specificity (tniieg) Figure 3. 8 Area Under the ROC Curve KDD Cup 2009", http://www.kddcup-orange.com/evaluation.php 27 Chapter 4 Experiments and Results This chapter focuses on presenting the experiments that have been conducted using the Two-level Regional Voting/Scoring face verification procedure we have proposed in Chapter 3. The research procedures will be demonstrated. The results, the accompanying analysis and evaluation will be covered as well. 4.1 Data Sets Face recognition is one of the most popular research areas of computer vision and machine learning. While a lot of face recognition algorithms have been developed, a large number of face data bases which are necessary to comparatively evaluate these algorithms have been collected. Since there are many databases in use currently, the choice of an appropriate data base to be used usually should be made based on the task given. Here, in my experiment I chose the ORL Database of Faces (also known as AT&T "The Database of Faces"), and the Yale Face Database as my experimental databases. 4.1.1 The ORL Database of Faces •y The ORL face dataset consists of images of 40 subjects, with 10 grayscale images (92x 112) per subject, with random variations in facial expression, pose, and lighting, which amount to a total of 400 faces. The standard task for this set is to identify which individual is present in a given image, based on some number of training examples. Because there are 40 individuals, theoretical chance (i.e. from guessing) is 1/4, or 2.5%. 3 http://www.cl.cam.ac.uk/Research/DTG/attarchive/facedatabase.html 28 Following proposed experiment classifiers were trained using 2, 5 and 8 training examples per individual (reserving the remaining 8, 5 and 2, respectively, for testing), with 50 random splits for experiment. Figure 4 . 1 ORL Database of Face 4 4.1.2 The Yale Face Database The Yale data set5 is indeed a very small face benchmark. It contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink. As with the ORL set, the standard task is to identify the individual on the basis of some number of training examples. Theoretical chance is 1/15, or 6.67%. Same as the ORL dataset, in the experiment, the proposed classifier were trained with 2, 5 or 8 training examples per individual (reserving the remaining 9, 6 or 3 images, respectively, for testing), with 50 random splits for experiment. 4 5 See footnote 3 http://cvc.yale.edu/projects/yalefaces/yalefaces.html 29 Figure 4. 2 Yale Face Database 6 4.2 Experiment We use UIUC 7 versions of database ORL and database Yale that are available online and all images are already aligned and cropped in a standard way, in order for the comparisons of the future works by other researchers. For the experiments, we embed into Two-Level Regional Voting Scheme with the holistic PCA algorithm, and a newly developed holistic algorithm SLDA. The implementation codes of both algorithms are available at UIUC. We chose the number of reduced dimensions to be min(M, S) - 1 for PCA approach, where M is the total number of total images and S is the number of pixels in each image (Chen and Tokuda 2009). We conduct experiments on these data sets with the cropped face images of size 64x64 pixels, each with 256 grey levels per pixel. We use all 50 random splits available in the UIUC versions to test the performances of our proposed systems. 6 7 See footnote 5 http://www.cs.uiuc.edu/homes/dengcai2/Data/FaceData.html 30 In the Two-level Regional Voting Scheme that we defined in Fig. 3.2, and the Two-level Regional Voting Model that we setup in Fig. 4.3, we mentioned there would be 2 threshold sets for each image: T /Regional Threshold, and Tlotal /Total Threshold. But in the experiments, we will only involve the thresholds for each region, and will not talk about the thresholds for the whole images. The reason is: we believe that the threshold for each region has the major influence on the recognition results, and also to find the threshold for each region is more difficult than to find the threshold for the whole image. Therefore, in order to implement the proposed procedure without the Total Threshold, in our experiment we match the input images to all the gallery images in database, so that, after obtaining the total score of all regions by comparing with different gallery images, we can use the majority of voting to classify the input images (i.e. find the image with most votes). But, different from the other three methods, instead of using the voting matrix, Method 4 uses the scoring matrix which is sum up by similarity distances. Therefore, we find the image with a minimum score in Method 4, rather than finding the image with most votes. 4.2.1 Setup Mode Our experimental model is constructed based on the proposed face verification procedure in Fig. 3.1 and 3.2. The general idea is: 1. For Data Set D, each Image I e D is divided into several regions IR = {/,,/,,...,/„}«£ Z(Note: n denotes to the number of partitions for one image, for example, if image I is divided into 10x10 partitions, « = 10 x 10 = 100); 31 2. A corresponding threshold T, = {Tx,T2,...,Tn} of each region and a threshold Ttolal of each image are generated. 3. When a testing Image I,esting passes through this model, it will be divided into regions as well, and be projected into lower dimensional subspace by using holistic algorithms. 4. After calculating the similarity distance for each region between the testing Image I,esting and the template / , a similarity SnUtotog,!) = {Si(Ites,ingiJilS2(Itesl!„g2,I2),...,Sn(Iteslingn,In)},ne Z is obtained. 5. Compare the similarity to thresholdT,, for a l l S R ( I , e s l i n g , I ) , i f S R ( I t e s t i n g , I ) < T I , then Vj =1 else V, = 0. n 6. Finally sum up the votes of all regions Vl0/al = Vj . And repeat the steps till the 1=1 testing image has been compared with all of the template images in database. 7. According to the Two-level Regional Voting Model that we setup in Fig. 4.3, n compare Vtota, = ^ Vj to each other, find the largest value of Vlolal (Note, in Method 4, /=i we have to find the smallest value of Vtota,), and then the testing image is classified. 8. By matching the image that is classified to the image which is declared, the testing image is verified. 32 Testing Image Data Set D Gallery Image Gallery Image J 1 L 1% T J c T 1 J„ T. t> J I T i S. T. T T. Calculate Similarity S, S. T T Regional Threshold Regional Threshold T T Vote/Score V. V, V. K V V Z v , Classified Figure 4. 3 Two-level Regional Voting Model Setup 4.2.2 Model Training In model training, the major tasks are to establish the gallery regional vectors database and to generate the corresponding threshold for each gallery region. To create the gallery regional vectors is simply dividing the gallery images into regions, and then save the lower dimensional subspace vectors of the regions into database. In order to generate the corresponding threshold^, a training data is required. To setup a training data, we split our testing examples of each image in testing dataset D into two groups: training group Gtrajn and testing group Gtest, as shown in Fig. 4.4 and Tab. 4.1. ORL Dataset with 10 grayscale images per subject riii 1 i / Gallery Examples Randomly pick two images as the training set K1 ^mmr— flMf I ' m Wj i i Mr' ftt i ^ J l i Training Group Split the testing set into two groups: one with 5 images as the training group that is used to generate the thresholds; the other with 3 images as the testing group. Figure 4. 4 Training Data Collecting 34 Table 4.1 Training Data Collection ORL Yale 2 Train 5 Train 8 Train 2 Train 5 Train 8 Train Gallery Testing Examples Example Training Testing s Group Group 2 5 3 5 8 3 2 1 2 1 6 5 4 3 2 8 2 1 After collecting the training data, use the Regional Scheme to divide both of the gallery images and the training images into regions (in the experiments, images are divided into 10x10 regions). And then project the regional vectors into lower dimensional subspace. Save the regional gallery image subspace vectors into database. Meanwhile, in the Threshold Generator (Fig. 3.1), we use the regional subspace vectors of the training images to generate the regional thresholds for the corresponding gallery images by employing the methods that are proposed in section 3.1.6. 4.2.3 Model Testing We have constructed two testing models by employing two face recognition approaches: PCA and S-LDA, respectively into our experimental model. The input of these models is the testing imageItesting from testing group Gtesting . The testing image Itesling is divided into 1 Ox 10 partitions, which is same as the template Image 1gallery. We have mentioned in previous sections, the receiver operating characteristic (ROC) curve is one of the effective methods to compare the efficiency of different systems. In order to employ ROC in our evaluation of the testing results, each model gives outputs which are 35 represented by pairs of Sensitivity/Specificity, and AUC values. The AUC values are the area under the ROC curves which is constructed by plotting Sensitivities against Specificities. 4.3 Experiment Results Analysis and Discussions 4.3.1 Generating Thresholds 4.3.1.1 One threshold for all subjects (Method 1, 0/1 Voting) We preset the value offi to 0.90, 0.80, 0.70, 0.60, and 0.50 respectively for each region, so that, there will be 5 sets of thresholds for each Image / . Using the AUC values, we find the thresholds with the best performance. C o m p a r i s o n of Different P r e s e t S p e c i f i c i t y ORL2TPCA 0.8 • 0.6 w 0.4 Spe=0.80 0.2 Spe=0.70 / Spe=0.60 Spe=0.50 0 0.2 0.4 0.6 0.8 Specificity 0.85 0.9 Specificity Figure 4. 5 Comparison of Different Regional Thresholds (method 1) 36 Fig. 4.5 gives an example of ROCs for different preset value of jJ. on Database ORL, with 2 training examples, and using algorithm PCA. The differences of AUCs are tiny, but still the figure demonstrates that when the value o f / / is set to 0.90, a larger AUC can be obtained. To prove the conclusion we had from Fig. 4.5, we extended our experiment to the databases ORL and Yale with 2, 5 and 8 training examples, respectively. The results are shown as in Tab. 4.2. Table 4. 2 AUC Results of Different Regional Thresholds (Method 1) 2T PCAA'1 ORL LDA#1 PCA#1 LDA#1 YALE 5T PCA#1 ORL LDA#1 PCA#1 LDA#1 YALE 8T PCA#1 ORL LDA#1 PCAA'1 LDA#1 YALE AUC Spe=G,90 Spe=0.80 Spe=0.70 Spe=0.60 Spe=0.50 0.9604 0.9595 0.9588 0.9583 0.9552 0.9449 0.9412 0.9360 0.9404 0.9379 0.9522 0.9537 0.9526 0.9496 0.9462 0.9518 0.9552 0.9484 0.9508 0.9506 AUC Spe=0.90 Spe=0.8Q Spe=0.70 Spe=0.60 Spe=0.50 0.9944 0.9946 0.9937 0.9922 0.9891 0.9921 0.9926 0.9917 0.9891 0.9858 0.9755 0.9732 0.9668 0.9668 0.9695 0.9783 0.9742 0.9752 0.9803 0.9745 AUC Spe=0.90 Spe=(J.80 Spe=0.70 Spe=0.60 Spe=0.50 0.9968 0.9946 0.9978 0.9978 0.9912 0.9963 0.9954 0.9933 0.9893 0.9961 0.9819 0.9778 0.9704 0.9700 0.9671 0.9787 0.9870 0.9823 0.9771 0.9750 4.3.1.2 One threshold for each subjects (Method 2, 0/1 Voting) Following the procedure of Method 2, we set r = 40 for database ORL, while r = 30 for database Yale. By taking candidate regional thresholds at 7 = m / r intervals, we obtain 40 sets of candidate regional threshold for database ORL and 30 sets for database Yale. Tab. 4.3 shows the thresholds with the relatively best AUC value for different databases with different number of gallery images, and by different algorithms. 37 Table 4 . 3 Threshold Set with Relatively Best AUC Result (Method 2) Training Dababase Algorithm Set 2T PCA 51 8T ORL 2T LDA 5T 8T ?! PCA 5T 8T YALE 2T LDA 5T 8T 10x10 with one threshold for each subject threshold MO AUC 14 8 11 14 9 16 3 5 6 5 4 3 0.9336 0.9738 0.9736 0.9119 0.9561 0.9524 0.9235 0.9435 0.938 0.9256 0.9458 0.9372 4.3.1.3 One threshold for each subjects (Method 3, 0/1 Voting) Same as in the Method 2, in Method 3, we obtain 40 sets of candidate threshold for database ORL and 30 sets for database Yale as well. The experimental results of Method 3 are shown in Tab. 4.4. 38 Table 4.3 Threshold Set with Relatively Best AUC Result (Method 2) 10X10 w i t h one threshold for each subject (Method 3) DataSet & Algorithm ORL PCA ORL LDA YALE PCA YALE LDA Training Sat 21 5T 8T 2T 5T 8T 2T 5T 81 2T 5T 8T Sepcificity Sensitivity 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.8433 0.9318 0.9320 0.7898 0.8749 0.8555 0.8293 0.8793 0.8760 0.8307 0.8757 0.8720 AUC 0.9392 0.9756 0.9764 0.9211 0.9584 0.9526 0.9266 0.9445 0.9430 0.9326 0.9484 0.9463 4.3.1.4 Conclusion As shown in Fig. 4.5 and Tab. 4.2, we conclude that when the preset fi is 0.90, the testing models with Method 1 have the better performance in most of the cases. While for testing models with Method 2 and Method 3, according to the results shown in Tab. 4.3 and Tab. 4.4, we believe that different way of collecting candidate regional thresholds can generate different regional threshold for the same image. And we also anticipate that different way of generating thresholds can cause different performance of the same face verification system. 4.3.2 Comparison of Different Training Sets Martinez et al. had demonstrated that the size of training data set sometimes affects the performance for different algorithms. For example, when the training data set is small, PCA can out perform LDA and, also the PCA is less sensitive to different training data set (Martinez and 39 Kak 2001). Therefore, when we compare the performance of different training sets, we take PCA as an example. The results are shown as following: Comparison of Different Trainihg Sets ORL,PCA.1x1 Figure 4. 6 Comparison of different training sets (ORL PCA 1X1) (NOTE: curve 1 x 1 denotes to the performance of the system with national voting approach, same as in the later figures) 40 •k^mftwisoft 'vtomm T«fe{| SsW "••OHLRCAHsihod t 1 Compirisbrs« Digest TraMta S«te ORl PCA ' 09 08 0.7 OS as U . 0,3 • 0,2 '"•"•21 81 OJ '0 0,-i 6.2 0.3 0.4 05 OS . 0,? B.8 3.9 1 Specificity Compsissfjh of DiSfo'sn! TnMt« S»w ORL PCA Me&0 3, f 3.9 nn 07 0,6 ; as 0.4 0.3 0.2 ......,..?r er o or 0.2 a.3; 8.4 ^ d,s a,6 a? o.a 0,0 i Figure 4. 7 Comparison of different training sets (ORL PCA 10X10) As shown in Tab 4. 1 and Fig. 4.6, when the size of training set increases, the size of training group decreases, which will directly affect the experiment results. Usually, when the size of training set increases, the performance of the system improves, i.e., under same conditions, the 41 AUC of database with 8 training examples is larger than the one with 5 training examples, and both of them are larger than the one with 2 training examples. In Fig. 4.7, the performances of the testing models improve as the number of training examples changed from 2 to 8 for Method 1 and Method 4. While for Method 2 and Method 3, we can hardly see any improvement of performance between 5 training examples and 8 training examples. It is caused by the recollection of training groups. In Tab. 4.1, when the number of training examples increases to 8 in database ORL, only one example will be left to be the training group for generating the thresholds, while the other example has to be reserved for testing purpose. This way, the range of threshold candidates is highly reduced. Therefore, the conclusion is that the size of Training Group for generating thresholds also affects the performance of the system. 4.3.3 Database ORL vs. Database Yale As we known, dataset ORL is a bigger database than Yale. And we believe that the size of database should have some impacts on the performance of our proposed models. Table 4. 5 Database ORL vs. Database Yale Number of Subject ORL Database Yale Database Image Total Examples Image per Examples Subject 400 40 10 165 15 11 42 Database 0 R L 2 T vs. Yale 2T : 1x1 / 'ORLPCA •Yale P C A / 0 • O R L S-LDA / 0.1 _L _L_ 0.2 .0.3 0.4 _L 0.5 Specificity Figure 4. 8 Database ORL vs. Yale (PCA & S-LDA 2T 1X1) •Yale S-LDA 0,6 0.7 0.9 0.9 1 Dsiaeass.ORtarcs. Ysa2t PW&S-UVUjs*»S I Oalsissf OKI 2T vs. eM i w »« SiiiijeclSftteiwti 11 -m OataSeS t SiifMithB) Set "ST OSS. PCA oat. t Specificity Set»iSiuff¥ 0.9000 0.9000 0.9000 n Qnnn 0.W0 YALE W * Si sf VAI.E I OA St " 'si 3.T 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.0525 0.9573 0.9740 0.9335 0.9963 0.9990 0.6330 0.7207 0.7307 0.8080 0.9380 0.9560 AUC 0.9430 0.9828 0.9898 0.9751 0.9975 0.9991 0.0373 0.0556 0.0374 0.9329 0.9791 0.9852 Specifieisy •st,iisi;h% .0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 0.9000 a.9i!oo 0.9000 0.9000 0.9000 0.9062 0.9900 0.9980 0.8977 0.9865 0.9945 0.8444 0.9246 0.9400 0.8677 0.9467 0.9560 AUC 0.9604 0.9944 0.9978 0.9522 0.9921 0.9961 0.9449 0.9755 0.9819 0.9552 0.9803 0.9870 one shrews •thl « r je«.(8Se