REGIONAL D ISPLA C EM EN T M ATCH ING SCHEM E FO R LBP B A SED FACE REC O G NITIO N by Ling Yan B.Sc., Shandong University, 2005 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN MATHEMATICAL, COMPUTER, AND PHYSICAL SCIENCES (COMPUTER SCIENCE) THE UNIVERSITY OF NORTHERN BRITISH COLUMBIA May, 2013 © Ling Yan, 2013 UMI Number: 1525699 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Di!ss0?t&iori P iiblist’Mlg UMI 1525699 Published by ProQuest LLC 2014. Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 A bstract In face recognition, alignment of the face images has been a known open is­ sue. This thesis proposes a displacement based local aligning scheme to construct a structural descriptive image template for comparison. To conquer the registration difficulties caused by the non-rigidity of human face images, a block displacement strategy is introduced to apply the regional voting scheme to face recognition field. Local Binary Pattern (LBP) is adopted to construct this block LBP displacementbased local matching approach, we name LBP-DLMA. Experiments are performed and have demonstrated the outstanding performances of this LBP-DLMA over the original LBP approach. It is expected and shown by experiments that this approach applies to both large and small sized images, and that it also applies to descriptor approaches other than LBP. Contents A bstract ii List o f Tables v List o f Figures vii Acknowledgem ent ix 1 Introduction 1 1.1 Overview....................................................................................................... 1 1.2 Research O b je ctiv e .................................................................................... 3 1.3 C o ntributions............................................................................................. 3 1.4 Thesis O b je c tiv e ....................................................................................... 4 2 The Face R ecognition Problem 2.1 6 Image C a p t u r e .......................................................................................... 8 2.1.1 Digital i m a g e ................................................................................. 8 2.1.2 Taking a p h o t o ..................................................................................12 2.2 Face Detection .............................................................................................. 15 2.3 Face N orm alization........................................................................................ 16 2.4 Face R ecognition........................................................................................... 17 iii 3 Literature Survey 3.1 3.2 4 18 The LBP Approach and Its V a r ia n t s ....................................................... 21 3.1.1 LBP approach.....................................................................................21 3.1.2 T P L B P .............................................................................................. 24 3.1.3 FPLBP .............................................................................................. 25 Regional V o tin g ............................................................................................ 26 Proposed Algorithm s 29 4.1 LBP Displacement C oncepts.......................................................................33 4.2 Similarity M etrics......................................................................................... 36 4.3 An LBP Displacement Template Matching Approach: LBP-DLMA................................................................................. 37 4.4 5 6 7 Another Version of LBP-DLMA:L B P -D T M A ..........................................38 Experim ents 47 5.1 F E R E T .........................................................................................................48 5.2 F R G C ............................................................................................................49 5.3 L F W ............................................................................................................... 51 Extensibility 56 6.1 Descriptors Other Than LBP ....................................................................56 6.2 Applications with Low Resolution Im ages................................................. 57 Conclusion and D iscussion 62 Bibliography 64 iv List of Tables 4.1 LBP Displacement-based Local Matching Approach - Off-Line . . . . 39 4.2 LBP Displacement-based Local Matching Approach - On-Line . . . . 40 4.3 LBP Displacement Template Matching A pproach-O n-L ine.................. 42 5.1 Parameters in our experim ents.................................................................... 48 5.2 The recognition rates of the original LBP and weighted LBP, the LBP-DTMA, and LBP-DLMA for the FERET probe sets, the mean recognition rates of the Fb+Fc+D upl, and results of permutation test with a 95% confidence level.................................................................... 50 5.3 The recognition rates of the LBP-DTMA, LBP-DLMA boosted by preprocessing schemes on the FERET probe sets, and a few known approaches........................................................................................................ 51 5.4 Recognition rates of LBP-DLMA approaches on FRGC Experiment 104 5.5 52 The accuracies of LBP-DLMA, LBP-DTMA and a few no-training approaches for L F W .................................................................................... 55 6.1 Parameters for TPLBP and FPLBP in our experim ents.........................57 v 6.2 The recognition rates of original TPLBP, FPLBP, and TPLBP DLMA and FPLBP DLMA without / with Preprocessing [48] for the FERET probe sets, the mean recognition rate of the Fb+Fc+D upl, and re­ sults of permutation test with a 95% confidence level................................ 58 6.3 Average Error Recognition Rates and Standard Deviations of LBP and LBP DLMA Algorithms, for Yale face set (32 x 32 pixels). 6.4 ... 60 Average Error Recognition Rates and Standard Deviations of LBP and LBP DLMA Algorithms, for ORL face set (32 x 32 pixels ). vi . . 61 List of Figures 2.1 Image processed for face recognition....................................................... 7 2.2 A face image 2.3 20 x 16 sized pixel matrix of the image in Figure2.2 2.4 Images taken with different angles and illumination conditions . . . . 2.5 Images taken at different t i m e .....................................................................14 2.6 Images deviations 3.1 A basic LBP operator 3.2 LBP d ic tio n a ry ............................................................................................. 23 3.3 A TPLBP operator 3.4 An FPLBP o p e r a t o r ....................................................................................26 3.5 The flag model for voting..............................................................................27 3.6 Regional voting in face recognition 4.1 LBP M a p ...................................................................................................... 43 4.2 A pile of LBP displacement blocks of the LBP map in Figure 4.1(a) . 44 4.3 The LBP displacement description of the face in Figure 2.2 and an ................................................................................................ 10 .............................. 11 13 ....................................................................................... 14 .................................................................................23 .................................................................................... 25 ........................................................... 28 amplified pile V z ,\ ...........................................................................................45 4.4 Best block similarity for every gallery image ina gallery set compared with a probe image V .................................................................................... 46 vii 4.5 Comparison results of local voting and te m p la te ...................................... 46 5.1 ROC curves over View 2 of L F W ............................................................... 54 viii Acknowledgement Great thanks to my supervisor Dr. Liang Chen, who always has great confidence in me, who insists to name my playing with the data “experiment” and is always ready to provide his unreserved support. He is not only my academic adviser, but also a mentor and real friend (whose wife feeds me with the fancy foods that I have never had in my own kitchen). Great thanks to my co-supervisor Dr. David Casperson, who generously squeezes me into his busy schedule all the time. I will benefit forever from his serious attitude in research. Dr. Casperson, merci beaucoup! Thanks to my thesis committee member Dr. Jueyi Sui for his encouragement and insightful comments. Thanks to my parents for their unconditional support for me. Thanks to my brother and husband who keep pushing me so I can finish this work in time. Many thanks to all the people who have shared my days during my studies and work. Special thanks to my grandma. I will miss her forever. ix Chapter 1 Introduction 1.1 Overview Face recognition, as a branch in the fields of computer vision, pattern recognition, biometric recognition and neuroscience, refers to verification or identification of a human being based on the visual features of a face. Face recognition has established its importance via its wide range of applications such as passport verification in customs, identity verification in bank systems, video surveillance in security systems and etc... In Canada, ICBC uses face recognition software[29] to help keeping drivers records in BC province; In Mexico the govern­ ment adopted F acelt® face recognition technology [20] to eliminate duplicate voter registrations in presidential elections[19, 56]; around the world, face recognition sys­ tems are applied in economic entities, entertainment, homes or small appliances for security/entertainment purposes [38, 54]. As face recognition is a practical yet popular research field, researchers have proposed a variety of approaches lending face recognition techniques a high level of maturity. However, we have to admit that the human perception system still remains 1 mysterious to science research. Consequently the state of the art face recognition algorithms mostly follow mathematical methods other than simulating the biological function of human brains. From the algorithms perspective, face recognition task is usually performed by the comparison of two face images to determine whether they belong to the same individual. Before comparison, a fundamental step for most algorithms is aligning the two face images. Alignment is known to be a key factor to the face recognition algorithm for its considerable influence on the recognition rate. It is, however, also a thorny issue for which researchers have never come up with a precise definition[53, 62, 52, 51]. Affected by a variety of factors such as facial expressions, facial makeup, pose angle, image quality etc., neither is it possible for a perfect pixel-to-pixel alignment between two images, nor is this perfect alignment ideal or necessary for research needs. Admitting this, we revise the definition of ideal alignment to that a good align­ ment does not focus on the best overlap of two images such that the rightmost corner of the mouth in one image is exactly the same position in a coordinate system as it is in the other image, but that the alignment best describes the features of the object to recognize: it should be tolerant of the deviation among images from the same person and tell the difference between images from different persons. The alignment task under this definition is now an alignment that approaches the ideal alignment as close as possible. An immediate benefit of a better alignment is a relatively accurate description of the offset between two images, and also a higher recognition rate of the face recogni­ tion system. The pursuing of a better alignment contributes to a high performance recognition algorithm and thus becomes an important motivation for research, in­ cluding this work. 2 1.2 Research O bjective The objectives of this research work are to: 1. Design an alignment scheme that finds a relatively better alignment of two face images; 2. It should be generally adoptable prior to many face recognition approaches to improve their performances; 3. Apply voting theory to the face recognition field and test the performance of hard combination and soft combination for face recognition; 4. Develop an executable framework that integrates our approach and the exist­ ing face recognition approaches to evaluate our approach; 5. Take further experiments to test its extensibility. 1.3 C ontributions This work presents an innovative displacement-based aligning scheme that has a high portability to various descriptors and significant improvements in their perfor­ mances. The greatest contribution of this alignment scheme is that it takes into consideration the regional deviations and dynamically simulates a relatively better alignment for a particular pair of face images. Regional voting theory is adapted to face recognition problems and has proved its strength for system stablility against image deviations/offsets/noises. The block LBP displacement-based local matching approach reports outstanding experimental performances in comparison with the original LBP approach. 3 Experiments demonstrate that our approach applies to not only large-sized im­ ages but also small-sized images, and that it also applies to descriptor approaches other than LBP. Part of the contents of this thesis has been published in [17, 10]. 1.4 Thesis O bjective The aim of this thesis is to provide a full view of our algorithm. The following objectives are realized to attain our goal: 1. Introduce the face recognition problem: investigate the face recognition system and discuss the factors that influence the performance of a face recognition system; 2. Review the state-of-the-art achievements in related fields, including those that inspire our approach; 3. Propose our approach; 4. Implement the research design; perform experiments and report experimental outcomes; 5. Refine the algorithm; 6. Test extensibility and report experimental outcomes. This thesis is an expansion of these objectives and is organized as following: Chapter 2 explores the face recognition problem and the processes of a face recognition system. Chapter 3 provides a literature survey of some popular face recognition ap­ proaches and voting scheme studies. In particular Chapter 3 gives a full description 4 for the LBP approach, which we choose as our representative descriptor to perform experiments; and a full description of the study on regional voting, which con­ tributes one of the most important inspirations of this work. Readers with related background can skip Chapters 2 and 3. Chapter 4 presents our proposed approach. Chapter 5 shows the experimental results and comparisons with some popular approaches for performance evaluation. Chapter 6 discusses the extensibility of our approach to descriptors other than LBP, followed by a conclusion in Chapter 7. 5 Chapter 2 The Face R ecognition Problem A face recognition task is to verify or identify a person by the facial features. Depending on the task objective, most face recognition problems fall into two categories: verification and identification. The former is a one-to-one problem: given a face and an identity, determine whether this face comply with the claimed identity while the latter is a one-to-many problem: given a face, the system needs claim its identity from known identities or claim that the identity is unknown. From a general point of view, specific tasks and applications have been exten­ sively studied in face recognition, such as facial expression recognition, gender recog­ nition, skin texture recognition etc.. The source images can be from 2D images, 3D models, videos, software developed pictures or other sources. Our study focuses on recognition of 2D images. The following discussion is within our study focus. A face recognition system is a system that performs the face recognition task. It is usually constructed of three components: a gallery set, a probe set, and the recognition component. A gallery set is a set of gallery images with recognized identities registered with the system. To the understanding of the system, the 6 gallery image(s)1 is the only knowledge and the standard description of its associated identity. A probe set is a set of probe images to be identified or verified. Sometimes a system does not store the probe set, instead, it intakes the probe image at a face recognition request. The recognition component, in a face verification task, takes in a probe image and its claimed identity, retrieves the gallery image(s) of the claimed identity, and compares this probe image with the gallery image (s) to make a positive or negative decision. In a face identification task, the recognition component takes in a probe image and compares it with every gallery image to determine the probe image’s identity or claim it does not recognize this probe image. A face recognition system that performs the above mentioned tasks usually fol­ lows 4 steps: Step 1: Image capture Step 2: Face detection Step 3: Image normalization Step 4: Face Recognition Figure 2.1 shows the processes to prepare an image for face recognition. (a) image capture (b) face detection (c) normalization Figure 2.1: Image processed for face recognition 1There may be more than one images for the same identity in a gallery set. 7 2.1 Im age Capture We assume that a face recognition system has its gallery set already stored in its memory, either pre-taken or exported from an existing database; the probe image, on the contrary, is usually taken at the recognition request. The image is then fed to the system to perform the recognition task. Recognition is to perform operations on the images. Before we go further into the next step, we need explore several characteristics of images that affect the performace of a face recognition system. 2.1.1 D ig ita l im age Images, as a manifestation of data, usually come in two forms: analog images and digital images. An analog image is continuous in tones with progressive changes, such as a photograph developed from the film or paint on canvas. A digital image is discrete, numerical representation stored as a matrix in a digital storage like a portable disk. An image in the computer storage system is always a digital image. A digital image can be taken by a digital camera, scanned from a photograph, projected from a 3D image, captured from a video or created by a graphical program. Analog images can be digitized by technical methods such like scanning. The two categories of digital images are vector images and raster images. The former are mostly created by a graphical program based on vectors/functions while the latter are usually based on the dots, the smallest component that constructs an image, which is called a pixel. In a face recognition system, the probe image usually is a raster image. A raster digital image is characterised by following features. An image can be one of the three color modes: binary, greyscale or color. In a particular color mode, a number of bits are used to represent the tones of each pixel. This number is called the bit depth or pixel depth. A bit depth of n yields 2n 8 tones. For example, a binary image needs one digit, 0 or 1, representing two colors, typically black and white; a greyscale image usually takes 2 to 8 bits. In a color image, 24-bit is called true color and 30-bit or higher is called deep color, both of which are sufficient to represent images as real for human eyes. The greater the bit depth is, the more tunes the image can represent. The binary value of the bits defines the pixel value. An image is represented as a matrix of pixel values. In a color image, pixel value is understood by the computer under certain color model. Most color models are either subtractive or additive mixing. Some famous color models are RGB, CMY, CMYK, HSV and HSL. A big concern about a digital image is the storage it requires, which closely relates to its image size: the number of bits it takes to represent the image. Image size is the product of bit depth and number of pixels. The number of pixels is represented by pixel dimension, which is the product of number of pixels per column and number of pixels per row. For example, an image containing m pixels per column and n pixels per row is of size m x n (pixels). Usually the bigger the image is, the more information it describes: high bit depth will give a richer tone scale and high pixel dimension will contain a greater scope or detailed texture. A digital image is compressed to reduce space cost. Depending on the compres­ sion methods, number of colors and etc., images can be of various file formats, like TIFF, PNG, GIF, JPG, RAW, BMP, PSD to name a few. Preference of file formats varies by task and usually file formats are mutually transformable. Figure 2.2 shows a 150 x 130 sized2 greyscale digital image. For illustration purposes, this image is resized to 20 x 16 and its pixel value matrix is shown in Figure 2.3. It is stored in a face recognition system as a •png file of bit depth 8 and the accompanying pixel values range between 0 and 255. 2150 x 130 means that this image has 150 pixels per column and 130 pixels per row. 9 A face recognition system adopts the raster digital image. The recognition com­ ponent in fact performs pair comparison(s) between two matrices of pixel values. Figure 2.2: A face image 10 122 79 40 37 37 34 31 29 29 26 24 22 21 22 24 25 27 28 29 29 38 34 36 35 31 28 30 44 37 44 46 44 37 25 17 19 22 23 25 29 41 39 37 33 30 35 56 78 89 103 114 116 113 100 67 26 19 21 21 24 38 39 37 33 38 74 85 82 119 129 129 125 115 119 122 91 34 21 21 22 40 36 32 51 75 95 81 75 120 131 125 114 110 110 111 116 100 68 64 54 36 34 58 92 106 95 95 89 122 121 107 102 113 91 81 112 121 109 91 97 35 61 95 116 117 100 99 101 109 117 113 108 101 98 96 93 108 121 99 95 53 102 128 133 122 112 112 109 117 136 138 108 90 90 112 93 91 115 106 92 76 124 135 134 121 108 108 115 121 141 144 108 86 95 110 97 92 117 107 91 92 125 134 137 121 98 99 98 111 123 120 113 108 103 87 96 116 124 99 89 109 129 135 144 118 98 97 86 118 124 112 105 117 87 87 115 125 112 85 86 119 93 130 113 131 121 145 134 108 117 115 129 83 107 77 99 119 132 129 131 128 131 117 123 106 114 96 121 111 118 118 108 112 71 85 51 80 53 85 51 59 66 83 104 113 121 126 121 134 136 140 129 120 118 99 54 20 22 23 23 Figure 2.3: 20 x 16 sized pixel matrix of the image in Figure 2.2 54 43 40 46 55 68 90 100 97 104 100 89 75 55 31 21 25 25 24 24 90 52 40 37 33 30 33 31 29 27 24 22 20 21 23 24 24 25 26 28 2.1.2 Taking a photo As we may easily claim that the more information the image contains, the higher recognition rate a system achieves. However, it is not true. We can simply learn this by thinking over how many times we took a high resolution picture that did not look like ourselves at all. Under this observation, one question arises immediately is: How can we take a picture that best describes our face? This may be answered differently from the aesthetic point of view or with concerns of face recognition rate. We here discuss several key influential factors that could be helpful to improve the performance of a face recognition system. Photographic equipment is the first choice we make in research studies because we need to provide the parameters of the equipment we use to collect the data. Then follows image size. It seems that big-sized images (or big-sized faces to be precise) should always be preferred since it offers more information than smaller ones. However, bigger image size also requires long processing time due to its large number of bits. In an image processing system, such as a face recognition system, the trade-off between the image size and processing time is taken as the trade-off between accuracy and efficiency. Some images, like the medical images, require highly detailed information while others might call for a faster processing time. Such a trade-off should take under consideration the emphasized system features and task requirements. Pose angle is a big concern for face recognition. The best angle of a picture taken for face recognition is the frontal image as it covers the whole region of the face. Pictures taken with an angle, horizontally, vertically or an arbitrary angle may cause absence of data while it is believed that full information of both sides of the face is helpful for the recognition decision as most of the human faces are not strictly symmetric. Also that pose angle causing different facial regions fall 12 into different focal length to the lens will frequently result in distortion of the face3 and illumination changes is a most frequent accompanying sideffect of post angle. Research shows that the recognition rate decreases as the pose angle increases, especially when the horizontal angle is greater than 30 degrees or the vertical angle is greater than 15 degrees [24]. Illumination, as studied in many research works, can greatly lower the recogni­ tion rate [41, 1]. That is to say, the change induced by illumination could be larger than the difference between individuals. Some face recognition approaches perform stably against illumination change, such as LDA. Some approaches apply strategies, such like histogram equalization, to reduce the illumination effects. We believe an illumination-oriented method should not be the final solution for a face recognition system, given that in the real world the image distortions vary and most likely are a result of a combination of many factors. (a) pose angle (b) illumination Figure 2.4: Images taken with different angles and illumination conditions A possible solution against pose angle and illumination is that by a fine control of shooting conditions, we may strictly restrict the influence to a small scale to take most comparable images of a person; however, this fails to deal with uncontrollable circumstance such like a video surveillance image taken under any illumination from 3Cosmetic guides suggest a 45-degree angle depression to give a look of skinner cheeks and bigger eyes to make a doll-like face. 13 any angle, or images that are from different sources where it is infeasible to unify everything, such as photos taken at airports around the country. There are also irresistible changes happen to human faces such as facial expres­ sions, aging, pimples and scars, makeups, apparels, cosmetic surgery, hair styles (hair growth), glasses, rings, color contacts and others accessories that people wear... A mature face recognition system should not refuse these changes as they are taken as part of the features of a human face. (a) 2005 (b) 2006 (c) 2010 Figure 2.5: Images taken at different time (a) facial expression (c) glasses (d) makeup Figure 2.6: Images deviations 14 (d) 2012 If an image is scanned from a photograph, machinery may cause image noise from an unclean surface or distortion from a warped or wrinkled original file. Prom above, the performance of a face recognition system relies heavily on the images. A face recognition system in practice may focus on specific factors for par­ ticular tasks while the study of the face recognition approaches should take into considerations all possible factors. A good recognition approach should minimize the intra-class difference and maximize the extra-class difference, that is, be stable against the biological feature distortion of human faces and image noises while re­ maining sensitive to the difference between different individuals. We further expect it be reasonably stable against a non-standard image with an angle, an unfavorable illumination or different image sizes. 2.2 Face D etection Face detection, as a pre-process for the face recognition task, is itself a research field in object-class detection. It aims at finding faces in an image taken under any condition where there can be none, one or more faces. Sometimes the faces are processed by rotating, scaling or other means if the face in the image is not in a preferred position4. In face recognition task, face detection is to find the location and size of the face and excludes background (non-face areas) from the image. Free face detection software include Facial landmark detector(Center of Machine Perception, Czech Technical University, Prague), face detection using support vector machine(SVM) (Omid Sakhi), FDLIB (W. Kienzle and etc.); companies like ACSYS, Betaface, Luxand offer commercial ones too[22]. 4One example is when a boy does handstand, his face is upside down. 15 2.3 Face N orm alization Face normalization prepares the images for comparison. It comes in two forms: geometric normalization and image condition normalization. Geometric normalization asks for a unified image size with fixed facial feature positions and scales (such as fixed centers of the eyes, distance between eyes5, the middle of the upper lip and other believed key facial features). This is achieved by clipping, resizing, scaling or rotation if necessary. Image condition normalization pre-processes images with unfavourable parameters such as lighting or contrast. This can be done by global filtering, local modification, histogram modification or with a lighting compensation mask. Advanced features of face normalization include facial expression normalization, facial orientation and many others by global or local modifications. Technical means are adopted to minimize the deviations caused by image conditions. As face detection and face normalization can be performed separately from face recognition, some recognition systems do not take this task into consideration but solely focuses on recognition. According to their different attitudes towards the two procedures, face recognition systems fall in two categories: the ones include face detection and normalization are called fully automated systems and those do not include the two procedures are called partially automated systems or semi­ automated systems. 5Figure 2.1(c) is normalized with distance between centers of eyes being 56 pixels and the centers of eyes lies on the 53th pixel of the same column. 16 2.4 Face R ecognition Normalized images are ready for comparison. The result of comparison is the an­ swer to the identification or verification task. In the literature, many approaches are developed for face recognition purposes with various and reasonable emphases, namely feature-based methods, appearance-based methods, descriptor-based meth­ ods, template-based methods, and neural network methods. For example, in a feature-based method, facial features are extracted from the normalized faces to get the nose, eyes and other believed to be important features, and then the feature vectors from two images are compared to derive a final conclusion. In a descriptorbased method, a descriptor is applied to get a description of the face, usually in the form of a vector. Then comparison takes place between vectors. A detailed discussion of the comparison methods will be presented in the Literature Review. 17 Chapter 3 Literature Survey Face recognition is easily seen as a bionics application as it is never difficult for a human being to recognize an acquaintance; however, it remains unknown how our brain performs such a task. Biologists and engineers keep exploring and have made many insightful yet interesting observations. Wilmer et al[55] found that human face recognition ability is specific and highly heritable by observing “correlation of scores between monozygotic twins (0.70) was more than double the dizygotic twin correlation (0.29)” and that “low correlations between face recognition scores and visual and verbal recognition scores indicate that both face recognition ability itself and its genetic basis are largely attributable to face-specific mechanisms” [55]. Similar observations have been made in many studies supporting that the brains have a specific section to perform the face recognition. A good evidence for this claim might be the face blindness disorder (prosopagnosia) [23], in the study of which the fusiform face area[31] is believe to be specialized for face recognition. A model built by Haxby et al further suggested that facial identity and expression might be processed by separate systems [25, 42]. Young et al[60] drew their conclusion that facial features were processed holisti- 18 cally from an experiment in which subjects found it more difficult to recognize the faces when the top half of one face is combined with the bottom half from another face [60]. Sadr et al made the interesting observations that recognition performance for faces without eyebrows was significantly worse than that for faces without eyes, which listed the eyebrow a key facial feature for face recognition that is no less important than the eyes [36]. Though extensive studies have been carried out to satisfy the curiosity for human face perception system, there is still a long way to go before we could translate the human perception manner into a computer-based algorithm and lead artificial face recognition in a biotic manner. Present research, other than simulating the biological functions of human brain, remains algorithmic in manner. Artificial face recognition algorithms base recognition on comparison of paired face images. Varied in how the images are processed, most traditional algorithms adopt one of the two approaches: holistic approaches and regional approaches. The former calculate the image as an integrated input while the latter breaks the image into regions. In recent years a new scheme arises by adopting the voting theory to face recognition, referred as the regional voting approaches. The research on face recognition initiates with holistic approaches in the late 1980s. Holistic approaches take an entire human face as a numeric matrix which is converted into a vector in multidimensional space by concatenating the rows of ma­ trix one after another. These face vectors are then projected into lower dimension spaces for similarity measurement. Different approaches vary in their methods of projection (standard projection, differential projection or kernel Eigenspace projec­ tion). Examples of holistic approaches are the Eigenspace-based approaches such as Principle Component Analysis (PCA) [43] [49], a later 2D-PCA[59], Fisher Linear Discriminant (FLD)[6 ], Evolutionary Pursuit (EP), Linear Discriminant Analysis 19 (LDA)[21], Independent Component Analysis (ICA) and etc. [2, 37, 50, 58, 5]. In the mid-1990s, research tends to focus on the different contributions of dif­ ferent regions from the face and thus led to a blossomming in the study of regional approaches. Regional approaches break the face into regions, aiming at preserving locality from which more discriminating face features would be used for compari­ son. Examples of regional approaches include subpattern PCA (SpPCA), Elastic Bunch Graph Matching (EBGM), Local Binary Pattern (LBP), Local Gabor Binary Pattern(LGBP), and Histogram Sequence (LGBPHS)[18, 3, 4, 47, 63]. For regional approaches, one thing worth mentioning is that when an approach extracts discriminative information locally from the face, shall it emphasize the biological features of the face, resulting the regions representing the facial features, or shall it emphasize the layout of the face features, resulting the regions representing portions of the face. Based on the assumption that some region might have more influence to identify a person, weight-scheme can be put to the regions to represent this property. Weights can be assigned based on the educated guesses such as that eyes and eyebrows are more discriminating than the cheek or forehead; or they can be empirical values that come out of the training process, if any. Regardless of whether focusing on features or spatial layout of the features, an accompanying concept that comes with many regional approaches is the descriptor, with which, a standard input face image is processed to a representation generated by this descriptor to better serve the calculation. A new category of approaches that arose lately is regional voting approaches, which is more a general scheme[ll, 12, 13, 14] that could apply to many research fields other than face recognition[15, 16, 9]. The main objective of introducing the regional voting scheme is to create a system that is more stable against noise[14]. The voting theory applies to face recognition in such a way that voting scheme 20 prevents the system from changing its decision based on the facial changes caused by aging, illumination changes or other irresistible influences. 3.1 T he LBP Approach and Its Variants 3.1.1 L B P approach Local Binary Pattern(LBP) is a regional descriptor-based approach originally pro­ posed by Ojala et al for texture description [45, 44] and later introduced to face recognition. LBP works as follows: Given a face image, an LBP operator applies to obtain its LBP map by thresh­ olding P sampling points on a circular neighbourhood of radius R centered at a pixel. Depending on the value of the center pixel and that of its neighborhood, a binary number 0 or 1 is assigned to its neighborhood representing whether the pixel value of this neighborhood is less than or greater than or equal to the center pixel value. Then the concatenation of the P binary values is taken as the label of this center pixel and all labels construct the LBP map of this image. The LBP map is then divided into windows. In each window, a histogram repre­ senting the distribution of the numerical labels for pixels in this region is generated to be the texture descriptor of this region and histograms from all windows are concatenated to form the LBP description of the whole face image. Figure 3.1 shows a basic LBP operator. Figure 3.1(a) is a 3 x 3 area from Figure 2.2. To calculate the LBP label for the pixel in the center, by thresholding 8 sampling points on a circle of radius 1 , we obtain an eight-bit string, which, if counted anticlockwise from the bottom right one, equals 63 in decimal as in Figure 3.1(c). 21 A uniform pattern in LBP is defined as an eight-bit string which contains at most two bitwise 0 / 1 transitions if examined circularly, i.e. it is a circular concatenation of a series of Os and a series of Is. An eight-bit string has 57 uniform patterns. The LBP label dictionary is a vector of 58 elements, containing 57 uniform patterns and 1 non-uniform element, as shown in Figure 3.2. Under this definition, the LBP label in Figure 3.1(c) will be labeled as in Figure 3.1(d). Ojala et a![44] observed in their experiments that uniform pattern in texture im­ ages counts for about 90% and 70% for using 8 sampling points on a neighbourhood of radius 1 and 16 sampling points on a neighbourhood of radius 2 respectively and proposed to classify only the LBP labels in the uniform patterns and classify all non-uniform ones into one category. In a more complicated form, LBP operator can have different radius with dif­ ferent sampling points that evenly distributed along the circular neighborhood. An LBP operator with P sampling points of radius R is denoted as L B P p^r . When a sampling point does not fall into the center of a pixel, bilinear interpolation is adopted to find the value of the sampling point. The LBP operator with consider­ ation of uniform patterns is denoted as LBPp2R. LBP has reported high performance by maintaining three levels of localities: the labels on a pixel level, the histogram representation on a regional level and the concatenated histograms on a global level. As we believe regional approaches should outperform many holistic approaches, and LBP is one of the reported best performing regional approaches, we come to the choice of applying our scheme to this LBP approach. 22 214 29 1 130 74 c P 1 1 8 1 1 0 n i 0 1 00111111—>63 binary to decimal 1 0 (b) After Thresholding (a) A 3*3 Neighborhood 63 is the 26-th value in the LBP Label Dictionary (d) The pixel LBP label (c) The pixel LBP value Figure 3.1: A basic LBP operator String Decimal LBP label 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 2 2 0 0 0 0 0 1 0 0 3 4 0 0 0 0 0 1 1 0 6 3 4 5 0 0 0 0 0 1 1 1 7 6 0 0 0 0 1 0 0 0 8 6 0 0 0 0 1 1 0 0 1 2 7 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 14 15 16 24 254 255 non uniform 8 9 1 0 11 56 57 58 Figure 3.2: LBP dictionary 23 3.1.2 TPLBP Three-Patch LBP (TPLBP) was introduced by Wolf et al[57] as a variant of LBP. It works as follows: A patch C is defined as a w x w region centering on a pixel c. TPLBP, as is named, involves three patches to calculate a bit code which later contributes to the TPLBP code for this pixel. For any pixel Cp, TPLBP first finds the patch Cp and S patches Ct(i 6 {1 , 2 , ■• • , 5}) distributed evenly along a circle of radius r from cp. A pair of patches are two patches th at are a patches apart along this circle. One bit code is generated by thresholding the difference of distances between Cp and each patch from the pair, and the TPLBP code for the center pixel is a concatenation of all bit codes. A TPLBP operator in a general form is denoted as TPLBPr!s;^ a as shown in Figure 3.3. Defining a thresholding function f{x ) in Equation 3.1, the TPLBP code for p is given in Equation 3.2. (3.1) 0, X ^ 2 , ( i + S / 2 + a ) m od s ) ) 2 ‘ ( 3 .3 ) i= 0 The global TPLBP description of the image is generated following the same process as in TPLBP. 25 ..a the 0th bit code for cp generated by this FPLBP r i > r 2 swa is: f(d(C 1 ,0 ,C'2 ia)-d(C'1 ,5 /2 ) )) Figure 3.4: An FPLBP operator 3.2 R egional Voting The stability of regional voting was proved by Chen and Tokuda in 2003[12], and later introduced to the studies of face recognition. To learn this voting scheme, we will first have a glance at the voting problem in general. A voting problem is to ask a population of M. to select one winner out of N candidates. This selection can perform in two manners: direct popular voting and regional voting. In direct popular voting, the M. voters each draws a vote to one of the H candidates and the candidate who gets the most votes wins. In regional voting (also called local voting or Electoral college), the M electors are first grouped into X regions, and a direct popular voting for the M candidates takes place within one region to generate a local winner on a winner-take-all basis. Later the regions, each performing as a single voter, vote for the final decision and the candidate who gets the most votes from the X regions wins. Noise is introduced to study the stability of voting schemes. A noise refers to 26 a sudden change of the decision from one vote and the stability is watched against noise. A system is said to be stable if the final result remains against noise. A simplest form of the voting model developed by Chen and Tokuda is illustrated in Figure 3.5. A binary flag of size 6 x 6 is used to represent the 36 voters in a two candidate voting; a white pixel means this voter votes for candidate A and a black pixel means this voter votes for candidate B. Figure 3.5(a) and Figure 3.5(c) show a block of noise turns over the original decision in a direct popular voting and Figure 3.5(b) and Figure 3.5(d) demonstrate a regional voting retains the original decision confronted with the same noise. to p le f t re g io n : w h ite V s b la c k : 5:4 w h ite - d o m in a te d re g io n UHiD (a) before noise popular voting 25:11 white-dominated flag (b) before noise regional voting 4:0 white-dominated flag to p le f t re g io n : w h ite V s b lack : 1:8 b la c k - d o m in a te d re g io n (c) after noise popular voting 16:20 black-dominated flag (d) after noise regional voting 3:1 white-dominated flag Figure 3.5: The flag model for voting 27 (a) Face image A of size 22 x 18 into 6 x 3 regions (b) Face image I Figure 3.6: Regional voting in face recognition Chapter 4 Proposed Algorithm s A deep look into the literature gives us the understanding that holistic approaches and regional approaches both take the whole face image as the input and each pixel, regardless of whether it is represented by its pixel value or some other value generated by a descriptor, contributes evenly likely to the final decision (even in a weighted scheme, the pixels in same-weighted regions contributes the same). We believe, as supported by [12, 14], such approaches lack tolerance for what is called noise (the sudden change of the values of some pixels which makes these pixels no longer corresponding to its original objective), neither do they enhance the ability to tolerant the biological deviations of face features which might happen only in some random regions of the face area and expose the disadvantage of fixed-weight scheme. In fact, a pre-set alignment always draws defects in some cases. Regional approaches work better, but still both holistic and regional approaches fail to deal with this issue. These weaknesses of pre-set alignment schemes and fixed weight schemes lead us to the conception of a scheme that could dynamically locate a best alignment by simulating all possible alignments corresponding to all possible deviations of facial 29 regions to conquer the alignment issue. Our scheme is established on the following observations. First, admitting the existence of deviations means that a deviation is consistent with the real-world face features and different regions may (usually they do) have different deviations: sometime the forehead in the probe image has a positive de­ viation from the forehead in the gallery image while the mouth in the same probe image has a negative deviation from that in the gallery image. Thus the common regional methods of assuming that all regions have the same deviation may seem lose precision. Simply cutting an image into regions and combining them back into a face gains no advantage besides locality of facial features. We believe different devi­ ations should exist for every pair of corresponding regions from the galley image and the probe image. Deviation between each pair should be found dynamically rather than assuming they are aligned to their original location in the image by default. This dynamic aligning process is much like shifting one image/region around the other to find a better position to align the two. We therefore propose a framework that simulates every possible deviation in units of a pair of corresponding regions to conquer this problem. Second, even a great change within a small region (corresponding to concentrated noise[1 2 ]) should not overturn the final recognition decision under circumstance where most of the other regions remain the same. It also means that even when there is a light change spread over every region (corresponding to salt-and-pepper noise[1 2 ]), as long as the regions retain this person’s identity, the final recognition decision should remain unchanged. For example, one might have temporary blood scabs on the chin and forehead from a car accident, which results in the similarity related to these regions to be extremely low, even further denies the final decision; while we intuitively perceive that the scabs should only be reflected on the decision 30 of their region(s), leaving decisions of other regions unaffected. However, even in regional methods, changes caused by facial features deviation or image noise within one region will cause a change in the description of the whole face and thus result in a different similarity between two images. We believe we could gain system ro­ bustness by constraining regional deviation and regional noise within its own region by applying voting theory to our scheme. Regional voting scheme is the right solution that meets our goal: system robust­ ness against noise. A face image fits in the regional voting scheme easily if each pixel is taken as a voter and the identities are taken as candidates. A face verification task is a one candidate voting where pixels vote for “positive” or “negative” to this candidate and the final decision is positive if the majority of pixels vote for positive or otherwise; A face identification task is a multi-candidate voting where each reg­ istered identity is a candidate and the pixel votes for one candidate and the final decision is the identity that gains the most votes. In two images, a noise happens if the pixel value in one image does not equal to its corresponding pixel in the other. In the theory part, noise corresponds to deviations or facial features changes. As an example, Figure 3.6 on page 28 shows two images of the same person and Figure 3.6(b) shows a noise contaminated image caused by smiling. The regional matching approach is adapted to face recognition in such a manner that a face image is divided into blocks (representing one region in the face) and decision from each block is made on statistics within this block which later votes for the final recognition decision. We then can take benefit from the voting scheme to construct an algorithm that is more stable against the deviation of face regions. Another immediate benefit is its stableness against noise. There are many reasons causing noise in images, such as regional shading from not preferred photographing or an unclean scanner surface when digitizing a filmed photo. As [14] suggests, re­ 31 gional voting gains robustness against both concentrated noise and salt-and-pepper noise. As our intention originates from building an aligning scheme that may fit more than one comparison methods, the literature survey leads our attention to the de­ scriptor based approaches, then the LBP descriptor to be particular [64, 30, 3], LBP descriptor outperforms many state-of-the-art face recognition approaches by its high descriptive localities[3, 4], and a comparison between the original LBP and the regional matching LBP should be capable of exposing the advantage of regional matching schemes over other descriptors. Also, given that LBP has reported high recognition rate [3], it should be interesting to see whether we can go further and how far we can go in artificial face recognition. Integrating the above proposed ideas, we come to the conception of construct­ ing a template framework that could dynamically generate all possible alignments from which we locate the best alignment based on displacement in units of image regions. LBP is used as the descriptor and regional voting is adopted to construct a displacement-based local matching approach, we name LBP-DLMA. We expect a high portability of this template to apply to any descriptor based matching ap­ proach. Further more, for a comprehensive framework, various descriptors can apply to the regions and thus a higher recognition rate is expected by taking advantage of the different descriptors. LBP-DLMA works as follows: Given a face image from the database, we first generate its LBP map. Believing that deviations vary among pairs of corresponding local regions from two images, we partition this LBP map into blocks. By assigning deviation values to blocks enumerately and respectively, we can generate a set of candidate face alignments, which together constructs the template description of the face. A best alignment is 32 located as one whose similarity is highest among all. Having located the best aligned face, every block in this best aligned face takes an internal election on a winner-take-all basis to generate the local decision of this block, which contributes one voter to determine the final decision. 4.1 LBP D isplacem ent C oncepts Given a face image, we obtain its LBP map of size (rn + 2s) x (n+ 2s) using the LBP descriptor1 in [3]. By removing h, i, j, k pixels (h, i, j, k > 0, h 4 - i = 2s, j + k = 2s) from top, bottom, leftmost and rightmost margins respectively, we obtain (2 s 4- 1 ) 2 slightly smaller LBP maps of size m x n. Each m x n sized map is called a layer and is denoted by 7,(1 < i < (2s 4 -1)2). Then we partition each layer into K x L blocks (K blocks per column, L blocks per row). A block in the r-th row, c-th column of the I-th layer is denoted by Brc i. The set of corresponding blocks from all layers are called a pile of LBP displacement blocks, or an LBP displacement pile. The pile of blocks in the r-th row, c-th column is denoted by PrtC = {B r^ i\l < I < (2s 4 -1)2)}. The set of all LBP displacement piles for a face image generates the template, or the LBP displacement description of the face image, denoted by T = {PriCJ l < r < K, I < c < L,}. Template for a gallery image is called a gallery template and template for a probe image is called a probe template. A candidate face description, or candidate face for simplicity, is a recombination of the blocks, one from each pile, in the template. The template has (2s 4-1) 2KJ. candidate faces2, representing all possible deviations of individual block. Let a test 1A11 following mentioned images are LBP maps of the images and to be simple, we use the term image referring to the LBP map of the image. 2Each block is selected from (2s 4 l ) 2 blocks of its own pile and a candidate face contains K x L blocks. Thus the number of candidate faces is ((2s 4 l) 2)'K"xL, equally (2s 4 l ) 2KL. 33 pair be two candidate faces from a gallery template and a probe template respec­ tively. A best matched pair can be located by exhaustively testing the similarities of the test pairs and choosing the pair with the highest similarity. Such a template retains the three levels of localities that the original LBP oper­ ator has: LBP label on the pixel level, histogram on the regional level and concate­ nated histograms on the global level. It also represents three levels of deviations: fixed deviation on a window level, dynamic deviation on a regional level and multi­ deviation on a global level. It gains tolerance on the deviations of images from the same person from the three levels of deviations. As an illustration of this template framework, assuming that Figure 4.1(a) is a 18 x 14 sized LBP map of the face image in Figure 2.2. Let s = 1, by removing 2 pixels from bottom, one pixel each from leftmost and rightmost margins, we obtain a 16 x 12 sized layer which we partition into 12 blocks, each of size 4 x 4, as shown in Figure 4.1(b). With different margins taken off, there are a total of 9 such 16 x 12 sized layers, each of which can be partitioned into 12 blocks. Figures 4.2(a) - 4.2(i) show an LBP displacement block pile consisting blocks corresponding to the shaded block in Figure 4.1(b) from all layers. Note that the second block(Figure 4.2(b)) in the pile is the shaded block in Figure 4.1(b). We have 12 such LBP displacement piles, as shown in Figure 4.3. The union of the Z-th (Z = 1, 2, • • • ,9) blocks from all piles is the Ith layer, a 16 x 12 sized LBP map obtained by removing i, 2 —i, j and 2 —j pixels from top, bottom, leftmost and rightmost margins respectively, where 0 < i , j < 2 . The set of all these 12 LBP displacement piles is the LBP displacement description template of this face image. A drawback of such a “simulate by enumerate” strategy is the time cost. How­ ever, we make following observations to reduce the time complexity while retaining the descriptiveness of the template. 34 The first observation is that duplicate test pairs3 exist in cases where the margins cut from gallery image and the probe image are the same. To reduce redundant comparisons we restrict that the margin parameters h, i, j, k in the probe template all equal s, yielding a probe template with only one block in each pile and only one layer in this template. This restriction will reduce comparisons4 of test pairs from (2s + 1)4* L to (2s + 1)2KL. Assuming the time cost for computing the similarity between the standard descriptions (without using template structures) of two face images is O(T), the total time complexity computing the similarity between two description templates will be O((2s + 1)2 KLT) if we compare all pairs of candidate faces. Under this restriction, the search for the best alignment would be the search for a candidate face in the gallery template which is best aligned with the probe template which contains only one face. We can define a best matched face being the gallery face from the best matched pair. The second observation is that a best matched face retains its “best match” property over all regions5. That is, the candidate face from the gallery template which contributes the best match with the probe template should be a combina­ tion of blocks that are locally best matches of their own piles respectively. This observation suggests that not all candidate faces need be tested and only the locally best aligned regions should be under our consideration. By a divide-and-conquer 3Two images with the same margins still vary given different values of the margins, however the offset is too small to affect the final result thus we can take them as “duplicate test pair”. 4A comparison associates with a test pair. A test pair is selected by choosing one candidate face from the gallery template out of (2s + 1)2KL choices, and one candidate face from the probe template out of the same choices, yielding (2s -I- l ) 2KL x (2s + l ) 2KL choices of test pairs , equally (2s + l ) 4KL. By restricting the probe template containing one candidate face, the choices of test pairs is reduced to (2s + \ ) 2KL x 1, equally (2s + 1)2KL. 5Proof: Assume the best matched face G contains one block B \ whose similarity with the corresponding block from the probe template is less than another block B 2 from its own pile. Replacing B \ by J52j we then have a face Q' whose similarity with the probe template is higher than G, which is contradictory to that Q is the best matched face. 35 strategy, we can further reduce the time complexity to 6 0 (K L (2 s -f 1)2 T). 4.2 Sim ilarity M etrics Assuming the global LBP-based representations of two face images are Q — {Gj. G2 , " '} and V = {Pi, 7 *2 , *• *}> the typical metrics for calculating the similarities between two global LBP descriptions are [3]: Euclidean Distance7, Histogram Intersection, Log-likelihood statistic and y square statistic: (4.1) Euclidean Distance: i Histogram Intersection: H(G,V) = £ m i n ( f t , P i) j (4.2) Log-likelihood Statistic: L{G,V) = Y , Gilo&Vi (4.3) (4.4) y square Statistic: Such metrics can also be used to calculate the similarity between the block level LBP descriptions of two blocks. As [3] suggests that the log-likelihood measure is not appealing for face recognition, we shall not use it as a similarity measure in this work. Note that in each block, there are one or more windows; the block level LBP description for a block is the concatenation of window level LBP statistics. 6A best aligned block is found within its own pile out of (2s + l ) 2 blocks and there are K x L best aligned blocks to find to construct a best aligned face, so the total number of comparisons is (2s + l ) 2 x K x L, equally K L ( 2 s + l ) 2. 7We will use a squared version of Euclidean Distance for simplicity in calculation. 36 4.3 A n LBP D isplacem ent Tem plate M atching Approach: L B P-D L M A Given a gallery set Q — {Q1, Q2, ■■• } of size T and a probe image V, we base regional voting approach on the following vote definitions: Let QPr,c denote the block pile in the r-th row, c-th column in the gallery tem­ plate, QBr^i denote the block in the I-th layer from this pile, and let V B rfi denote the block in the r-th row, c-th column in the probe template8. V B r c also denotes the block pile that contains V B r c. Assuming the similarity between two blocks is defined as Sim(QBr,c,V B riC), with a higher value representing a higher similarity. We then define the similarity between two block piles as in Equation 4.5: S im (G P r,c,P B rc) = max (Sim{^Br.)C,/,'PjBr,c}) lJ5r,c) = THR(Sim{0PriC,PJ9riC}) (4.6) In a face identification task, one probe template V is compared with every gallery template Ql in the gallery set. Block V B r c is believed to share the identity of QP*C which has the greatest similarity among all QP^C where t G {1,2, - *- , T}. The 8There is only one block each pile in the probe template, so no need for the layer subscript. 37 identity can be retieved by the parameter t following Equation 4.7. vot e(VBrc) = arg max (Sim{QP^c, V B rc}) (4.7) From the perspective of algorithm design, to match a probe V against a gallery set of T images, LBP-DLMA involves two stages: an offline process that prepares the gallery templates, one template for each image, and an online process that prepares the probe template and performs the comparison, as shown in Table 4.1 and Table4.2. 4.4 A nother Version o f LBP-D LM A: L B P -D T M A The original motivation of adopting regional voting scheme is to conquer the regis­ tration difficulty caused by the non-rigidity of facial features. Regional voting works on the hard combination (local decisionsions within each block generate the final decision by majority voting) of the blocks. It outperforms the soft combination (sim­ ilarity values obtained in all blocks are accumluted to generate the final simlairty value) in general[14]. However, to a specific application, such like face recognition, we still see some ground for adopting soft combination. An example is shown in Figure 4.4 and Figure 4.5. Figure 4.4 shows the best block similarities of each gallery face Ql. By applying LBP-DLMA, each block in V gets a vote for the identity whose block similarity is highest among all. The voting result is shown is Figure 4.5(a) with the final identity decision goes to the identity of Ql who gains 7 votes out of 12. However, if we apply the soft combination, summing up all block similarities of Ql to obtain their global similarities respectively to find the best matched Q09* the result would be overturned. Figure 4.5(b) shows that final identity decision goes to the identity of Q2 because it has the highest similarity 38 Table 4.1: LBP Displacement-based Local Matching Approach - Off-Line P e ra m e te rs C hosen: Number of Piles in Each Image K x L (K piles per column, L piles per row), Shifting Value s, Number of Windows per Block wc x wt (wc piles per column, wt piles per row). A. Off-Line G allery Im age L B P D isplacem ent D escrip tio n C o n stru c tio n : R equire: a gallery of face images; the size of the gallery is T. For each image Q 1. Obtain the pixel label map by calculating the LBP pattern of each pixel (Note: The label map is slightly smaller than the original gallery image since the pixels on the boundaries may not have a label.) Assume the smaller size is (m + 2 s) x (n + 2 s). 2. For 1 = 0 to 2s 2.1. For j = 0 to 2s 2.1.1. Remove i, 2s —i, j and 2s —j pixels from the leftmost, rightmost, topmost and bottommost boundaries of the label map to obtain a layer. (Note: as a total, there are (2s -I-1) 2 layers.) 2.1.2. Partition this layer into K x L blocks; partition each block into wcx w l windows, where we obtain the LBP label statistics (histogram of pixel labels); then concatenate the LBP label statistics of all windows in each block into a block level LBP description. 3. Obtain the LBP displacement description of the gallery image by piling up the corresponding block level LBP descriptions into each pile. 39 Table 4.2: LBP Displacement-based Local Matching Approach - On-Line B . On-Line Face Recognition: Require: V is an (m + 2s) x (n -1- 2s) sized probe image. B-l Obtain the LBP displacement description for V as follows: 1. Obtain the pixel label map by calculating the LBP pattern of each pixel. 2. Remove s pixels from all four sides of the label map. 3. Partition the label map into K x L blocks. 4. Partition each block into wc x wt windows, where we obtain window level LBP statistics, then concatenate the window level LBP statistics into a block level LBP description; each block LBP description constructs an LBP displacement pile; the set of all LBP piles is the LBP displacement description. B-2 Do classification as follows: 1. Set vote counters V t = 0 for all t E { 1 ,2, • • • , T}. 2. For r = l to K 2.1. For c = l to L 2.1.1. For QPlc, where t 6 {1, 2, . . . , T} 2.1.1.1 Calculate S i m ( Q P*c, V Prc), according to Equation 4.5. 2.1.2. Find image index I = argm axte{li2 ,-.,r}Sim(^Pr4 c, P P TtC). 2.1.3. Increase Vj by 1. 3. Classify the image as the identity of image g3 in the gallery set, where J = arg max V , . 40 with V as a whole. It would be an endless discussion that in this example, which face, Q1 or Q2, look most like V. Regardless of the answer itself, we believe such a discussion has its theoretical contributions. This discussion gives us the insight to generate soft combination and thus come to the second version of our algorithm, the direct template matching approach, we name LBP-DTMA. The main idea of LBP-DTMA is finding the best aligned face for every gallery image Q1 and calculate the global similarities of all best aligned faces. The one th at has highest similarity claims the identity of V. LBP-DTMA works as follows9: Given a gallery set of size T and a probe image V, we obtain the template descriptions for each Qt and V as in LBP-DLMA. Based on previous observations, the best aligned face can be found locally fol­ lowing Equation 4.8. For each Qt, find its best aligned candidate face that satis­ fies Equation 4.5 and calculate its similarity to V. This similarity is denoted by Sim(Qt,V ) and taken as the similarity between Ql and V. V shares identity of the one that has the highest similarity among all Ql , as in Equation 4.9. Sim(Q,V) = max 'S~'Sim(^jBrci,V B rc) lc in the probe template 2.1.1 . 1 For each QBl c U in the pile QPlrc 2.1.1.1.1 Calculate Sim(5B{Aj,P B r,c) according to Equation ( 1 ), (2), (3) or (4) (as the similarity formula chosen) 2.2 Let S 1 — max(Sim(g-B* e V B rc). r,c 2.3 Classify the image as the identity of image Q1 in gallery, where I = arg max S t . t e i,2, - , r 42 57 58 20 58 57 57 31 39 38 31 32 32 33 57 53 58 25 25 58 6 34 57 44 38 37 38 38 38 31 32 32 32 33 57 53 52 12 58 41 44 37 2 58 37 58 5 4 58 22 0 6 33 40 50 58 57 38 38 3 35 57 43 0 13 19 58 26 58 0 32 32 32 57 38 37 3 58 32 45 23 19 35 57 0 35 57 58 0 6 39 37 38 38 3 33 39 44 48 58 33 58 14 58 58 50 51 0 10 37 37 1 5 58 58 45 43 37 3 15 56 55 0 35 57 58 10 37 29 58 15 27 26 23 42 22 0 15 57 44 11 58 46 1 9 43 48 1 14 57 25 54 52 58 14 8 6 5 57 52 37 0 14 43 42 1 14 28 14 54 43 30 34 57 0 57 57 37 0 13 27 23 58 0 41 3 35 57 43 30 6 6 34 40 38 16 13 19 26 25 25 11 58 0 33 45 30 39 3 10 40 0 4 13 19 26 18 25 25 25 25 58 0 58 22 22 0 8 8 58 13 19 57 53 53 (a) Original LBP Map of Size 18 x 14 6 34 57 44 38 37 38 38 38 31 32 32 32 33 57 53 58 41 44 37 2 58 37 58 5 4 58 22 0 6 33 40 57 38 38 3 35 57 43 0 13 19 58 26 58 0 32 32 38 l 3 8 37 38 3 3 58 32 39 44 45 23 48 19 58 35 33 58 57 14 0 58 35 57 58 50 58 0 51 0 6 * 58 45 43 37 3 15 56 55 0 35 57 58 38 5* 48 w 26 23 42 22 0 15 57 44 11 58 46 1 25 54 52 58 14 8 6 5 57 52 37 0 *1 42 1 14 28 14 54 43 30 34 57 0 57 57 37 0 13 58 0 41 3 35 57 43 30 6 6 34 40 38 16 13 19 (b) 16 x 12 Sized LBP Map From (a) Figure 4.1: LBP Map 43 25 11 58 0 33 45 30 39 3 10 40 0 4 13 19 26 25 25 25 58 0 58 22 22 0 8 8 58 13 19 57 53 21 28 25 25 25 25 18 26 18 19 19 19 19 19 57 58 58 39 57 38 37 3 37 38 38 3 37 37 1 5 37 29 58 15 37 38 38 3 (a) Block 1 38 37 3 58 38 38 3 33 37 1 5 58 38 3 33 39 1 5 58 58 (g) Block 7 43 48 1 14 37 37 1 5 29 58 15 27 38 38 3 33 37 1 5 58 29 48 58 1 15 14 27 57 38 3 33 39 1 5 58 58 58 15 27 26 (h) Block 8 43 42 1 14 37 1 5 58 29 48 42 1 58 1 15 14 14 27 57 28 (f) Block 6 1 5 58 58 58 1 1 15 14 14 27 57 28 26 25 14 (e) Block 5 58 15 27 26 37 43 29 48 58 1 15 14 (c) Block 3 (b) Block 2 (d) Block 4 37 3 58 32 37 37 37 29 1 58 5 15 1 14 57 25 (i) Block 9 Figure 4.2: A pile of LBP displacement blocks of the LBP map in Figure 4.1(a) 44 41 4-1 37 2 58 37 58 5 4 58 oi 0 38 38 3 35 57 43 0 13 19 58 26 58 37 3 58 32 45 23 19 35 57 0 35 57 38 3 33 39 44 48 ,58 33 58 14 58 58 B(3,l,9) 6 0 33 32 40 32 50 32 58 0 6 39 5 58 58 45 43 37 3 15 56 55 0 35 57 58 10 58 15 27 26 23 42 22 14 ;>< 25 54 5*2 58 14 15 8 57 6 44 5 11 57 58 52 46 37 0 14 9 la y er 9 ft 50 51 0 10 1 14 28 14 54 43 30 34 57 0 57 57 37 0 13 27 0 41 3 35 57 43 30 6 6 34 40 38 16 13 19 26 12 58 41 44 37 2 58 37 58 5 4 58 25 25 58 0 58 22 22 0 8 8 58 13 19 57 53 53 11 58 0 33 45 30 39 3 10 40 0 4 13 19 26 18 58 57 38 38 3 35 57 43 0 13 19 58 57 38 37 3 58 32 45 23 19 35 57 0 25 25 25 25 18 26 18 19 19 19 19 19 57 58 58 39 37 38 38 3 33 39 44 48 58 33 58 14 37 29 58 15 27 26 23 42 22 0 15 57 44 11 58 46 43 48 1 14 57 25 54 52 58 14 8 6 5 57 52 37 layer 3 43 42 1 14 28 14 23 58 0 41 3 35 57 12 58 41 44 37 o 25 25 11 58 0 33 45 58 57 38 38 3 35 57 43 0 13 19 58 25 21 25 28 25 25 25 25 58 25 0 25 R» 18 57 37 58 6 38 38 34 37 38 57 3 3 44 58 33 38 32 39 37 58 45 44 38 37 23 48 m 58 19 58 38 5 35 33 57 58 31 4 32 58 0 14 58 58 d (3,1,2) 50 51 37 37 5 58 58 45 43 37 3 15 56 55 n 35 57 37 29 58 15 27 26 23 42 22 0 15 57 44 11 58 46 layer 2 22 0 6 33 26 58 0 32 35 58 57 58 58 -50. 35 26 0 32 32 0 t o C 7 0 33 6 33 57 33 32 57 B(3,l,2) 53 43 48 1 14 57 25 54 57 58 20 58 57 57 31 39 38 31 32 32 43 42 14 28 14 54 58 6 34 57 44 38 37 38 38 38 31 32 23 58 0 41 3 35 57 12 58 41 44 37 ,58 37 58 5 4 58 25 25 ii 58 0 33 45 58 57 38 38 3 35 57 43 0 13 19 58 25 25 25 25 58 0 58 57 38 37 3 58 32 45 23 19 35 57 0 35 57 58 0 37 38 38 3 33 39 44 48 58 33 58 14 58 58 50 51 37 3il 5 58 58 45 13 37 3 15 56 55 0 35 57 37 29 58 15 27 % 23 42 22 0 15 57 44 11 58 45 43 48 43 42 14 57 25 54 52 58 14 8 6 5 57 52 37 14 28 14 54 43 30 34 57 0 57 57 37 0 layer 1 32 22 26 32 0 58 33 6 0 58 57 33 32 B(3,l,l) Figure 4.3: The LBP displacement description of the face in Figure 2.2 and an amplified pile 45 23 58 0 41 3 3b 57 43 30 6 6 34 40 38 16 13 25 25 U 58 0 33 45 30 39 3 10 40 0 4 13 19 0.8 0.8 0.8 0.7 0.7 0.7 0.1 0.1 0.1 0.8 0.8 0.8 0.7 0.7 0.7 0.1 0.1 0.1 0.8 0.2 0.2 0.7 0.7 0.7 0.1 0.1 0.1 0.2 0.2 0.2 0.7 0.7 0.7 0.1 0.1 0.1 (a) Best Block Similarity for Q1 (b) Best Block Similarity for Q2 (c) Best Block Similarity for 1,2 Figure 4.4: Best block similarity for every gallery image in a gallery set compared with a probe image V S im (Q \P ) = 7.6 Qx Sim{G2,V ) = 7.9 Gl S im {G \V ) = 1.2 (a) Block Results for V by Hard Combination: I D( V ) = I D{ Ql ) (b) Block Results for V by Soft Combination: I D{ V) = I D{ Q2) Figure 4.5: Comparison results of local voting and template 46 Chapter 5 Experim ents We carry out experiments on FERET [34], “Faces in the Wild” (LFW) [28] and FRGC [35]. The usage of large, well developed databases avoids the bias from the images1[34] and experimental results following the restrictions of the datasets are compared on the same platform to provide a more convincing evaluation for the algorithms. LBP descriptor involves a few parameters. In all our experiments, as suggested in [3], we set the parameters as in Table 5.1. To further reduce the number of LBP displacement blocks in each pile, we restrict the relative offset by restricting |3 — i\ + |3 —j\ < 4. It is understandable that, in a practical system we may further improve the accuracies if we adjust these parameters on a “trial and error” basis though we do not include such a strategy in this work as we believe it is not necessary for research purposes2. *As Phillips et al mentioned in [34]: “Before the database FERET, a large number of papers reported outstanding recognition results usually > 95 percent correct recognition on limited-size database usually < 50 individuals. ” 2 “If you torture the data long enough, it will confess.” —Ronald Coase. 47 Table 5.1: Parameters in our experiments LBP Operator: LB P ,^ radius of circle number of sampling points apply uniform pattern LBP-DLM A: number of blocks per LBP map number of windows per block margin cut from the LBP map margins on top,bottom,left,right other restrictions 5.1 R=2 P=8 yes 5 x 5 (K=5,L=5) 7x7 s=3 h,6 - h ,j, &—j > 0; |3 —/i| + |3 —j\ < 4 FER ET FERET database [34] is assembled to test and evaluate face recognition algorithm under standard tests and procedures. FERET consists of 14051 gray-scale images from 1199 individuals. The images vary in lighting conditions, facial expressions, pose azimuths, etc.. Subsets are presented with different task concerns. We carry out experiments on FERET. Following the work in [3], five sets of FERET are used: the Fa gallery set that contains images of 1196 subjects, one image for each subject; the Fb probe set that contains 1195 face images of 1195 subjects as in Fa but with alternative facial expressions; the Fc probe set that contains 194 face images taken under different illumination conditions on the same day as their respective Fa matches; the Dupl probe set that contains 722 face images taken anywhere between one minute and 1031 days after the corresponding images in Fa were taken; the Dup2 probe set being a subset of dupl that contains 234 face images taken at least 18 months after the corresponding Fa images were taken. These five sets are designed for the study of algorithm performance against facial expressions(Fa, Fb), illuminations(Fa, Fc) and aging(Dupl, Dup2). 48 All faces are first normalized into standard size 150 x 130 (150 pixels per column, 130 pixels per row), where the distance between the centers of the two eyes is 56 pixels and the segment connecting centers of two eyes lies on the 53rd pixel below the top boundary. The standard 150 x 130 elliptical mask from FERET data collection is used to exclude non-face areas from the LBP maps, and a few pixels are removed from each side of the mask since the LBP map of an image is always smaller than the original image. Following [3], permutation test with 95% confidence level is also carried out using the image list, list640.srt, in the CSU face identification evaluation system package [7]. Iist640.srt contains 4 images each for 160 subjects. 10000 permutations are tested, with each containing one image per subject in the gallery set and another in the probe set. The results are shown in Table 5.2. The results of a few famous approaches are listed in the same table for comparison. It is shown that LBP-DLMA not only improves the original LBP approach, but also achieves the performances at least comparable to the state of the art approaches. It was explained in [48] that a preprocessing stage can significantly improve the performance of LBP approach. Therefore, we also do the experiments with the preprocessing as suggested in [48]. Results are shown in Table 5.3. 5.2 FRG C We carry out the FRGC experiment 104 [35] of FRGC version 1, which is generally considered the most challenging in this FRGC VI dataset. It requires recognizing 608 uncontrolled faces from 152 controlled gallery faces. We normalize the face images into size 150 x 130 as we did for FERET exper- 49 Table 5.2: The recognition rates of the original LBP and weighted LBP, the LBP-DTMA, and LBP-DLMA for the FERET probe sets, the mean recognition rates of the Fb+Fc+D upl, and results of permutation test with a 95% confidence level. Method LBP, no weight 4] LBP, weighted [4] LBPEuclidean Distance Histogram intersection DLMA Chi square statistic Euclidean Distance LBPHistogram intersection Template Chi square statistic Fb Fc Dupl Dup2 93% 97% 99.37% 99.39% 99.31% 98.49% 98.91% 98.74% 51% 79% 93.60% 96.16% 96.20% 90.21% 92.78% 91.24% 61% 66% 79.66% 82.52% 82.23% 70.50% 76.04% 75.62% 50% 64% 75.56% 80.31% 80.53% 61.11% 68.38% 65.81% Fb,Fc & Dupl 78.20% 84.74% 92.10% 93.32% 93.18% 88.16% 90.53% 90.15% Permutation Test mean upper lower 81% 71% 76% 81% 85% 76% 84.92% 89.24% 93.31% 87.21% 91.22% 95.09% 87.34% 91.33% 95.18% 78.13% 83.26% 88.13% 83.13% 87.88% 92.50% 83.13% 87.61% 91.88% Table 5.3: The recognition rates of the LBP-DTMA, LBP-DLMA boosted by pre­ processing schemes on the FERET probe sets, and a few known approaches. Method Euclidean Distance Histogram intersection LBP-DLMA Chi square statistic Euclidean Distance preproceed Histogram intersection Chi square statistic LBP-DTMA LGBPHS[63] HGPP61] SIS [32] Schwartz [39] Preproceed Fb 99.29% 99.37% 99.37% 98.49% 99.00% 99.00% 98.0% 97.6% 91.0% 95.7% Fc 98.97% 99.48% 99.25% 98.45% 98.97% 98.45% 97.0% 98.9% 90.0% 99.0% Dupl 85.37% 88.40% 88.71% 84.07% 88.23% 88.23% 74.0 % 77.7 % 68.0 % 80.3 % Dup2 82.29% 85.89% 86.89% 82.05% 86.75% 86.75% 71.0% 76.1% 68.0% 80.3% iments. The results are shown in Figure 5.4. We also include the results of LBPDLMA with a “preprocessing” stage, as suggested by Tan et al [48]. We can see that LBP-DLMA with and without preprocessing improve LBP with and without preprocessing significantly. We should emphasize here that, our intension is to improve LBP approaches by using local matching scheme. It is not our intension to show that our approach is better than all possible approaches in all datasets. We understand some other approaches, such as [39], get better results for this experiment; we should add that those approaches actually use the settings more flexibly than we do —they use a training approach while we do not. 5.3 LFW We have also carried out experiments on “Labeled Faces in the Wild” (LFW)3 [28]. LFW is a database containing 13,233 face images of 5,479 individuals collected from 3The set is available via LFW official site http://vis-www.cs.umass.edu/lfw. 51 Table 5.4: Recognition rates of LBP-DLMA approaches on FRGC Experiment 104 LBP DLMA Euclidean Histogram intersection Chi square statistics 32.17% 34.38% 33.23% 28.1% LBP DLMA with Preprocessing LBP with Preprocessing[26] Euclidean Histogram intersection Chi square statistics 67.47% 58.31% 67.20% 58.1% LBP Template LBP Euclidean Histogram intersection Chi square statistics 42.94% 47.37% 47.20% 28.1% LBP Template with Preprocessing LBP with Preprocessing[26] Euclidean Histogram intersection Chi square statistics 74.01% 85.86% 86.18% 58.1% LBP [26] the web for the study of unconstrained face recognition. The faces were detected by the viola-Jones face detector and labeled by the name of individuals. 1,680 individu­ als in the database have two or more distinct photos. We test the performance of our approach on the 10 folds of view 2. All the face images were taken in unconstrained environments, exhibiting “ ‘natural’ variability in pose, lighting, focus, resolution, facial expression, age, gender, race, accessories, make-up, occlusions, background, and photographic quality” [28]. In this task, given two face images, the goal is to decide whether two images are of the same person. This is a binary classification problem, with two possible outcomes: “same” or “different” . LFW view 2 provides 10 folds of face sets where the sets of people in different folds are disjoint; when testing on one fold, the other nine folds can be used for training. Results of various approaches have been reported at LFW official site4. We use LFW-a version of images (the images aligned using a commercial face alignment software) [46]. The images are of size 250 x 250. We first crop them into images of size 90 x 78 (by removing 88 pixel margins from top, 72 from bottom, 4Note that most of the approaches reported were developed only for the specific binary classi­ fication task; our approach was not intended to be applicable only to this kind of tasks. 52 and 86 pixel margins from both left and right sides). Note that there were errors in the alignment of many images; we just keep them as they were (so some of the final cropped faces indeed are not correctly aligned). In LBP-DLMA, since a “voting” is required in each pile, we need a few “reference faces” to find relative values. Here, we use a dummy set as “reference faces”: for the experiments in the i-th fold, we use the first images (named “***..-0001.jpg” ) of the first 10 individuals in the (i —l)th fold (when i —1 = 0, we use the 10th fold) as the dummy set. For a pair of images x and y, for each pile, we first obtain the similarity array between x and the set consists of y and dummy set, then obtain the similarity array between y and the set consists of x and the dummy set; the average to these two arrays are taken so as to make local decision according the this array. Our results are shown in Figure 5.1 and Table 5.5. Due to the fact that LBP-DLMA does not have a training process, our ap­ proach should be compared to other no-training approaches as suggested in LFW site; we also include the Receiver Operating Characteristic(ROC) curves of all these no-training approaches SD-MATCHES (L & R system with SIFT descriptors and MATCHES flavour), H-XS-40 (Histogram of LBP features with Chi Square simi­ larity measure and 40 windows), GJD-BC-100 (Gabor Jets Descriptors with Borda Count measure and 100 reference images) and LARK representation without super­ vision [40], which are available in both LFW site and [30], in Figure 5.1 and Table 5.5. We can see that the LBP DLMA, regardless the similarity metrics that it uses, is significantly better than all other approaches. For the alternate version LBP-DTMA, we can either use or not use dummy set. The results are shown as following: 53 0.9 0.8 true positive rate 0.7 0.6 0.5 0.4 0.3 0.2 H-XS-40 GJD-BC-100 SD-MATCHES LARK unsupervised LBP DLMA, Euclidean LBP DLMA, Histogram intersection LBP DLMA, Chi square statistic 0.1 0 0.1 0.2 0.3 0.5 0.4 0.6 0.7 false positive rate Figure 5.1: ROC curves over View 2 of LFW 54 0.8 0.9 1 Table 5.5: The accuracies of LBP-DLMA, LBP-DTMA and a few no-training ap­ proaches for LFW Approach SD-MATCHES H-XS-40 GJD-BC-100 LARK unsupervised LBP-DLMA LBP-DTMA LBP-DTMA, with Dummy Set Euclidean Histogram intersection Chi square statistic Euclidean Histogram intersection Chi square statistic Euclidean Histogram intersection Chi square statistic 55 Accuracy 0.6410 ± 0.0062 0.6945 ± 0.0048 0.6847 ± 0.0065 0.7223 ± 0.0049 0.7517 ± 0.0122 0.7648 ± 0.0186 0.7622 ± 0.0206 0.6905 ± 0.0235 0.7428 ±0.0144 0.7417 ± 0.0143 0.7352 ± 0.0180 0.7633 ± 0.0152 0.7613 ± 0.0172 Chapter 6 Extensibility We expect that our approach can be applied to other descriptor approaches. Sim­ ply replacing LBP in Table 4.1 and Table 4.2 by any descriptor approach A , we should able to generate A-DLMA. We also expect that our approaches apply to low resolution images too. 6.1 D escriptors Other Than LBP We test the extensionality of this displacement local matching approach on two vari­ ants of LBP: Three-Patch LBP (TPLBP) and Four-Patch LBP (FPLBP). Applying DLMA to TPLBP and FPLBP, we generate TPLBP-DLMA FPLBP-DLMA. Exper­ iments of TPLBP-DLMA and FPLBP-DLMA are carried out on FERET datasets. For the parameters required for TPLBP and FPLBP, we use the default values of [57] as shown in Table 6.1: The experimental results are shown in Table 6.2. We can easily see that the performances of TPLBP-DLMA and FPLBP-DLMA are significantly better than TPLBP and FPLBP respectively. 56 Table 6.1: Parameters for TPLBP and FPLBP in our experiments T P L B P Operator: T P L B P 2 ^ , 3,5 r= 2 ring radius of circles 3 x 3, w = 3 patch size number of additional patches S= 8 distance between two apart patches a= 5 F P L B P Operator: F P L B P ^ 5 ,3 ,3 ,! ring radii of two circles ri = 4, r2 = 5 patch size 3 x 3 , w=3 number of additional patches S= 8 distance between two apart patches a = l 6.2 A pplications w ith Low R esolution Im ages A mathematical assumption for local matching schemes being superior than global matching schemes is that the nation should be “large” enough [1 1 , 14], although there is no fixed definition for “large”. We now try to demonstrate that the LBPDLMA also works for applications with small sized images. We use [8 ]’s version of Yale and ORL face sets available via Cai’s website h t t p : / / w w w .cad.zju.edu.cn/hom e/dengcai/D ata/FaceD ata.htm l, where all faces are of standardized size 32 x 32. The Yale dataset contains the images of 15 subjects, each with 1 1 images captured with variations of lighting conditions and facial expressions such as normal, happy, sad, sleepy, surprised and wink). ORL dataset contains the images of 40 subjects, each with 10 images captured with variations of expressions and details such as open eyes, close eyes, smiling, no-smiling, w/o glasses. For any given fc (fc = 2,3, • ■- 8 ), fcTrain represents a split where fc images per subject are chosen with labels for training, the rest are used for test. For fair comparison, 50 such random splits for each fcTrain of both Yale and ORL are available via Cai’s website. We perform experiments on Yale and ORL with LBP approach and LBP-DLMA 57 Table 6.2: The recognition rates of original TPLBP, FPLBP, and TPLBP DLMA and FPLBP DLMA without / with Preprocessing [48] for the FERET probe sets, the mean recognition rate of the Fb+Fc+D upl, and results of permutation test with a 95% confidence level. Permutation Test Fb,Fc Method & Dupl lower mean upper Euclidean Distance 94.64% 74.23% 62.33% 55.98% 81.71% 68.13% 74.12% 80.00% Histogram intersection 96.44% 86.08% 74.65% 69.23% 88.04% 80.00% 85.06% 90.00% TPLBP Chi square statistic 95.98% 86.08% 74.79% 69.66% 87.83% 79.38% 84.50% 89.38% TPLBP Euclidean Distance 99.26% 91.90% 75.97% 71.80% 90.62% 83.05% 87.51% 91.77% Histogram intersection 99.48% 95.15% 79.79% 75.7% 92.35% 85.68% 89.83% 93.91% DLMA Chi square statistic 99.38% 93.27% 78.83% 74.30% 91.79% 85.75% 89.90% 93.96% Preprocessed Euclidean Distance 98.88% 98.39% 77.56% 73.54% 91.54% 84.92% 89.27% 93.48% Histogram intersection 99.14% 98.23% 83.17% 81.98% 93.60% 87.87% 91.88% 95.68% TLBP DLMA Chi square statistic 99.15% 98.99% 82.31% 81.46% 93.38% 87.85% 91.85% 95.68% Euclidean Distance 95.73% 69.59% 64.13% 54.70% 82.52% 72.50% 78.07% 83.13% FPLBP Histogram intersection 96.65% 74.23% 67.45% 56.84% 84.60% 75.94% 81.19% 86.25% Chi square statistic 96.65% 74.23% 67.73% 56.41% 84.70% 75.63% 81.16% 86.25% FPLBP Euclidean Distance 98.89% 76.16% 68.68% 57.11% 86.47% 79.64% 84.32% 88.91% Histogram intersection 98.82% 81.09% 69.62% 60.98% 87.21% 80.84% 85.51% 90.09% DLMA Chi square statistic 99.04% 84.38% 70.56% 60.50% 87.95% 81.12% 85.78% 90.31% Preprocessed Euclidean Distance 98.74% 98.24% 75.10% 69.65% 90.61% 84.01% 88.27% 92.45% FPLBP Histogram intersection 99.00% 98.23% 76.96% 73.49% 91.39% 84.79% 89.07% 93.25% DLMA Chi square statistic 98.94% 98.22% 77.19% 73.08% 91.44% 85.05% 89.33% 93.49% Fb Fc Dupl Dup2 approach. For LBP approach, we let the window numbers per row (per column) to be 8, although the numbers between 7 and 9 seems to get very close accuracy. For LBP-DLMA, we let the number of blocks per row (per column) to be 4, and the number of windows in a block to be 3 per row (per column). Due the small size of the images, for LBP and for the LBP embedded within LBP-DLMA, we test on the radius of a circle to be both 2 and 1. For the sampling points distributed evenly on the circle, we keep it to be 8 as we did in Section “Experiments”. The results are reported in Tables 6.3 and 6.4. We can easily find that LBP-DLMA can improve the accuracy of LBP approach regardless the parameters for generating the local binary pattern labels and regardless of the similarity measurements. 59 Table 6.3: Average Error Recognition Rates and Standard Deviations of LBP and LBP DLMA Algorithms, for Yale face set (32 x 32 pixels). 7 Train 8 Train 4 Train 5 Train 6 Train 2 Train 3 Train Error ±Std Error ±Std ErrorztStd ErrorztStd ErrorztStd ErrorztStd ErrorztStd Eucl. 43.16 ± 5.11 37.22 ± 3.87 35.52 ± 3.77 33.10 ± 3.41 30.59 ± 4.10 30.97 zt 3.39 30.40 ± 5.51 CM LBP Hist. 39.87 ± 4.95 34.88 zt 4.11 32.27 ± 3.07 29.99 it 3.77 27.23 ± 4.43 27.48 zt 3.92 25.80 ± 5.33 || Chi 40.47 ± 5.22 35.58 ± 3.89 33.28 ± 3.34 31.20 ± 3.85 28.59 zt 4.47 28.23 ± 4.35 27.51 ± 5.55 3 73 Eucl. 34.33 ± 4.94 28.09 ± 3.31 25.08 ± 2.86 23.06 ± 3.29 21.79 zh 3.67 20.75 ± 3.64 19.86 ± 4.27 5? cd LBP Hist 31.98 ± 4.56 25.68 ± 3.10 22.57 ± 2.16 20.72 zt 2.47 20.19 ± 3.36 18.88 zt 3.32 17.92 ± 4.07 DLMA Chi 32.85 ± 4.55 26.45 ± 3.10 23.98 ± 2.60 22.36 ± 2.95 21.32 zt 3.18 20.10 ± 4.06 19.18 ± 4.20 Eucl. 41.51 ± 5.69 35.41 ± 3.74 32.98 zt 3.77 31.22 ± 3.57 28.81 ± 3.57 29.33 zt 4.34 27.78 zt 5.93 II LBP Hist. 36.64 ± 4.79 31.39 zt 3.89 29.69 ± 2.95 26.44 ± 3.72 24.39 ± 3.59 23.87 ± 3.82 22.11 ± 5.05 Chi 37.50 ± 4.75 31.97 ± 3.70 30.15 zt 3.10 26.71 ± 4.06 24.32 zt 3.58 23.30 ± 4.01 21.29 ± 4.95 3 Eucl. 31.08 ± 4.69 24.22 ± 3.07 21.23 ± 2.60 18.43 ± 3.27 17.46 ± 2.70 16.57 zt 3.41 15.09 zt 4.05 cS LBP Hist. 28.52 ± 4.08 23.26 ± 2.42 19.70 zt 2.42 16.58 ± 3.39 16.49 zt 3.23 14.50 zt 3.30 13.81 ± 4.02 DLMA Chi 29.11 ± 4.22 23.60 ± 2.92 20.16 zt 3.05 16.92 zt 3.89 17.54 ± 3.00 15.61 ± 2.94 14.50 ± 4.37 Table 6.4: Average Error Recognition Rates and Standard Deviations of LBP and LBP DLMA Algorithms, for ORL face set (32 x 32 pixels ). 8 Train 7 Train 4 Train 5 Train 6 Train 2 Train 3 Train E rro riS td ErrordtStd Error±Std Error±Std Error±Std Error±Std Error±Std Eucl. 19.97 ± 2.86 12.75 ± 2.00 8.40 ± 1.84 5.98 ± 1.77 4.49 ± 1.84 3.33 ± 1.59 2.26 ± 1.79 conquer algorithm for significantly improving descriptor based face recognition approaches. In Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid, editors, Computer Vision E C C V 2012, volume 7576 of Lecture Notes in Computer Science, pages 214-227. Springer Berlin Heidelberg, 2012. [11] L. Chen and N. Tokuda. Regional voting versus national voting -stability of regional voting (extended abstract). In Int. ICSC Symposium on Advances in Intelligent Data Analysis, Rochester, New York, USA, Jun. 22-25 1999. [12] L. Chen and N. Tokuda. Robustness of regional matching scheme over globe matching scheme. Artificial Intelligence, 144(l-2):213-232, 2003. [13] L. Chen and N. Tokuda. Stability analysis of regional and national voting schemes by a continuous model. IEEE Trans. Knowledge and Data Engineering, 15(4): 1037-1042, 2003. 65 [14] L. Chen and N. Tokuda. A general stability analysis on regional and national voting schemes against noise - why is an electoral college more stable than a direct popular election? Artificial Intelligence, 163(l):47-66, 2005. [15] L. Chen and N. Tokuda. A unified framework for improving the accuracy of all holistic face identification algorithms -electoral college for human face identification by computing machinery. Artificial Intelligence Review, 33(1-2), 2010 . [16] L. Chen, N. Tokuda, and A. Nagai. Robustness of regional matching over global matching -experiments and applications to eigenface-based face recognition. In M. R. Syed and O. R. Baiocchi, editors, Intelligent Multimedia, Computing and Communications Technologies and Applications of the Future (Proc. of 2001 Int. Conf. on Intelligent Multimedia and Distance Education, Fargo, North Dakota, USA, June 1-3, 2001), pages 38-47. John Wiley & Sons, Inc., New York, 2001. [17] Liang Chen and Ling Yan. Block lbp displacement based local matching ap­ proach for human face recognition. In Jong-11 Park and Junmo Kim, editors, Computer Vision - ACCV 2012 Workshops, volume 7728 of Lecture Notes in Computer Science, pages 97-108. Springer Berlin Heidelberg, 2013. [18] S. Chen and Y. Zhu. Subpattern-based principle component analysis. Pattern Recognition, 37(5): 1081-1083, 2004. [19] MetaData Company. Metadata company website, h t t p : / / www.m etad ata. com. mx/, 2013. [20] Visionics Company. Visionics Facelt©technology, h t t p ://www.v i s i o n ic s . com/, 2013. 66 [21] K. Etemad and R. Chellappa. Discriminant analysis for recognition of human face images. Journal of the Optical Society of America, 14:1724-1733, 1997. [22] R. Frischholz. The Face Detection Homepage. faced etectio n .co m /, 2013. h t t p :/ / h t t p : //www. [23] T. Griiter, M. Griiter, and C. C. Carbon. Neural and genetic foundations of face recognition and prosopagnosia. J Neuropsychol, 2(l):79-97, 2008. [24] H. Z. Gu and S. Y. Lee. Integrating two-dimensional morphing and pose estima­ tion for face recognition with pose variations. Journal of Information Science and Engineering, 2012. [25] J. V. Haxby, E. A. Hoffman, and M. I. Gobbini. The distributed human neural system for face perception. Trends Cognitive Science, 4:223-233, 2000. [26] J. Holappa, T. Ahonen, and M. Pietikainen. An optimized illumination nor­ malization method for face recognition. In Biometrics: Theory, Applications and Systems, 2008. BTAS 2008. 2nd IEEE International Conference on, pages 1-6, Sept. 29-Oct. 1 2008. [27] C. Huang, S. Zhu, and K. Yu. Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. Technical Report TR115, NEC, 2011. [28] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, Oct. 2007. [29] ICBC. ICBC face recognition software for security, http://w w w .icbc.com / 67 d r iv e r -1 i censin g /y o u r-p riv acy , 2013. [30] R. Javier, R. Verschae, and M. Correa. Recognition of faces in unconstrained environments: A comparative study. EURASIP Journal on Advances in Signal Processing, 2009. Article ID 184617, 19 pages. [31] N. Kanwisher, J. McDermott, and M. M. Chun. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci, 115(1):4302-4311, 1992. [32] J. Liu, S. Chen, Z. Zhou, and X. Tan. Single image subspace for face recognition. In AFMG, pages 205-219, 2007. [33] E. Nowak and F. Jurie. Learning visual similarity measures for comparing never seen objects. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8, 2007. [34] P. Phillips, H. Moon, S. A. Rizvi, and P. Rauss. The FERET evaluation methodology for face-reeognition algorithms. IEEE Trans. Pattern Analysis & Machine Ingelligence, 22(10): 1090-1104, 2000. [35] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. Overview of the face recognition grand challenge. In Proc. of Computer Vision and Pattern Recognition, volume I, pages 947-954. San Diego, Jun. 2005. [36] J. Sadr, I. Jarudi, and P. Sinha. Perception, 32:285-293, 2003. The role of eyebows in face recognition. [37] P. Sanguansat, W. Asdornwised, S. Jitapunkul, and S. Marukatat. 68 Class- specific subspace-based two-dimensional principal component analysis for face recognition. 2006. [38] Z. Schultz. Facial recognition technology helps DMV prevent identity theft. WMTV-News, 2007. [39] W. R. Schwartz, H. Guo, and L. S. Davis. A robust and scalable approach to face identification. In European Conference on Computer Vision, pages 476489, 2010. [40] H. J. Seo and P. Milanfar. Face verification using the lark representation. IEEE Transactions on Information Forensics and Security, 6:1275-1286, Dec. 2011. [41] S. Shan, W. Gao, B. Cao, and D. Zhao. Illumination normalization for robust face recognition against varying lighting conditions. Analysis and Modeling of Faces and Gestures AMFG 2003. IEEE International Workshop on, pages 157-164, 2003. [42] P. Sinha, B. Balas, Y. Otrovsky, and R. Russell. Face recognition by human: Nineteen results all computer vision researchers should know about. Proceedings of The IEEE, 94(11):1948-1962, 2006. [43] L. Sirovish and M. Kirby. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A Optics and Image Science, 4(3):510-524, 1987. [44] M. Pietikainen T. Ojala. A comparative study of texture measures with classi­ fication based on feature distributions. Pattern Recognition, 29(1):51—59, 1996. [45] M. Pietikainen T. Ojala. Histogram of gabor phase patterns (hgpp): A novel 69 object representation approach for face recognition. IEEE Transactions on Image Processing , 16:57-68, 2007. [46] Y. Taigman, L. Wolf, and T. Hassner. Multiple one-shots for utilizing class label information. In The British Machine Vision Conference (BMVC), London, Sep. 2009. [47] K. Tan and S. Chen. Adaptively weighted sub-pattern PCA for face recognition. Neurocomputing, 64:505-511, 2005. [48] X. Tan and B. Triggs. Enhanced local texture feature sets for face recognition under difficult lighting conditions. In AMFG, pages 168-182, 2007. [49] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71—86, 1991. [50] M. Turk and A. Pentland. Face recognition using eigenfaces. IEEE Conf on Computer Vision and Pattern Recognition, pages 586-591, 1991. [51] A. Wagner, J. Wright, A. Ganesh, Z. Zhou, H. Mohahi, and Y. Ma. Toward a practical face recognition system: Robust alignment and illumination by sparse representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(2):372-386, 2012. [52] R. Walker, M. Stokes, M. Socker, and M. Collins. A study of the face recognition ability of orthodontists and lay persons of different age groups. Journal of Orthodontics, 39(1):9-16, 2012. [53] P. Wang, L. C. Tran, and Q. Ji. Improving face recognition by online image alignment. ICPR Pattern Recognition, 18th International Conference on, 1:311— 70 314, 2006. [54] Y. Welinder. A face tells more than a thousand posts: Developing face recog­ nition privacy in social networks. Harvard Journal of Law & Technology, 26(l):165-239, 2012. [55] J. B. Wilmer, L. Germine, C. Chabris, G. Chatterjee, M. Williams, E. Loken, K. Nakayama, and B. Duchaine. Human face recognition ability is specific and highly heritable. Proceedings of the National Academy of Sciences of the United States of America, 107(11):5238-5241, 2010. [56] Business Wire. Mexico adopts visionics’ faceit technology in permanent system for eliminating duplicate voter registrations, 2000. [57] L. Wolf, T. Hassner, and Y. Taigman. Descriptor based methods in the wild. In Real-Life Images workshop at the European Conference on Computer Vision (ECCV), Oct. 2008. [58] H. Xiong, M. N. S. Swamy, and M. O. Ahmad. Two-dimensiional FLD for face recognition. Pattern Recognition, 38(7):1121-1124, Jul. 2005. [59] J. Yang, D. Zhang, A. F. Frangi, and J. Yang. Two-dimensional pea: a new approach to appearance-based face representation and recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26:131-137, 2004. [60] A. W. Young, D. Hellawell, and D. C. Hay. Configuration information in face perception. Perception, 16:747-759, 1987. [61] B. Zhang, S. Shan, X. Chen, and W. Gao. Histogram of gabor phase patterns (hgpp): A novel object representation approach for face recognition. IEEE 71 Transactions on Image Processing, 16:57-68, 2007. [62] D. Zhang, M. Yang, and X. Feng. Sparse representation or collaborative rep­ resentation: Which helps face recognition? Computer Vision (ICCV), 2011 IEEE International Conference on, pages 471-478, 2011. [63] W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang. Local gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition. Computer Vision, ICCV 2005. Tenth IEEE International Conference on, 1:786-791, 2005. [64] J. Zou, Q. Ji, and G. Nagy. A comparative study of local matching approach for face recognition. IEEE Transactions on Image Processing, 16(10):2617-2628, 2007. 72