Initial Investigation into Using A Two-Level Regional Voting Approach for Face
Verification

by

Jun Ma
B.Sc., University of Northern British Columbia, 2007

THESIS PROPOSAL SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
IN
MATHEMATICAL, COMPUTER, AND PHYSICAL SCIENCES
COMPUTER SCIENCE

The University of Northern British Columbia
April 2010

©Jun Ma, 2010

1*1

Library and Archives
Canada

Bibliothgque et
Archives Canada

Published Heritage
Branch

Direction du
Patrimoine de l'6dition

395 Wellington Street
Ottawa ON K1A 0N4
Canada

395, rue Wellington
Ottawa ON K1A 0N4
Canada
Your file Votre reference

ISBN:

978-0-494-60862-3

ISBN:

978-0-494-60862-3

Our file

Notre reference

NOTICE:

AVIS:

The author has granted a nonexclusive license allowing Library and
Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or noncommercial purposes, in microform,
paper, electronic and/or any other
formats.

L'auteur a accorde une licence non exclusive
permettant a la Bibliotheque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par telecommunication ou par I'lnternet, preter,
distribuer et vendre des theses partout dans le
monde, a des fins commerciales ou autres, sur
support microforme, papier, electronique et/ou
autres formats.

The author retains copyright
ownership and moral rights in this
thesis. Neither the thesis nor
substantial extracts from it may be
printed or otherwise reproduced
without the author's permission.

L'auteur conserve la propriete du droit d'auteur
et des droits moraux qui protege cette these. Ni
la these ni des extraits substantiels de celle-ci
ne doivent etre imprimes ou autrement
reproduits sans son autorisation.

In compliance with the Canadian
Privacy Act some supporting forms
may have been removed from this
thesis.

Conformement a la loi canadienne sur la
protection de la vie privee, quelques
formulaires secondaires ont ete enleves de
cette these.

While these forms may be included
in the document page count, their
removal does not represent any loss
of content from the thesis.

Bien que ces formulaires aient inclus dans
la pagination, il n'y aura aucun contenu
manquant.

Canada

Abstract
Face verification is defined as a person whose identity is claimed a priori will be
compared with the person's individual template in database, and then the system checks whether
the similarity between pattern and template is sufficient to provide access.
In this thesis we introduce a new procedure of face verification with an embedding
Electoral College framework, which has been applied successfully in face identification. The
approaches are evaluated by experiments on benchmark face databases applying the Electoral
College framework embedded with standard baseline PCA algorithm and newly developed
algorithm S-LDA. The results demonstrate that the proposed face verification systems improve
the performance of these holistic algorithms.

i

Table of Contents
Abstract

i

Table of Contents

ii

List of Figures

iv

List of Tables

v

Acknowledgments

vi

Chapter 1 Introduction

1

1.1 Motivation

1

1.2 Major Contribution

2

1.3 Overview of the Thesis

3

Chapter 2 Literature Review

4

2.1 History of Face Recognition

4

2.1.1 Early Development

4

2.1.2 Recent Improvements

5

2.2 Identification vs. Verification

6

2.2.1 Examples of Face Recognition Methods/Algorithms

6

2.2.2 Face Verification Procedure

8

2.3 Multi-Level Regional Voting System as Face Recognition Approach
Chapter 3 Approach/Methods
3.1 Proposed Procedure of Face Recognition/Verification

10
12
12

3.1.1 Face Verification Procedure

12

3.1.2 Regional Scheme

14

3.1.3 Regional Thresholds Generator

16

ii

3.1.4 Voting/Scoring Scheme

21

3.1.5 Shifting

23

3.2 Performance Measures

23

3.2.1 Reviews

23

3.2.2 Experimental Results Evaluation

26

Chapter 4 Experiments and Results
4.1 Data Sets

28
28

4.1.1 The ORL Database of Faces

28

4.1.2 The Yale Face Database

29

4.2 Experiment

30

4.2.1 Setup Mode

31

4.2.2 Model Training

34

4.2.3 Model Testing

35

4.3 Experiment Results Analysis and Discussions

36

4.3.1 Generating Thresholds

36

4.3.2 Comparison of Different Training Sets

39

4.3.3 Database ORL vs. Database Yale

42

4.3.4 Comparison of Different Methods

46

4.3.5 Recognition Results

47

Chapter 5 Conclusions

52

5.1 Summary

52

5.2 Future Works:

53

References

55

List of Figures
Figure 2. 1 The procedure of face verification

9

Figure 3. 1 Proposed Training Procedure

12

Figure 3. 2 The procedure for face verification with regional voting/scoring scheme

13

Figure 3. 3 Two-Level Regional Division vs. Multi-Level Regional Division

15

Figure 3. 4 One threshold for each subject (Method 2)

19

Figure 3. 5 One threshold for each subject (Method 3)

20

Figure 3. 6 Two-Level Regional Voting/Scoring Scheme

22

Figure 3. 7 2-step shift of each direction makes 25 shifts in total

23

Figure 3. 8 Area Under the ROC Curve

27

Figure 4. 1 ORL Database of Face

29

Figure 4. 2 Yale Face Database

30

Figure 4. 3 Two-level Regional Voting Model Setup

33

Figure 4. 4 Training Data Collecting

34

Figure 4. 5 Comparison of Different Regional Thresholds (method 1)

36

Figure 4. 6 Comparison of different training sets (ORL PCA 1 x 1)

40

Figure 4. 7 Comparison of different training sets (ORL PCA lOx 10)

41

Figure 4. 8 Database ORL vs. Yale (PCA & S-LDA 2T 1 x 1)

43

Figure 4. 9 Database ORL vs. Yale (PCA & S-LDA 2T lOx 10)

44

Figure 4. 10 Comparison of Different Methods (2T)

46

Figure 4. 11 Recognition Results (2T)

48

List of Tables
Table 3. 1 Confusion Matrix

24

Table 3. 2 Sensitivity/Specificity

25

Table 4. 1 Training Data Collection

35

Table 4. 2 AUC Results of Different Regional Thresholds (Method 1)

37

Table 4. 3 Threshold Set with Relatively Best AUC Result (Method 2)

38

Table 4. 4 Threshold Set with Relatively Best AUC Result (Method 3)

39

Table 4. 5 Database ORL vs. Database Yale

42

Table 4. 6 Recognition Results

50

Acknowledgments
I would like to thank my supervisor, Dr. Liang Chen, for his continuous guidance and
support throughout my studies at the University of Northern British Columbia. I also wish to
express my thanks to Dr. Charles Brown and Dr. Jianbing Li for their encouragement, valuable
advice and positive suggestions on my thesis.
As well, I would like to take this opportunity to thank the University of Northern British
Columbia for providing every support throughout my studies in Canada. I would like to give a
special thank you to Dr. Jean Wang and the HPC lab for their technical support on my research
work. Also I would like to thank all my friends, Lin, Motto, Ms. Beattie, and James who show
their daily support and thank you to Vicki and Graham for their advice and suggestions on my
thesis.
Last but not least, I would also give my sincere thanks to my parents for their selfless
love, endless patience and huge support (either financial or spiritual). Without them, I could not
have reached this point.

VI

Chapter 1 Introduction
This chapter presents an introduction to the topic of this thesis and the major contribution
made through this research. The outline of this thesis will be provided as well.

1.1 Motivation
Face recognition can be generally defined as identify or verify one or more persons in a
still or video image of a scene by using a stored database of faces (Zhao et al. 2003). The only
difference between identification and verification is that identification refers to a positive ID of
identity within a predefined identities group (One-to-Many); while verification refers to a
positive ID of specific Identity (One-to-One). Other than that, identification and verification
work in exactly the same way.
Because of increasing commercial and security needs, face recognition, as a biometrics
technique, has received a lot of attentions in the past decades. Since the 1990s, the researchers
have put tremendous efforts in this research area, and acquired an enormous accomplishment in a
very short time. Numerous algorithms have been developed, and the face recognition systems
which are based on these algorithms have been applied in real life (Biometrics History, NSTC
Subcommittee on Biometrics). Holistic algorithms are the most popular approaches so far; they
are mostly developed for the face identification, and can naturally be applied into the face
verification systems.
However, the field of face recognition is still full of challenges. Current face recognition
does not work well under some conditions, such as poor lighting, sunglasses, long hair, or other

1

objects partially covering the subject's face. And sometimes, even a low resolution image or a
big smile can lead to a less effect in the system.
As a decision making strategy, voting scheme has been recently applied into the face
identification system (Chen and Tokuda. 2003; Artiklar et al. 2003; Faltemier et al. 2006).
According to the previous research works, we can see that multi-level voting scheme is able to
significantly improve the performance of holistic algorithms in the face identification (Chen and
Tokuda 2003; Chen and Tokuda 2003; Chen and Tokuda 2005; ). It has just been proved that the
regional voting can be applied into face identification systems, and the face identification system
with regional voting scheme has achieved improvements on performance (Chen and Tokuda
2009). So far, the regional voting hasn't been applied into face verification system yet, and the
prospective difficulties on finding the "thresholds" for each region keeps people away this
research topic. Therefore, my next step of inquiry is to investigate the possibility of adopting the
regional voting scheme into the face verification system.
This thesis intends to investigate strategies of embedding the Regional Voting scheme
into a regular face verification procedure.

1.2 Major Contribution
In this thesis, I studied the strategies for adopting a Multi-level Regional Voting Scheme
for face verification. The major contributions of this thesis are:
•

Based on the regular face verification procedure, we have constructed a new face
verification procedure adopting Two-level Regional Voting Scheme. By employing two
face recognition algorithms: Principal Component Analysis (PCA) and Spatially Smooth

2

Version of LDA (S-LDA), we have developed two models of the proposed face
verification procedure.
•

In the newly proposed procedure, a concept "threshold" takes a key role. We proposed
the following 4 methods of generating thresholds:
o

One threshold for all subjects (0/1 voting);

o

One threshold for each subject (Method 2 and Method 3, 0/1 voting);

o

Weighted voting (the similarity value in each region is used as the
weight).

•

Extensive experiments have been developed for testing the above models and methods,
the advantages and disadvantages have been compared.

1.3 Overview of the Thesis
The purpose of this research is to explore the applicability of Multi-level Regional Voting
Scheme into face verification. The remaining chapters of this thesis are organized as follows. In
the next chapter, a literature review introduces the background of the face recognition and
reviews the approaches that are related to the proposed method in the literature. Chapter 3
describes the methodology of this research. In Chapter 4, the experiment procedures are
demonstrated, and the results and accompanying analysis are presented. Finally, the conclusions
and discussions of future work are provided in Chapter 5.

3

Chapter 2 Literature Review
This chapter provides a brief review of the field of face recognition and a review of other
research that has been done in this area in order to facilitate the introduction of the proposed
method. As well, a number of face recognition approaches and procedures for face verification
will be presented. Subsequently, the idea of college election, which is closely related to the
proposed method, will be covered.

2.1 History of Face Recognition
Face recognition has become a very popular research area in computer vision, and has
been studied for the past decades (Biometrics History, NSTC Subcommittee on Biometrics).

2.1.1 Early Development
Face recognition researches started in the 1960s. The first system for face recognition
(Facial feature-based recognition, developed by Bledsoe, Kelly) (Kelly 1970; Bledsoe 1964)
required the administrator to manually input and computed the measurements and locations of
face features, such as hair colour and lip thickness. In the late 1980s, first semi-automated facial
recognition system was deployed (the Lakewood Division of the Los Angeles County Sheriffs
Department, 1988). In 1989 Kohonen brought up an idea called "eigenface," also known as the
PCA approach, which computes "a face description by approximating the eigenvectors of the
face image's autocorrelation matrix" (Kohonen 1989). Later, Kirby and Sirovich (Kirby and
Sirovich 1990) introduced an algebraic manipulation to directly calculate the eigenface. The
development of the Eigenface algorithm was a milestone because it showed that less than one
hundred values were required to approximate a suitably aligned and normalized face image

4

(Sirovich and Kirby 1987). In 1991, Turk and Pentland (Turk and Pentland 1991) extended PCA
to recognize faces, which enabled a reliable real-time automated face recognition system. In
1990s, Optical Character Recognition (OCR) was launched for consumer applications like
scanning and faxing. In 1996, Belheumeur et al. implemented the algorithm LDA which was
developed by Fisher in 1936 with PCA and brought up the idea of "fisherfaces" (Belhumeur et
al. 1996). By about 1997, a face recognition system called "Bochum system" was developed and
sold as a commercial product. It was used by customers such as Deutsche Bank and operators of
airports. The software was described as "robust enough to make identifications from less-thanperfect face views. It can also often see through such impediments to identification as mustaches,
beards, changed hair styles and glasses - even sunglasses" (ScienceDaily 1997). In 2000, a
standard testing method and database called FERET was established to evaluate or compare
facial recognition algorithms. In the same year, the first face recognition vendor test (FRVT
2000) was held. A popular face recognition algorithm, "Independent Component Analysis
(ICA)" was implemented in 2002 by Bartlett et al (Bartlett et al. 2002).

2.1.2 Recent Improvements
In 2006, the performance of the latest face recognition algorithms was evaluated in the
Face Recognition Grand Challenge (FRGC). High-resolution face images, 3-D face scans, and
iris images were used in the tests. The results indicated that "the new algorithms are 10 times
more accurate than the face recognition algorithms of 2002 and 100 times more accurate than
those of 1995. Some of the algorithms were able to outperform human participants in
recognizing faces and could uniquely identify identical twins" (Phillips et al. 2005; Phillips et al.
2007). In the FRVT 2006, a FRR of 0.01 at a FAR of 0.001 was achieved by Neven Vision
(NV1-NORM algorithm) on the very high-resolution still images and Viisage (V-3D-N

5

algorithm) on the 3D images. Furthermore, the FRVT 2006 established the first 3-D face
recognition benchmark, and showed significant progress has been made in matching faces across
changes in lighting (Phillips et al. 2007).

2.2 Identification vs. Verification
Face recognition systems can be classified into two groups:
Identification - "A one-to-many comparison of the captured face against a face database
in an attempt to identify an unknown individual" (Biometrics, Wikipedia). In face identification,
the system is trained with the patterns of a group of persons. An unknown pattern that is going to
be identified is matched against every known template, yielding either a score or a distance
describing the similarity between the pattern and the template. The system assigns the pattern to
the person with the most similar template.
Verification - "A one-to-one comparison of a captured biometric with a stored template
to verify that the individual is who he claims to be" (Biometrics, Wikipedia). In verification
system, the pattern that is verified is compared with the person's claimed individual template in
order to decide whether the similarity between pattern and template is sufficient to support the
claim.

2.2.1 Examples of Face Recognition Methods/Algorithms
Appearance based approaches are the most successful and well-studied techniques in face
recognition (Turk and Pentland 1991). In appearance-based approaches, an image of
s i z e n X m pixels is usually represented by a vector in a n X m dimensional space. But in practice,
these n X m dimensional spaces are too large to allow robust and fast face recognition. To

6

resolve this problem, dimensionality reduction techniques are used (He et al. 2005). Two of the
most popular techniques for this purpose are Principal Component Analysis (PCA) (Turk and
Pentland, 1991), and the Linear Discriminant Analysis (LDA) (Belhumeur et al., 1996; Zhao et
al., 1998). In the following paragraphs, we will briefly introduce the PCA algorithm and a newly
developed algorithm called Spatially Smooth Version of LDA which is based on LDA.

2.2.1.1 Principal Component Analysis (PCA)
PCA is a statistical dimensionality-reduction method, which retains the majority of the
variation present in the dataset, while reducing the dimensionality of a dataset. Kirby and
Sirovich (Kirby and Sirovich 1990) applied PCA to representing faces and Turk and Pentland
(Turk and Pentland 1991) extended PCA to recognizing faces. PCA-based face recognition
method is an eigenvector method designed to model linear variation in high-dimensional data.
The PCA can be used to find a subspace from a given higher dimensional vector. The input of
PCA is a training set,

set is zero ( '

oiN

facial images such that the ensemble mean of the training

) (Moon and Phillips 2001). PCA projects the original n -dimensional data

onto the ^ -dimensional linear subspace spanned by the leading eigenvectors of the data's
covariance matrix (Turk and Pentland 1991; Martinez and Kak 2001).

2.2.1.2 Spatially Smooth Version of LDA (S-LDA)
The Spatially Smooth Subspace Learning (SSSL) model (Cai et al. 2002) is a linear
dimensionality reduction method that uses a laplacian penalty to constrain the coefficients to be
spatially smooth and produce a spatially smooth subspace which is better for image
representation. Recognition clustering and retrieval can be then performed in the image

7

subspace. It was developed based on an approach called "Graph Embedding (GE)" which was
also proposed by Dr. Cai and his colleagues (He et al. 2005). The GE approach is defined
as GE(W,D);

Wdenotes a symmetric mxmmatrix

with Wt] having the weight of the edge

joining vertices / and j; D is a diagonal matrix whose entries are column (or row) sums of W,
Z), = ^ W]t. Cai et al. has proved that many recently proposed manifold learning algorithms can
j
be interpreted into the Graph Embedding framework by changing W(tie et al. 2005; Cai et al.
2008). Therefore, the SSSL model can be applied to all the existing subspace learning
algorithms, such as LDA. The research of Cai et al. has demonstrated that SSSL consistently
outperforms the corresponding ordinary subspace learning algorithms and their tensor extensions
(Cai et al. 2002).

2.2.1.3 Summary
There are numerous face recognition algorithms, such as the independent component
analysis (ICA) (Liu and Wechsler 1998; Delac et al. 2005; Bartlett et al. 2002; Comon 2003), the
eigenspace-based adaptive approach (EP) (Liu and Wechsler 1998), the Elastic Bunch Graph
Matching (EBGM) (Wiskott et al. 1997), and the support vector machine (SVM, Guo et al. 2000,
Jonsson et al. 2000). All the above algorithms are appearance based. The other face recognition
techniques based on face representation are called "Feature-based", which uses geometric facial
features and geometric relationships between them.

2.2.2 Face Verification Procedure
The general procedure of face verification is summarized in Figure 2.1.

8

PCA
LDA
S-LDA
Figure 2.1 The procedure of face verification

The ordinary face verification procedure is described as in Fig. 2.1. It includes three
major components: the Holistic Algorithm Model, Matching Processor, and the Classifier.
After some pre-processing, testing images are usually projected into lower dimensional
subspaces by using some of the holistic algorithms in the Holistic Algorithm Model. Common
holistic algorithms include Principal Component Analysis (PCA), Linear Discriminant Analysis
(LDA), and some other newly developed holistic algorithms, such as S-LDA, and so on.
The subspace vector which is obtained in the Holistic Algorithm Model then is passed
into the Matching Processor. In this processor, the subspace vector of the testing image will be
compared with the subspace vectors of the template images which are stored in the database. The
output of the Matching Processor is similarities values which are usually obtained by measuring
the similarity between the testing image vectors and the template image vectors.
The last component is called Classifier in where the similarities values are compared with
the preset thresholds, so that a verifying decision can be made (Kang et al. 2002).

9

2.3 Multi-Level Regional Voting System as Face Recognition Approach
VOTING is a popular and important decision-making processes. It has not only been
used in daily social and political activities, but has also been used in many scientific studies.
Voting usually operates in two ways: national and regional voting. In national voting, a
candidate(s) is selected directly by a simple majority of the entire voting population of the
nation, while in regional voting the entire nation is divided into regions; the winner of the voting
is determined by a majority of the winning regions, based on the winner-take-all principle (Chen
and Tokuda 2003).
A K -level Electoral College (Regional Voting) is simply defined as follows: "the
original nation/area is said to be the 1st level (level 1). This nation is then partitioned into 2ndLevel regions. Each 2nd-level region is partitioned into 3r -level regions, which are then
partitioned into 4th-level regions, and so on up to the K"' -level regions. The winner of each K'h level region is determined by a majority of its voting population. The winner of and i'h -level
region (i = k -1, k - 2,...,1) is determined by the majority of the winning (i + \)th -level regions
that the i'h -level region was partitioned into, based on the winner-takes-all principle."1
A voting scheme has also been introduced into the face identification field. It had an
outstanding performance compared with other existing face identification models. Faltemier et
al. used multiple regions of the face for matching, and tried to reduce the effects caused by
expression variation between gallery and probe images, and the experimental results
demonstrated an improved performance (Faltemier et al. 2006). Artiklar et al. developed a face
1

Chen, L. "Theory of Multi-Level Electoral College for Multi-Candidate Elections and
Electoral College Based Face Recognition Surveillance & Intelligent Textual Information
Retrieval Systems", NSERC Discovery Grant proposal, 2005.

10

recognition system by using local voting networks which combine the local distance
computations and a voting scheme (Artiklar et al. 2003). Chen et al. recently demonstrated that
regional voting scheme can be used as a general framework to significantly improve the
performance of all holistic algorithms in face identification systems (Chen and Tokuda 2009).

11

Chapter 3 Approach/Methods
First, this chapter formulates a model of the Regional Voting face verification system.
Second, the evaluation method is presented.

3.1 Proposed Procedure of Face Recognition/Verification
3.1.1 Face Verification Procedure
The proposed face verification procedure is developed based on the original face
verification procedure. It includes four major components: the Regional Scheme, Lower
Dimensional Space, the Matching Processor and the Voting/Scoring Model; and two databases:
the Database of Gallery Regional Subspace Vectors, and the Database of Regional Thresholds.
Training Set

Figure 3 . 1 Proposed Training Procedure

12

Before constructing the new face verification process, we had to first establish two new
databases. As shown in Fig. 3.1, in order to establish the two new databases, we had to first
collect a training data which includes both the gallery images and the training images. Under the
regional voting scheme, the gallery images were first divided into non-overlapping regions in the
Regional Scheme. Then a holistic algorithm was used to obtain subspace vectors, and then to
project the regional vectors into lower dimensional subspace. Finally those regional subspace
vectors were saved as data in the Database of Gallery Regional Subspace Vectors. Meanwhile,
the training images were also divided into regions and projected into lower dimensional
subspaces. Then the regional subspace vectors of the training images and the corresponding
gallery images were inputs to the Threshold Generator where the thresholds of each region were
generated and saved into data as the Database of Regional Thresholds. The principle of
Threshold Generator will be discussed in section 3.1.4.

Figure 3. 2 The procedure for face verification with regional voting/scoring scheme

13

In the proposed face verification procedure (Fig. 3.2), the testing images are first
partitioned into non-overlapping regions and then represent the subimage of each region by a raw
vector. The vector of each region is then projected into the lower dimension space to generate a
regional subspace vector. After obtaining the regional subspace vectors, in the matching
processor, by calculating the Euclidean distance of two vectors, the similarities between the
testing image regional vectors and the corresponding gallery image regional vectors are
measured. Finally, in the Voting/Scoring Model, the regional similarity values are compared to
the stored regional thresholds. If the similarity distance is less than or equal to the threshold, then
the region gets a vote/score of 1, otherwise, a vote/score of 0.
Since we are using the "Two-level Regional Voting Scheme," after the voting/scoring for
each region, we will have the total votes for the whole image (sum the votes for all regions of
one image). Another threshold is needed to classify the whole image. For example: take 3 x 3
division as an example; assume we set the threshold as 3; there are a total 4 regions out of 9
regions vote for 1 (i.e. the score is 4); then this image will be verified as the image that it
declared.

3.1.2 Regional Scheme
The Regional Scheme is described in the following way: the nation (l st -level) is
represented as a rectangular area with size of nxm( wheren,me Z+) unit cells. Divide the
nation into C x C equal shaped rectangles which called region (2nd-level) with size
r x r ; (where n and m are divisible by r and r y , and ri and

are positive integers). Each 2nd-level

region can be considered as a nation, and divided into C x C equal shaped rectangles which
called 3rd-level region with size ti x t, (where rt and r. are divisible by ti andf., andt t andf. are

14

positive integers), as shown in Fig. 3.3 (Chen and Tokuda 2005). By repeating the above steps,
the nation can be partitioned into K ( K > 0 ) levels of regions.

Second level Regions
3 x 3 Partitions
With size ri y-r}

Nation ( n x m )

Two-Level Regional

"V

Second Level Regions
3 x 3 Partitions
With size

Third Level Regions
3 x 3 Partitions
With size t{ x t j

Tin c c -L e ve 1/Multi-Le ve 1 Re gio nal

Figure 3. 3 Two-Level Regional Division vs. Multi-Level Regional Division

15

3.1.3 Regional Thresholds Generator
As it is explained in section 3.1.1 and Fig. 3.1, "Threshold" plays as a key role in the
proposed procedure. There are two sets of threshold involved: threshold for each region and
threshold for all regions.
Assuming that we have one threshold for all subjects in each region and for all regions of
each subject, then for a regional scheme of M by N regions, there will be a MN +1 threshold.
Different thresholds will result in different sensitivity and specificity. We can expect that,
whenMV out of MN +1 threshold are set, and ROC curve of specificity vs. sensitivity can be
drawn by varying different values of the remaining threshold. A proper set of thresholds needs to
be selected from the MN thresholds so that the AUC of ROC is sub-optimized.
We will also suggest different thresholds for different subjects. In the following
subsections, we develop 4 methods for generating the threshold for each region to ensure good
performance on the Voting/Scoring model. These four methods are not the only ways or the best
ways to generate thresholds, and we believe that other better ways can be developed later.

3.1.3.1 One Threshold for All Subjects (Method 1, 0/1 Voting)
For the training phase, we constructed a group of training images Gtrajning which are used
to generate the thresholds.
1. Divide the training images into regions, and compare to the Gallery images, and get
the similarity distance S, m (which denotes to the Euclidean distance between the
training region and the gallery region., and n = number of subjects, m = number of
regions) of each region.

16

2. Find M, m = min(5'/

), take each M, m as a candidate threshold T, m .

3. Use the regional similarity distance S, m again to compare with these T, m, and get the
voting/scoring matrices by employing the Voting/Scoring scheme, so that the
specificity of each S, m is calculated.
4. Find a T, m where the specificity matches the preset value jU (e.g., the preset value
jU= 0.90, find where the specificity-0.90), and save this Tj m as the candidate
threshold of Gallery Image In 's m'h region. In order to find the regional thresholds for
gallery images in database, we set the preset value fi = [0.90, 0.80, 0.70, 0.60, and
0.50] (Note: the preset value // is same for all regions of all subjects) Among these
preset values find the one with the best performance (e.g., to find the best
performance is just to draw the ROC curves of the final results with each preset
value j i , and then find the one with the largest AUC value), and save it to be the
regional threshold of its corresponding gallery image into database.

3.1.3.2 One Threshold for Each Subject (Method 2 & 3, 0/1 Voting)
Let each subject have its own group of training images

{n = number of subjects).

1. Again divide all of the images into regions, and
2. Calculate the similarity between the training image and the gallery image for each
region. The regional similarity distances SCln m (n= number of subjects, m = number
of regions) are obtained.

17

3. Hence, each SC, m can be considered as a candidate threshold!) m of gallery
image I n 's m'h region.
4. The number of the training images varies when the number of gallery images
changes. In order to get a same size of candidate thresholds T, m for variant size
of G"timing, we take SC,^m at Y intervals as our candidate thresholds, where
Y = ml r(m = number of subjects *size of training group, and r is any number that
can divide m exactly).
So far, we have just finished collecting the candidate thresholds. To find the threshold
with the best performance among these candidates as the threshold of gallery images in database,
two methods are introduced in the following.
•

Method 2: choose the thresholds for all regions from the same entity of different
candidate regional threshold set each time, and then save the threshold with best
performance/with a largest AUC value as the threshold^ (m denotes to the m' h region of
Gallery Image I n ) into database (Fig. 3.5).

18

TS,

Region 1
l-SC/,.1

2. SCIt 3. SCIiA

t.'sclu,
ts,

Region 2
1. SC

Ti

T2

2. SC,iu
3. SC.11,
f. SC/,,2

TS,
t

Tm

Region m
1. SC.11>m
2. SC^ n
3. SC,u m
•••9
t. SC.11>m

Figure 3 . 4 One threshold for each subject (Method 2)
(Note: t= number of subjects * number of training images per subject)

•

Method 3: Find the threshold with the best performance/with a largest AUC value for
each region, and then save it as the threshold^ of gallery image/„ 's mth region into
database (Fig. 3.6).

19

TS

Region 1

1. SC.,
With best performance

14. SC.

t. sch,,

With best performance

Image 7,
TS,

Region 2

1. SC.h!
33. SC.
t. SC

re,

2

Region

m

1. 5C,
With best performance

24. SC,
t. SC

Figure 3. 5 One threshold for each subject (Method 3)

Theoretically, models with Method 3 should have better performances than models with
Method 2. And both Method 2 and Method 3 should outperform Method 1.

3.1.3.3 Weighted Voting in Regions (Method 4)
In order to compare the regional voting scheme thoroughly with the national voting
scheme, we also employed the fourth method which is regional voting weighted voting in our

20

experiment. We have mentioned previously that the most important concept of our proposed face
verification procedure is 'Threshold.' In method 4, 'weighted voting' can also be considered as
that when we set all of the regional thresholds to 0 (since we use Euclidean distance to measure
the similarity), and use the similarity value in each region as the weight. To implement this
method, we simply sum the similarity distance value Sj of each region and make up a score
value S for an Image / .
We believe that Method 4 should have the best performance among these 4 methods
which we have proposed, because it excludes the error rate that may be caused by "Threshold."

3.1.4 Voting/Scoring Scheme
The Voting/Scoring Scheme is constructed as in Fig 3.6. Each input regional vector of
testing image is matched with its corresponding gallery regional vector by calculating the
similarities between vectors. And then a corresponding regional threshold is used to determine
whether the input regional vector is classified. If it is classified, this region gets a vote/score of 1
or 0. This process was repeated for all the regions and all the votes/scores received by each
image in the database were tracked. Once the voting/scoring was done, a total score of each
image was obtained by summing the votes/scores of all the regions. Now, by employing another
threshold, we can determine whether the whole input image is classified.

21

Gallery Image with
partition of 3x3

Testing Image with
partition of 3x3

Testing Regionl

Gallery Region9
Calculate the similarity
between the Testing
region and Gallery
region

Calculate the similarity
between the Testing
v
region and Gallery
region

Similarity Distance

Similarity Distance

Vote/Score-

1
Vote/Score

0

Verrfied
Testing Image = Gallery Image
Figure 3. 6 Two-Level Regional Voting/Scoring Scheme

22

3.1.5 Shifting
In order to provide robustness to small amounts of shift, normally the shift process is
applied when computing the distance between a testing image region and gallery images regions.
In our study, we use 2 steps shifting for 4 directions: north, south, east, and west, which gives 25
shifts in total for each region, and we then record the smallest distance (Artiklar et al. 1999).

Figure 3. 7 2-step shift of each direction makes 25 shifts in total.

3.2 Performance Measures

3.2.1 Reviews
3.2.1.1 False Reject Rate and False Accept Rate
A false accept rate or FAR is the probability that the system incorrectly matches the input
pattern to a non-matching template in the database. It measures the percent of invalid inputs
which are incorrectly accepted (Biometrics, Wikipedia).

23

False reject rate or FRR is the probability that the system fails to detect a match between
the input pattern and a matching template in the database. It measures the percent of valid inputs
which are incorrectly rejected (Biometrics, Wikipedia).
In FRR/FAR, the result of classification, obtained by varying a threshold, can be
represented in a confusion matrix as shown in Tab. 3.1.
Table 3 . 1 Confusion Matrix

Condition

Test
Outcome

Positive
Negative

Positive
True
Positive
False
Negative

Negative
False
Positive
True
Negative

TP (True Positive), FP (false positive), TN (true negative) and FN (false negative)
represent the number of examples falling into each possible outcome.
The False Reject Rate (FRR) and the False Accept Rate (FAR) are defined as (Biometrics
Wikipedia):

FRR =

FP
TN + FP
Equation 1

FN
FAR = ————
FN + TP

Equation 2

3.2.1.2. Sensitivity and Specificity

24

Sensitivity and specificity are statistical measures of the performance of a binary
classification test. The sensitivity (or true positive rate/recall rate) measures the proportion of
actual positives which are correctly identified, and the specificity (or true negative rate) measures
the proportion of negatives which are correctly identified. A theoretical, optimal prediction can
achieve 100% sensitivity and 100% specificity (Altman and Bland 1994).
Table 3. 2 Sensitivity/Specificity

Condition
Positive
Negative
Test Positive True Positive False Positive
outcome Negative False Negative True Negative

4

Sensitivity

Specificity

The Sensitivity and Specificity are defined as follows:
...
Sensitivity =

number of True Positives
, „„„
= 1 - FRR
number of True Positives + number of False Negatives
Equation 3

Specificity =

number of True Negatives
=
number of True Negative s + number of False Positives
Equation 4

From Equations 3 and 4 (Biometrics, Wikipedia), we notice that Sensitivity/Specificity
has close relation with FAR/FRR. Therefore, the Sensitivity/Specificity curve pair can works
same as the FAR/FRR curve pair. It can be suited to set an optimal threshold for the biometric
system. The higher the acceptance threshold, the lower the Sensitivity. Raising the acceptance
threshold, however, also raises the Specificity. Therefore, using the threshold parameter most
practical biometric systems are not adjusted for Sensitivity = Specificity. The goal must be to
have as large an Sensitivity as possible for any given Specificity, and vice versa; i.e., compare
the Sensitivities at common Specificity, and vice versa (Biometrics FAQ, Bioidentification).

25

3.2.2 Experimental Results Evaluation
Our experimental results are evaluated base on the Area Under the Receiver Operating
Characteristic Curve (AUC-ROC), sensitivity and specificity. The Receiver Operating
Characteristic (ROC) space here is defined by specificity and sensitivity as x and y axes
respectively, which depicts relative trade-offs between true positive and true negative.

3.2.2.1 Receiver Operating Characteristic (ROC)
The Receiver Operating Characteristic (ROC, Wikipedia) plots Sensitivity values directly
against Specificity values. In general, the matching algorithm makes a decision based on a
threshold which determines how close to a template the input needs to be for it to be considered a
match. If the threshold is set to be low, there will be a lower FRR/Specificity but higher
FAR/Sensitivity. Correspondingly, a higher threshold will increase the FRR/Specificity but will
reduce the FAR/Sensitivity. The ROC is limited to values between 0 and 1 on the x axis
(Specificity) and y axis (Sensitivity). It has the following characteristic:
1. The ideal ROC only has values that lie either on the x axis (Specificity) or the
y axis (Sensitivity); i.e., when the Sensitivity is 1, the Specificity is 0, or vice
versa.
2. The highest point is for all systems given by Specificity=0 and Sensitivity=l.
3. The ROC cannot increase.
Since ROC is independent of threshold scaling, it can be used to effectively compare
between different systems (Biometrics FAQ, Bioidentification).

3.2.2.2. Area Under the ROC Curve (AUC-ROC)

26

The experimental results of the proposed model are evaluated according to the arithmetic
mean of the so-called Area Under Curve (AUC) (Fogarty et al. 2005). AUC corresponds to the
area under a ROC curve obtained by plotting Sensitivity against Specificity by varying a
threshold on the prediction value to determine the classification result.
And AUC is the same as the ROC Curve which is independent of threshold scaling and is
limited to values between 0 and 1 on the x axis (Specificity) and y axis (Sensitivity); and it has
the same characteristic as ROC. Therefore, the AUC-ROC statistic is often used for model
comparison (Hanley and McNeil 1983).

1

0

spe

1

Specificity (tniieg)

Figure 3. 8 Area Under the ROC Curve

KDD Cup 2009", http://www.kddcup-orange.com/evaluation.php

27

Chapter 4 Experiments and Results
This chapter focuses on presenting the experiments that have been conducted using the
Two-level Regional Voting/Scoring face verification procedure we have proposed in Chapter 3.
The research procedures will be demonstrated. The results, the accompanying analysis and
evaluation will be covered as well.

4.1 Data Sets
Face recognition is one of the most popular research areas of computer vision and
machine learning. While a lot of face recognition algorithms have been developed, a large
number of face data bases which are necessary to comparatively evaluate these algorithms have
been collected.
Since there are many databases in use currently, the choice of an appropriate data base to
be used usually should be made based on the task given. Here, in my experiment I chose the
ORL Database of Faces (also known as AT&T "The Database of Faces"), and the Yale Face
Database as my experimental databases.

4.1.1 The ORL Database of Faces
•y
The ORL face dataset consists of images of 40 subjects, with 10 grayscale images
(92x 112) per subject, with random variations in facial expression, pose, and lighting, which
amount to a total of 400 faces. The standard task for this set is to identify which individual is
present in a given image, based on some number of training examples. Because there are 40
individuals, theoretical chance (i.e. from guessing) is 1/4, or 2.5%.
3

http://www.cl.cam.ac.uk/Research/DTG/attarchive/facedatabase.html

28

Following proposed experiment classifiers were trained using 2, 5 and 8 training
examples per individual (reserving the remaining 8, 5 and 2, respectively, for testing), with 50
random splits for experiment.

Figure 4 . 1 ORL Database of Face 4

4.1.2 The Yale Face Database
The Yale data set5 is indeed a very small face benchmark. It contains 165 grayscale
images in GIF format of 15 individuals. There are 11 images per subject, one per different facial
expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal,
right-light, sad, sleepy, surprised, and wink. As with the ORL set, the standard task is to identify
the individual on the basis of some number of training examples. Theoretical chance is 1/15, or
6.67%.
Same as the ORL dataset, in the experiment, the proposed classifier were trained with 2,
5 or 8 training examples per individual (reserving the remaining 9, 6 or 3 images, respectively,
for testing), with 50 random splits for experiment.

4
5

See footnote 3
http://cvc.yale.edu/projects/yalefaces/yalefaces.html

29

Figure 4. 2 Yale Face Database 6

4.2 Experiment
We use UIUC 7 versions of database ORL and database Yale that are available online and
all images are already aligned and cropped in a standard way, in order for the comparisons of the
future works by other researchers. For the experiments, we embed into Two-Level Regional
Voting Scheme with the holistic PCA algorithm, and a newly developed holistic algorithm SLDA. The implementation codes of both algorithms are available at UIUC. We chose the number
of reduced dimensions to be min(M, S) - 1 for PCA approach, where M is the total number of
total images and S is the number of pixels in each image (Chen and Tokuda 2009).
We conduct experiments on these data sets with the cropped face images of size 64x64
pixels, each with 256 grey levels per pixel. We use all 50 random splits available in the UIUC
versions to test the performances of our proposed systems.

6
7

See footnote 5
http://www.cs.uiuc.edu/homes/dengcai2/Data/FaceData.html

30

In the Two-level Regional Voting Scheme that we defined in Fig. 3.2, and the Two-level
Regional Voting Model that we setup in Fig. 4.3, we mentioned there would be 2 threshold sets
for each image: T /Regional Threshold, and Tlotal /Total Threshold. But in the experiments, we
will only involve the thresholds for each region, and will not talk about the thresholds for the
whole images. The reason is: we believe that the threshold for each region has the major
influence on the recognition results, and also to find the threshold for each region is more
difficult than to find the threshold for the whole image.
Therefore, in order to implement the proposed procedure without the Total Threshold, in
our experiment we match the input images to all the gallery images in database, so that, after
obtaining the total score of all regions by comparing with different gallery images, we can use
the majority of voting to classify the input images (i.e. find the image with most votes). But,
different from the other three methods, instead of using the voting matrix, Method 4 uses the
scoring matrix which is sum up by similarity distances. Therefore, we find the image with a
minimum score in Method 4, rather than finding the image with most votes.

4.2.1 Setup Mode
Our experimental model is constructed based on the proposed face verification procedure
in Fig. 3.1 and 3.2. The general idea is:
1. For Data Set D, each Image I e D is divided into several regions
IR = {/,,/,,...,/„}«£ Z(Note: n denotes to the number of partitions for one image,
for example, if image I is divided into 10x10 partitions, « = 10 x 10 = 100);

31

2. A corresponding threshold T, = {Tx,T2,...,Tn} of each region and a threshold Ttolal of
each image are generated.
3. When a testing Image I,esting passes through this model, it will be divided into regions
as well, and be projected into lower dimensional subspace by using holistic
algorithms.
4. After calculating the similarity distance for each region between the testing
Image I,esting and the template / , a similarity
SnUtotog,!) = {Si(Ites,ingiJilS2(Itesl!„g2,I2),...,Sn(Iteslingn,In)},ne

Z is obtained.

5. Compare the similarity to thresholdT,, for a l l S R ( I , e s l i n g , I ) , i f S R ( I t e s t i n g , I ) < T I ,
then Vj =1 else V, = 0.
n
6. Finally sum up the votes of all regions Vl0/al =

Vj . And repeat the steps till the
1=1

testing image has been compared with all of the template images in database.
7. According to the Two-level Regional Voting Model that we setup in Fig. 4.3,
n
compare Vtota, = ^ Vj to each other, find the largest value of Vlolal (Note, in Method 4,
/=i
we have to find the smallest value of Vtota,), and then the testing image is classified.
8. By matching the image that is classified to the image which is declared, the testing
image is verified.

32

Testing Image

Data Set D
Gallery
Image

Gallery
Image

J

1

L

1%

T

J

c

T

1

J„

T.

t>
J

I

T

i

S.

T.

T

T.

Calculate Similarity

S,

S.

T

T
Regional Threshold

Regional Threshold

T

T

Vote/Score

V. V,

V. K

V

V
Z v ,

Classified

Figure 4. 3 Two-level Regional Voting Model Setup

4.2.2 Model Training
In model training, the major tasks are to establish the gallery regional vectors database
and to generate the corresponding threshold for each gallery region. To create the gallery
regional vectors is simply dividing the gallery images into regions, and then save the lower
dimensional subspace vectors of the regions into database. In order to generate the corresponding
threshold^, a training data is required. To setup a training data, we split our testing examples of
each image in testing dataset D into two groups: training group Gtrajn and testing group Gtest, as
shown in Fig. 4.4 and Tab. 4.1.

ORL Dataset with 10 grayscale images per subject

riii

1

i /

Gallery Examples
Randomly pick two images as
the training set

K1

^mmr—

flMf

I

' m

Wj

i

i

Mr'

ftt

i ^ J l i

Training Group

Split the testing set into two groups: one with 5 images as the training group that is
used to generate the thresholds; the other with 3 images as the testing group.

Figure 4. 4 Training Data Collecting

34

Table 4.1 Training Data Collection

ORL

Yale

2 Train
5 Train
8 Train
2 Train
5 Train
8 Train

Gallery Testing Examples
Example Training Testing
s
Group
Group
2
5
3

5
8

3

2

1

2
1

6

5

4

3
2

8

2

1

After collecting the training data, use the Regional Scheme to divide both of the gallery
images and the training images into regions (in the experiments, images are divided into 10x10
regions). And then project the regional vectors into lower dimensional subspace. Save the
regional gallery image subspace vectors into database.
Meanwhile, in the Threshold Generator (Fig. 3.1), we use the regional subspace vectors
of the training images to generate the regional thresholds for the corresponding gallery images by
employing the methods that are proposed in section 3.1.6.

4.2.3 Model Testing
We have constructed two testing models by employing two face recognition approaches:
PCA and S-LDA, respectively into our experimental model. The input of these models is the
testing imageItesting from testing group Gtesting . The testing image Itesling is divided into 1 Ox 10
partitions, which is same as the template Image 1gallery.
We have mentioned in previous sections, the receiver operating characteristic (ROC)
curve is one of the effective methods to compare the efficiency of different systems. In order to
employ ROC in our evaluation of the testing results, each model gives outputs which are

35

represented by pairs of Sensitivity/Specificity, and AUC values. The AUC values are the area
under the ROC curves which is constructed by plotting Sensitivities against Specificities.

4.3 Experiment Results Analysis and Discussions
4.3.1 Generating Thresholds

4.3.1.1 One threshold for all subjects (Method 1, 0/1 Voting)
We preset the value offi to 0.90, 0.80, 0.70, 0.60, and 0.50 respectively for each region,
so that, there will be 5 sets of thresholds for each Image / . Using the AUC values, we find the
thresholds with the best performance.

C o m p a r i s o n of Different P r e s e t S p e c i f i c i t y
ORL2TPCA

0.8

• 0.6

w

0.4
Spe=0.80
0.2

Spe=0.70

/

Spe=0.60
Spe=0.50

0

0.2

0.4

0.6

0.8

Specificity

0.85

0.9

Specificity

Figure 4. 5 Comparison of Different Regional Thresholds (method 1)

36

Fig. 4.5 gives an example of ROCs for different preset value of jJ. on Database ORL, with
2 training examples, and using algorithm PCA. The differences of AUCs are tiny, but still the
figure demonstrates that when the value o f / / is set to 0.90, a larger AUC can be obtained.
To prove the conclusion we had from Fig. 4.5, we extended our experiment to the
databases ORL and Yale with 2, 5 and 8 training examples, respectively. The results are shown
as in Tab. 4.2.
Table 4. 2 AUC Results of Different Regional Thresholds (Method 1)

2T
PCAA'1
ORL
LDA#1
PCA#1
LDA#1
YALE
5T
PCA#1
ORL
LDA#1
PCA#1
LDA#1
YALE
8T
PCA#1
ORL
LDA#1
PCAA'1
LDA#1
YALE

AUC
Spe=G,90 Spe=0.80 Spe=0.70 Spe=0.60 Spe=0.50
0.9604
0.9595
0.9588
0.9583
0.9552
0.9449
0.9412
0.9360
0.9404
0.9379
0.9522
0.9537
0.9526
0.9496
0.9462
0.9518
0.9552
0.9484
0.9508
0.9506
AUC
Spe=0.90 Spe=0.8Q Spe=0.70 Spe=0.60 Spe=0.50
0.9944
0.9946
0.9937
0.9922
0.9891
0.9921
0.9926
0.9917
0.9891
0.9858
0.9755
0.9732
0.9668
0.9668
0.9695
0.9783
0.9742
0.9752
0.9803
0.9745
AUC
Spe=0.90 Spe=(J.80 Spe=0.70 Spe=0.60 Spe=0.50
0.9968
0.9946
0.9978
0.9978
0.9912
0.9963
0.9954
0.9933
0.9893
0.9961
0.9819
0.9778
0.9704
0.9700
0.9671
0.9787
0.9870
0.9823
0.9771
0.9750

4.3.1.2 One threshold for each subjects (Method 2, 0/1 Voting)
Following the procedure of Method 2, we set r = 40 for database ORL, while r = 30 for
database Yale. By taking candidate regional thresholds at 7 = m / r intervals, we obtain 40 sets of
candidate regional threshold for database ORL and 30 sets for database Yale. Tab. 4.3 shows the
thresholds with the relatively best AUC value for different databases with different number of
gallery images, and by different algorithms.

37

Table 4 . 3 Threshold Set with Relatively Best AUC Result (Method 2)

Training
Dababase Algorithm Set
2T
PCA
51
8T
ORL
2T
LDA
5T
8T
?!
PCA
5T
8T
YALE
2T
LDA
5T
8T

10x10 with one
threshold for each
subject
threshold
MO
AUC
14
8
11
14
9
16
3
5
6
5
4
3

0.9336
0.9738
0.9736
0.9119
0.9561
0.9524
0.9235
0.9435
0.938
0.9256
0.9458
0.9372

4.3.1.3 One threshold for each subjects (Method 3, 0/1 Voting)
Same as in the Method 2, in Method 3, we obtain 40 sets of candidate threshold for
database ORL and 30 sets for database Yale as well. The experimental results of Method 3 are
shown in Tab. 4.4.

38

Table 4.3 Threshold Set with Relatively Best AUC Result (Method 2)

10X10 w i t h one threshold for
each subject (Method 3)
DataSet &
Algorithm
ORL PCA

ORL LDA

YALE PCA

YALE LDA

Training
Sat
21
5T
8T
2T
5T
8T
2T
5T
81
2T
5T
8T

Sepcificity Sensitivity
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000

0.8433
0.9318
0.9320
0.7898
0.8749
0.8555
0.8293
0.8793
0.8760
0.8307
0.8757
0.8720

AUC
0.9392
0.9756
0.9764
0.9211
0.9584
0.9526
0.9266
0.9445
0.9430
0.9326
0.9484
0.9463

4.3.1.4 Conclusion
As shown in Fig. 4.5 and Tab. 4.2, we conclude that when the preset fi is 0.90, the testing
models with Method 1 have the better performance in most of the cases. While for testing
models with Method 2 and Method 3, according to the results shown in Tab. 4.3 and Tab. 4.4, we
believe that different way of collecting candidate regional thresholds can generate different
regional threshold for the same image. And we also anticipate that different way of generating
thresholds can cause different performance of the same face verification system.

4.3.2 Comparison of Different Training Sets
Martinez et al. had demonstrated that the size of training data set sometimes affects the
performance for different algorithms. For example, when the training data set is small, PCA can
out perform LDA and, also the PCA is less sensitive to different training data set (Martinez and

39

Kak 2001). Therefore, when we compare the performance of different training sets, we take PCA
as an example. The results are shown as following:

Comparison of Different Trainihg Sets
ORL,PCA.1x1

Figure 4. 6 Comparison of different training sets (ORL PCA 1X1)

(NOTE: curve 1 x 1 denotes to the performance of the system with national voting approach,
same as in the later figures)

40

•k^mftwisoft 'vtomm T«fe{| SsW
"••OHLRCAHsihod t

1

Compirisbrs« Digest TraMta S«te
ORl PCA
'

09

08
0.7
OS
as

U
. 0,3 •

0,2
'"•"•21
81

OJ
'0

0,-i 6.2 0.3

0.4

05 OS
. 0,? B.8 3.9 1
Specificity

Compsissfjh of DiSfo'sn! TnMt« S»w
ORL PCA Me&0 3,

f
3.9
nn 07
0,6

; as
0.4
0.3
0.2
......,..?r

er
o

or

0.2 a.3; 8.4 ^ d,s a,6 a? o.a 0,0

i

Figure 4. 7 Comparison of different training sets (ORL PCA 10X10)

As shown in Tab 4. 1 and Fig. 4.6, when the size of training set increases, the size of
training group decreases, which will directly affect the experiment results. Usually, when the size
of training set increases, the performance of the system improves, i.e., under same conditions, the

41

AUC of database with 8 training examples is larger than the one with 5 training examples, and
both of them are larger than the one with 2 training examples.
In Fig. 4.7, the performances of the testing models improve as the number of training
examples changed from 2 to 8 for Method 1 and Method 4. While for Method 2 and Method 3,
we can hardly see any improvement of performance between 5 training examples and 8 training
examples. It is caused by the recollection of training groups. In Tab. 4.1, when the number of
training examples increases to 8 in database ORL, only one example will be left to be the
training group for generating the thresholds, while the other example has to be reserved for
testing purpose. This way, the range of threshold candidates is highly reduced. Therefore, the
conclusion is that the size of Training Group for generating thresholds also affects the
performance of the system.

4.3.3 Database ORL vs. Database Yale
As we known, dataset ORL is a bigger database than Yale. And we believe that the size
of database should have some impacts on the performance of our proposed models.
Table 4. 5 Database ORL vs. Database Yale

Number of
Subject
ORL Database
Yale Database

Image
Total
Examples
Image
per
Examples
Subject
400
40
10
165
15
11

42

Database 0 R L 2 T vs. Yale 2T
: 1x1

/

'ORLPCA
•Yale P C A

/
0

• O R L S-LDA

/
0.1

_L

_L_

0.2

.0.3

0.4

_L
0.5
Specificity

Figure 4. 8 Database ORL vs. Yale (PCA & S-LDA 2T 1X1)

•Yale S-LDA
0,6

0.7

0.9

0.9

1

Dsiaeass.ORtarcs. Ysa2t
PW&S-UVUjs*»S I

Oalsissf OKI 2T vs. <t*t 21
PCASS-LBAIIfe!toiS2;

In Fig. 4.8, it clearly shows that both PCA and S-LDA have much better performances in
a larger database ORL. But after applying the two-level regional voting scheme, in Fig. 4.9, not
only the differences between the performances on different databases are reduced, but also in
some cases, the performances of the smaller database Yale are even better than the larger
database ORL. For example, in Method 2, the figure shows clearly that algorithm S-LDA with

44

database Yale outperforms when it is with database ORL. And in Method 4, different databases
didn't even give any impact on the experiment results. Conclude by reasoning that the proposed
face verification procedure has better performance with a smaller database.
From the point of different algorithms, in Fig. 4.8, obviously, S-LDA outperforms PCA
regardless of different databases. But it shows in Fig. 4.9 that, by adopting the proposed
procedure, PCA has a huge improvement. In Method 2 and 3, S-LDA performs even worse than
PCA, while in Method 1 and 4 S-LDA and PCA have almost the same performance. In other
words, the performances of both algorithms are improved by the newly developed face
verification procedure, but obviously PCA achieves a bigger improvement.

45

4.3.4 Comparison of Different Methods

J.

of Oifemnt Methods
OK. 21 PCA

Cbmosfison nf£?Ss»n! MeihEte
OBliTS-LSA

Csm(i<fis«) o( am* Mutate'
VjIs.zfPC*

CWOWMW 'rfSis«nt |MMl
: fate 2s S-LDA-

D-.Sr

OS -

pej-

0,8;'

o.?

0.7 •

0.6

o.e •

!; 0:5

; 0.5 -

fi i

0:4 •

y-. N,

/
/

05-

0,3
/

I

ftll-

0:2 •

tV

d2

0 . 1 j-

0"

/

0:

-i-.-'.-l -ir i .1
-Msttodi
1 0.2 0.3 04 .'.0,5 0.6 0? 0 8 0.9 1

0:1 "

(J,C

0

0 1 0.2 0,3 04 C

06 a?. 84i M 1

•-Ik <IK'I

Figure 4.10 Comparison of Different Methods (2T)

Method 4 has the best performance than the other three methods which are with
thresholds, regardless of the database and algorithm. The reason is that, in theory, a threshold
should be a value which can exactly block the non-match images away. But the reality is that

46

there must be some errors, so that it is impossible to find such an exact threshold. Therefore,
when we set the threshold, an error rate occurs in the meantime.
Method 3 outperforms Method 2 as we expected, but unexpectedly, Method 1 has better
performance than both Method 2 and Method 3. Before starting the testing, our expectation is
that the method of "One Threshold for Each Subject" should outperform the method of "One
Threshold for All Subjects". But the experimental results gave us a totally reversed conclusion.
Therefore, sometimes the simplest way is the best way.

4.3.5 Recognition Results
At last, with the thresholds that are collected in section 4.3.1, ROC curves are drawn for
the systems with different approaches and methods. By calculating the area under ROC curves
(AUC), we compare the efficiencies of different systems with different partitions. So far, we
have only done our experiment with an image partition of 1 Ox 10. The recognition results are
shown as in following figures:

47

RKagratiasteMK
OB-IT PCA

0.1 0,3 S3 .0.4 as- 0& !).? U8. <5.9.

fate 2f PCA050S

0.?

f"
ti.rt

/

'0.3'

at

—US

6,'i

•—"WxlOMntiSisM
— IfliMMsftoeU
'18, 0.? 5

6,

iaxipMci.sc

""•"SxW Omkii

81 0,2 M OS 0.5 06

„ _ |, j

lOsiOMefatt
- —fG*IOM«W3
— 10K1.0 jS
0.! 0:2 0.3 0.4 m 0.6 07 OS 1X9 !
SfiWfeilj

Figure 4.11 Recognition Results (2T)

It shows in Fig. 4.11, when we run our system in databases with 2 training examples, the
system with two-level voting approach outperforms the system with national voting approach in
most cases. Especially, when we applied the two-level voting approach in database Yale with
PCA algorithm, an outstanding improvement of the system performance has been achieved.

48

Also, the figures show that the system with Method 4 is most efficient, except when it is applied
in database ORL with algorithm S-LDA.
In order to have a thorough view of our testing results for the systems with different
databases and different methods, we have extent our experiments to the different training sets of
database, and obtained Tab. 4.6 which is shown as follows:

49

Table 4. 6 Recognition Results

m m « m <«« <M<wt>eM i w »«
SiiiijeclSftteiwti 11

-m

OataSeS t
SiifMithB)

Set
"ST

OSS. PCA

oat. t

Specificity Set»iSiuff¥
0.9000
0.9000
0.9000
n Qnnn
0.W0

YALE W *

Si
sf

VAI.E I OA

St
" 'si

3.T

0.9000
0.9000
0.9000
0.9000
0.9000
0.9000

0.0525
0.9573
0.9740
0.9335
0.9963
0.9990
0.6330
0.7207
0.7307
0.8080
0.9380
0.9560

AUC
0.9430
0.9828
0.9898
0.9751
0.9975
0.9991
0.0373
0.0556
0.0374
0.9329
0.9791
0.9852

Specifieisy •st,iisi;h%
.0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
a.9i!oo
0.9000
0.9000
0.9000

0.9062
0.9900
0.9980
0.8977
0.9865
0.9945
0.8444
0.9246
0.9400
0.8677
0.9467
0.9560

AUC
0.9604
0.9944
0.9978
0.9522
0.9921
0.9961
0.9449
0.9755
0.9819
0.9552
0.9803
0.9870

one shrews •thl « r
je«.(8Se<ho <j 3J

10X10 With one lhf«.ihoW t s r
eacir s u l i j e d JMsthod 2|

Specificity S[«isi',h,%
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
05000

0.8202
0.9295
0.9290
0.7658
0.8734
0.8515
0.7939
0.8533
0.8587
0,7976
0,8173
0.8507

AUC
0.9336
0.9738
0.9736
0.9119
0.9561
0.9524
0.9235
0.9435
0.9380
0.9256
0.9158
0.93/2

Sjmtificir,- Sc^sitk'siy
0.9000

0.9000
0-0000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000

0.8433
0.9318
0.9320
07898
0.8749
0.8555
0.8293
0.8793
0.8760
0.0307
0.8757
0.0720

ib

AUC
0.9392
0.9756
0.9764
0.9211
0.9501
0.9526
0.9266
0.9445
0.9430
0.9326
0.9484
0.9463

IOKIB

^ VSiM!
• e t h o s ! 4)

Specific-its- teiisKwity
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000
0.9000

0.9205
0.9915
0.9930
0.9133
0.9920
0.9955
0.0969
0.9627
0.9720
0.9342
0.9740
0.9027

AUC
0.9673
0.9964
0.9981
0.9636
0.9965
0.9985
0.9679
0.9066
0.9878
0.9785
0.9912
0.9920

50

From the recognition results in Tab. 4.6, the conclusions can be made that the regional
voting approach can not only be applied into the face verification system, but also it improves
the performance of the system when there is a proper method of generating regional thresholds

51

Chapter 5 Conclusions
As the conclusion of this thesis, this chapter presents a summary of this research work,
followed by some directions in our future research.

5.1 Summary
In this paper, we have proposed the two-level regional voting face verification procedure
which has employed the Electoral College framework into the original face verification process.
Two face recognition algorithms PCA and S-LDA are embedded into the proposed procedure.
The experiments match with the theories that we have proposed:
•

The Electoral College framework is adopted into face verification systems successfully.

•

We have known that when both the traditional algorithm and newly develop algorithm
are used as global approaches for matching whole face images directly, the newly
developed approaches do has a significant improvements compared with the traditional
approaches (Chen and Tokuda 2005). But, when used the Electoral College framework
although both the performances of the traditional holistic algorithms and the newly
developed holistic algorithm have been improved, the traditional algorithm PCA
achieved a tremendous improvement compared with the newly developed algorithm SLDA.

•

According to the experiment result, the Electoral College framework improves the
performance of the face verification systems in different databases, especially for smaller
database

52

•

Also, different from face identification systems, the face verification systems have a key
concept called "Threshold". In the experiments, we have proposed fours different
methods of generating thresholds. The experiments results also showed that with different
methods of generating thresholds, the performance of the proposed system varies.

5.2 Future Works:
In our experiment, we tested our models by using ORL and Yale database of faces which
are known as relatively small databases. Although we have concluded that the Electoral College
framework has better performance for smaller database, a test on some larger database, such as
FERET will be helpful for the further investigation into the face verification system with
Electoral College framework embedded.
Also, in the experiments, we have only constructed our testing models with two-level
regions, and in Dr. Chen's previous research, it is showed that multi-regional voting with smaller
sized regions always demonstrates and improved stability over those with larger sized regions,
including the national voting in its limiting case in particular (Chen and Tokuda 2003).
Therefore, in the future, to extent our two-level regional voting models to multi-level regional
voting model (such as, three-level or four-level) may be the next research concern.
We have embedded the traditional algorithm PCA and the newly developed algorithm SLDA in the proposed procedure. And there are some other traditional algorithm and newly
developed algorithm, such as LDA, and fisherface algorithms, S-LPP and SRDA, which can be
also embedded in our proposed procedure. According to the experiment results, traditional
algorithm PCA has achieved a greater improvement than S-LDA does. But to obtain a

53

conclusion of "the proposed procedure improves the traditional algorithms more", we will need
more experiments on employing more algorithms into the proposed procedure,
In the experiment, we have only took the partition of 1 Ox 10 as a comparing object, and
from the previous work of Chen and Tokuda (Chen and Tokuda 2003), we have known that the
partitioning of images also affects the experiment results. Therefore, a thorough experiment on
different partitioning is necessary for future research.
We have emphasized that the method of generating threshold has strong influence on the
performance of systems, in other words, in order to improve the performance of the proposed
face verification system, method of generating threshold will be one of the future research target

54

References
'Mugspot' can find A face in the crowd - face-recognition software prepares to go to work in the
streets. 1997. ScienceDaily. 12 November.
Biometrics. Available from http://en.wikipedia.org/wiki/Biometrics.
Biometrics FAQ. Available from http://www.bromba.com/faq/biofaqe.htm.
Biometrics history. Available from http://www.biometrics.gov/Documents/BioHistory.pdf.
Receiver operating characteristic. Available from
http://en.wikipedia.or g/wiki/Recei ver operating characteristic.
Altman, D. G., and J. M. Bland. 1994. Diagnostic tests 1: Sensitivity and specificity. BMJ 308,
(1552) (11 June).
Artiklar, M., M. Hassoun, and P. Watta. 1999. Application of a postprocessing algorithm for
improved human face recognition. Washington, D. C., USA.
Artiklar, M., X. Y. Mu, M. H. Hassoun, and P. Watta. 2003. Local voting networks for human
face recognition. Neural Networks 3, (20-24): 2140-5.
Bartlett, M. S., J. R. Movellan, and T. J. Sejnowski. 2002. Face recognition by independent
component analysis. IEEE Transactions on Neural Networks 13, (6) (November): 1450-64.
Belhumeur, P. N., J. P. Hespanha, and D. J. Kriegman. 1997. Eigenfaces vs. fisherfaces:
Recognition using class specific linear projection. IEEE Trans. Pattern Analysis and
Machine Intelligence 19, (7): 711-20.
Belhumeur, P., J. Hespanha, and D. Kriegman. 1996. Eigenfaces vs. fisherfaces: Recognition
using class specific linear projection. Cambridge, UK.
Bledsoe, W. W. 1964. The model method in facial recognition. Palo Alto, CA: Tech. Rep. PRI:
15.
Bradley, A. P.,. 1997. The use of the area under the ROC curve in the evaluation of machine
learning algorithms. Pattern Recognition, Patter Recognition Society 30, (7): 1145-59.
Cai, D., X. He, and J. Han. 2008. SRDA: An efficient algorithm for large scale discriminant
analysis. IEEE Transactions on Knowledge and Data Engineering 20, (1) (January): 1-12.
Cai, D., X. He, Y. Hu, J. Han, and T. Huang. 2007. Learning a spatially smooth subspace for
face recognition. IEEE Inter National Conference Computer Vision Pattern Recognition.

55

Chen, L., and N. Tokuda. 2009. A unified framework for improving the accuracy of all holistic
face identification algorithms. Springer Science + Business Media (October),
http://www.springerlink.com/content/h42t422t435183 80/.
Chen, L., and N. Tokuda. 2005. A general stability analysis on regional and national voting
schemes against noise - why is an electoral college more stable than a direct popular
election? Artificial Intelligence 163, (1): 47-66.
Chen, L., and N. Tokuda. 2003. Robustness of regional matching scheme over global matching
scheme. Artificial Intelligence 144, : 213-232.
Chen, L., and N. Tokuda. 2003. Stability analysis of regional and national voting scheme by a
continuous model. IEEE Transactions on Knowledge and Data Engineering 15, (4)
(July/August): 1037-42.
Comon, P. 1994. Independent component analysis, A new concept? Signal Processing, Elsevier
36, (3) (April): 287-314.
Dattorro, J. 2005. Euclidean distance matrix. In Convex optimization & euclidean distance
geometry. USA: Meboo Publishing.
Delac, K., M. Grgic, and S. Grgic. 2005. Independent comparative study of PCA, ICA, and LDA
on the FERET data set. Wiley Periodicals, Inc. 15, (5): 252-60.
Faltemier, T., K. Bowyer, and P. Flynn. 2006. 3D face recognition with region committee voting.
Fisher, R. A. 1938. The statistical utilization of multiple measurements. Annals of Eugenics 8, :
376-86.
Fisher, R. A. 1936. The use of multiple measurements in taxonomic problems. Annals of
Eugenics 7, : 179-88.
Fogarty, J., R. Baker, and S. Hudson. 2005. Case studies in the use of ROC curve analysis for
sensor-based estimates in human computer interaction. Waterloo, Ontario, Canada.
Guo, G., S. Z. Li, and K. Chan. 2000. Face recognition by support vector machines. Grenoble,
France.
Hanley, J. A., and B. J. McNeil. 1983. A method of comparing the areas under receiver operating
characteristic curves derived from the same cases. Radiology 3, (148) (1 September): 83943.
He, X., and P. Niyogi. 2003. Locality preserving projections. Vancouver, Canada.
He, X., S. Yan, Y. Hu, P. Niyogi, and H. -J Zhang. 2005. Face recognition using laplacianfaces.
IEEE Transactions on Pattern Analysis and Machine Intelligence 27, (3): 328-40.

56

Hua, G. 2008. Face recognition by discriminative orthogonal rank-one tensor decomposition. In
Recent advances in face recognition., eds. Kresimir Delac, Mislav Grgic and Marian
Stewart Bartlett, 236. Vienna, Austria: .
Jonsson, K., J. Matas, J. Kittler, and Y. P. Li. 2000. Learning support vectors for face
verification and recognition. Grenoble, France.
Kang, H., T. F. Cootes, and C. J. Taylor. 2002. A comparison of face verification algorithms
using appearance models. British Machine Vision Conference 2, : 477-82.
Kelly, M. D. 1970. Visual identification of people by computer. Tech. Rep. AI-130.
Kirby, M., and L. Sirovich. 1990. Application of the karhunen-loeve procedure for the
characterization of human faces. IEEE Pattern Analysis and Machine Intelligence 12, (1):
103-8.
Kohonen, T. 1989. Self-organization and associative memory. Berlin.
Liu, C., and H. Wechsler. 1999. Comparative assessment of independent component analysis
(ICA) for face recognition. Washington D. C. USA.
Liu, C., and H. Wechsler. 1998. Face recognition using evolutionary pursuit. Freiburg, Germany.
Martinez, A. M., and A. C. Kak. 2001. PCA versus LDA. IEEE Trans, on Pattern Analysis and
Machine Intelligence 23, (2): 228-33.
Moon, H., and P. J. Phillips. 2001. Computational and performance aspects of PCA-based face
recognition algorithms. Perception 30, : 303-21.
Phillips, P. J., P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min,
and W. Worek. 2005. Overview of the face recognition grand challenge.
Phillips, P. J., T. W. Scruggs, A. J. O'Toole, P. J. Flynn, K. W. Bowyer, C. L. schott, and M.
Sharpe. 2007. Local FRVT 2006 and ICE 2006 large-scale results. National Institute of
Standards and Technology Gaithersburg (March).
Sirovich, L., and M. Kirby. 1987. Low-dimensional procedure for the characterization of human
faces. Journal of the Optical Society of America: 519-24.
Turk, M., and A. Pentland. 1991. Eigenfaces for recognition. J. Cog. Neuroscience 3, (1): 71-86.
Turk, M., and A. P. Pentland. 1991. Face recognition using eigenfaces. Maui, Hawaii.
Wiskott, L., J. M. Fellous, N. Krueuger, and C. von der Malsburg. 1997. Face recognition by
elastic bunch graph matching. IEEE Trans, on Pattern Analysis and Machine Intelligence
19, (7): 775-9.

57

Zhao, W., R. Chellappa, P. J. Phillips, and A. Rosenfeld. 2003. Face recognition: A literature
survey. ACM Computing Surveys 35, (4): 399-458.

58