SCALE-SPACE AND WAVELET DECOMPOSITION BASED SCHEME FOR
FACE RECOGNITION USING NEAREST LINEAR COMBINATION

by

Farhana Hoque
B.Sc., University of Dhaka, 2008
M.Sc., University of Dhaka, 2009

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
IN
COMPUTER SCIENCE

UNIVERSITY OF NORTHERN BRITISH COLUMBIA
February 2014

© Farhana Hoque, 2014

UMI Number: 1525709

All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.

Di!ss0?t&iori P iiblist’Mlg

UMI 1525709
Published by ProQuest LLC 2014. Copyright in the Dissertation held by the Author.
Microform Edition © ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code.

ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346

Abstract

Face recognition has attracted much attention from Artificial Intelligence researchers
due to its wide acceptability in many applications. Many techniques have been sug­
gested to develop a practical face recognition system th a t has the ability to handle
different challenges. Illumination variation is one of the major issues th at significantly
affects the performances of face recognition systems. Among many illumination ro­
bust approaches, scale-space decomposition based methods play an important role in
reducing the lighting effects in facial images. This research presents a face recognition
approach for utilizing both the scale-space decomposition and wavelet decomposition
methods.
In most cases, the existing scale-space decomposition methods perform recogni­
tion, based on only the illumination-invariant small-scale features.

The proposed

approach uses both large-scale and small-scale features through scale-space decom­
position and wavelet decomposition. Together with the Nearest Linear Combination
(NLC) approach, the proposed system is validated on different databases. The exper­
imental results have shown that the system outperforms many recognition methods
in the same category.

Contents

Abstract
List of Tables
List of Figures
Acknowledgement
1 Introduction
1.1

M otivation.................................................................................

1.2

C ontribution..............................................................................

1.3

Overview of the T h e s is ..........................................................

2 Literature Review
2.1

Face Recognition S y s te m .......................................................

2.2

Feature Extraction Approaches

2.3

2.2.1

Local Feature based A p p ro ach ................................

2.2.2

Holistic Template or Appearance based Approach

Classification Approaches
2.3.1

2.4

..........................................

....................................................

Nearest Linear C o m b in atio n ....................................

Summary

.................................................................................

3 Illumination Invariant Approaches and Wavelet Decomposition

19

3.1

Statistical A p p r o a c h ..............................................................................

20

3.2

Light Modelling A p p ro a c h ....................................................................

20

3.2.1

Illumination Cone M e t h o d s

.................................

21

3.2.2

Quotient Image (QI) based Approaches

.............................

21

3.2.3

Spherical Harmonic based A p p ro a c h e s .................................

22

The T V + L 1 Model for Additive Signal Decomposition....................

23

3.3.1

Param eter A and Image Feature S c a le....................................

25

3.3.2

Properties of the T V + L 1 M o d e l.............................................

26

3.3.3

Application of the T V + L 1 M o d e l..........................................

28

3.4

Wavelet T ra n s fo rm .................................................................................

30

3.5

Summary

35

3.3

.................................................................................................

36

4 Proposed Approach
4.1

Face Recognition System based on the Proposed Approach . . . .

38

4.1.1

Training Stage

...........................................................................

38

4.1.2

Recognition S ta g e .......................................................................

43

..............................................................

44

.................................................................................................

46

4.2

Pseudo-code of the Approach

4.3

Summary

47

5 Experimental Results
.................................

48

5.1.1

Datasets: Yale, ORL and P I E .................................................

48

5.1.2

Results on Yale, ORL and PIE D a t a s e t s .............................

50

5.1.3

Experiment on FERET D a ta s e t..............................................

55

5.2

Computational Com plexity....................................................................

57

5.3

Summary

58

5.1

Experiments on Yale, ORL and PIE Datasets

.................................................................................................
iv

6 Conclusions
Bibliography

List of Tables
5.1 Average error rates and STD on Yale Database with different Train/Test
sizes.....................................................................................................................

51

5.2 Average error rates and STD on ORL Database with different Train/Test
sizes.....................................................................................................................

52

5.3 Error rates on PIE Database with different Train/Test sizes..................

52

5.4 Error rates on different Probe sets of FERET Database.........................

57

vi

List of Figures
5

2.1

Face Recognition System...............................................................................

3.1

Face images under different illumination conditions in CMU PIE Database. 20

3.2

Image decomposed into cartoon (u ) and texture (v) using varying value
of A......................................................................................................................

24

3.3

Extraction of signals with different scales by the T V + L 1 model [109].

28

3.4

Input images and their v components as the result of the T V + L 1 model. 28

3.5

Discrete Wavelet Transform (DWT) [65]...................................................

32

3.6

An image is decomposed into four images after one level DWT.

...

32

3.7

Two level wavelet decomposition [65].........................................................

34

4.1

Block diagram of the proposed face recognition system.........................

39

4.2

Extraction of features in different bands after scale-space decomposi­
tion of an image................................................................................................

41

5.1

Images from Yale Database...........................................................................

49

5.2

Face images of 3 persons in ORL Database..............................................

50

5.3

Eight different face images of 3 persons in PIE Database......................

50

5.4

Comparison of error rates of the NLC+LS with two other methods for
Yale Database...................................................................................................

53

Comparison of error rates of the NLC+LS with two other methods for
ORL Database..................................................................................................

54

5.5

5.6

5.7

Gallery image and different types of probe images of a subject from
FERET Database.............................................................................................

56

Example of probe images from Fc probe set..............................................

56

viii

Acknowledgement

Firstly, I would like to express my sincere thanks to my supervisor, Dr. Liang
Chen, for his continued support and guidance throughout my studies at the University
of Northern British Columbia. W ithout his assistance I would not be able to complete
my thesis. Words can not express my deepest sense of gratitude to him. I would also
like to thank my supervisory committee members Dr. Desanka Polajnar and Dr.
Jianbing Li for their guidance and encouragement.
I would like to take this opportunity to thank the University of Northern British
Columbia for providing constant support throughout the last two years. I would like to
thank Marva Byfield, Administrative Assistant of the Computer Science Department.
I would also like to give thank to Dr. Jean Wang, HPC Senior Lab Instructor, for
providing technical help on my research work. Also, I would like to thank all my
friends, especially Dhawal, Negar, Adiba, Raman and Giridhar for their support and
help.
Finally, I give special thanks to my parents for their unconditional love and en­
couragement. It is only because of their endless help, I have reached this point. I can
not exclude thanking my husband Kaisur Rashed, for his encouragement and support
throughout my studies.

Chapter 1
Introduction

1.1

Motivation

Face recognition is a major and well-known research area in computer vision.
It is also closely related to pattern recognition, image processing, neural networks,
computer graphics, and psychology [112]. It is a biometric method th at is used to
identify or verify persons from a digital image or video, based on different facial
features.
Face recognition has several advantages over other biometric methods, which make
it the most favourable choice for identifying people today. Face images can be easily
captured from a distance by a camera without much involvement of the user, and
this is useful for security and surveillance purposes [50]. The presence of noise and
variation of illumination in obtained face images can be removed by using proper
preprocessing and recognition algorithms.
Due to these advantages, face recognition has many important applications in
biometrics, access control, security [47], video surveillance, law enforcement [112] and
1

entertainment [49]. The increasing demand of commercial and security applications
has inspired the researchers to focus on developing various machine face recognition
systems.
However, to design a robust and practical face recognition system, researchers
have to overcome different limitations, such as variation in lighting or illumination,
head pose, facial expression, accessory, aging, and so on [45].

The changes th at

occur in facial appearance under abnormal lighting conditions can severely degrade
the performance in face recognition. To solve the problem of face recognition under
varying illumination, various approaches have been proposed in the past. Among these
approaches, scale-space decomposition based methods have attracted a lot of attention
because of their simplicity and efficiency. Scale-space decomposition methods roughly
decompose a facial image into large-scale features (like illumination, shadow cast
etc.) and small-scale features (like eyes, nose and mouth). These methods use only
the illumination-insensitive small-scale features for recognition and ignore the largescale features [45]. However, according to various researches [44] [106] [105], largescale features are also useful for good recognition performance.

The goal of this

research is to use both large- and small-scale features of facial images for the purpose
of recognition.

1.2

Contribution

The proposed approach is developed based on utilizing facial features of different
scales. The major contribution of this research is th at it uses a combination of scalespace decomposition and wavelet decomposition, to extract the facial features from
images. Then for the classification, the representation of an individual is generated as
a linear combination of decomposed feature vectors by using the Nearest Linear Com­
bination (NLC) approach. Results obtained from experiments on several benchmark
2

databases have shown th a t the proposed approach is better than various standard face
recognition methods. Also, the approach reduces the error rates of several scale-space
decomposition based methods.

1.3

Overview of the Thesis

The conducted research presented in this thesis is organized into several chapters.
Chapter 2 contains the basic ideas and background of the face recognition problem. It
also reviews the existing approaches that are related to the proposed method. Chapter
3 provides an overview of different illumination invariant approaches. It also intro­
duces the scale-space decomposition and wavelet decomposition methods. Chapter
4 briefly describes the design of the proposed methodology. Chapter 5 demonstrates
the experimental procedure, the results, and a comparative study with other methods
to evaluate the effectiveness of the proposed approach. Finally, Chapter 6 concludes
the thesis, and summarizes the proposed approach and provides suggestions for ex­
ploration in the future.

3

Chapter 2
Literature Review
Many researchers have tried to use knowledge of the human recognition process in
order to solve the automatic face recognition problem.
Recent studies [14] [70] have found th at the ability to recognize face is an excep­
tional feature of our brains and a special part of the brain is used to do this. However,
researchers are still working to find accurate information about how the brain per­
forms face recognition. It is useful to observe th at when the special area of the brain
responds to faces, many neural networks become involved in processing information
from the faces. Here, this mechanism uses almost hundreds of billions of neurons.
Therefore, it may be impractical to implement the same process through the machine
face recognition system.
Furthermore, the human brain can easily recognize a face even if the appearance
of the face is changed by illumination, expression, pose, or accessories. However, an
automatic face recognition system sometimes may not be successful in this case.
Considering these facts, it is very difficult to design a face recognition system

4

which simulates the human face recognition process absolutely. To solve the problem,
various approaches have been developed with different mathematical tools to compute
the similarity between faces instead of using human-relevant features.

2.1

Face Recognition System

Face recognition systems simply take an image or video as an input, and identify
the person or people from the image or video. To recognize a face, a system can be
designed using these three steps [112] as shown in Figure 2.1.

Input

Face

Feature

Image

Detection

Extraction

Face
Recognition

Identity
of th e
person

Figure 2.1: Face Recognition System.

The first stage involves locating and extracting all faces in the image using different
algorithms. Performance of the face recognition partially depends on how perfectly the
faces are detected and this step plays a vital role in the recognition systems. Most of
the recognition approaches were evaluated by using different image databases, where
faces are already detected, aligned, and cropped from the original picture. Machine
recognition systems work with two set of face images: firstly, a gallery image set, where
the faces are known. Secondly, a test or probe image th at needs to be identified. The
gallery image set contains one or more than one image for each person with variation
in expression, pose, lighting, or accessory.
The second step of the recognition system is designed to obtain significant features
from the gallery face image set using various feature extraction techniques. These
techniques represent the gallery images and the test image through lower dimensional

5

feature vectors [67].
Finally, the system uses a classification algorithm th at compares the gallery image
representations and the test image vector (both obtained from the second step) to
compute the similarity between them and the recognition result is reported. Numerous
face recognition approaches have been proposed based on the combination of the last
two steps. The issues of different feature extraction, as well as classification algorithms
will be highlighted in this chapter.

2.2

Feature Extraction Approaches

Humans can easily recognize a person even when wearing sunglass, hat, or other
accessories. Age of the person also does not affect the recognition ability of humans.
All these appear as challenging factors for the machine face recognition system and
these effects can be handled through the use of efficient feature extraction techniques.
The goal of feature extraction is to obtain a small set of relevant features from a face,
which will decrease the false positive ratio in classifiers [51]. Feature extraction is
also useful for dimension reduction, which in turn reduces the computation time and
memory requirement. Therefore, feature extraction is treated as a core part of the
face recognition problem.
There are many algorithms for feature extraction and these approaches are divided
into two major categories: geometrical local feature based methods and holistic tem­
plate matching or appearance based approaches [50] [17] [41] [46]. Holistic approaches
use the overall representation from the whole facial image while local feature based ap­
proaches use some prominent features of the face to perform recognition. An overview
of these two approaches is given below.

6

2.2.1

Local Feature based Approach

In this approach, discrete facial features like eyes, mouth, nose, hair, and chin
etc. are extracted and then geometric relationships including areas, distance, and
angles among those features are measured [50] [63].

Face images are compared

through these geometric feature representations using various standard recognition
techniques [50] (76]. The Elastic Bunch Graph Matching (EBGM) method [102], De­
formable Templates [110], and Hough Transform methods [74] are well-known local
feature based approaches. Advantages of this scheme are th at it is insensitive to pose,
illumination, or size variation in the input images [50]. Although it provides a con­
densed feature representation of a face image th a t makes the matching faster [16],
the convenience of these features depends on the efficiency of the feature detection
stage [50] [30] [63]. In practise, the low accuracy and high computational complex­
ity of the available feature detection techniques degrade the performance of the local
feature based approaches [57]. Hence, nowadays holistic approaches are extensively
used in solving most of the face recognition problems.

2.2.2

Holistic Template or Appearance based Approach

This approach represents a face image as a two-dimensional array of intensity
variation and compares the image to several templates of other faces to classify the
image [50]. In earlier solutions, face matching was performed using direct correlation
based approaches. Due to very high dimensional space, these approaches were not
able to overcome the limitations of face size, orientation, lighting, and noise [50]. The
common solution is to apply dimension reduction tools th at extract the basis informa­
tion required for classification. These projection methods create a lower dimensional
face space, where faces of different classes reside in separate regions in th a t space.

7

The general idea behind the dimension reduction method is stated below.
In projection based approaches, a training data G can be represented as a matrix
of N rows of m n dimensional image vectors. After the dimension reduction step, a
lower dimensional subspace denoted as P of size m n x k is obtained. To obtain the
most significant information from the images, G is projected into the subspace as
following:
Ap - G P

(2.1)

where X p is a N x k matrix that is the basis of the low dimensional projection of G.
From a test image Y of size m x n, we obtain, Yp = Y P . Finally, a comparison
between Yv and X p is performed using some measures to find the highest similarity.
Many statistical projection based procedures are used as feature extraction meth­
ods, such as PCA [98], LDA [12], ICA [10], weighted PCA [101], two dimensional
PCA [108] [64], kernel PCA [71], and so on. Some feature extraction approaches,
like Discrete Cosine Transform [29] and Discrete Wavelet Transform [60] are used to
downsample images into lower resolution components. Mostly these feature extraction
approaches are used as a portion of face recognition systems with different classifi­
cation techniques. The basic principles of Principal Component Analysis (PCA) are
explained in the following section, which is used in the proposed approach to extract
features, as well as to reduce dimension.

2.2.2.1

Principal Component Analysis

Principal Component Analysis (PCA) is a common statistical technique for de­
riving features from multi-dimensional d ata so th at the features contain most of the
variance of the data. In addition to feature extraction, PCA is also used to reduce
dimension, eliminate redundancy, and have a compact representation of huge data

without much loss of information. Turk and Petland [98] first introduced the tech­
nique to solve the problem of face recognition. They proposed the Eigenfaces approach
based on the fact th at compressed but im portant patterns from a facial image could
improve performance of the face recognition approaches.
The Eigenfaces approach consists of two stages: training and recognition.
Assume the training set contains M images of size m x n and each image is
converted to one-dimensional column vector of size m n x 1 or N x 1 (let N — mn).
These column vectors are inserted into a matrix in row wise. Direct use of this high
dimensional space to the face recognition task makes it very complex. Therefore, to
perform recognition, we need to derive a lower dimensional space that retains as much
variance of the images as possible [5]. In this method, maximum nonconformity of
each face from the mean face image is computed to form the covariance matrix of all
faces. The variance of face images is then represented through eigenvectors obtained
from the covariance matrix [9].
Let X be the resultant training data matrix of size M x N. Firstly, we have to
calculate the mean vector u along each dimension

= 1,2,..., M.

N

( 2 .2 )

Then the mean vector is subtracted from each column of X to create the mean-centred

9

matrix X m, where

— Ui

X 12 ~ U i

•• •

X\N —U i

X21 ~ U2

X22 ~ U2

‘ ’

X2N ~ U2

X u

(2.3)

X

m

1 -

u

m

X

m

2

~

u

m

■ ■ ■

X

m

n

-

Next step is to find the N x N covariance matrix C of X m.

(2.4)

Then the matrices of eigenvectors and related eigenvalues are computed from the
covariance matrix. Both of them will have dimension N x N . This computation causes
high memory and time consumption. An alternative solution to this problem is to
check if M < N , then eigenvectors and eigenvalues are calculated using the following
covariance matrix C of size M x M, which can lessen the computational complexity.

C = X mx'm

(2.5)

In this case, eigenvectors and eigenvalues will have dimension M x M.
Computed eigenvectors indicate a set of features or eigenfaces th at express the
direction of maximum variance among faces and the corresponding eigenvalues denote
the value of variance [98].

Eigenvectors are usually arranged in decreasing order

according to eigenvalues.
To reduce the dimension, let P eigenvectors be selected from the eigenfaces, to
10

form a set V th at has the P highest eigenvalues. These P eigenfaces are used to create
a P-dimensional subspace or face space th a t describes the most variance within the
set of face images [98].
Afterwards, the training d ata matrix X m is projected into the face space to obtain
the matrix X p of size M x P
( 2 .6 )

where X p is the representation of training images in the face space. The advantage
of this transformation is th a t each training image is now converted into a vector of
size 1 x P instead of 1 x N, where P < N.
In the recognition stage, a probe face image is taken th a t contains the same di­
mension of the training images, and converted into a matrix Y of size 1 x N (where
N = m x n), which comprises the mean subtracted values in its columns. To get the
feature vector from the probe image, Y is projected into the same subspace to get a
row vector Yp consisting of P columns.

(2.7)

Classification is done by using the Nearest Neighbour (NN) classifier. The probe
feature vector is compared to every training projection and the identity of the training
image th a t has the best match to the input face is chosen to be the result of recog­
nition. Recognition is done by computing the Euclidean distance D from the probe
image representation to all training image representations.

D = \\YP - X P

11

( 2 .8 )

There will be a total of M distances for M training images and the unknown
probe image is classified to the class of the most similar training image, which has
the minimum distance with the input image.
W ith the introduction of the Eigenfaces approach, other subspace projection tech­
niques can be explored. In 1997, Belhumeur [12] proposed the Fisherfaces approach
based on Fisher’s Linear Discriminant Analysis (LDA) [31]. It creates a lower dimen­
sional space to project the input data, where patterns belonging to different classes
are well separated from each other while patterns of the same class stay relatively
close. This task is accomplished by extracting a set of vectors, which best discrimi­
nates among the classes in order to minimize the within-class scatter and maximize
the between-class scatter [26]. PCA is used as a preprocessing step to reduce the
computational complexity. The recognition of face images is done in the same way as
in the Eigenfaces approach. The limitations of LDA are th at it requires more than one
gallery image per person to work properly and it is also sensitive to different training
data sets [50].
Independent Component Analysis (ICA) is another subspace projection method
proposed by Barlett [10], which is used to find basis vectors th a t are spatially local­
ized and statistically independent [26]. This approach was tested using two different
representations on the FERET [79] database. Although pre-application of PCA in­
creases the performance of ICA by reducing dimension, it still requires more time to
compute ICA basis vectors compared to PCA.
Instead of having other subspace projection approaches, PCA is considered one
of the most popular and efficient approach for feature extraction. This approach is
embedded into the proposed research to extract features. However, in PCA repre­
sentation, two images of the same person and two images of different persons are

12

treated almost in the same way, which characterizes the poor discriminatory power
of PCA [30] [63] [86]. This problem can be solved by using Linear Discriminant
Analysis (LDA) [12], but this requires a large number of images per class. The ex­
pensive process of finding eigenvectors is also a disadvantage of PCA. If the number
of training images M is larger than N (the number of pixels in training images), then
it involves very high computational complexity th a t depends on N. To reduce the
complexity, eigenvectors are calculated through an alternative way using (2.5) when
M < N. However, the complexity will be increased in polynomial order with the
increase of M. The problem can be solved by combining the Discrete Wavelet Trans­
form (DWT) with PCA to extract features, which reduces the computational load,
and also improves the discriminatory power of PCA [30].

2.3

Classification Approaches

The classification approaches are applied on the feature space obtained from the
feature extraction step. The goal of the classification problem is to properly sepa­
rate the patterns belonging to different classes by decision boundaries. The decision
boundaries are determined through the probability distribution of features of each
class [87]. Classification is performed based on the similarity of the input image with
the gallery images, which is measured through minimizing the error between them.
Different classifiers are usually used in face recognition algorithms, such as the Nearest
Neighbour (NN) [3], Nearest Linear Combination (NLC) [58], Bayesian classifier [81],
Neural Networks [66], Support Vector Machines [43], and Decision tree [28]. To solve
the classification problem in this proposed study, the Nearest Linear Combination
(NLC) classifier is preferred to use.

13

2.3.1

N earest Linear Combination

Nearest Linear Combination (NLC) is a pattern classification approach which is
successfully applied in face recognition [58]. The main concept of this technique is
that patterns belonging to a same subject or class are assumed to be the elements of
one linear subspace [12] [11] [48]. Based on this concept, a linear combination of the
gallery image vectors from an individual is developed to represent a probe or query
image vector of th at person [58]. The linear combination is termed as NLC since it
is closest to the probe face. The NLC establishes relation between the gallery images
and the probe image. For the classification, distances between the given probe image
and all the linear combinations th at represent classes are computed. The class of the
probe image will be decided in favour of the class with minimum distance [48].
Let N be the number of classes and m be the number of prototypes for each class.
Assume th at {xl5 x 2, ..., x m} is a set of prototypical vectors of gallery images from
a face class C. In the training stage, each prototypical vector in n-dimensional space
can be represented as, Xk =(£fci, •••,£*«) where Xk G Rmxn, k = 1 , 2 ,. .. ,m.
A linear combination Yp € Rnxl of the prototypical vectors of the same class is
formed as

Yp —XiPi + x 2&2 + • • ■+ xm0m = X 0

( 2 -9 )

where X € Rnxm is the class specific model or predictor of the class C th a t can be
developed by piling each n-dimensional vectors of th at class and 0 G Rmxl is the
vector of parameters or weights of the linear model. The expression can be written

14

in m atrix notation.
"
V pi

Xn

X 12

X 1m

Pi

Xn

X 22 ••• X.2m

Pi

=

( 2 . 10 )

Xn 1 X n2 ' *’

Pn

For a given unknown probe image vector Y € R "x l, weights can be estimated using
the least-square estimation [48] [88] [84].

P = { X ' X ) ~ lX ' Y

( 2 . 11 )

W ith the calculated weights P and predictors X , the linear combination or response
vector Yp for the class C can be determined using (2.9) and (2.11).

Yp = X { X ' X ) - ' X ' Y

( 2 . 12 )

At the recognition step, the Euclidean distance between the response vector Yp
and the probe image vector Y is measured using the following formula [48]:

Dist(Y,Yp) = \ \ Y - Y p\\2

(2.13)

There will be N linear combinations to represent N classes and therefore, a total of
N distances need to be computed to classify the probe image Y.

DistpY, Yp,) = ||K - Y ^ , i = 1 , 2 , . . . , N.

15

(2.14)

The probe image is classified into one of N classes, with which it has minimum
distance.
Compared to the Nearest Neighbour (NN), the NLC approach is more suitable for
the case if there exists more than one training image per class [58]. In NN classifica­
tion, it needs to compare a probe image to every training image, where the expression
and illumination variations in training images affect the result. But unfortunately in
most cases, sufficient numbers of prototypes are not available for a face class th a t can
account for all the changes in images. To overcome this limitation of the NN, the
NLC approach makes a linear combination of the prototypical vectors of each class
and compares the probe image to each linear combination. The changes in estimated
parameters th at form the linear combinations are considered to cover all the possible
variations in face images [58]. In this way, it provides a large number of representative
prototypical points for each class, which characterize more changes in images than the
available prototypes [59].
Li [58] proposed the NLC approach th a t used the Eigenfaces technique [98] in the
feature extraction step. The approach achieved considerably better recognition result
than the standard Eigenfaces technique and the NN method when experimented on
two custom databases.
Later in 1999, Li and Lu [59] proposed another classification method, called the
Nearest Feature Line (NFL) approach, based on the idea of the NLC approach. In
the NFL method, a linear model with a pair of images of a person is used to form
a feature line (FL) th at passes through the two prototype feature points of the class
and expands the two points. Then the probe image is compared with each feature
line for recognition purpose. Experiments on a combined dataset reported th at the
error rate of the NFL method was 43.7% to 65.4% lower than the standard Eigenfaces

16

method. Also it showed better performance compared to the Eigenfaces method on
the ORL [8] dataset.
In 2010, Naseem, Tongeri and Bennamoun [48] suggested an approach to make
linear combinations of downsampled gallery images. The efficiency of the proposed
method was evaluated by performing experiments on five well-known databases: the
AT&T ]85], Georgia Tech [1], FERET [79], Extended YaleB ]35] [55], and AR [61]. The
experimental results showed that this approach improved the performance compared
to several standard approaches, like the Eigenfaces, Fisherfaces fl2], Kernel eigenfaces
]108], 2DPCA ]108], and ICA [108] [10].
Moon and Lee ]92] proposed an illumination invariant face recognition approach
based on linear combination of face prototypes. The prototypes are synthesized from
the photometric stereo images of each subject by using PCA. After that, a linear
combination of the exemplars is formed to represent a new image of a subject and
an input image is classified by determining the minimum distance between the input
image and the linear combinations of prototypes. The approach was evaluated using
the MPI [15] face database and showed better performance when compared with the
Eigenfaces and Bilinear analysis approach [32].
The efficiency of the NLC approach is thus evaluated using several standard
databases. To classify the face images, this novel classification method is used in
the proposed research.

2.4

Summary

In this chapter, the basic steps of face recognition system are introduced.

It

also provides a detailed description of a well-known feature extraction method and a
classification technique that are utilized in the proposed approach. The main difficulty
17

caused by illumination variation in face images and the different approaches th a t can
be used to handle the problem will be stated in the next chapter. A brief overview of
the Discrete Wavelet Transform (DWT) is also provided in the next chapter.

18

Chapter 3
Illumination Invariant Approaches
and Wavelet Decomposition
Illumination variations in images usually appear if the images are taken under un­
controlled lighting environments. The illumination effect can severely degrade the
performance of various face recognition approaches [106J. Sometimes due to the ex­
treme lighting situation, the whole face or part of the face can be in a shadow, which
makes it very difficult to extract any feature from the face or the part of the face.
It has been proven both experimentally [4] and theoretically [ ill] th at the variation
in face images due to changing illumination is more significant than the differences
among different individuals under same illumination conditions. The effect of lighting
variation on face images in the CMU PIE [96] database is shown in Figure 3.1, where
face images of same person having illumination variations appeared very different
compared to images in normal lighting condition.
In recent years many face recognition algorithms have been proposed to tackle
the illumination problem. Some of the approaches are summarized in the following

19

Figure 3.1: Face images under different illumination conditions in CMU PIE Database.
sections.

3.1

Statistical Approach

In several studies [89] [38], statistical methods have been used in illumination
invariant face recognition approaches. Linear subspace algorithms including Linear
Discriminant Analysis (LDA) [12] [95] and Bayesian methods showed better perfor­
mance than the Eigenfaces [98] approach in terms of illumination variation, but still
they are not successful in practical applications. To improve the performance, non­
linear methods, such as Kernel PCA [71], and Kernel LDA [82], are used. In many
cases, these algorithms struggle to deal with huge differences in lighting among face
images [78].

3.2

Light Modelling Approach

To develop illumination invariant face recognition approaches, researchers are now
focusing on establishing different methods to balance lighting in images, such as Illu­
mination Cone methods [13] [35], spherical harmonic based representations [11] [83],
and quotient image based approaches [91] [90] [93].

20

3.2.1

Illum ination Cone M ethods

The Illumination Cone method proposed by Belhumeur and Kriegman [13] showed
that a set of images under varying illumination but with a fixed pose can be used to
create a convex cone to represent each individual. The illumination cone is estimated
through low dimensional linear-subspaces that are built from the training images by
using traditional approaches. The test image is recognized according to the identity
of the illumination cone th a t is nearest to the image. The method is based on the
assumption th at the objects’ surfaces have Lambertian reflectance functions, as well
as convex shapes to avoid cast shadows. The objects are illuminated using arbitrary
number of point light sources and the illumination cone is built by using very few
images. Georghiadas and Belhumeur [35] proposed another variation of the Illumi­
nation Cone method, where an illumination cone is generated for each pose of a face
class and the illumination cones for all poses of a face class are combined to represent
each face class. For the recognition, the Euclidean distance between a test image
and each face class representation is computed and then the closest representation is
determined based on the minimum distance. The drawback of the illumination cone
based approach is that to create a linear subspace from the training data, it requires
multiple images of a face class with uneven lighting condition ]39].

3.2.2

Quotient Image (QI) based Approaches

Quotient Image (QI) based approaches extract an illumination invariant image for
each object class from a small set of images. The QI approach was first proposed by
Shashua and Riklin-Raviv [91]. Initially, this method generated a linear combination
of the set of face examples from each object class. Then a lighting invariant quotient
image was obtained by computing the ratio of albedo between an image of arbitrary

21

face class and the computed linear combination. Performance of the method is better
than the Eigenfaces approach, but it requires a large training data set [33]. S. Shan
et al [90] presented a Quotient Image Relighting (QIR) approach using some models
of the QI method. The proposed approach used some face images having abnormal
lighting condition to obtain a face image th a t is re-lighted using normal lighting. This
approach improved the performance compared with other illumination normalization
approaches; however, one of the limitations was th at it requires prior knowledge about
the lighting system of the images.

3.2.3

Spherical Harmonic based Approaches

R. Basri and David Jacobs [11] suggested the use of spherical harmonic based
representations to describe lighting. The set of images of an object th at are seen in
the harmonic lighting are defined as harmonic images and these harmonic basis images
form a low-dimensional linear subspace model of that object. A given unknown image
with arbitrary lighting conditions can be recognized by comparing the image with
each linear subspace model th at represents an object. This method showed some
limitations in dealing with cast shadow, specularity, and pose variation in images.
Since the representation of the harmonic image set is dependent on the way of lighting,
accuracy can be improved by selecting appropriate lighting.
From the above analysis, it can be said th at most of these approaches have some
constraints. For example, they may need assumption of the lighting mode, or they
require a large number of training images th at are considered as major obstacles to
be applied in a practical system. To deal with illumination variation in practical
applications, recently researchers have paid much attention to other approaches that
are based on image scale-space decomposition by using total variation image models.
One of the most well-known total variation image models is T V + L 1 model [21],

22

3.3

The TV +L1 Model for Additive Signal Decom­
position

To extract useful feature components from an image by scale driven decomposition
is an im portant task in image analysis. This decomposition can be accomplished by
applying Total Variational image models. A 2-dimensional grey-scale image can be
denoted by a function / , which is defined on a bounded image domain Q of R 2 and
Q is generally a rectangle in R2 [75] (103) (20) (19). In the Total Variation (TV)
regularization framework, the observed image / is considered as an additive signal
that is composed of image cartoon u and texture v, where u and v are functions
in Q [23]. Cartoon u mainly contains extrinsic features or large-scale features, like
background hues, shadow and im portant boundaries, such as sharp edges. Texture v
covers small-scale patterns or intrinsic features and some noise.
The image decomposition problem is viewed as splitting an image / into u and v
components. Image / can be decomposed by using the Total Variation regularization
with an L1 fidelity term, also known as T V + L 1 model. Since u is more regular than
v, u can be obtained from image / by solving the following variational regularization
problem [21] [24]:
m in I |Vu| + A ||/ —k||x,i
Jn

(3.1)

The TV + L1 model involves two different terms and a scalar constant A (lambda) that
adjusts the truncation scale between u and v [23]. The first term f Q |Vit| denotes the
total variation of u over the image domain Q and the second term states the L 1 norm
||/ —u||, which is the degree of closeness between / and u [21] [23]. Since f — u + v,
with u solved from (3.1) the small-scale component v can be calculated as, v = f - u.
In Figure 3.2, an example of image decomposition using the T V + L 1 model is
23

shown along with corresponding cartoon (it) and texture (v). It is easily observable
that with different A values, u and v parts are showing varying large-scale and smallscale features respectively.

u3(0.4)

vl(1.2)

v2(l .0)

v3(0.4)

v4(0.1)

Figure 3.2; Image decomposed into cartoon (u) and texture (v) using varying value
of A.

To solve (3.1), different approaches have been proposed, such as PDE-based gradi­
ent descent technique [21], Second-order Cone Program (SOCP) [6] [36], and Param et­
ric Maximum Flow (PMF) algorithm [37]. The PDE approach can be implemented
easily and requires less memory, although it sometimes creates numerical difficul­
ties [24]. To escape from this problem, the SOCP approach can be used th a t finds
a solution of (3.1) using modern interior-point methods [6] [36]. It provides better
accuracy but a significant amount of memory is needed for this approach [24]. Using
the PM F algorithm, the user can obtain approximately the same result as the SOCP
approach [106]. However, it can perform computation much faster. This influences
many researchers to accept this algorithm for solving (3.1).

24

3.3.1

Param eter A and Image Feature Scale

By changing the value of the parameter A, we can determine which image features
need to be kept based on their scale. The scale of a feature is defined as the ratio of
the area of the feature to its perimeter [94]. For example, the scale of a circle with
radius r can be written as following:

tit

2

r

S ~ 2nr = 2

(3'2>

D. Strong and T. Chan [94] proved th a t the intensity change of any image feature,
caused by regulation, is proportional to truncation scale a and inversely proportional
to the scale of image feature. Here truncation scale (a ) determines the closeness
between u and / by controlling regularization and it is inversely proportional to A.
For a smaller truncation scale or larger value of A, regularization is minimized and
u becomes close to / by retaining the sharp edges or large-scale features in u. The
opposite effects will occur when a larger truncation scale or smaller value of A is
selected.
According to the definition of scale, the intensity change <5 is defined as following

|94|:
S=

scale

(3.3)

Now, the parameter A can be related to the features in different scales that need to
be preserved. For a small truncation scale (a), the change of large-scale features will
be less and they will be kept in u. Thus, u will be similar to / with a larger value
of A because only a little amount of small-scale features will be removed in v. If we
need to remove most of the small-scale features to v, then we need to choose a large
truncation scale and a smaller A. This effect is illustrated in Figure 3.2, where with

25

the decrease of the value of A from larger to smaller, almost all the small-scale features
have been extracted in v, and only some large-scale features, such as shadow, lighting
effect, and background hues, are kept in it.
Thus, the param eter A plays an im portant role in extracting image features in dif­
ferent scales through multi-scale image decomposition. Based on the size of an image,
a proper A can be selected from some recommended values or by exploration [23].
To use the T V + L 1 model for different face image sizes, the following recommended
A values can be used: 100x100 (pixels): A = 0.7 to 0.8, 200x200: A = 0.35 to 0.4,
400x400: A = 0.175 to 0.2 [24]. According to [44], for handling most of the lighting
variations in face images of size 64x80, A can be selected in the range of [0, 1.2].

з.3.2

Properties of the T V + L 1 M odel

The T V + L 1 model is popular for its important merit of preserving edges [94].
This property can be analysed in a simple way. Firstly, minimizing the term f n | Vuj
attem pts to reduce the variation of u over Q [23]. Secondly, minimizing the L 1 term
||/ —u ||, which has a tendency to keep u almost the same as / , and reserves the sharp
object edges of image f in u [23]. In this way, the entire sharp edges of shadow cast on
faces can be preserved in u by reducing Jn |Vu| + A [ |/ - i t |/ i with an appropriate large
A. These large-scale features do not affect the recognition process, so after removing
и, the impact of lighting can be reduced in / . Finally, the features in small-scale band
(i.e. shape, contours, the size and relative position of eyes, nose, mouth, eyebrows)
are obtained in v. These features are more important to distinguish different persons,
because classification is made based on comparison of the positions of small-scale
facial features in two face images [23] [24].
Another exceptional property of the T V + L 1 model is the ability to perform scale-

26

dependent and intensity-independent decomposition [24]. For a simple 2-dimensional
image / , this property can be illustrated by using the approach described in [103] [23]
[24]Let / = cBr(y) {X ) be a function that has intensity c ~ 0 everywhere except at
all the points inside the disk B r(y) with center y and radius r, where the intensity
c > 0. For this / , Chan and Esedoglu [21] showed th at after solving (3.1), we get the
following:
0
u

* c'f f o r any d E [0.1]

i f A= |

(3.4)

/
From the above analysis of the example, we can infer two im portant facts. First, the
value of param eter A, which controls whether ux will be equal to 0 or / , depends on
the radius of the disk only, and intensity c does not have any impact on A. Secondly,
the signal of the disk or / is either completely preserved in u \ or v\ for all values of
A (except A = |) , which depicts the least signal corruption property of the T V + L 1
model [103] [23]. An example of scale-dependent image decomposition property of the
T V + L 1 is presented in Figure 3.3. It is observed th a t signals with different scales are
separated into either u or v, but not into both u and v. Based on these properties,
the T V + L 1 model with an appropriate A can reduce the lighting effects from face
images and extract the illumination invariant small-scale features th at can improve
the performance of the face recognition algorithms. The effect of applying the T V + L 1
model on some input images with illumination variation from the Yale [107] database
is shown in Figure 3.4.

27

(u i)

(«a>

(wa)

>U4>

(U»»

(we)

(t'i>

(tsjl

<*»>

(V4>

(V s)

(V*)

Figure 3.3: Extraction of signals with different scales by the T V - L 1 model [109].

Input

TV+L1
Output
Figure 3.4: Input images and their v components as the result of the TV L 1 model.

3.3.3

Application of the T V + L 1 M odel

The T V + L 1 model has been successfully applied to many other image processing
tasks based on its properties. Alliney [7] and Nicolova [72] [73] analysed the proper­
ties and applied the 1-dimensional or 2-dimensional version of the T V + L 1 model to
different signal processing tasks. Nicolova [73] suggested the use of the T V -rL 1 model
to noise removal and outlier identification. Chan and Esedoglu [21] first utilized its
cartoon-texture decomposition property. By taking the advantage of this property,
W. Yin and T. Chen applied this model to correct the irregular background for cDNA

28

microarray images [104] and digital microscope images [23].
Based on the basic properties of the T V + L 1, various improvements have been
made over it. W. Yin and T. Chen [23j [24] enhanced the T V + L 1 model to be used
as a multiplicative model th at can factorize images under multiplicative illumination
effect, and can provide the lighting robust facial features. They proposed the Total
Variation based Quotient Image (TVQI) model [23] and Logarithmic Total Variation
(LTV) model [24] to normalize the illumination condition in face images. In the TVQI
model, initially an image / is decomposed using the T V + L 1 model; and then, / is
divided by u to get the illumination normalized component, v (v = £), where smallscale features are evenly emphasized. This approach showed significant performance
improvement over the other normalization techniques on the Yale face database B [35].
The LTV model [24] applies the logarithmic transformation on the image / at first, to
convert it from the multiplicative image model to additive, and then uses the T V + L 1
model on logarithm of the image. To compare the performance of the model with
other algorithms, it was tested on the Yale face database B [35] and CMU PIE [96]
database, and it showed better recognition rate for both of the databases.
Xie et al. [105] [106] suggested an approach using both small- and large-scale fea­
tures. After decomposing a face image into large- and small-scale feature images,
illumination normalization is performed on the large-scale feature image and smooth­
ing operation is performed on the small-scale feature image to get better visual results.
Finally, the normalized large-scale feature and smoothed small-scale feature images
are combined to form a normalized face image, which is classified using the Nearest
Neighbour (NN) classifier. The approach was evaluated on the Extended YaleB [35]
and CMU PIE [96] databases, where it outperformed other existing methods.
H. Han et al. [44] proposed Separability Oriented Preprocessing (SOP) approach

29

th a t utilizes the facial features of different scales to pre-process a face image for illu­
mination invariant face recognition. In this approach, features are extracted using the
T V + L 1 model. The extracted features are linearly combined to infer an illumination
robust image, which is used with different face recognition approaches. While forming
the linear combination, the weights are determined from the training images by max­
imizing the Fisher separability of different subjects’ faces. In the SOP formulation
step, the solution of the optimization problem is obtained by using simulated anneal­
ing [54]. From the conducted experiment, SOP with the Fisherfaces [12] approach
achieved 75% recognition rate on the Multi-PIE [40] database and 84.1% on the Ex­
tended YaleB [35] database, th a t were higher than other preprocessing methods. To
evaluate SOP with different face recognition approaches other than the Fisherfaces,
it was combined with the Eigenfaces [98] and LBP [97] and showed significant im­
provement on their performances when experimented on the FRGC Ver2.0 Exp.4 [77]
dataset.
From the above mentioned applications, it is clear th a t the T V + L 1 model can be a
good choice for face recognition under lighting variation due to its ability to preserve
the illumination invariant facial structure from a single face image. The model has
other advantages; it does not require any lighting assumption, and it needs to set
only one parameter, A. Due to these benefits, the T V + L 1 model is embedded into
the proposed approach for scale-space decomposition of images.

3.4

Wavelet Transform

The wavelet transform [25] is a popular image decomposition approach th a t pro­
vides both spatial (location in time) and frequency information of an image simul­
taneously [34], It is more suitable than Fourier transform methods to analyse sig­
nals th a t contain small discontinuities. The Fourier transform methods analyse all
30

frequencies in the signal through the same window or resolution while the wavelet
transform uses multi-resolution technique. The multi-resolution technique examines
different frequencies with different sized windows to indicate the exact position (in
time) of discontinuity. The wavelet transform uses small window for high frequen­
cies to provide good time resolution, but poor frequency resolution. On the other
hand, low frequencies are viewed through large window th at gives poor time, but
good frequency resolution [52].
The wavelet transform provides information of the image in the form of a small
number of wavelet coefficients th at can be stored and analysed for further use [53].
There are two classes of wavelet transform: continuous and discrete.

Continuous

Wavelet Transform (CWT) is defined by the following equation [65]:

/

oo

x(t)i/j(scale,position, t) dt

(3.5)

•OO

where C represents the set of wavelet coefficients that is obtained from the signal
x(t). Many wavelet coefficients are generated by applying wavelet function ip’{t) for
each time interval over the signal through changing position and scaling (stretching
or compressing). Low frequencies in the signal are stretched and overall information
about the signal is represented. On the other hand, compression is done on the high
frequency parts of the signal th at provides detail information in the signal. Since
wavelet coefficients are computed at every possible scale and shifting position, CWT
requires a lot of time and resources to store redundant information.
In contrast, the Discrete Wavelet Transform (DWT) is performed only on a subset
of scales and positions th at results in faster and easier computation of wavelet coeffi­
cients. The DWT decomposes a signal into coarse approximation (low frequency) and
detail (high frequency) information by using a low pass and a high pass filter respec­
31

tively. After the filtering step, downsampling operation is performed th a t halves the
resolution by discarding every second sample. The process of one-dimensional DWT
is shown in Figure 3.5.
Upp

Downutnplinf

HiZKD~*0 •500 costs
S | 1000 samples
IPF

-500 coefs

Figure 3.5: Discrete Wavelet Transform (DWT) [65].

Considering the advantages of DWT, two-dimensional DWT is generally preferred
for performing wavelet decomposition of two-dimensional images. It is performed by
two steps: firstly, one-dimensional DWT is applied on each row of an image to produce
a row transformed image and then the same one-dimensional DWT is performed on
the columns of the new image in the second step. As a result, the image is decomposed
into four sub-band images corresponding to various frequency ranges. The process is
illustrated in Figure 3.6.

•HH

>HL

►
W

Figure 3.6: An image is decomposed into four images after one level DWT.

As shown in Figure 3.6, the sub-band images are denoted as low-low (LL), low-high
32

(LH), high-low (HL) and high-high (HH). The LL sub-band reflects global information
(approximation) and represents the visual mood of the original image.

The high

frequency components, like the LH band denotes horizontal details, the HL band
provides vertical details, and the HH band is the representative of diagonal details
and noise [52] [27] [42], Each of these sub-band images represents a smaller version
of the original image. If the size of the original image is A'xN, then four N /2 x
N / 2 sub-band images are obtained after the first level decomposition. The LL band
image represents the identity of the original image and contains the most prominent
information. Further levels of decomposition will be carried on the LL sub-band as
in Figure 3.7. The maximum number of levels is chosen depending on the size of the
image or type of application.
To implement the DWT, a mother wavelet function is needed as a source and all
wavelet functions in the transformation can be produced from this original function
through shifting and scaling. Efficiency of the wavelet transform depends on the
use of appropriate wavelet function. The wavelet function is selected based on the
need of application. A number of wavelet functions are currently available, such as
Haar, Daubechies, Coifletl, Symlet2, Meyer, Morlet, and Mexican Hat etc. Among
these wavelet functions, Daubechies wavelet [25] is usually used in wavelet transform
because of its ability to retain most of the information of the data.
In wavelet decomposition based recognition process, the selection of a sub-band
image plays an im portant role in the recognition performance. Nastar et al. [68] [69]
found that, high frequency part of the image is affected by the facial expression and
small accessories while variation in pose affects the low frequency part. By removing
the high frequency components, the effects of various facial expressions and occlusions
in the images of the same person can be reduced [100]. Therefore, through the use
of the low frequency sub-band image (LL band image) for recognition purpose, it
33

H

-500

CDi

■EK5> H

a

£Hi>

1000

-250

LL2 LH2
HL2 HH2

LH1

HL1

HH1

CD2

S*<3H

-250

CA1

£H5)

CA2

Figure 3.7: Two level wavelet decomposition [65].
can improve the matching of different images of the same person. Furthermore, the
selected LL band images will have lower resolution than the original image dimension,
which reduces the computational load in computing eigenvectors by PCA.
From the above analysis, it can be said th a t the following advantages make the
DWT a widely used tool in face recognition [30] [63] [86]:

1. In DWT, by discarding the high frequency information from the low frequency
data at each level, the size of the sub-band images are reduced by 2" times of
the original image after n level decomposition. Therefore, by extracting features
from a compressed sub-band image, the computational work can be reduced.
2. Time and frequency information can be located more exactly by the advantage
of mutiresolution decomposition, which is useful to attenuate distortion from an
image.

Numerous studies on face recognition including [30] [63] [86] [52] [53] [42] [100] [62]
have used the DWT to improve performance. In [100], different types of wavelet
transform were applied on the ORL [8] dataset to evaluate the recognition performance

34

with different distance measures. The benefits of feature extraction through PCA from
the wavelet sub-bands were investigated in [30] [86] [62], In [63] [42], a combination of
wavelet decomposition and the Eigenfaces method was used to compute features from
images and the recognition task is performed by using the Support Vector Machines
(SVM) classifier.
For the multilevel dimension reduction property, wavelet transform is combined
with the T V + L 1 model in the proposed approach. This is done to reduce the compu­
tational load of extracting features from images.

3.5

Summary

Different algorithms for solving illumination problems are presented in this chap­
ter. The basic concepts of the T V + L 1 model and Discrete Wavelet Transform (DWT)
th a t play an im portant role in the proposed approach are also stated concisely. A
brief overview of the proposed approach is illustrated in the next chapter.

35

Chapter 4
Proposed Approach
The proposed face recognition approach is developed based on scale-space decom­
position by the T V + L 1 model and wavelet decomposition.

Instead of using only

small-scale features for recognition, the approach presented here takes into consider­
ation the importance of both large-scale and small-scale features. It is designed to
utilize the features at different scales th a t are extracted from face images using the
T V + L 1 model. After that, the wavelet decomposition followed by PCA is applied
on the extracted components of various scales to find the compact and important
features th a t improve the performance of the classification step.
Some of the factors th at can be considered as im portant to design this approach
are:
The T V + L 1 model, which is used for scale-space decomposition, can be solved
using the following equation:

(4.1)

mm

36

Here the observed image / is decomposed into image cartoon u and texture v. The
illumination component u contains large-scale features, such as background hues and
shadow, or lighting effect. The small-scale component v contains intrinsic features,
like eyes, nose, and mouth. The param eter A (lambda) adjusts the truncation scale
between u and v.
Analysis of the traditional uses of the Total Variational model (see section 3.3.1)
reveals th at it is sometimes hard to determine the truncation scale, which is inversely
proportional to A. According to the analysis of parameter A in [21], face images
with normal lighting will need a larger truncation scale to reduce less illumination
components from image / , and a smaller A should be set for that purpose. This will
remove very small amounts of large-scale features to u and the small-scale component
v will be close to / . In case of abnormal lighting, more large-scale features need to
be removed from / to u. Therefore, a small truncation scale is used with a larger
A. Thus, the result will not be desirable if a single A is chosen for face images with
different lighting attributes.
Another im portant aspect of the T V + L 1 based face recognition method (see sec­
tion 3.3.3) is th at it assumed that only features in small-scale component are sufficient
to identify a person. Therefore, the features in large-scale component are ignored, but
large-scale features can also be helpful to classify different faces [44] [106] [105]. Largescale features are im portant to explain the sex information and they also represent
the overall description of the individual, which is useful for feature-based recogni­
tion [112]. Moreover, it is very difficult to produce a frontal-illuminated image having
a good visual quality without large-scale features [106]. On the other hand, smallscale features provide finer details th at facilitate the identification process. Therefore,
both small-scale and large-scale features have importance in face recognition process.

37

To overcome the above-mentioned constraints, the proposed approach is designed
with the combination of multiscale decomposition and wavelet decomposition.

In

multiscale decomposition method using the T V + L 1 model, the values of A are linearly
selected within a range th a t will cover all the lighting variations. Then the features
at different scales of face images are exploited with the help of wavelet decomposition
and PCA. Finally, the use of the Nearest Linear Combination (NLC) approach for
classification, expands the representational capacity of images, which is expected to
reduce the margin of error.

4.1

Face Recognition System based on the Proposed
Approach

The proposed approach is designed in two stages: training and recognition. In
the training stage, a set of training or gallery images is used to derive training image
representations th at are required in the recognition step. After getting probe image
representations, similarity between the probe and training image representations is
computed to identify the probe image. The block diagram of the proposed approach
is shown in Figure 4.1.

4.1.1

Training Stage

The training stage can be divided into 4 steps. A brief overview of each step is
stated below:

(I) Multiscale Decomposition: The first step in the training stage is to extract
facial features of different scales through multiscale decomposition. To address the
issue of lighting variations in images, a gallery image of a person is decomposed using
the T V + L 1 model into pairs of large-scale and small-scale components: {uk, v*-}; k —

38

Wavelet
PCA

Transfoim

Gallery

Features at
different scales

im ages

Subspace

projection

LL band
images

Probe
image

Probe image
vector

invariant feature

Recognition Stage

Figure 4.1: Block diagram of the proposed face recognition system.
1

,

2

by changing the value of A. It is observed th a t A in the range of [0,1.2]

will be sufficient to handle most of the lighting variations for face images of size
64x80 [45]. The range is dependent on the size of the image. To include all possible
lighting variations, it is necessary to find a way to determine the values of parameter
A so th at the values remain in th at range.
Therefore, to decompose images cleanly into texture and cartoon components by
the T V + L 1 model, a schedule for the parameter A is determined as the following:

Aj — 2jA i, j = 2 ,3,..., M

(4.2)

Here the value of regularization parameter A will increase linearly. For initial
value; Ai = 0.1 and the number of decomposition; M = 8, the resulting values of A
will remain in the range of [0.1, 1.6] which is not far from [0, 1.2].
After getting the set of A values using (4.2), the values are arranged in decreasing

39

order. As a result of multiscale decomposition using the values, eight different pairs of
large-scale and small-scale features {uk, vk}; k — 1,2,.... M can be obtained that are
sufficient to cover almost all of the features of different bands (from fine to coarse).
Initially, there will be only small-scale facial features in
from the largest to smallest value in th a t range,

i>i. W ith the decrease of A

it will continue to add more face

shape information from Vi to v m Since the image is decomposed into u and v

based on

an additive model,the

relationship among all Vj can be written as below [44]:

vi C v 2 C

C vm

(4-3)

At first, the V\ component will contain very little small-scale information. After that,
the features th at are already in t>i will be included in v2 along with some additional
features. Following in the process, more and more features will be added from v2 to
%•
Facial features are extracted from these small-scale components. The features of
different bands in scale-space can be obtained after decomposition as shown below [44]:

vi

if j = 1
(4.4)

Sj =<
vj — Vj-\

otherwise

where Sj, j = 1,2,..., M are the extracted facial features th at include both small-scale
and large-scale features. Here, the S\ band contains the smallest-scale features, like
eyes and mouth. The largest-scale features, such as shape and shading, are contained
in the S m band. In Figure 4.2, some examples of the extracted features are presented
along with the small-scale components.

40

Si

S3

S4

Figure 4.2: Extraction of features in different bands after scale-space decomposition
of an image.

(II) Wavelet Decomposition: In the second step, Discrete Wavelet Transform
(DWT) is applied on the extracted components to reduce the dimension of the re­
sultant images. After each level of DWT, there will be four sub-band (three high
frequency sub-band and one low frequency sub-band) images th at represent a wide
range of variance and its sum is equal to th a t of the whole image [30]. An outline of
the wavelet decomposition process is provided in Chapter 3 (see section 3.4).
According to the analysis in [68J [69| [100], the effects of various facial expressions
and occlusions in the images of the same person can be reduced by using the low
frequency sub-band (LL sub-band) image. Therefore, it can improve the performance

of recognition. Also, the lower resolution of the LL sub-band images can induce less
computational load in computing eigenvector by PCA.

(Ill) Principal Component Analysis (PCA): PCA (see section 2.2.2.1) is used
on the LL sub-band images to achieve the most of the image information, using the
minimum number of representative vectors. As described in Chapter 2, the compu­
tational complexity in obtaining eigenvectors depends on the dimension (denoted by
R) of the training images, which can have a standard value of 64 x 64. This induces
a high computational load in computing eigenvectors, which is usually made lower by
using the number of images N when N < R. However, the complexity is increased
non-linearly with the increase of N [30] [86]. In the proposed approach, to obtain
the features in different scale bands through multiscale decomposition, an image is
decomposed into M images. There will be total Q (Q = N x M) images instead of
N, which can induce significantly high computational weight in the feature extraction
step of PCA.
By using the LL sub-band images obtained after DWT, the complexity in comput­
ing eigenvectors can be minimized significantly. For example, by using second level
decomposition on 64 x 64 sized training images, LL sub-band images with resolution
21 x 21 {R — 21 x 21 = 441) are obtained. Suppose, there are 120 (N = 120) images
in the training data set and each image is decomposed into 8( M = 8) feature bands.
Since now, the total number of training images, Q (120 x 8 = 960), is greater than
the resolution (R = 441) of the images (LL sub-band); therefore, the computational
complexity will depend on the reduced resolution of 441. In this way, it makes a sub­
stantial reduction in the calculation load when the number of training images is larger
than resolution 441. This is considered normal for most of the practical applications.
The result of PCA is a set of eigenvectors th at will form an eigen subspace and
42

the gallery image representations or prototypical vectors are computed by projecting
the gallery or training images into the subspace.

(IV) Linear Combination of Extracted Features: Using the synthesized features
of the different bands, suppose m prototypical vectors are formed for each person
where, m — k x M with k is the number of images per person in the training data
set and M is the number of extracted features as described in the above analysis.
For classification using the Nearest Linear Combination (NLC) method, the proto­
typical vectors of images from an individual are linearly combined to form a response
vector Yp, which represents the individual

Yp =

+ X 2P2 + ■• • + X mfim

(4-5)

where X \ , X 2, ..., X rn are the prototypes of the same class and 0\, 02, ■■■, Pm are
the weights th at are determined using the NLC approach, which is stated in Chapter
2 (see section 2.3.1).

4.1.2

R ecognition Stage

The recognition task is also performed using 4 steps. Initially, the test image is
decomposed using the T V + L 1 model to obtain the illumination invariant or smallscale facial structure (v component) from the image. According to the analysis in [45],
use of the T V + L 1 model with a large A value is appropriate for images with abnormal
lighting condition, but it might cause negative impact if the input image is taken under
normal lighting condition.1 To avoid this situation, decomposition is performed using
a smaller A value in the range [0, 1.2], which is sufficient to remove the lighting effects
'M ost of the small-scale features will be removed from the image for a larger A value.

43

from the probe image.
In the next step, wavelet decomposition is conducted on the extracted component
to transform the test image to the LL sub-band image th at will be of the same
resolution of the training images.
The resultant LL sub-band image is represented as probe image vector Y by pro­
jecting the LL images into the subspace which is obtained in the training stage.
The probe image representation and the computed linear combinations or response
vectors th at represent all the classes are used in the final step to measure the similarity
between them. If there are N c classes, then the probe image needs to be compared
with a total of N c linear combinations to determine whether the image is matched
to any of the classes. Similarity is measured by using the following formula th at
computes the distance or difference between the probe image vector ( Y ) and all the
response vectors (V^,):

Di(Y,Ypi) = \ \ Y - Y pi\W, i — 1 , 2 , . . . , N c.

(4.6)

The probe image will be recognized to a class if it has a minimum distance with
that class.
Dedsion — argminDi(Y,Ypi); i ~ 1 , 2 , . . . , N c.
i

4.2

(4.7)

Pseudo-code of the Approach

The overall algorithm can be presented in the following pseudo-code:

A. Training Stage:
Selection of A values: Form a schedule (denoted by L) of A values using (4.2) so
44

that it can change linearly. Then arrange the A values of L from the largest to smallest
value within a specific range.

Require: A gallery or training image set G th at contains n x n sized face images.

1. For each image f in G
(i) Obtain the decomposed pairs of large-scale (u ) and small-scale (w) compo­
nents by using the T V + L 1 model with the A values in L.
(ii) Extract the features of different scales from the v components using (4.4).
Let S be the set of features of various bands.
(iii) Perform wavelet decomposition on each extracted feature component in S
and obtain the LL sub-band images.
2. Apply PCA on the LL sub-band images of all gallery images and generate an
eigen subspace.
3. Calculate prototypical vectors by projecting the LL sub-band images into the
subspace.
4. For each individual in G
generate a linear combination {Yp) of the prototypical vectors th at is used as a
representation of that individual.

B. Recognition Stage:
Require: T is an n x n sized probe image.
1. Decompose T by the T V + L 1 model using a single A to reduce the illumination
effects from the small-scale component (t't).
45

2. Obtain the LL sub-band image after applying wavelet decomposition on vt.
3. Compute the probe image vector Y by projecting the LL sub-band image into
the face space.
4. Compute the distance between the probe image representation and all the linear
combinations according to equation (4.6).
5. Find the class C, which has a minimum distance with Y, and classify Y as the
class C.

4.3

Summary

An overview of all of the steps of the proposed method is given in this chapter.
Of most importance is the use of a multiscale decomposition process to extract com­
ponents representing the features of multiple scales. The performance evaluation of
the proposed approach using different databases is provided in the next chapter.

46

Chapter 5
Experimental Results
Several experiments were conducted on different datasets to demonstrate the effec­
tiveness of the proposed approach. In the experiments, the gallery face images were
decomposed into different feature bands by using the T V + L 1 model. To solve the
T V + L 1 model, the Parametric Maximum Flow (PMF) algorithm [37] was adopted,
and the code of PM F algorithm was embedded from CAAM’s website of Rice Univer­
sity [99]. To determine the A values for multiscale decomposition using the T V + L 1
model, the A-schedule stated in Chapter 4 (see section 4.1.1) was used. To obtain illu­
mination insensitive small-scale facial features from the test images, the T V + L 1 model
was used with A (approximately 0.1). For wavelet decomposition, Daubechies wavelet
D4 [25] was used, as the experimental results in [30] [62] revealed that Daubechies
wavelet D4 was better for recognition performance and computation time compared
to other wavelets [27]. The implementation code of the baseline PCA algorithm was
downloaded from Deng Cai’s website [18].
To evaluate the proposed approach, four different datasets were selected to per­
form experiments: the Yale database [107], the Olivetti Research Laboratory (ORL)

47

database [8], the Carnegie Melon University Pose, Illumination, and Expression (CMU
PIE) database [80] and the Facial Recognition Technology (FERET) database [78].
The performance of the proposed approach was compared with four different methods:
the standard Eigenfaces or PCA [98] (see section 2.2.2.1), Nearest Linear Combination
(NLC) [58] (see section 2.3.1) and combination of the T V + L 1 model [21] (see section
3.1) with PCA (TV+L*&PCA), as well as NLC (T V + L ^ N L C ). For the NLC ap­
proach, the images were first converted into prototypical vectors using PCA and then
classified using the NLC classifier. To combine the T V + L 1 model with PCA and
the NLC, the images were first decomposed with a smaller A (approximately 0.1) to
extract small-scale features. For feature extraction, PCA was performed on the smallscale feature images and then images were classified using the Nearest Neighbour (NN)
classifier (for T V + L ^ P C A ) and NLC classifier (for T V + L ^ N L C ). To make an eas­
ier presentation, the proposed approach is called the Nearest Linear Combination of
large and small-scale feature images (NLC+LS). The following sections will cover de­
scriptions of the datasets used, the results, the analysis, as well as evaluation of the
results.

5.1

Experiments on Yale, ORL and PIE Datasets

To develop a reliable face recognition system, different databases of facial images
and a testing procedure to evaluate the system need to be used [78].

As a part

of testing, Yale, ORL, and PIE databases were used to verify the accuracy of the
proposed approach initially. A short description of these datasets is provided below.

5.1.1

Datasets: Yale, ORL and PIE

The Yale dataset from Yale University contains 165 grayscale images of 15 persons.
There are a total of 11 images per person th at have variations in facial expression and

48

configuration: centre-light, with glasses, happy, left-light, with no glasses, normal,
right-light, sad, sleepy, surprised, and wink [107]. The set of face images th at belong
to one of the persons in the Yale dataset is shown in Figure 5.1.
The ORL dataset comes from Cambridge AT&T Laboratory, previously Olivetti
Research Laboratory. This dataset contains 40 different subjects and each subject
has 10 distinct images. Thus, there are total 400 grayscale images in this dataset th at
were taken at different times. The images have variations in lighting, facial expressions
(open / closed eyes, smiling/not smiling), and facial details ( glasses/ no glasses) [63].
Figure 5.2 shows face images of three persons (in three rows) in the ORL dataset.

Figure 5.1: Images from Yale Database.

The CMU PIE database from Carnegie Melon University consists of 41,368 images
of 68 persons. There are 13 different poses, 43 different illumination conditions, and
4 different facial expressions for each person [80]. However, only the five near frontal
poses (C05, C07, C09, C27, C29) were used in the experiments, which makes a total
of 11,554 face images. There are 170 images for each person, except subject 38, which
has 164 images. Some of the face images of three persons in the PIE database are
shown in Figure 5.3.
On Deng Cai’s website [18], face images for all three datasets have been manually
49

Figure 5.2: Face images of 3 persons in ORL Database.

Figure 5.3: Eight different face images of 3 persons in PIE Database.
aligned by pupils and were cropped to 64x64 pixels, with 256 gray levels for each
pixel [22j. The datasets were downloaded from the website and the experiments were
carried out on exactly the same data.

5.1.2

R esults on Yale, ORL and PIE D atasets

The experiments were conducted on each k train (k = 2, 3, 4, 5, 6, 7, 8 for Yale, k =
2, 5, 8 for ORL and k = 5, 30, 80 for PIE), where k indicates the number of images per

50

person in the gallery or training images. The rest were used as test images. For each
k train, a total of 50 random splits are available in the original versions of these three
datasets. For both Yale and ORL datasets, to test the performance of the proposed
approach, the first 25 random splits were used for each k train. Average, as well as
standard deviations (STD) of error rates were calculated for these two datasets. Due
to the large size of PIE dataset, only the first random split was used and error rates
were estimated.
For the Yale and ORL databases, features were extracted from 8 different bands
through multiscale decomposition. For the PIE database, the images were decom­
posed into 3 feature bands. For PCA approach, the number of reduced dimension
was chosen as min (M, S)-l, where M is the total number of gallery images (the num­
ber of extracted features from gallery images) and S is the number of pixels in each
image [22].
The results for different training sets on the Yale and ORL databases are given
in Table 5.1 and Table 5.2 respectively. Table 5.3 illustrates the results on the PIE
database for 5, 30 and 80 training sets.
Table 5.1: Average error rates and STD on Yale Database with different Train/Test
sizes.
Train Size

PCA

NLC

T V +L ^PC A

TV +L ^N L C

NLC+LS

2 Train

45.72±4.82

41.63±4.84

42.55±6.17

39.79±5.69

35.88± 5.12

3 Train

43.13±4.44

36.33±3.72

39.60±4.72

35.57±4.02

30.83± 3.95

4 Train

39.92±3.43

34.36±4.08

35.96±3.34

33.52±3.48

28.42± 4.09

5 Train

35.60±3.32

30.27±4.80

30.53±3.77

28.71±4.91

24.44± 4.94

6 Train

35.63±4.61

28.59±4.84

31.36±4.05

29.23±4.38

25.87± 4.30

7 Train

34.47±4.36

30.33±4.88

28.47±5.05

30.07±5.02

26.47± 4.99

8 Train

32.80±6.20

29.60±7.92

28.27±5.81

29.16±5.81

26.58± 6.33

51

Table 5.2: Average error rates and STD on ORL Database with different Train/Test
sizes.
Train Size

PCA

NLC

TV +L'& PC A

TV +L ^N L C

NLC+LS

2 Train

33.01±2.95

29.36±2.51

25.62±2.03

22.82±2.11

22.47± 2.62

5 Train

14.34±2.32

8.40±1.83

10.78±2.31

7.43±1.84

6.20± 1.25

8 Train

8.80±2.11

3.70±1.71

7.27±2.12

4.12±1.46

3.10± 1.39

Table 5.3: Error rates on PIE Database with different Train/Test sizes.
PCA

NLC

T V +L ^PC A

T V +L ^N L C

NLC+LS

Train Size

Error Rate

Error Rate

Error Rate

Error Rate

Error Rate

5 Train

68.50

49.92

68.52

53.51

45.85

30 Train

25.81

6.32

25.81

6.52

5.76

80 Train

7.34

3.07

7.34

3.39

3.33

It is easy to see th at the proposed approach has shown encouraging performance on
the Yale, ORL and PIE datasets. It persistently achieves better recognition accuracy
than all the other methods on different train sizes of the Yale dataset (see Table 5.1).
For the Yale dataset, the best result of the NLC+LS is reported on 5 train where
it outperforms the Eigenfaces, NLC, T V + L ^ P C A , and T V + L J&NLC methods by
11.16%, 5.83%, 6.09% and 4.27% respectively.
The proposed NLC+LS approach experimented on the ORL dataset (see Table
5.2) outperforms the traditional PCA and NLC approaches and it also improves the
accuracy of the T V + L ^ P C A approach. It is fairly comparable to the T V + L ^ N L C
method for all train sizes of the ORL dataset.
As shown in Table 5.3, the NLC+LS approach achieves impressively better perfor­
mance than the Eigenfaces and T V + L 1&PCA methods on the PIE dataset. It attains
low error rates of 5.76% and 3.33% for PIE 30 train and 80 train respectively, which

52

are favourably comparable to accuracies achieved by the NLC and T V -tL 1&NLC
methods.
To obtain a clear view of the experimental results, the effects of combining the
large-scale and small-scale features by the NLC+LS method on the Yale and ORL
datasets are depicted in Figure 5.4 and Figure 5.5 respectively.

NLC+LS
TVL1&NLC
7VL1&PCA

uj 32

Train Size

Figure 5.4: Comparison of error rates of the NLC i LS with two other methods for
Yale Database.

These figures show the extent to which the performances of the standard T V + L 1
based methods, T V + L ^ P C A and T V + L X&NLC, can be improved by the NLC—LS
method. These two methods (T V + L X&;PCA and TV —LX&NLC) only used the smallscale features for recognition and they discarded the large-scale features. The better
performances of the NLC—LS approach compared to these two approaches validate
the fact th a t only using the small-scale features of face images is not enough for
recognition. Large-scale features also contain useful information and should not be
discarded.

53

NLC+LS
TVL1&NLC
TVL1&PCA

«>
O

o
UJ

Train Size

Figure 5.5: Comparison of error rates of the NLC+LS with two other methods for
ORL Database.
From these figures, it is observed th a t the error rates of the methods decrease
with the increase of the size of the training sets. However, one exception is noticed
in Figure 5.4, where the accuracy of the TV —LX&NLC method begins to drop for 7
train and 8 train. This fact can be supported by the analysis in [58], which states that
the performance of the NLC approach sometimes may not improve with the increase
of the number of images per person in the training dataset.
In Figure 5.4, it is easy to see th at the addition of features of different bands by
the NLC+LS method has significantly improved the accuracies of the TV —L*&PCA
and TV+L*&NLC.
From Figure 5.5, it is perceived th at the NLC+LS method is more effective in
improving performance than the TV—L*&PCA method and it also outperforms the
TV—L*&NLC by a small margin for the ORL dataset.
Based on the above analysis, it can be observed that the proposed approach has

54

better performance with small train size, which is true with conclusion of the study
in [58].

5.1.3

Experiment on FERET D ataset

The facial images in the FERET database were collected by NIST from 1993 to
1997 [56]. As described in [78], the standard FERET database consists of total 14051
images from 1199 persons. It includes five subsets (one gallery set and four probe sets)
th a t were used in this experiment for recognition test. The gallery set Fa contains
1196 images from 1196 subjects; therefore, it has one image per subject. The set of
probe images is divided into four partitions (subsets) based on expression variation,
lighting change, and the time difference from the gallery images. The Fb probe set
consists of 1195 images of the subjects in Fa, but with different facial expressions. In
the probe set Fc, there are 194 images of the same subjects of Fa, taken under different
lighting environments. The D upl (Duplicate 1) probe set contains 722 images th at
are captured any-time between one minute and 1031 days after the matching gallery
image was taken. The Dup2 (Duplicate 2) probe set, a subset of the D upl, consists
of 234 images, where the images are taken at least 18 months after the corresponding
image in Fa. Figure 5.6 shows different images of a subject from the FER ET database.
Some images in the Fc probe set are shown in Figure 5.7.
The images were resized into 70x60 from the actual size of 150x130. To compute
the eigenvectors from the extracted features initially, the minimum value between the
total number of extracted images and the resolution of the images was chosen. Then,
according to the FERET recommendation [56] [26], the subspace was formed by using
the first 40% of the eigenvectors th at would contain about 98% of the information
stored in the eigenvectors. Since the gallery set Fa contains one image per subject, the
same result was obtained for the Eigenfaces and NLC approaches. To avoid redun-

Fa

Fb

D upl

Dup2

Figure 5.6: Gallery image and different types of probe images of a subject from
FERET Database.

Figure 5.7: Example of probe images from Fc probe set.
dancy, the comparison result for the NLC, T V + L ^ N L C , and NLC+LS approaches
on the FERET dataset is shown in Table 5.4.
From the results in Table 5.4, it is observed that the NLC—LS copes well with
the problem of illumination variations by achieving comparatively better result for
the Fc probe set. This is the toughest probe set, as it contains images of the subjects
under considerably different lighting conditions [56]. In case of the Fc probe set,
the NLC I LS outperforms the NLC approach and TV I L*&NLC by approximately
7.7% and 8.8% respectively, which states the better tolerance of the NLC ) LS for
varying illumination compared to the NLC and TV I L !&NLC. However, it shows
slightly degraded performance compared to the NLC for the expression variation of
the Fb probe set. For the D upl and Dup2 probe sets, the accuracies achieved by the
56

Table 5.4: Error rates on different Probe sets of FERET Database.
NLC

TV +L ^N L C

NLC+LS

Probe Set

Error Rate

Error Rate

Error Rate

F b + F c+ D u p l+ D u p 2

41.53

43.07

41.28

Fb

15.32

18.66

16.40

Fc

66.49

67.53

58.76

D upl

63.30

63.02

62.60

Dup2

87.61

85.90

88.03

Fb-f Fc+ D upl

36.43

38.32

36.09

NLC+LS are fairly comparable to the NLC and T V + L ^ N L C .
As stated by [48] [58], it needs to use more than one gallery image per person to
form a reliable linear subspace by the Nearest Linear Combination (NLC) approach.
Also the gallery images should represent variation in viewing, expressions, and illu­
mination conditions to consider all possible changes in the probe images. Since the
gallery set Fa contains a single image for each person; the predictor of a class was built
by using the extracted feature images from a single image to match with the probe
image. However, the formed predictor is not sufficient to cover all of the required
variations, which is essential for improving the performance by the NLC.
The results in Table 5.4 clearly reflect that the recognition performance of the
NLC+LS is at least comparable to the NLC and T V + L ^ N L C methods for almost
all of the cases of the FERET dataset.

5.2

Computational Complexity

The time complexity of the proposed approach can be determined in the following
way:
57

To find the time complexity of the testing stage, at first, the complexity of solving
the T V + L 1 model by the Parametric Maximum Flow (PMF) algorithm is considered.
According to [37], the T V + L 1 model can be solved in 0 ( m n l o g ^ ) ) time, where m
and n are the height and weight of the image. Wavelet decomposition is performed
with complexity 0 (?nnf) [2], where I is the decomposition level. After wavelet de­
composition, the resolution of the image becomes r x s, where r < m and s < n.
The complexity of the dimension-reduction algorithm (PCA) is 0 (rsd), where d is
the number of dimension in reduced dimension space. After that, distance between
the probe image and each subject in the gallery is calculated using the Nearest Lin­
ear Combination (NLC) [58] classification approach. During this step, required time
complexity is 0 (rsc ), where c is the total number of subjects to be compared. Thus,
the total time complexity in the testing stage is 0 ( (m n(l og ( ^ ) + /)) + (r s ( d + c))).
The complexity of the training stage is similar to that of the testing stage. Assume
th a t each gallery image of a person is decomposed into q feature images using the
T V + L 1 model, where k is the number of images per person in gallery. Therefore, the
estimated complexity is O [kq((mn(log(^) + I) + (rs(d + c)))).

5.3

Summary

A comprehensive comparison of the proposed approach, with the state-of-the-art
techniques, has been presented in this chapter. It is appropriate to indicate th at
the developed approach has been shown to perform well for the case of illumination
variations. It shows quite agreeable results with large databases as well. The chapter
also includes a complexity analysis of the proposed approach.

58

Chapter 6
Conclusions
Performing face recognition under different illumination conditions is a challenging
field for researchers. Many approaches have been proposed to tackle the problem
of illumination changes, yet the problem is far from being solved. The approach
presented in this research attem pts to provide an illumination-invariant face recogni­
tion framework based on the combination of scale-space decomposition and wavelet
decomposition.
Scale-space decomposition methods decompose a facial image into large-scale and
small-scale features. One of the popular scale-space decomposition methods is the
Total Variation model (T V +L 1), which can be utilized to estimate the illumination
component.

The advantage of this model is th at it requires only one parameter,

lambda (A). Most of the T V + L 1 based methods use only illumination-invariant smallscale features for recognition after removing the large-scale features.
Another popular tool for image analysis is Discrete Wavelet Transform (DWT),
which decomposes an image into different sub-band images with various frequency
ranges. The resultant low frequency sub-band (LL sub-band) image has lower resolu­
59

tion than the original image and contains the most im portant information.
Unlike other scale-space decomposition based methods, the proposed approach is
designed to use both large-scale and small-scale facial features of gallery images. For
multiscale decomposition process using the T V + L 1 model, a schedule for the values of
A is provided. To include all possible lighting conditions (from normal to abnormal),
features of different scales are extracted using the A values within a predefined range.
Then the LL sub-band images are obtained after applying DWT on the extracted
feature images, which can reduce the computational complexity of deriving feature
vectors by Principal Component analysis (PCA). For the classification, the Nearest
Linear Combination (NLC) approach is used to combine the prototypical vectors of
a person to represent th a t individual. Lastly, the probe image is compared to all the
linear combinations to determine the class.
To evaluate the proposed approach, several experiments were conducted on dif­
ferent databases. The proposed approach has shown to be superior than the other
two baseline approaches: PCA and NLC (for the Yale, ORL and PIE datasets). It
also has a better or fairly comparable result to other T V + L 1 based methods: the
T V + L ^ P C A and TV+L^&NLC. It reveals th at the integration of large-scale and
small-scale features by the proposed method improves the results of the methods
(TV +LX&PCA and T V + L ^ N L C ) th at perform recognition based on only smallscale features.
In this research, the proposed approach is validated with different databases. In
the future, further research is needed to be done to improve the performance of the
system for large databases.
Although the T V + L 1 model is utilized in the proposed framework, the effective­
ness of the proposed framework should be applicable when it is in conjunction with
60

other scale-space decomposition approaches for illumination normalization. Further
research on replacing the T V + L 1 with the LTV [24] and TVQI [23] is needed, to
validate the generality of the proposed approach. The possibility of embedding other
feature extraction methods will also be studied in future works.
Furthermore, the NLC classification approach is utilized in the proposed approach
for the recognition task. Additional research could be done to get better performance
by adding weights for computing linear combinations. The performance of the system
can be tested by using other classification techniques, such as the Nearest Constrained
Linear Combination (NCLC) approach and Support Vector Machines (SVM).
The proposed approach outperforms two different holistic approaches, and is also
superior or equivalently accurate with other T V + L 1 based methods. The initial stud­
ies in this research show th at there is still room for improvement, which can increase
the effectiveness of the proposed method.

61

Bibliography
[1] Georgia Tech Face Database.

www.anefan.com/research/facereco.htm.

Feb.16, 2014.
[2] The Wavelet. http://en.wikipedia.org/wiki/Wavelet. Feb.16, 2014.
[3] P. R. A. J. Noraini and N. Selvanathan. Human face recognition using Zernike
moments and Nearest Neighbour Classifier. In 4th student conference on Re­
search and Development, pages 120-123, 2006.
[4] Y. Adini, Y. Moses, and S. Ullman. Face Recognition: the Problem of Compen­
sating for Changes in Illumination Direction. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 19:721-732, 1997.
[5] N. U. Ahmed and K. R. Rao. Orthogonal Transforms for Digital Signal Pro­
cessing. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1975.
[6] F. Alizadeh, F. Alizadeh, D. Goldfarb, and D. Goldfarb. Second-Order Cone
Programming. Mathematical Programming, 95:3-51, 2001.
[7] S. Alliney. A property of the minimum vectors of a regularizing functional
defined by means of the absolute norm. IEEE Transactions on Signal Processing,
45(4):913—917, 1997.
[8] AT&T.

The ORL Dataset,

http://www.cad.zju.edu.cn/home/dengcai/

Data/FaceData.html. Feb.16, 2014.

62

[9] G. A. Bahtivar. Holistic Face Recognition By Dimension Reduction. M.Sc. The­
sis, Department of Electrical and Electronics Engineering, Middle East Techni­
cal University, Sept. 2003.
[10] M. S. B artlett, J. R. Movellan, and T. J. Sejnowski. Face recognition by Indepen­
dent Component Analysis. IEEE Transactions on Neural Networks, 13(6):1450—
1464, 2002.
[11] R. Basri and D. W. Jacobs. Lambertian Reflectance and Linear Subspaces.
IE E E Trans. Pattern Anal. Mach. In tell, 25(2):218—233, Feb. 2003.
[12] P. N. Belhumeur, P. Hespanha, and D. J. Kriegman. Eigenfaces vs. Fisherfaces:
Recognition using class specific linear projection. IEEE Trans. Pattern Analysis
and Machine Intelligence, pages 711-720, 1997.
[13] P. N. Belhumeur and D. J. Kriegman. W hat is the Set of Images of an Object
Under All Possible Lighting Conditions? In Proceedings of the 1996 Conference
on Computer Vision and Pattern Recognition (CVPR ’96), CVPR ’96, page
270, Washington, DC, USA, 1996.
[14] S. Bentin, T. Allison, A. Puce, E. Perez, and G. McCarthy. Electrophysiological
Studies of Face Perception in Humans. J. Cognitive Neuroscience, 8(6):551—565,
Nov. 1996.
[15] V. Blanz and T. Vetter. Face Recognition Based on Fitting a 3D Morphable
Model. IE E E Trans. Pattern Anal. Mach. Intell., 25(9):1063—1074, 2003.
[16] R. Brunelli and T. Poggio. Face Recognition through Geometrical Features. In
European Conference on Computer Vision (ECCV), pages 792-800, 1992.
[17] R. Brunelli and T. Poggio. Face recognition: features versus templates. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 15(10):1042—1052,
1993.

63

[18] D. Cai. Matlab codes for dimensionality reduction (subspace learning), h ttp :
//www . c a d . z j u . ed u . cn/hom e/dengcai/D ata/D im ensionR eduction.h tm l.

Feb.16, 2014.
[19] A. ChamboIIe and P.-L. Lions. Image recovery via total variation minimization
and related problems. Numerische Mathematik, 76(2):167-188, 1997.
[20] T. Chan, S. Esedoglu, F. Park, and A. Yip. Recent Developments in Total
Variation Image Restoration. In In Mathematical Models of Computer Vision.
Springer Verlag, 2005.
[21] T. F. Chan and S. Esedoglu. Aspects of total variation regularized L 1 function
approximation. SIA M J. Appl. Math, 2005.
[22] L. Chen and N. Tokuda. A unified framework for improving the accuracy of
all holistic face identification algorithms. Artificial Intelligence Review, 33(12):107—122, 2010.
[23] T. Chen, W. Yin, X. S. Zhou, D. Comaniciu, and T. S. Huang. Illumination
normalization for face recognition and uneven background correction using total
variation based image models. In Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR ’05), pages
532-539, 2005.
[24] T. Chen, W. Yin, X. S. Zhou, D. Comaniciu, and T. S. Huang. Total Variation
Models for Variable Lighting Face Recognition.

IEEE Trans. Pattern Anal.

Mach. Intell., 28(9):1519-1524, Sept. 2006.
[25] I. Daubechies. The Wavelet Transform, Time-frequency Localization and Signal
Analysis. IE E E Trans. Inf. Theor., 36(5):961-1005, Sept. 2006.
[26] K. Delac, M. Grgic, and S. Grgic. Independent comparative study of PCA, ICA,
and LDA on the FERET data set. Int. J. Imaging Systems and Technology,

64

15(5):252-260, 2005.
[27] H. K. Ekenel and R. Stiefelhagen. Local Wavelet Analysis for Face Recognition.
In 15th IEEE Conference on Signal Processing and Communications Applica­
tions, pages 1-4, 2007.
[28] M. S. El-Bashir. Face Recognition using Multi-Classifier. Applied Mathematical
Sciences, 6(45):2235-2244, 2012.
[29] M. J. Er, W. Chen, and S. Wu. High-speed Face Recognition Based on Discrete
Cosine Transform and RBF Neural Networks. Trans. Neur. Netw., 16(3):679691, May 2005.
[30] G.-C. Feng, P. C. Yuen, and D.-Q. Dai. Human face recognition using PCA on
wavelet subband. J. Electronic Imaging, 9(2):226—233, 2000.
[31] R. A. Fisher.

The Use of Multiple Measurements in Taxonomic Problems.

Annals of Eugenics, 7(7):179—188, 1936.
[32] W. T. Freeman, J. B. Tenenbaum, W. T. Freeman, and J. B. Tenenbaum. Learn­
ing bilinear models for two-factor problems in vision. In Proc. IEEE Computer
Soc. Conf. on Computer Vision and Pattern Recognition (CVPR), 1997.
[33] J.-Y. W. GAO-YUN AN and Q.-Q. RUAN. Enhanced TV -based Quotient
Image Model and its application to Face Recognition with One Sample per
Subject Employing Subspace Methods.

Journal of Information Science and

Engineering, 26(1):279—290, 2010.
[34] C. Garcia, G. Zikos, and G. Tziritas. A Wavelet-based Framework for Face
Recognition. In Workshop on Advances in Facial Image Analysis and Recogni­
tion Technology, 5 Th European Conference on Computer Vision, pages 84-92,
1998.

65

[35] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman. From Few to many:
Illumination cone models for face recognition under variable lighting and pose.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 23:643-660,
2001 .

[36] D. Goldfarb and W. Yin. Second-Order Cone Programming Methods for Total
Variation-Based Image Restoration. SIA M J. Sci. Comput, 27:622-645, 2004.
[37] D. Goldfarb and W. Yin. Parametric Maximum Flow Algorithms for Fast Total
Variation Minimization. SIA M J. Sci. Comput., 31 (5):3712—3743, Oct. 2009.
[38] R. Gross, S. Baker, I. Matthews, and T. Kanade.

Face Recognition Across

Pose and Illumination. In S. Z. Li and A. K. Jain, editors, Handbook of Face
Recognition, pages 197-221. Springer, 2011.
[39] R. Gross and V. Brajovic. An Image Preprocessing Algorithm for Illumination
Invariant Face Recognition.

In 4th International Conference on Audio- and

Video-Based Biometric Person Authentication (AVBPA), June 2003.
[40] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. Multi-PIE. Image
Vision Comput., 28(5):807—813, May 2010.
[41] M. A. Grudin. On internal representations in face recognition systems. Pattern
Recognition, 33(7):1161—1177, 2000.
[42] E. Gumus, N. Kilic, A. Sertbas, and O. N. Ucan. Evaluation of Face Recognition
Techniques Using PCA, Wavelets and SVM. Expert Syst. A ppi, 37(9):64046408, Sept. 2010.
[43] G. Guo, S. Z. Li, and K. Chan. Face Recognition by Support Vector Machines.
In Proceedings of the Fourth IEEE International Conference on Automatic Face
and Gesture Recognition 2000, FG ’00, page 196, Washington, DC, USA, 2000.
IEEE Computer Society.

66

[44] H. Han, S. Shan, X. Chen, S. Lao, and W. Gao. Separability Oriented Prepro­
cessing for Illumination-insensitive Face Recognition. In Proceedings of the 12th
European Conference on Computer Vision: Part VII, ECCV’12, pages 307-320,
Berlin, Heidelberg, 2012. Springer-Verlag.
[45] H. Han, S. Shan, L. Qing, X. Chen, and W. Gao. Lighting Aware Preprocessing
for Face Recognition Across Varying Illumination. In Proceedings of the 11th
European Conference on Computer Vision: Part II, ECCV’10, pages 308-321,
Berlin, Heidelberg, 2010. Springer-Verlag.
[46] B. Heisele, P. Ho, J. Wu, and T. Poggio. Face Recognition: Component-Based
versus Global Approaches. Computer Vision and Image Understanding, 2003.
[47] Y. Ijiri, M. Sakuragi, and S. Lao. Security Management for Mobile Devices by
Face Recognition. In Proceedings of the 7th International Conference on Mobile
Data Management, MDM ’06, page 49, Washington, DC, USA, 2006. IEEE
Computer Society.
[48] R. T. Imran Naseem and M. Bennamoun. Linear Regression for Face Recog­
nition.

IEEE Transactions on Pattern Analysis and Machine Intelligence,

32(11):2106—2112, 2010.
[49] M. Ingebretsen. The Brave New Intelligent Interface. IEEE Intelligent Systems,
25:4-8, 2010.
[50] R. Jafri and H. R. Arabnia. A Survey of Face Recognition Techniques. JIPS,
5(2):41—68, 2009.
[51] A. K. Jain, R. P. W. Duin, and J. Mao. Statistical Pattern Recognition: A re­
view. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(1 ):4—
37, 2000.

67

[52] M. K. K. P. Chandar, M. M. Chandra and B. Latha. Multi Scale Feature Ex­
traction and Enhancement using SVD towards Secure Face Recognition System.
In International Conference on Signal Processing, Communication, Computing
and Networking Technologies, pages 64-69, 2011.
[53] S. Kakarwal and R. Deshmukh. Wavelet Transform based Feature Extraction for
Face Recognition. International Journal of Computer Science and Application
Issue, pages 100-104, 2010.
[54] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by Simulated
Annealing. Science, 220:671-680, 1983.
[55] J. H. Kuang-Chin Lee and D. J. Kriegman. Acquiring Linear Subspaces for Face
Recognition under Variable Lighting. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 27(5):684-698, 2005.
[56] J. R. B. K. S. Kyungim Baek, Bruce A. Draper. PC A vs. ICA : A comparison
on the FERET dataset. In Joint Conference on Information Sciences, 2002.
[57] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back. Face Recognition: A Con­
volutional Neural Network Approach. IEEE Transactions on Neural Networks,
8:98-113, 1997.
[58] S. Z. Li. Face Recognition Based on Nearest Linear Combinations. In IEEE
Trans. On Neural Networks, pages 839-844, 1998.
[59] S. Z. Li and J. Lu. Face Recognition Using the Nearest Feature Line Method.
IEEE Transactions on Neural Networks, 10:439-443, 1999.
[60] K. Ma and X. Tang. Discrete Wavelet Face Graph Matching. In International
Conference on Image Processing, volume 2, pages 217-220, 2001.
[61] A. Martinez and R. Benavente. The AR Face Database. CVC technical report
24, 1998.

68

[62] M. Mazloom and S. Kasaei. Combination of Wavelet and PCA for Face Recog­
nition. In EEE GCC conference, pages 1-5, 2006.
[63] M. Mazloom, S. Kasaei, and H. A. Neissi. Construction and Application of
SVM Model and Wavelet-PCA for Face Recognition.

In Proceedings of the

2009 Second International Conference on Computer and Electrical Engineering
- Volume 01, ICCEE ’09, pages 391-398, Washington, DC, USA, 2009. IEEE
Computer Society.
[64] J. Meng and W. Zhang. Volume Measure in 2DPCA-based Face Recognition.
Pattern Recogn. Lett., 28(10):1203—1208, July 2007.
[65] G. O. J.-M. P. Michel Misiti, Yves Misiti. Wavelet Toolbox For Use with MATLAB. MathWorks, 1996.
[66] T. Mitchell. Artificial Neural Networks. McGraw Hill, 1997.
[67] X. Mu and M. H. Hassoun. Combining Gabor Features: Summing vs. Voting in
Human Face Eecognition. In IEEE International conference on Systems, Man
and Cybernetics, volume 1, pages 737-743, 2003.
[68] C. Nastar and N. Ayache. Frequency-Based Nonrigid Motion Analysis: Appli­
cation to Four Dimensional Medical Images. IEEE Trans. Pattern Anal. Mach.
Intell., 18(11): 1067—1079, 1996.
[69] C. Nastar, B. Moghaddam, and A. Pentland. Flexible Images: Matching and
Recognition Using Learned Deformations. Computer Vision and Image Under­
standing, 65(2):179—191, 1997.
[70] C. A. Nelson. The Development and Neural Bases of Face Recognition. Infant
and Child Development, 10:3-18, 2001.

69

[71] V. D. M. Nhat and S. Lee.

An Improvement on PCA Algorithm for Face

Recognition. In J. Wang, X. Liao, and Z. Yi, editors, ISN N (1), volume 3496
of Lecture Notes in Computer Science, pages 1016-1021. Springer, 2005.
[72] M. Nikolova. Minimizers of Cost-Functions Involving Nonsmooth Data-Fidelity
Terms. Application to the Processing of Outliers. SIA M J. Numerical Analysis,
40(3):965-994, 2002.
[73] M. Nikolova. A Variational Approach to Remove Outliers and Impulse Noise.
Journal of Mathematical Imaging and Vision, 20(l-2):99—120, 2004.
[74] M. Nixon. Eye Spacing Measurement for Facial Recognition. In Application of
Digital Image Processing, pages 279-285, 1985.
[75] S. Osher, A. Sole, and L. Vese. Image Decomposition and Restoration Using
Total Variation Minimization and the H 1 Norm. Simul, 1:349-370, 2002.
[76] D. K. P. Temdee and K. Chamnongthai. Face Recognition by Using Fractal
Encoding and Backpropagation Neural Network. 5th International Symposium
on Signal Processing and its Applications, 1999.
[77] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman,
J. Marques, J. Min, and W. Worek. Overview of the Face Recognition Grand
Challenge. In Proceedings of the 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (C V P R ’05) - Volume 1, CVPR ’05,
pages 947-954, Washington, DC, USA, 2005. IEEE Computer Society.
[78] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss. The FERET Evalua­
tion Methodology for Face-Recognition Algorithms. IEEE Trans. Pattern Anal.
Mach. Intell., 22(10): 1090-1104, Oct. 2000.
[79] P. J. Phillips, H. Wechsler, J. Huang, and P. Rauss. The FERET Database
and Evaluation Procedure for Face Recognition Algorithms. Image and Vision

70

Computing, 16(5) :295—306, 1998.
[80] PIE. The CMU PIE database, h ttp ://w w w .c a d .z ju .e d u .c n /h o m e /d e n g c a i/
D ata/F aceD ata.htm l. Feb.16, 2014.
[81] H. L. Q. Liu and S. Ma. A non-parameter Bayesian classifier for face recognition.
Journal of electronics, 20(5):362—370, 2003.
[82] H. L. Qingshan Liu and S. Ma. Improving Kernel Fisher Discriminant Analysis
for Face Recognition. IEEE Transactions on Circuits and Systems fo r Video
Technology, 14(1):42 49. 2004.
[83] R. Ramamoorthi and P. Hanrahan. On the relationship between radiance and
irradiance: determining the illumination from images of a convex Lambertian
object. Journal of the Optical Society of America A , 8(10):2448-2459, 2001.
[84] T. P. Ryan. M odem Regression Methods. Wiley-Interscience, 1997.
[85] F. S. Samaria. Parameterisation of a Stochastic Model for Human Face Iden­
tification. In Proceedings of Second IEEE Workshop Applications of Computer
Vision, pages 138 - 142, Dec 1994.
[86] M. P. Satone and G. K. Kharate. Face Recognition Based on PC A on Wavelet
Subband of Average-Half-Face. JIPS, 8(3)483 494, 2012.
[87] R. J. Schalko. Pattern Recognition. John Wiley k Sons, 1992.
[88] G. A. F. Seber. Linear Regression Analysis. Wiley-Interscience, 2003.
[89] G. Shakhnarovich and B. Moghaddam. Face Recognition in Subspaces. In S.Z.
Li, A.K. Jain (EDS.), Handbook of Face Recognition, pages 141-168. Springer,
2004.
[90] S. Shan, W. Gao, B. Cao, and D. Zhao. Illumination Normalization for Robust
Face Recognition Against Varying Lighting Conditions. In Proceedings of the

71

IEEE International Workshop on Analysis and Modeling of Faces and Gestures,
AMFG ’03, page 157, Washington, DC, USA, 2003. IEEE Computer Society.
[91] A. Shashua and T. Riklin-Raviv.

The Quotient Image:

Class-Based Re-

Rendering and Recognition with Varying Illuminations. IEEE Trans. Pattern
Anal. Mach. Intell., 23(2):129—139, 2001.
[92] S.-W. L. Song-Hyang Moon and S.-W. Lee. Illumination Invariant Face Recog­
nition Using Linear Combination of Face Examplars. Lecture Notes in Computer
Science, Springer, 3546:112-121, 2005.
[93] H. W. Stan, H. Wang, S. Z. Li, and Y. Wang. Generalized Quotient Image. In
Proceedings of IEEE International Conference on Computer Vision and Pattern
Recognition, pages 498-505, 2004.
[94] D. M. Strong and T. F. Chan. Edge-preserving and Scale-dependent Properties
of Total Variation Regularization. Inverse Problems, 19(6):165—187, 2003.
[95] D. Swets and J. juyang Weng. Using Discriminant Eigen features for Image
Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence,
18:831-836, 1996.
[96] S. B. Terence Sim and M. Bsat. The CMU Pose, Illumination, and Expres­
sion (PIE) Database.

IE E E Transactions on Pattern Analysis and Machine

Intelligence, 25:1615-1618, 2003.
[97] M. P. Timo Ojala and T. Maenpaa. Multiresolution Gray-Scale and Rotation
Invariant Texture Classification with Local Binary Patterns. IEEE Trans. Pat­
tern Anal. Mach. Intell., 24(7):971-987, July 2002.
[98] M. Turk and A. Pentland. Eigenfaces for Recognition.
science, 3(1):71—86, Jan. 1991.

72

J. Cognitive Neuro­

[99] R. University.

The T V + L 1 model.

h ttp ://w w w .c a a m .ric e .e d u /~ w y l/

ParaMaxFlow/. Feb.16, 2014.
[100] P. D. Wadkar and M. Wakhande. Face Recognition using Discrete Wavelet
Transforms.

International Journal of Advanced Engineering Technology,

3(1):230—242, 2012.
[101] H.-Y. Wang and X.-J. Wu. Weighted PCA space and its application in face
recognition. Machine Learning and Cybernetics, 7:4522-4527, 2005.
[102] L. Wiskott, J.-M. Fellous, N. Kruger, and C. V. D. Malsburg. Face Recognition
By Elastic Bunch Graph Matching. IEEE Trans. Pattern Analysis and Machine
Intelligence, 19:775-779, 1997.
[103] D. G. Wotao Yin and S. Osher. The Total Variation Regularized L 1 Model
for Multiscale Decomposition. Multiscale Modeling & Simulation, 6(1):190-211,
2007.
[104] X. S. Z. Wotao Yin, Terrence Chen and A. Chakraborty. Background correc­
tion for cDNA microarray images using the T V + L 1 model.

Bioinformatics,

21(10):2410—2416, 2005.
[105] X. Xie, W. Zheng, J.-H. Lai, and P. C. Yuen. Face illumination normalization
on large and small scale features. In CVPR. IEEE Computer Society, 2008.
[106] X. Xie, W.-S. Zheng, J.-H. Lai, P. C. Yuen, and C. Y. Suen. Normalization of
Face Illumination Based on Large-and Small-Scale Features. IEEE Transactions
on Image Processing, 20(7):1807—1821, 2011.
[107] Yale. The YALE Dataset. h ttp ://w w w .c a d .z ju .e d u .c n /h o m e /d e n g c a i/D a ta /
FaceD ata.htm l. Feb.16, 2014.

73

[108] J. Yang, D. Zhang, A. F. Frangi, and J.-y. Yang. Two-Dimensional PCA: A New
Approach to Appearance-Based Face Representation and Recognition. IEEE
Trans. Pattern Anal. Mach. Intell., 26(1):131-137, Jan. 2004.
[109] W. Yin, D. Goldfarb, and S. Osher. Image Cartoon-Texture Decomposition and
Feature Selection Using the Total Variation Regularized L 1 Functional. Lecture
Notes in Computer Science, Springer, 3752:73-84, 2005.
[110] A. L. Yuille, P. W. Hallinan, and D. S. Cohen. Feature Extraction from Faces
Using Deformable Templates. Int. J. Comput. Vision, 8(2):99—111, Aug. 1992.
[111] W. Zhao and R. Chellappa. Robust Face Recognition Using Symmetric Shapefrom-Shading. Technical Report, Center for Automation Research, University
of Maryland, 1999.
[112] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face Recognition: A
Literature Survey. AC M Comput. Surv., 35(4):399—458, Dec. 2003.

74