Machine Learning Techniques in Pain Recognition

Md. Maruf Monwar

B.Sc., University of Rajshahi, Bangladesh, 1996
M.Sc., University of Rajshahi, Bangladesh, 1997

Thesis Submitted in Partial Fulfillment of
the Requirements for the Degree of
Master of Science
in
Mathematical, Computer and Physical Sciences
(Computer Science)

The University of Northern British Columbia
December, 2006
© Md. Maruf Monwar, 2006

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Library and
Archives Canada

Bibliotheque et
Archives Canada

Published Heritage
Branch

Direction du
Patrimoine de I'edition

395 W ellington Street
Ottawa ON K1A 0N4
Canada

395, rue W ellington
Ottawa ON K1A 0N4
Canada
Your file Votre reference
ISBN: 978-0-494-28422-3
Our file Notre reference
ISBN: 978-0-494-28422-3

NOTICE:
The author has granted a non­
exclusive license allowing Library
and Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or non­
commercial purposes, in microform,
paper, electronic and/or any other
formats.

AVIS:
L'auteur a accorde une licence non exclusive
permettant a la Bibliotheque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par telecommunication ou par I'lnternet, preter,
distribuer et vendre des theses partout dans
le monde, a des fins commerciales ou autres,
sur support microforme, papier, electronique
et/ou autres formats.

The author retains copyright
ownership and moral rights in
this thesis. Neither the thesis
nor substantial extracts from it
may be printed or otherwise
reproduced without the author's
permission.

L'auteur conserve la propriete du droit d'auteur
et des droits moraux qui protege cette these.
Ni la these ni des extraits substantiels de
celle-ci ne doivent etre imprimes ou autrement
reproduits sans son autorisation.

In compliance with the Canadian
Privacy Act some supporting
forms may have been removed
from this thesis.

Conformement a la loi canadienne
sur la protection de la vie privee,
quelques formulaires secondaires
ont ete enleves de cette these.

While these forms may be included
in the document page count,
their removal does not represent
any loss of content from the
thesis.

Bien que ces formulaires
aient inclus dans la pagination,
il n'y aura aucun contenu manquant.

i*i

Canada
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Abstract
Facial expressions are a key index o f emotion. To make use o f the information afforded by facial
expression for emotion science and clinical practice, reliable, valid, and efficient methods o f
measurement are critical. Enabling computer systems to recognize facial expressions and infer
emotions from them is a challenging research topic. This thesis presends an appearance-based
approach for pain recognition from video sequences. An automatic face detector is employed
which uses skin color modeling to detect human face in the video sequence. The pain affected
portions o f the face are obtained by using a mask image. Facial features are processed by both
holistically and locally. Two machine learning approaches - eigenimage and multilayer neural
network are used for recognition. The first approach processes features holistically and projected
onto a feature space, to produce the biometric template. Recognition in this approach is
performed by projecting a new image onto the feature spaces spanned by the eigenimage and then
classifying the painful face by comparing its position in the feature spaces with the positions of
known individuals. Eigenface, eigeneye and eigenlip techniques are used for this approach. The
multilayer neural network technique processes facial features locally. Two types o f features,
location features and shape features are computed and then used as inputs to the artificial neural
network which uses standard error back-propagation algorithm for classification o f painful and
non-painful faces.

ii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Table of Contents
Abstract

....................................................................................................................

Table of Contents

...................................................................................................

iii

List o f Tables

...........................................................................................................

vi

List of Figures

....................................................................................................

Publications from the Thesis

1

2

ii

vii

...............................................................................

ix

Acknowledgement .....................................................................................................

x

Introduction

1
.....................................................................................................

1

1.1

Overview

1.2

Scope o f the Thesis

....................................................................................

4

1.3

Outline o f the Thesis

.................................................................................

5

Theoretical Background and Previous W ork

7

.................................................................................................

7

2.1

Introduction

2.2

Human Recognition of Facial Expressions

............................................

8

2.3

Machine Recognition o f Facial Expressions

........................................

9

....................................................................

11

2.3.1

Statistical Approach
2.3.1.1

Face Recognition by PC A

........................................

12

2.3.1.2

Face Recognition by LDA

.......................................

15

2.3.1.3

Transformation-based System

2.3.1.4
2.3.1.5

................................

16

Face Recognition by SVM

.......................................

18

Feature-based Approaches

......................................

18

........................................................

20

........................................................................

21

.......................................................................................

22

2.3.2

Neural Network Approach

2.3.3

Hybrid Approach

2.3.4

Other Issues

2.4

Previous Research on Expression Recognition

....................................

24

2.5

Conclusion

..................................................................................................

28

iii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

3

Face Detection

30

3.1 Introduction

.................................................................................................

3.2 Background o f Face Detection
3.2.1

3.2.2

3.3

4

Feature-based Approaches

........................................................

34

....................................

35

....................................................

37

3.2.1.1

Low Level Feature Analysis

3.2.1.2

Template Matching

3.2.1.3

Generalized Knowledge Rules

Image-based Approaches

................................

37

...........................................................

38

3.2.2.1

Linear Subspace Methods

39

3.2.2.2

Learning Networks

....................................................

40

3.2.2.3

Statistical Approaches

..............................................

42

Skin Color Modeling for Face Detection

..............................................

44

........

48

Skin Color Based Face Detection in RGB Color Space
3.3.1.1

Building Skin Color Model

.....................................

48

3.3.1.2

Skin Region Segmentation

......................................

52

3.3.1.3

Face Detection

...........................................................

54

..................................................................................................

55

Conclusion

M achine Learning Approaches for Pain Recognition
Introduction

4.2

Eigenimage-based Pain Recognition

57

..............................................................

59

Calculating Eigenfaces

4.2.2

Recognition using Eigenfaces

4.2.3

Rebuilding an Image using Eigenfaces

4.2.4

Eigeneye and Eigenlip Methods

..................................................

4.3.2

64

...............................................

66

...........................

68

..............................................................

69

4.3.1.1

Artificial Neural Network

4.3.1.2

Structure o f Multilayer Perceptrons

4.3.1.3

Back propagation for Multilayer Perceptrons

Features Extraction
4.3.2.1

.......................................
.......................

69
71

......

74

.....................................................................

75

Location Features Extraction
4.3.2.1.1

63

..................................

Multilayer Neural Network-based Pain Recognition
Neural Network Basics

56

......................................................

4.2.1

4.3.1

56

.................................................................................................

4.1

4.3

33

.......................................

3.3.1

3.4

................................................................

30

..................................

76

Eye Comers and Eyebrow Inner Endpoints

76

iv

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

4.3.2.1.2

4.3.3
4.4

5

....................................

4.3.2.2

Location Features Representation

4.3.2.3

Shape Features Extraction

78

......................................

79

.................................

80

..................................................................................................

82

Pain Recognition using Neural Network

Conclusion

83

................................................................................................

83

......................................................................................

84

................................................................

85

5.1

Introduction

5.2

Image Acquisition

5.3

Results o f Eigenimage Method

5.4

Results o f Neural Network Method

5.5

Comparison between Eigenimage and Neural Network Method

........................................................

89

......

93

......................................................................

93

5.5.1

Speed Comparison

5.5.2

Accuracy Comparison

Summary

77

..........................

Simulations and Results

5.6

6

Mouth Comers

................................................................

94

.....................................................................................................

95

Conclusions

97

6.1

Contributions

.............................................................................................

97

6.2

Limitations

.................................................................................................

99

6.3

Future Works

.............................................................................................

100

......................................................................................................

101

Bibliography

v

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

List of Tables
............................................................................

86

5.1

Average face detection rate

5.2

Comparison o f various eigenimage methods for pain recognition

5.3

Effect o f the number o f neurons (in 1 hidden layer) on system accuracy

90

5.4

Effect o f the number o f neurons (in 2 hidden layers) on system accuracy

91

5.5

Effect o f the number o f neurons (in 3 hidden layers) on system accuracy

91

5.6

Effect o f the number o f neurons (in 5 hidden layers) on system accuracy

92

5.7

Speed comparison

..........................................................................................

93

5.8

Accuracy comparison

...................................................................................

94

..........

vi

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

87

List of Figures
........................................................................

2

..........................

5

Examples o f several variations o f background

...........................................

32

3.2

C lassification o f face detection m ethods

...............................................

33

3.3

The face detection system by Rowley et al.......................................................

41

3.4

Face detection examples from Schneiderman and Kanade

......................

44

3.5

RGB color cube

...............................................................................................

45

3.6

Double cone model o f HSI color space

......................................................

47

3.7

(a) Selected skin region in RGB image

.....................................................

50

(b) Selected skin in Chromatic color

........................................................

50

................................................................

50

1.1

Sources of facial expressions

1.2

Block diagram o f the proposed pain recognition system

3.1

3.8

(a) Cluster in color space (RGB)

(b) Cluster in chromatic space (r,g)

50

3.9

Gaussian model

51

3.10

Block diagram for face segmentation

3.11

Segmentation and approximate face location process

3.12

...............................................................................................
..........................................................
..............................

54

(a) Original video frame

.............................................................................

55

(b) Gray level image

..................................................................................

55

............................................................................................

55

(c) Mask image

(d) Resultant image

.....................................................................................

4.1

Block diagram o f eigenimage-based pain recognition system

4.2

Training images for eigenfaces

4.3

Average image

4.4

Eigenfaces for recognition

4.5

Reconstructed training images for eigenfaces

4.6

Image to recognize

4.7

Image after the reconstruction process

4.8

4.9

53

55

.................

58

.....................................................................

59

................................................................................................

60

............................................................................

62

...........................................

63

...............................................................................................

66

........................................................

66

(a) Average eye

.............................................................................................

67

(b) Eigeneyes

................................................................................................

67

(a) Average lip

.............................................................................................

67

vii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

(b) Eigenlips

.................................................................................................

67

...........

68

............................................................................................

72

4.10

Block diagram o f neural network-based pain recognition system

4.11

Sigmoid function

4.12

Structure o f a multilayer perceptron

4.13

Iterative thresholding o f the face to find eyes and brows

4.14

Face location feature representation for expression recognition

4.15

Zones o f the edge map o f the normalized face

4.16

(a) Four quantization levels

.........................................................

73

........................

77

............

78

.........................................

79

........................................................................

80

...........

80

......................................................

81

(b) Histogram corresponding to the middle zone o f the mouth
4.17

Neural network-based pain recognizer

5.1

Block diagram o f eigenimage-based pain recognition system

5.2
5.3

................

85

Bar diagram o f accuracy results for various eigenimage methods

........

88

Block diagram o f neural network-based pain recognition system

.........

89

viii

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Publications from the Thesis
[1] Md. Maruf Monwar and Dr. Siamak Rezaei, Pain Recognition Using Artificial
Neural Network, in proceedings of the 6th IEEE International Symposium on Signal
Processing and Information Technology 2006 (ISSPIT 2006), ISBN: 0-7803-9754-1,
August 27-30, 2006, Vancouver, Canada, pp. 28-33.

[2] Md. Maruf Monwar and Dr. Siamak Rezaei, Appearance-based Pain Recognition
from Video Sequences, in the proceedings of the 2006 International Joint Conference on
Neural Networks (IJCNN 2006), ISBN: 0-7803-9490-9, July 16-21, 2006, Vancouver,
Canada, pp. 2429-2434.

[3] Md. Maruf Monwar and Dr. Siamak Rezaei, A Robust Technique for Pain
Recognition from Video Sequences using Skin Color Modeling, in the proceedings of
the International MultiConference of Engineers and Computer Scientists 2006 (IMECS
2006), ISBN: 978-988-98671-3-3, June 20-22, 2006, Hong Kong, pp. 513-518.

[4] Md. Maruf Monwar , Padma Polash Paul, Md. Wahedul Islam and Dr. Siamak
Rezaei, A Real-Time Face Recognition Approach from Video Sequence Using Skin
Color Model and Eigenface Method, in the proceedings of the 19th IEEE Canadian
Conference on Electrical and Computer Engineering (CCECE 2006), ISBN: 1-42440038-4, May 7 - 10, 2006, Ottawa, Canada, pp. 2150-2154.

ix

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Acknowledgment
I am greatly indebted to my supervisor, Dr. Siamak Rezaei, for kindly providing
suggestions and encouragement which helped me in all the time of research and writing
of this thesis. His comments have been of greatest help at all times.
My gratitude also goes to my supervisory committee members for their suggestions and
encouragement that led to substantial improvements of this thesis. In particular, I am
grateful to Dr. Ken Prkachin for his valuable feedbacks.
I would like to express my thanks to Dr. Moustafa of Physics department of the
University of Northern British Columbia, Canada for his important suggestions and
feedback.
I thank the Dean of Graduate Studies Dr. Robert Tait, the secretary of the Dean of
Graduate Studies Ms. Bethany Haffner for having always been so supportive.
Finally I would like to acknowledge my thanks to my family. My wife, Nahid Sultana
provided me constant support which helped me to overcome the many difficulties and
discouragement on the way to completing this dissertation. My parents Mirza Md.
Monwar Hossain and Mrs. Sufia Monwar, my sister Sharmin Monwar supported me with
their encouragement and comprehension. My daughter, Rushama Nahiyan Raiyan has
been a constant source of joy and gave me the strength I needed to go through the
preparation of this dissertation.

x

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 1

Introduction

1.1 Overview
The face plays a crucial role in interpersonal communication. Seeing a face, we can
recognize a person’s identity, gender, age, expression, etc. This information is
irreplaceable for the normal conduct of human communications. If machines could
recognize such information from a human face, humans and machines might thereby
communicate more smoothly, robustly, and harmoniously.
In recent years, a tremendous amount of research has been carried out for automatic
recognition of facial expressions (such as, joy, anger, sadness, fear, disgust, surprise etc.)
from video sequence and still there is significant potential for further research and
development. This coupled with the vast array of commercial applications (e.g. in
medical system and in psychological research) makes it an attractive area of research.
The facial expressions can be changed by several ways. Figure 1.1 shows the sources
for facial expressions [1].

1

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Felt Emotions

Conviction

Cogitation

Unfelt Emotions

Emblems
Social Winks

Non-Verbal
Communication

Mental States

Verbal Communication

Manipulators

Pain

Listener
Responses

Regulators

Fig. 1.1. Sources of facial expressions

In this thesis, we propose a method for automatically inferring pain in video
sequences and treat the system as a special type of facial expression recognition system.
The pain recognition techniques can be subdivided in two categories electroencephalography (EEG) signal-based and image-based.
In the first method, the current that flows during synaptic excitations of the dendrites
of many pyramidal neurons in the cerebral cortex during pain are measured and used for
pain or expression classification.
In the image-based method, the face images, collected either from static images or
from video sequences are used for pain recognition. This approach can be further
subdivided into appearance-based and model-based methods. The appearance-based

2

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

approaches represent an object in terms of several object views. An image is considered
as a high-dimensional vector, i.e. a point in a high-dimensional vector space. Many viewbased approaches use statistical techniques to analyze the distribution of the object image
vectors in the vector space, and derive an efficient and effective representation (feature
space) according to different applications. Given a test image, the similarity between the
stored prototypes and the test view is then carried out in the feature space. This image
vector representation allows the use of learning techniques for the analysis and for the
synthesis of images.
Facial features extraction methods can be categorized according to whether they
focus on motion or deformation of faces and facial features, respectively whether they act
locally or holistically. Holistic feature processing means the face is processed as a whole.
Local feature processing means processing the features that are prone to change, with
facial expressions.
Also facial features can be subdivided into transient and intransient. Intransient facial
features are always present in the face, but may be deformed due to the facial expressions.
Among these, the eyelids, eyebrows and the mouth are involved in the facial expressions.
Tissue texture, facial hair as well as permanent furrows constitute other types of
intransient facial features that influence the appearance of facial expressions. Transient
facial features encompass different kind of wrinkles and bulges that occur with facial
expressions. Especially the forefront and the frontal area regions surrounding the mouth
and the eyes are prone to transient facial features. Opening and closing of eyes and the
mouth may furthermore lead to iconic changes to texture that cannot be predicted from
antecedent frames.

3

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

We will focus our research on developing a pain recognition scheme that does not
depend on excessive geometry and computations like deformable templates. Instead,
some linear appearance-based methods will be used. We will process the facial features
both holistically and locally.

1.2 Scope of the Thesis
In this thesis, the performances of two machine learning approaches have been
studied for automatic pain recognition from video sequences. The two approaches are
eigenimage and multilayer neural network methods.
We have used a database of painful and neutral video files. In this database, there are
68 video files of 34 persons with different colors, ethnicities, ages and genders. In one
file, the person displays a neutral facial expression and in the other file, the person
displays pain. The individuals in the videos were all people who had shoulder problems
and participated in an experiment in which pain was produced by manipulation of the
affected shoulder.
We have used skin color modeling technique for face detection. After that, features
are extracted from detected face portions. At last the pain recognition will be performed
by a set of recognizers. Two approaches - eigenimage method and multi-layer neural
network will be used to train the recognizers.

This will allow us to compare the

computational time and accuracy of the two methods. The eigenimage method will
process the facial features holistically and the neural network-based pain recognizer will
process the facial features locally. The simplified block diagram of our proposed system

4

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

is shown in figure 1.2.

Input Video

Face Detection

Feature Extraction

Skin Colo? IM e H h g

Output (PaM Painless)

_ _ _ _ _ _ _ _ _ _ _

R ecognition Eigenimage &
9

N e u ra l N etw o rk

Fig. 1.2 Block diagram of the proposed pain recognition system

1.4 Outline of the Thesis
The rest of this thesis is organized as follows:
Chapter 2 introduces the literature for the machine recognition research. The basic
differences between human recognition and machine recognition are reported in this
chapter. Also various methods for machine recognition and pattern classifications are
discussed.
Chapter 3 introduces some face detection basics and how the system learns to
discriminate face and non-face examples from each other. In other words, how the face is
detected in this research using skin color modeling technique is illustrated in this chapter.
Chapter 4 describes the two machine learning approaches. First, the Eigenimage
method is described in detail. Then the neural network classifier is discussed.
Chapter 5 describes the simulation results for comparing the two machine learning
methods for pain recognition. Speed performance and accuracy are compared between

5

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

the Eigenimage method and neural network-based classifier.
Chapter 6 summarizes the conclusions of this thesis and gives some future directions
for further research.

6

Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission.

Chapter 2

Theoretical Background and
Previous Work

2.1 Introduction
Recognition of facial expressions has been an interesting issue for both
neuroscientists and computer engineers dealing with artificial intelligence (AI). A healthy
human can detect and identify a face easily and then can recognize expression from that
face, whereas for a computer to recognize expression, the face area should be detected
first, and recognition comes next.
Hence, for a computer to recognize expressions from faces the photographs or video
should be taken in a controlled environment; a uniform background and identical poses
makes the problem easier to solve. These face images are called mug shots [2]. From
these mug shots, canonical face images can be manually or automatically produced by
some preprocessing techniques like cropping, rotating, histogram equalization and
masking.
Image-based pain recognition system is similar to the facial expressions recognition
system. Unlike most of the expression recognition systems, it does not classify joy,
sadness, disgust, anger, fear and surprise expressions of faces, instead it recognizes pain
in the face.

7

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

In this chapter we will look at the human vs. machine recognition of faces ad facial
expression recognition and the research that has been done in this field by previous
researchers.

2.2 Human Recognition of Facial Expressions
When building artificial facial expression recognition systems, scientists need to
understand the architecture of the human facial expression recognition system. Focusing
on the methodology of human expression recognition system may be useful to understand
the basic system. However, the human expression recognition system utilizes more than
just 2-dimensional data. The human facial expression recognition system uses data
obtained from some or all of the senses. All these pieces of data are used either
individually or collectively for storage and remembering of faces. In many cases, the
surroundings also play an important role in human facial expression recognition system.
It is hard for a machine recognition system to handle so much data and their
combinations. However, it is also hard for a human to remember many faces due to
storage limitations. A key potential advantage of a machine system is its memory
capacity [3], whereas for a human facial expression recognition system the important
feature is its parallel processing capacity.
Both holistic and feature information are important for the human facial expression
recognition system. Studies suggest the possibility of global descriptions serving as a
front end for better feature-based perception [3]. If there are dominant features present
such as big ears and a small nose, holistic descriptions may not be used. Also, recent

8

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

studies show that an inverted face (i.e. all the intensity values are subtracted from 255 to
obtain the inverse image in the grey scale) is much harder to recognize than a normal face
[4].
Eyes, mouth and face outline have been determined to be more important than the
nose for perceiving and remembering faces and recognizing expressions. It has also been
found that the eye, eyebrow region and the mouth region of the face are more useful than
the other parts of the face for expression recognition [121].
For humans, expressions from photographic negatives of faces are difficult to
recognize. But, there is not much study on why it is difficult to recognize expression from
negative images of human faces. Also, a study on the direction of illumination [4]
showed the importance of top lighting; it is easier for humans to recognize faces
illuminated from top to bottom than the faces illuminated from bottom to top.
In the next section, we will discuss the previous works on machine recognition of
facial expressions. We also make comparison between human and machine recognition of
facial expressions.

2.3 Machine Recognition of Facial Expressions
Although studies on human recognition of facial expressions were expected to be a
reference on machine recognition of facial expressions, research on machine recognition
of facial expressions has been developed independent of studies on human recognition of
facial expressions. During the 1970’s, typical pattern classification techniques, which use
measurements between features in faces or face profiles, were used [5]. During the

9

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

1980’s, work on face recognition remained nearly stable. Since the early 1990’s, research
interest in machine recognition of faces and facial expressions has grown tremendously.
The reasons for that are:
1. an increase in emphasis on civilian/commercial research projects,
2. the studies on neural network classifiers with emphasis on real time,
3. computation and adaptation,
4. the availability of real time hardware and
5. the growing need for surveillance and robotics applications.

The basic question relevant for expression classification is what form the structural
code (for encoding the facial features) should take to achieve face recognition. Two
major approaches are used for machine identification of human faces; geometrical local
feature based methods, and holistic template matching based systems. Also, combinations
of these two methods, namely hybrid methods, are used. The first approach, the
geometrical local feature based one, extracts and measures discrete local features (such as
eye, nose, mouth, hair, etc.) for retrieving and identifying faces. Then, standard statistical
pattern recognition techniques and/or neural network approaches are employed for
matching faces using these measurements [6]. One of the well known geometrical-local
feature based methods is the Elastic Bunch Graph Matching (EBGM) technique. The
other approach, the holistic one, conceptually related to template matching, attempts to
identify faces using global representations [7], Holistic methods approach the face image
as a whole and try to extract features from the whole face region. In this approach, as in
the previous approach, the pattern classifiers are applied to classify the image after

10

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

extracting the features. One of the methods to extract features in a holistic system is
applying statistical methods such as Principal Component Analysis (PCA) to the whole
image. PCA can also be applied to a face image locally; in that case the approach is not
holistic.
Whichever method is used, the most important problem in face recognition is the
problem of dimensionality. Appropriate methods should be applied to reduce the
dimension of the studied space. Working on higher dimensions causes over fitting where
the system starts to memorize. Also, computational complexity would be an important
problem when working on large databases.
In the following sections, the main studies will be summarized. The recognition
techniques are grouped as statistical and neural based approaches. The next section
discusses statistical approaches, while section 2.3.2 discusses neural-based approaches.

2.3.1 Statistical Approaches
Statistical methods include template matching based systems where the training and
test images are matched by measuring the correlation between them. Moreover, statistical
methods include the projection based methods such as Principal Component Analysis
(PCA) and Linear Discriminant Analysis (LDA). In fact, projection based systems came
out due to the shortcomings of the straightforward template matching based approaches;
th at is, trying to carry out the required classification task in a space of extremely high

dimensionality.
Template Matching: Brunelli and Poggio [8] suggested that the optimal strategy for
face image analysis is holistic and corresponds to template matching. In their study, they

11

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

compared a geometric feature based technique with a template matching system. In the
simplest form of template matching, the image (as 2-D intensity values) is compared with
a single template representing the whole face using a distance metric.
Although recognition by matching raw images has been successful under limited
circumstances, it suffers from the usual shortcomings of straightforward correlationbased approaches, such as sensitivity to face orientation, size, variable lighting conditions,
and noise. The reason for this vulnerability of direct matching methods lies in their
attempt to carry out the required classification in a space of extremely high
dimensionality. In order to overcome the problem of dimensionality, the connectionist
equivalent of data compression methods is employed first. However, it has been argued
that the resulting feature dimensions do not necessarily retain the structure needed for
classification, and that more general and powerful methods for feature extraction such as
projection based systems are required. The basic idea behind projection based systems is
to construct low dimensional projections of a high dimensional point cloud, by
maximizing an objective function such as the deviation from normality.

2.3.1.1 Face Recognition by PCA
The Eigenface Method of Turk and Pentland [9] is one of the main methods applied
in the literature which is based on the Karhunen-Loeve expansion. Their study is
motivated by the earlier work of Sirowich and Kirby [10] [11]. It is based on the
application of Principal Component Analysis to the human face. It treats the face images
as 2-D data, and classifies the face images by projecting them to the eigenface space
which is composed of eigenvectors obtained by the variance of the face images.

12

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Eigenface recognition derives its name from the German prefix eigen, meaning own or
individual. The Eigenface method of facial recognition is considered the first working
facial recognition technology [12].
When the method was first proposed by Turk and Pentland [9], they worked on the
image as a whole. Also, they used Nearest Mean classifier to classify the face images. By
using the observation that the projection of a face image and non-face image are quite
different, a method of detecting the face in an image is obtained. They applied the
method on a database of 2500 face images of 16 subjects, digitized at all combinations of
3 head orientations, 3 head sizes and 3 lighting conditions. They conducted several
experiments to test the robustness of their approach to illumination changes, variations in
size, head orientation, and the differences between training and test conditions. They
reported that the system was fairly robust to illumination changes, but degrades quickly
as the scale changes [9]. This can be explained by the correlation between images
obtained under different illumination conditions; the correlation between face images at
different scales is rather low. The eigenface approach works well as long as the test
image is similar to the training images used for obtaining the eigenfaces.
Later, derivations of the original PCA approach are proposed for different
applications.
PCA and Image Compression: In their study, Moghaddam and Pentland [13] used
the Eigenface Method for image coding of human faces for potential applications such as
video telephony, database image compression and face recognition.
Face Detection and Recognition Using PCA: Lee et al. [14] proposed a method
using PCA which detects the head of an individual in a complex background and then

13

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

recognizes the person by comparing the characteristics of the face to those of known
individuals.
PCA Performance on Large Databases: Lee et al. [15] proposed a method for
generalizing the representational capacity of available face database.
PCA & Video: In a study by Crowley and Schwerdt [16], PCA was used for coding
and compression for video streams of talking heads. They suggest that a typical video
sequence of a talking head can often be coded in less than 16 dimensions.
Bayesian PCA: Another method, which is also studied throughout this thesis, is the
Bayesian PCA method suggested by Moghaddam et al. [17] [18] [19]. By this system, the
Eigenface method based on simple subspace-restricted norms is extended to use a
probabilistic measure of similarity. Also, another difference from the standard Eigenface
approach is that this method uses the image differences in the training and test stages.
The difference of each image belonging to the same individual with each other is fed into
the system as intrapersonal difference, and the difference of one image with an image
from different class is fed into the system as extra-personal difference. Finally, when a
test image comes, it is subtracted from each image in the database and each difference is
fed into the system. For the biggest similarity (i.e. smallest difference) with one of the
training images, the test image is decided to be in that class. The mathematical theory is
mainly studied by B. Moghaddam, and A. Pentland [20].
Also, in [21] Moghaddam introduced his study on several techniques; Principal
Component Analysis (PCA), Independent Component Analysis (ICA), and nonlinear
Kernel PCA (KPCA). He examined and tested these systems using the Facial
Recognition Technology (FERET) Database. He argued that the experimental results

14

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

demonstrate the simplicity, computational economy and performance superiority of the
Bayesian PCA method over other methods.
Finally, Liu and Wechsler [22] [23] worked on a Bayesian approach to face
recognition.
PCA and Gabor Filters: Chung et al. [24] suggested the use of PCA and Gabor
Filters (linear filters whose impulse responses are defined by harmonic function
multiplied by a Gaussian function) together. Their method consists of two parts: In the
first part, Gabor Filters are used to extract facial features from the original image on
predefined fiducial points. In the second part, PCA is used to classify the facial features
optimally. They suggested the use of combining these two methods in order to overcome
the shortcomings of PCA. They argue that, when raw images are used as a matrix of PCA,
the eigenspace cannot reflect the correlation of facial feature well, as original face images
have deformation due to in-plane, in-depth rotation and illumination and contrast
variation. Also they argued that, they have overcome these problems using Gabor Filters
in extracting facial features.

2.3.I.2. Face Recognition by LDA
Etemad and Chellappa [25] proposed a method on applying of Linear/Fisher
Discriminant Analysis for the face recognition process. LDA is carried out via scatter
matrix analysis. The aim is to find the optimal projection which maximizes between class
scatter of the face data and minimizes within class scatter of the face data. As in the case
of PCA, where the eigenfaces are calculated by the eigenvalue analysis, the projections of
LDA are calculated by the generalized eigenvalue equation.

15

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Subspace LDA: An alternative method which combines PCA and LDA has also been
studied [26] [27] [28] [29]. This method consists of two steps; the face image is projected
into the eigenface space which is constructed by PCA, and then the eigenface space
projected vectors are projected into the LDA classification space to construct a linear
classifier. In this method, the choice of the number of eigenfaces used for the first step is
critical; the choice enables the system to generate class separable features via LDA from
the eigenface space representation. The generalization/over fitting problem can be solved
in this manner. In these studies, a weighted distance metric guided by the LDA
eigenvalues was also employed to improve the performance of the subspace LDA method.

2.3.I.3. Transformation-based Systems
There are three types of transformation-based system. One of them uses Discrete
Cosine Transform, one uses a combination of Pseudo 2-dimentional Hidden Markov
Models (HMMs) and Discrete Cosine Transform and the other uses Fourier Transform.
DCT: Podilchuk and Zang [30] proposed a method which finds the feature vectors

using Discrete Cosine Transform (DCT). Their system tries to detect the critical areas of
the face. The system is based on matching the image to a map of invariant facial
attributes associated with specific areas of the face. They claim that this technique is
quite robust, since it relies on global operations over a whole region of the face. A
codebook of feature vectors or codewords is determined for each person from the training
set. They examined recognition performance based on feature selection, number of
features or codebook size, and feature dimensionality. For feature selection, they tried
several block-based transformations and the K-means clustering algorithm [31] to

16

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

generate the codewords for each codebook. They argued that the block-based DCT
coefficients produce good low-dimensional feature vectors with high recognition
performance. This brings the possibility of performing face recognition directly on a
DCT-based compressed bitstream without having to decode the image.
DCT & HMMs: Eickeler et al. [32] suggested a system based on Pseudo 2-D Hidden
Markov Models (HMMs) and coefficient of the 2-D DCT as the features. A major
advantage of their approach is that the system works directly on JPEG-compressed face
images, i.e. it uses the DCT-coefficients provided by the JPEG standard. Thus, it does not
need any further decompressing of the image. Also Nefian and Hayes [33] used DCT and
HMMs as their feature vectors in their research.
Fourier Transform (FT): Spies and Ricketts [34] describe a face recognition system
based on an analysis of faces via their Fourier spectra. Recognition is achieved by finding
the closest match between feature vectors containing the Fourier coefficients at selected
frequencies. This technique is based on the Fourier spectra of facial images, thus it relies
on a global transformation, i.e., every pixel in the image contributes to each value of its
spectrum. The Fourier spectrum is a plot of the energy against spatial frequencies, where
spatial frequencies relate to the spatial relations of intensities in the image. In the case of
face recognition, this translates to distances between areas of particular brightness, such
as the overall size of the head, or the distance of the eyes. Higher frequencies describe
finer details and they claimed that these are less useful for identification of a person.
They also suggested that, humans can recognize a face from a brief look without focusing
on small details. They perform the recognition of faces by finding the Euclidian distance
between a newly presented face and all the training faces. The distances are calculated

17

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

between feature vectors with entries that are the Fourier transform values at specially
chosen frequencies. They argue that, as few as 27 frequencies yield good results (98 %).
Moreover, this small feature vector combined with the efficient Fast Fourier Transform
(FFT) makes this system extremely fast.

2.3.1.4. Face Recognition by SVM
Phillips [35] applied Support Vector Machines (SVM) to face recognition. Face
recognition is a K-class problem, where K is the number of known individuals; and SVM
is a binary classification method. By reformulating the face recognition problem and
reinterpreting the output of the SVM classifier, they developed a SVM-based face
recognition algorithm. They formulated the face recognition problem in difference space,
which models dissimilarities between two facial images. In difference space, they
formulated the face recognition as a two class problem. The classes are; dissimilarities
between faces of the same person and dissimilarities between faces of different people.
By modifying the interpretation of the decision surface generated by SVM, they
generated a similarity metric between faces, learned from examples of differences
between faces [35].

2.3.1.5. Feature-based Approaches
Bobis et al. [36] developed a feature based face recognition system. They suggested
that a face can be recognized by extracting the relative position and other parameters of
distinctive features such as eyes, mouth, nose and chin. The system described the overall
geometrical configuration of face features by a vector of numerical data representing

18

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

position and size of main facial features. First, they extracted eyes coordinates. The
interocular distance and eye position is used to determine the size and the position of the
areas of search for face features. In these areas, binary thresholding is performed by
modifying the threshold automatically to detect features. In order to find their coordinates,
discontinuities are searched in the binary image. They claimed that their experimental
results showed that their method is robust, valid for numerous kind of facial image in real
scene, works in real time with low hardware requirements and the whole process is
conducted automatically.
Cagnoni and Poggi [37] suggested a feature-based approach in which they applied the
eigenface method to sub-images (eye, nose, and mouth). They also applied a rotation
correction to the faces in order to obtain better results.
Guan and Szu [38] compared the performance of PCA and ICA on face images. They
argued that ICA encodes face images with statistically independent variables, which are
not necessarily associated with the orthogonal axis, while PCA is always associated with
orthogonal eigenvectors. While PCA seeks directions in feature space that best represent
the data in a sum-squared error sense, ICA seeks directions that are most independent
from each other. They also argued that both these pixel-based algorithms have the major
drawback that they weight the whole face equally and therefore lack the local geometry
information. Hence, Guanand Szu suggested that approaching the face recognition
problem with ICA or PCA applied on local features.
Martinez [39] proposed a different approach based on identifying frontal faces. Their
approach divides a face image into n different regions, analyzes each region with PCA,
and then uses a Bayesian approach to find the best possible global match between a probe

19

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

and database image. The relationship between the n parts is modeled by using Hidden
Markov Models (HMMs).

2.3.2 Neural Network Approaches
Neural Network approaches have been used in face and facial expression recognition
generally in a geometrical local feature based manner, but there are also some methods
where neural networks are applied holistically. In the next section we will discuss three
types of neural network-based recognition of faces and facial expressions.
Feature based Backpropagation NN: Temdee et al. [40] presented a frontal view
face recognition method by using fractal codes which are determined by a fractal
encoding method from the edge pattern of the face region (using eyebrows, eyes and
nose). In their recognition system, the obtained fractal codes are fed as inputs to a
Backpropagation Neural Network for identifying an individual. They tested their system
performance on the ORL (Olivetti Research Laboratory) face database. They reported
their performance as 85 % correct recognition rate in the ORL face database.
Dynamic Link Architectures (DLA): Lades et al. [41] presented an object
recognition system based on Dynamic Link Architectures, which is an extension of the
Artificial Neural Networks. The DLA uses correlations in the fme-scale cellular signals
to group neurons dynamically into higher order entities. These entities can be used to
code high-level objects, such as a 2-D face image. The face images are represented by
sparse graphs, whose vertices are labeled by a multi-resolution description in terms of
local power spectrum, and whose edges are labeled by geometrical distance vectors. Face
recognition can be formulated as elastic graph matching, which is performed in this study

20

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

by stochastic optimization of a matching cost function.
Elastic Bunch Graph Matching (EBGM): Wiskott et al. [42] presented a
geometrical local feature based system for face recognition from single images out of a
large database containing one image per person, which is known as Elastic Bunch Graph
Matching (EBGM). In this system, faces are represented by labeled graphs, based on a
Gabor Wavelet Transform (GWT). Image graphs of new faces are extracted by an Elastic
Graph Matching process and can be compared by a simple similarity function. In this
system, phase information is used for accurate node positioning and object-adapted
graphs are used to handle large rotations in depth. The image graph extraction is based on
the bunch graph, which is constructed from a small set of sample image graphs. In
contrast to many neural-network systems, no extensive training for new faces or new
object classes is required. Only a small number of typical examples have to be inspected
to build up a bunch graph, and individuals can then be recognized after storing a single
image.
The system inhibits most of the variance caused by position, size, expression and
pose changes by extracting concise face descriptors in the form of image graphs. In these
image graphs, some predetermined points on the face (eyes, nose, mouth, etc.) are
described by sets of wavelet components (jets). The image graph extraction is based on
the bunch graph, which is constructed from a small set of image graphs.

2.3.3 Hybrid Approaches
There are some other approaches which use both statistical pattern recognition
techniques and Neural Network systems.

21

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Er et al. [43] worked on the use of Radial Basis Function (RBF) Neural Networks on
the data extracted by discriminant eigenfeatures. They used a hybrid learning algorithm
to decrease the dimension of the search space in the gradient method, which is crucial on
optimization of high dimension problem. First, they tried to extract the face features by
both the PCA and LDA methods. Next, they presented a hybrid learning algorithm to
train the RBF Neural Networks, so the dimension of the search space is significantly
decreased in the gradient method.
Thomaz et al. [44] also developed a system by combining PCA and RBF neural
network. Their system is a face recognition system consisting of a PCA stage which
inputs the projections of a face image over the principal components into a RBF network
acting as a classifier. Their main concern is to analyze how different network designs
perform in a PCA+RBF face recognition system. They used a forward selection
algorithm, and a Gaussian mixture model. According to the results of their experiments,
the Gaussian mixture model optimization achieves the best performance even using less
neurons than the forward selection algorithm. Their results also show that the Gaussian
mixture model design is less sensitive to the choice of the training set.

2.3.4 Other Issues
Besides statistical and neural network approaches, there are some other approaches,
e.g., range data, infrared scanning and profile images methods, used in facial expressions
recognition system development. In the next section, those two methods will be discussed.
In the range data method, range images are used for recognition. In this method data
is obtained by scanning the individual with a laser scanner system. This system also has

22

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

the depth information so the system processes 3-dimensional data to classify face images
[5],
The infrared scanning method scans the face image by an infrared light source.
Yoshitomi et al. [45] used thermal sensors to detect temperature distribution of a face. In
this method, the front-view face in input image is normalized in terms of location and
size, followed by measuring the temperature distribution, the locally averaged
temperature and the shape factors of the face. The measured temperature distribution and
the locally averaged temperature are separately used as input data to feed a Neural
Network, while the values of shape factors are used for supervised classification. By
integrating information from the Neural Network and supervised classification, the face is
identified. One disadvantage of visible ray image analysis is that the accuracy of face
identification is strongly influenced by lighting conditions including variation of shadow,
reflection and darkness.
The profile images approach was first introduced by Liposcak and Loncaric [46].
This method is based on the representation of the original and morphological derived
profile shapes. Their aim is to use the profile outline that bounds the face and the hair.
They take a grey-level profile image, threshold it to produce a binary image, representing
the face region. They normalize the area and orientation of this shape using dilation and
erosion. Then, they simulate hair growth and haircut and produce two new profile
silhouettes. From these three profile shapes they obtain the feature vectors. After
normalizing the vector components, they use the Euclidean distance measure for
measuring the similarity of the feature vectors derived from different profiles.

23

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

2.4 Previous Research on Expression Recognition
Since the early 1970s, Paul Ekman and his colleagues [47] have performed extensive
studies of human facial expressions. They found evidence to support universality in facial
expressions. These “ universal facial expressions” are those representing happiness,
sadness, anger, fear, surprise, and disgust. They studied facial expressions in different
cultures, including preliterate cultures, and found much commonality in the expression
and recognition of emotions on the face. However, they observed some differences in
expressions as well, and proposed that facial expressions are governed by “ display
rules” in different social contexts. For example, Japanese subjects and American subjects
showed similar facial expressions while viewing the same stimulus film. However, in the
presence of authorities, the Japanese viewers were more reluctant to show their real
expressions. Babies seem to exhibit a wide range of facial expressions without being
taught, thus suggesting that these expressions are innate [48].
Ekman and Friesen [49] developed the Facial Action Coding System (FACS) in 1976
to code facial expressions where movements on the face are described by a set of action
units (AUs). There are 46 action units and expression is defined as one of these action
units, which is a contraction or relaxation of one or more muscles. For example, it can be
used to distinguish the two types of smiles as follows:
•

insincere and voluntary Pan American smile: contraction of zygomatic major
(a muscle of facial expression which draws the angle of the mouth superiorly
and posteriorly) alone

•

sincere and involuntary Duchenne smile: contraction of zygomatic major and
inferior part of orbicularis oculi (arises from the nasal part of the frontal bone,

24

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

from the frontal process of the maxilla in front of the lacrimal groove, and
from the anterior surface and borders of a short fibrous band, the medial
palpebral ligament).
Each AU has some related muscular basis. This system of coding facial expressions is
done manually by following a set of prescribed rules. The inputs are still images of facial
expressions, often at the peak of the expression.
A constraint in the development of FACS was that it deals with what is clearly visible
in the face, ignoring invisible changes (e.g. certain changes in muscle tonus), and
discarding visible changes too subtle for reliable distinction.
FACS excludes visible changes in muscle tonus which do not entail movement;
changes in skin coloration are usually not visible on black and white records. Also
excluded from FACS are: facial sweating, tears, rashes, pimples and permanent facial
characteristics.
The user of FACS must learn the mechanics — the muscular basis — of facial
movement, not just the consequence of movement or a description of a static landmark.
FACS emphasizes patterns of movement, the changing nature of facial appearance.
Distinctive actions are described: the movements of the skin, the temporary changes in
shape and location of the features, and the gathering, pouching, bulging and wrinkling of
the skin.
Although the labeling of expressions currently requires trained experts, researchers
have had some success in using computers to automatically identify FACS codes, and
thus quickly identify emotions.

25

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

In spite of its high time-consumeness and these constraints, it is the most popular
standard currently used to systematically categorize the physical expression of emotions,
and it has proven useful both to psychologists and to animators. This system of coding
facial expressions is done manually by following a set of prescribed rules. The inputs are
still images of facial expressions, often at the peak of the expression.
Ekman’s work inspired many researchers to analyze facial expressions by means of
image and video processing. By tracking facial features and measuring the amount of
facial movement, they attempted to categorize different facial expressions. Recent work
on facial expression analysis and recognition [50-61] has used these “ basic expressions”
or a subset of them. In [62], Pantic and Rothkranz provide an in depth review of many of
the researches done in automatic facial expression recognition in recent years.
The work in computer-assisted quantification of facial expressions did not start until
the 1990s. Mase [56] used optical flow (OF) to recognize facial expressions. He was one
of the first to use image processing techniques to recognize facial expressions. Lanitis et
al. [53] used a flexible shape and appearance model for image coding, person
identification, pose recovery, gender recognition, and facial expression recognition.
Black and Yacoob [50] used local parameterized models of image motion to recover nonrigid motion. Once recovered, these parameters were used as inputs to a rule-based
classifier to recognize the six basic facial expressions. Yacoob and Davis [63] computed
optical flow and used similar rules to classify the six facial expressions. Rosenblum et al.
[60] also computed optical flow of regions on the face, then applied a radial basis
function network to classify expressions. Essa and Pentland [52] used an optical flow
region-based method to recognize expressions. Donato et al. [51] tested different features

26

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

for recognizing facial AUs and inferring the facial expression in the frame. Otsuka and
Ohya [59] first computed optical flow, then computed the 2D Fourier transform
coefficients, which were used as feature vectors for a hidden Markov model (HMM) to
classify expressions. The trained system was able to recognize one of the six expressions
near real-time (about 10 Hz). Furthermore, they used the tracked motions to control the
facial expression of an animated Kabuki system [64]. A similar approach, using different
features, was used by Lien [54], Nefian and Hayes [57] proposed an embedded HMM
approach for face recognition that uses an efficient set of observation vectors based on
the DCT coefficients. Martinez [55] introduced an indexing approach based on the
identification of frontal face images under different illumination conditions, facial
expressions, and occlusions. A Bayesian approach was used to find the best match
between the local observations and the learned local features model and an HMM was
employed to achieve good recognition even when the new conditions did not correspond
to the conditions previously encountered during the learning phase. Oliver et al. [58] used
lower face tracking to extract mouth shape features and used them as inputs to an HMM
based facial expression recognition system (recognizing neutral, happy, sad, and an open
mouth). Hok-chun Lo and Ronald Chung used Eigenface, first introduced by M. Turk and
A. Pentland in 1991 and later modified by many researchers, for facial expression
recognition in 2003.
In 2004, Jeffrey Cohn of University of Pittsburgh and T. Kanade of Carnegie Mellon
University [123] developed an automated facial image analysis system for automatic
recognition of facial action units and quantitative analysis of their dynamics, such as
timing. In their system, the changes in both permanent (e.g., brows) and transient (e.g.,

27

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

furrows) facial features are automatically detected and tracked offline throughout the
image sequence. Using FACS, they grouped facial features into separate collections of
feature parameters. These parameters include feature displacement, velocity, and
appearance. The extracted facial feature and head motion trajectories are fed to a
classifier for action unit recognition. In addition to recognizing action units, the system
quantified the timing of facial actions and head gesture for studies of timing of facial
actions.
All of these methods are similar in that they first extract some features from the
images, then these features are used as inputs into a classification system, and the
outcome is one of the pre-selected emotion categories. They differ mainly in the extracted
features of the video images and in the classifiers used to distinguish different emotions.
In this study, we also followed this strategy. First, we extract features from the
video images and then fed them into classifiers. We did not use FACS in our research. In
stead, we have use PCA and the local face features for our feature extraction. As a
recognizer, we used Eigenimage and the back-propagated neural networks.

2.5 Conclusion
In this chapter, we have discussed the various methods of facial expressions
recognition. Also, we have discussed the previous researches on this field. Some of these
works are good for all types of images and some of them are not good for low intensity
images. Some researchers recognized expressions by using some action units of the faces.

28

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

In our research, we have not used any action units but have considered the eye and lip
portions of a face because these regions of face are most sensitive to pain. For learning
and recognition, we have used eigenimage method and neural network approaches due to
their simplicity and accuracy.
One other important issue of any facial expression recognition system is the face
detection. In the next section, we will discuss various face detection techniques and
illustrate our face detection system using skin color modeling.

29

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 3

Face Detection

3.1 Introduction
Computer vision, in general, aims to duplicate (or in some cases compensate) human
vision, and traditionally, has been used in performing routine, repetitive tasks, such as
classification in massive assembly lines. Today, research on computer vision is spreading
enormously so that it is almost impossible to itemize all of its subtopics. Despite this fact,
one can list several relevant applications, such as face processing (i.e. face, expression,
and gesture recognition), computer human interaction, crowd surveillance, and contentbased image retrieval. All of these applications require face detection, which can be
simply viewed as a preprocessing step, for obtaining the “object”. In other words, many
of the techniques that are proposed for these applications assume that the location of the
face is pre-identified and available for the next step.
Face detection is one of the tasks which human vision can do effortlessly. However,
for computer vision, this task is not that easy. A general definition of the problem can be
stated as follows: Identify all of the regions that contain a face, in a still image or image
sequence, independent of any three dimensional transformation of the face and lighting
condition of the scene. There are several methods issued for this problem and they can be
broadly classified in two main classes, which are feature-based, and image-based

30

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

approaches. Previous research has shown that both feature-based and image-based
approaches perform effectively while detecting upright frontal faces, whereas featurebased approaches show a better performance for the detection scenarios especially in
simple scenes.
Face detection is the problem of determining whether a sub-window of an image
contains a face. From the point of view of learning, any variation which increases the
complexity of decision boundary between face and non-face classes will also increase the
difficulty of the problem. For example, adding tilted faces into the training set increases
the variability of the set, and may increase the complexity of the decision boundary. Such
complexity may cause the classification to be harder. There are many sources introducing
variability when dealing with the face. They can be summarized as follows:
Image plane variation is the first simple variation type that one may encounter.
Image transformations, such as rotation, translation, scaling and mirroring may introduce
such kind of variations. Utilization of image pyramids with a sliding detector window is
one common way to deal with such transformations for the input image. Variations in the
global brightness, contrast level can also be expressed in the same category. Typical
examples for such variations can be seen in Figure 3.1.
Pose variations can also be listed under image plane variations aspects. However,
changes in the orientation of the face itself on the image can have larger impacts on its
appearance. Rotation in depth and perspective transformation may also cause distortion.
The common way to deal with pose variation is to isolate pose types (i.e. frontal, profile,
rotated). Some examples for such pose variations are shown in Figure 3.1.

31

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Lighting variations may dramatically change face appearance in the image. Such
variations are the most difficult type to deal with due to the fact that pixel intensities are
directly affected in a nonlinear way by changing illumination intensity or direction. For
example, when using skin color as a feature for face detection, varying color temperature
[65] of the light source may cause skin color filtering to fail. Some examples for lighting
variations are shown in Figure 3.1
Background variations are another challenging factor for face detection in cluttered
scenes. Discriminating windows including a face from those of non-face is more difficult
when no constraints exist on background. Most of the examples shown in figure 3.1 have
complex backgrounds which makes the face detection problem harder.

Fig. 3.1. Examples of several variations of background

32

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

In the rest of this chapter, we will give a general overview of face detection
approaches in section 3.2 and then in section 3.3, we will discuss our face detection
approach.

3.2 Background of Face Detection
Over the past ten years, there has been a great deal of research concerning important
aspects of face detection. Using generalized face shape rules, motion, and color
information many segmentation schemes have been presented [66] [67] [68]. The use of
probabilistic [69] and neural network methods [70] has made face detection possible in
cluttered scenes and variable scales. Face detection research can be heuristically
classified in two main categories: feature-based approaches and image-based approaches.

Face Detection
Feature Based Approaches
Low Level
Feature
Analysis

Temporal
Matching

Image Based Approaches

Genralized
Knowledge
Rules

Linear
Subspace
Methods

Learning
Networks

Statistical
Methods

Fig. 3.2. Classification of face detection methods

According to the taxonomy in figure 3.2, feature-based methods make explicit use of

33

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

face knowledge and follow the classical detection methodology, in which low level
features that are used prior to analysis mostly rely on heuristics or advance templates.
The apparent properties of the face, such as skin color and face geometry, are used at
different levels of the system. Since features are the main ingredients, these techniques
are named as the feature-based approach. These approaches [71] have embodied the
majority of interest in face detection research starting as early as the 1970s. Taking
advantage of current advances in pattern recognition theory, image-based approaches
address face detection as a general pattern recognition problem. Partly due to well known
work by K. Sung and T. Poggio [72], these approaches have attracted much attention in
recent years, and have demonstrated remarkable results. According to the image-based
methods, face detection is a two class (face, non-face) object recognition problem which
uses pure image (intensity) representations instead of abstract feature representations.

3.2.1 Feature-Based Approaches
Most feature-based approaches share similar consecutive steps. Usually, the first step
is to make pixel level eliminations by utilizing low level feature(s), e.g. skin color
filtering, edge detection. Due to the low level properties, the result that is generated in the
first step is ambiguous. In the second step, visual features which are not eliminated in the
first step are organized within a global face knowledge or geometry. Using this feature
analysis, feature ambiguities are reduced and the locations of face and facial features are
determined. The final step may involve the use of templates or active shape models.
In the next section, we discuss the three feature-based approaches - low level feature
analysis, template matching and generalized knowledge rules.

34

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

3.2.1.1 Low Level Feature Analysis
There are three low level features - edges, skin color and motion, that are used in face
detection process. In the following section we will discuss these three low level features
E dges: As a useful primitive feature in computer vision, edge representation was

applied to early face detection system by Sakai et al. [73]. Later, based on this work, a
hierarchical framework was proposed by Craw et al. [74] to trace the human head line.
This approach included a line follower which is implemented with a curvature constraint.
Some more recent examples of edge-based techniques can be found in references [75-78].
Edge detection is the important step in edge-based techniques. For detecting edges,
various types of edge detector operators are used. The Sobel operator is the most
common filter among others for detecting edges [76] [79]. Also, a variety of 1st and 2nd
derivatives (Laplacian) of Gaussians have been used in some approaches [73] [80]. A
large scale Laplacian was used to obtain lines [73], and steer-able and multi-scaleorientation filters are preferred in [80].
Skin Color: Human skin color has been used and proven to be effective feature for
face detection, and related applications. Although skin color differs among individuals,
several studies have shown that the major difference exists in the intensity rather than the
chrominance. Several color spaces have been used to label skin pixels including RGB [81]
[82], NRGB (normalized RGB) [83-85], HSV (or HSI) [86-87], YCrCb [88], CIE-XYZ
[89], CIE-LUV [66]. Although, the effectiveness of the different color spaces is arguable,
common point of all above works is the removal of the intensity component.
Color segmentation can basically be performed using appropriate skin color
thresholds where skin color is modeled through histograms or charts [90-91] [87]. More

35

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

complex methods make use of statistical measures that model face variation within a
wide user spectrum [83-84] [92-93]. For instance, Oliver et al. [84] and Yang et. al. [93]
employ a Gaussian distribution to represent a skin color cluster, consisting of thousands
of skin color samples, taken from the different human races. Even though color
information seems to be an efficient tool for identifying facial areas, the skin color
models may fail when the spectrum (correlated color temperature) of light source varies
significantly.
We have also studied skin color information to utilize a skin color filter in the
preprocessing step in face detection. However, in general, the skin color filters are
constructed by using fixed boundaries (thresholds) for sample pixel distributions in color
space. Illumination and camera parameters are omitted. Hence, the exhaustiveness in the
variations for a sample pixel set may create a performance for the resulting skin color
filter. The response of two skin color filters for the same color image can be seen in
Figure 3.3. Note that the HSI skin color filter with fixed thresholds is unsuccessful in
determining skin color pixels. On the other side, the NRGB skin color filter that is using
adjustable thresholds is successful in determining skin color pixels by adding false alarms.
Although, it may be more deeply experimented, we may state that a varying threshold
skin color filter which includes self adaptation to image illumination properties (e.g. CCT)
may result in more effective skin color filtering results.
In our research, we have used this method for face detection. More details of our
implementation is given in section 3.3.
Motion: Motion information is a convenient way of locating moving objects when a
video sequence is provided. It is possible to narrow face searching area utilizing this

36

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

information. The simplest way to achieve motion information is frame difference analysis.
Accumulated frame difference is improved frame difference analysis which is used by
many reported face detection research studies [68, 94]. Besides face region, Luthon et. al.
[95], also employ frame difference to locate facial features, such as eyes. Another way of
measuring visual motion is through the estimation of moving image contours. Compared
to frame difference, results generated from moving contours are always more reliable,
especially when the motion is insignificant [96].

3.2.1.2 Template Matching
Given an input image, the correlation values in predetermined standard regions, such
as face contour, eyes, nose and mouth are calculated independently. Although, this
approach has the advantage of simplicity, it has been insufficient for face detection since
it can not handle variations in scale, rotations pose and shape. Multi resolution, multi
scale, sub-templates and deformable templates have been proposed to achieve scale and
shape invariance template matching [97-98].
There are many studies which have been done on template matching. In [97], Miao et
al. proposed a hierarchical template matching method for face detection. Kwon et al. [98]
proposed a detection method based on snakes and templates. Lanitis et al.[99] established
a detection method utilizing both shape and intensity information.

3. 2.1.3 Generalized Knowledge Rules
In generalized knowledge-based approaches, the algorithms are developed based on
heuristics about face appearance. Although, it is simple to create heuristics for describing

37

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

the face, the major difficulty is in translating these heuristics into classification rules in an
efficient way. If these rules are over detailed, they may come up with missed detections;
on the other hand, if they are more general they may introduce much false detection. In
spite of this, some heuristics can be used at an acceptable rate in frontal faces on
uncluttered backgrounds. Yang and Huang [67] used a hierarchical knowledge-based
method to detect faces. Their system consists of three level rules going from general to
detail. This method does not report a high detection rate, their ideas for mosaic (multi­
resolution), and multiple level rules have been used in more recent methods.

3.2.2 Image-Based Approaches
In contrast to feature-based approaches, image-based approaches utilize example
image representations, instead of abstract representations consisting of features. In
general, image-based approaches rely on machine learning and statistical analysis. Face
detection is a two class (face, non-face) classification problem, which relies on learned
characteristics generally in the form of distributions. The specific need for face
knowledge is avoided by formulating the problem as a learning paradigm to discriminate
a face pattern from a non-face pattern.
Most of the image-based approaches apply a window scanning technique for
detecting faces. The window scanning algorithm employs an exhaustive search of the
input im age for possible face locations at all scales, but there are variations in the

implementation of this algorithm for almost all the image based systems. Typically, the
size of the scanning window, the sub sampling rate, the step size, and the number of

38

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

iterations vary depending on the method proposed and the need for a computationally
efficient system.
In the following section, we will discuss three image-based approaches for face
detection. These approaches are linear subspace methods, learning networks and
statistical approaches.

3.2.2.1 Linear Subspace Methods
In the late 1980s, Sirovich and Kirby [100] developed a technique using PCA
(Principal Component Analysis)to efficiently represent human faces. Given a set of face
images, the proposed technique obtains the principal components of the distribution of
faces, expressed in terms of eigenvectors (of the covariance matrix of the distribution).
Then, each individual face in the set can be approximated by a linear combination of the
largest eigenvectors (eigenfaces) corresponding to the largest eigenvalues, using
appropriate weights.
Later, Turk and Pentland [94] improved this technique for face recognition. Their
method takes advantage of the distinct nature of the weights of eigenfaces for individual
face representation. Since, face reconstruction, by using its principal components, is an
approximation, a residual error is defined in the algorithm as a preliminary measure of
“faceness”. This residual error which they termed “distance-from-face-space” (DFFS),
gives an indication of face existence through the observation of global minima in the
distance map.
Pentland et al. [101] later proposed a facial feature detector using DFFS, generated
from eigenfeatures (eigeneye, eigennose, eigenmouth), which are obtained from various

39

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

facial feature templates in a training set. The performance of the eye locations was
reported to be 94% with 6% false positive rate in a database of 7562 frontal face images
in front of a plain background.
More recently, Moghaddam and Pentland have further developed this technique
within a probabilistic framework [102], Unlike the usual PC A framework, they did not
discard the orthogonal complement of face space. This leads to uniform density
assumption of the face space. Hence, they developed a maximum likelihood detector
which takes into account both face space and its orthogonal complement to handle
arbitrary densities. They report a detection rate of 95% on a set of 7000 face images for
detecting the left eye. Compared to the DFFS detector, the results were significantly
better. On a task of multi scale head detection of 2000 face images from the FERET [32]
database which includes mug shot faces in front of a uniform background, the detection
rate was 97%.

3.2.2.2 Learning Networks
Since face detection can be understood as a two class pattern recognition problem,
several neural network-based approaches have been introduced for solution. A review of
neural network-based face detection methods can be found in Viennet et al. [103].
The first advanced neural network approach which reported significant results on a
large, complex dataset was introduced by Rowley et al. [70], The system incorporates
face knowledge in a rationally connected neural network as shown in figure 3.3. The
neural network is designed to look at windows of 20 x 20 pixels. There is one hidden
layer with 26 units, where 4 hidden units connected to 10 x 10 pixel sub regions, 16 units

40

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

connected to 5 x 5 sub regions, and 6 units connected to 20 x 5 pixels via input units. The
input window is pre-processed through lighting correction (a best fit linear function is
subtracted) and histogram equalization. This pre-processing method was adopted from
Sung and Poggio’s system. A major problem which arises with window scanning
techniques is the overlapping detections. Rowley et al. [104] deals with this problem
using the following two heuristics:
Thresholding: the number of detections in a small neighborhood surrounding the
current location is counted, in the output pyramid which includes both location and scale
and if this count turns out to be above a certain threshold, a face is assumed to be present
at this location.

Input tauge pyramid

Extracted window
nn

to

Corrected lighting

Hfcaegmn equalled

v.....
Ft* processing.

Receptive Ifeltfc

Hidden units

V

/

Nenral network

Fig 3.3. The face detection system by Rowley et al.
Overlap elimination: when a region is classified as a face by using heuristic
thresholding, then overlapping detections are likely to be false positives and removed.
In Lin et al. [105], a fully automatic face recognition system is proposed based on
probabilistic decision-based neural networks (PDBNN). A PDBNN is a classification
neural network with a hierarchical modular structure. Instead of converting input images
to a raw vector, they preferred to use features based on intensity and edge information.

41

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Sparse Network of Winnows (SNoW), which is a new learning architecture in the
visual domain, is applied to face detection by Roth et al. [106]. Similar to the previously
mentioned methods, Roth et al. also use the bootstrap method of Sung and Poggio for
generating training samples and preprocess all images with histogram equalization.
Moreover, the window scanning technique is used in multi-scales during the evaluation
stage similar to the previously mentioned methods. This method is one of the underlying
methods used in this thesis, hence it will be examined in detail in the next chapter.

3.2.2.3 Statistical Approaches
There are several statistical approaches for face detection. Some of the proposed
systems are based on information theory [107], a support vector machine [108] and Bayes
[69] decision rule.
Colmenarez and Huang [107] proposed a system based on Kullback relative
information (Kullback divergence). This divergence is a nonnegative measure between
two probability density functions for a random process Xn. During training, for each pair
of pixels in the training set, a joint-histogram is used to create probability functions for
the classes of faces and non-face. Since pixel values are highly dependent on neighboring
pixel values, Xn is treated as a first order Markov process and the pixel values in the
gray-level images are re-quantized to four levels. The authors used a large set of 11 x 11
images of faces and non-face for training, and the training procedure results in a set of
look-up tables with likelihood ratios. In order to further improve performance, pairs of
pixels which contribute poorly to the overall divergences are dropped from the look-up
tables and not used in the face detection system. This technique is further improved by

42

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

including error bootstrapping, and later the technique was also incorporated in a real-time
face tracking system [107].
Another major approach is Support Vector Machines (SVM) which can be considered
as a new paradigm to train polynomial functions, or neural network classifiers. While
most training classifiers (e.g. Bayesian, neural networks) are based on minimizing of
training error empirical risk, SVM is based on another principle called structural risk
minimization, which aims to minimize an upper bound on the expected generalization
error. The SVM classifier is a linear classifier and its optimal hyper-plane is defined by a
weighted combination of a set of training (support) vectors, which is chosen to minimize
the expected classification error of the preciously unseen test patterns. Osuna et al. [109]
developed an efficient method to train a SVM for large scale problems, and applied it to
face detection. SVMs are also applied to the problem in the wavelet domain to detect
pedestrians and faces [108], Kumar and Poggio [110] recently incorporated Osuna et al.’s
SVM algorithm in a system for real-time tracking and analysis of faces. They apply the
SVM algorithm on segmented skin regions of the input images to avoid exhaustive
scanning.
As another approach, Schneiderman and Kanade [111, 69] described two face
detectors based on Bayes decision rule, presented as a likelihood ratio test as
P ( i m a g e | o b j e ct . )
P{ im age|n on — o b je c t)

P { n o n — object)
P [object]

If the likelihood ratio (left side) of above equation is greater than the right side, then it is
decided that an object (a face) is present at the current location. The advantage of this
approach is the optimality of the Bayes decision rule [112], if the representations are
accurate. By the help of this approach, a view-based detector is developed with a frontal

43

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

view detector and a right profile detector (to detect left profile images, the right profile
detector is applied to mirror reversed images). Some examples of outputs which are
processed using wavelets can be seen in figure 3.4.

Fig. 3.4. Face detection examples from Schneiderman and Kanade

3.3 Skin Color Modeling for Face Detection
It would be fair to say that the most popular algorithm for face localization is the use
of color information, whereby estimating areas with skin color is often the first vital step
of such a strategy. Hence, skin color classification has become an important task. Much
of the research in skin color based face localization and detection is based on RGB,
YCbCr and HSI color spaces. These three color spaces are described in the following.
R G B C olor Space: The RGB color space consists of the three additive primaries: red

(R), green (G) and blue (B). Spectral components of these colors combine additively to
produce a resultant color.

44

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The RGB model (figure 3.5) is represented by a 3-dimensional cube with red green
and blue at the comers on each axis (Figure 1). Black is at the origin. White is at the
opposite end of the cube. The gray scale follows the line from black to white. In a 24-bit
color graphics system with 8 bits per color channel, red is (255, 0, 0). On the color cube,
it is (1, 0, 0). The RGB model simplifies the design of computer graphics systems but is
not ideal for all applications. The red, green and blue color components are highly
correlated. This makes it difficult to execute some image processing algorithms. Many
processing techniques, such as histogram equalization, work on the intensity component
of an image only.

Blue = (0,0,1)

Cyan = (0,1,1)

M agenta = (1,0,1)

Green = (0,1,0)

Red = (1,0,0)

Yellow = (1,1,0)

Fig. 3.5. RGB color cube
YCbCr Color Space: YCbCr color space has been defined in response to increasing
demands for digital algorithms in handling video information, and has since become a
widely used model in a digital video. Here Y is the luma component and Cb and Cr the
blue and red chroma components.
It belongs to the family of television transmission color spaces. The family includes
others such as YUV and YIQ. YCbCr is a digital color system, while YUV and YIQ are

45

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

analog spaces for the respective PAL and NTSC systems. These color spaces separate
RGB (Red-Green-Blue) into luminance and chrominance information and are useful in
compression applications however the specification of colors is somewhat unintuitive.
The 601 recommendation specifies 8 bit (i.e. 0 to 255) coding of YCbCr, whereby the
luminance component Y has an excursion of 219 and an offset o f +16. This coding places
black at code 16 and white at code 235. In doing so, it reserves the extremes of the range
for signal processing footroom and headroom. On the other hand, the chrominance
components Cb and Cr have excursions of +112 and offset of +128, producing a range
from 16 to 240 inclusively.
HSI Color Space: Since hue (H), saturation (S) and intensity (I) are three properties
used to describe color, it seems logical that there should be a corresponding HSI color
model. When using the HSI color space, one does not need to know what percentage of
blue or green is required to produce a color. One simply adjusts the hue to get the color
that one wishes. To change a deep red to pink, one adjusts the saturation. To make it
darker or lighter, one alters the intensity.
Many applications use the HSI color model. Machine vision uses HSI color space in
identifying the color of different objects. Image processing applications such as
histogram operations, intensity transformations and convolutions operate only on an
intensity image. These operations are performed with much ease on an image in the HSI
color space.
For HSI being modeled with cylindrical coordinates, see figure 3.6. The hue (H) is
represented as the angle 0, varying from 0° to 360°. Saturation (5) corresponds to the

46

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

radius, varying from 0 to 1. Intensity (I) varies along the z axis with 0 being black and 1
being white.
When 5 = 0, color is a gray value of intensity 1. When 5 = 1 , color is on the boundary
of top cone base. The greater the saturation, the farther the color is from white/gray/black
(depending on the intensity). Adjusting the hue will vary the color from red at Oo,
through green at 120o, blue at 240o, and back to red at 360o. When 1 = 0 , the color is
black and therefore H is undefined. When 5 = 0 , the color is grayscale. H is also
undefined in this case.
By adjusting I, a color can be made darker or lighter. By maintaining 5 = 1 and
adjusting 7, shades of that color are created.
I
w h ile

G reen

120

etlow

B lu e

M age n(a

C yan

240"

B lack

Fig 3.6. Double cone model of HSI color space
We have used RGB color space for face detection using skin color information in
our research. The details of this process are given in the following section.

47

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

3.3.1 Skin Color Based Face Detection in RGB Color Space:
Crowley and Coutaz [113] argue that one of the simplest algorithms for detecting skin
pixels is to use a skin color algorithm. The perceived human color varies as a function of
the relative direction to the illumination. The pixels for skin region can be detected using
a normalized color histogram, and can be further normalized for changes in intensity on
dividing by luminance. Thus an [R, G, B] vector is converted into an [r, g] vector of
normalized color which provides a fast means of skin detection. This gives the skin color
region which localizes the face. As in [113], the output is a face detected image which is
from the skin region. This algorithm fails when there are some more skin region like legs,
arms, etc.

3.3.1.1 Building a Skin Color Model:
We have used almost the same technique in our implementation since the common
RGB representation of color images is not suitable for characterizing skin-color. In the
RGB (red, green and blue) space, the triple component (r, g, b) represents not only color
but also luminance. Luminance may vary across a person's face due to the ambient
lighting and is not a reliable measure in separating skin from non-skin region [114].
Luminance can be removed from the color representation in the chromatic color space.
Chromatic colors [115], also known as "pure" colors in the absence of luminance, are
defined by a normalization process shown below:

r = R/(R+G+B)

b = B/(R+G+B)

48

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Color green is redundant after the normalization because r + g + b = 1. If two points
Pi[ri,gi,bi] and P2[r2,g2,b2], are proportional, i.e.,

ri
r2

gi
g2

b2

then, Pi and P2 have the same color but different brightness.
Chromatic colors have been effectively used to segment color images in many
applications [116], It is also well suited in this case to segment skin regions from non­
skin regions. The color distribution of skin colors of different people was found to be
clustered in a small area of the chromatic color space. Skin colors of different people are
very close, but they differ mainly in intensities [117]. With this finding, we could proceed
to develop a skin-color model in the chromatic color space.
A total of 68 skin samples from 68 color images taken from the same number of
videos (neutral and painful) were used to determine the color distribution of human skin
in chromatic color space and generate the statistical skin-color model. Our samples were
taken from persons of different ethnicities: Asian, Caucasian and African and from
different ages and genders with varying illumination conditions. As the skin samples
were extracted from color images, the skin samples were filtered using a low-pass filter
to reduce the effect of noise in the samples. The used low pass filter is as follows:

1/9

1
1
1

1
1
1

1
1
1

Fig. 3.7(a) and 3.7(b) illustrate the training process, in which a skin-color region is
selected and its RGB representation is stored. It was verified, using training data, that

49

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

skin-colors are clustered in color space, as illustrated in Fig. 3.8(a). Although skin colors
of different people appear to vary over a wide range, they differ much less in color than
in brightness. In other words, skin colors of different people are very close, but they
differ mainly in intensities [117]. With this finding, we could proceed to develop a skincolor model in the chromatic color space.

Fig. 3.7(a). Selected skin region in RGB image Fig. 3.7(b). Selected skin in Chromatic color

I
Fig. 3.8(a). Cluster in color space (RGB)

Fig. 3.8(b). Cluster in chromatic space (r,g)

The color histogram revealed that the distributions of skin-color of different people
are clustered in the chromatic color space and a skin color distribution can be represented
by a Gaussian model N(m, C), where:

Mean, m = E {x} [ where x = (r b)T ] and

50

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Covariance, ^

=

G rr

®rg

<7gr

gg

The Gaussian model which was generated from the training data is illustrated in fig
3.9.

Fig. 3.9. Gaussian model

With this Gaussian fitted skin color model, we can now obtain the likelihood of skin
for any pixel of an image. If a pixel, having been transformed from RGB color space to
chromatic color space, has a chromatic pair value of (r, b), the likelihood of skin for this
pixel can then be computed as follows:

Likelihood = P(r,b) = exp[-0.5(x-m)TC '1(x-m)],

[ where, x = (r,b)T]

Hence, this skin color model can transform a color image into a gray scale image such

51

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

that the gray value at each pixel shows the likelihood of the pixel belonging to the skin.
With appropriate thresholding, the gray scale images can then be further transformed to
binary images showing skin regions and non-skin regions.

3.3.1.2 Skin Region Segmentation:
Our main goal in this segmentation process was to remove the background of the
image from skin regions using the previously discussed skin color model. At first, the
input image is converted to chromatic color space. Then using the Gaussian model, a
grayscale image of skin likelihood pixels is constructed.
Each skin pixel has a set of constant values for each r, g and b component. Also every
pixel in the normalized image has three values and they are normalized-red, normalizedgreen and normalized-blue. The segmentation process extracts these normalized
components and constructs two images (fig. 3.11(c) and fig. 3.11(d)). Each of these
images is converted into black and white image by applying different threshold for
normalized input image (fig. 3.11(e) and fig. 3.11(f)) such that r = 0.41-0.50 and g =
0.21- 0.30. Finally, we perform an ‘AND’ operation between these two black and white
images where white pixels are skin and blacks are non skin pixel. In this approach, due to
noise and distortion in input image, color information of some skin pixels acts like non
skin region and generate non contiguous skin color region. To solve this problem, first a
morphological closing operator is used to obtain skin-color blobs (fig. 3.11(g)). A median
filter was also used to eliminate spurious pixels (fig. 3.11(h)). Boundaries of skin-color
regions are determined using a region growing algorithm in the binary image. Regions
with size less than 1% of image size are eliminated [116]. At the end of the segmentation

52

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

process, black and white skin regions of images are multiplied by the original RGB
image and we then get the skin region (fig. 3.11 (i)) of the face. Fig. 3.10 illustrates a
simple block diagram for the segmentation process and fig. 3.11 shows an example of
segmentation and face location detection process performed on a painful image.

Original RGB image
from video Frame

S eg m en ted R G B F ace

Converting into
Chromatic Color Space

Multiply main RGB
Image by Black and

Thresholding image using
Skin color threshold

Generate Black and white
Face area template

Apply Region Growing
Algorithm

F ilter the n on face
areas

Fig. 3.10. Block diagram for face segmentation

53

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

r____

(a) Ori^nal ROB Image +

(b) Normalized Image »

(c) ExtraetettRfre d) component

(d) Extracted Gf green) component

Apply threshold

Apply Threshold

(a) Black and White image

(t) Black and White image

(gl Image after ‘AND' & ‘closing*

(h) Removingnoiee

(p Final face

Fig. 3.11. Segmentation and approximate face location process

3.3.1.3 Face Detection:
To reduce some search space for eye template matching, bounding rectangles of all
connected areas from the black-white template are taken into consideration and the center
of the face areas is calculated. This is the mass point of the template area. Now the
calculation of the height and width of the bounding rectangle can be performed. If the
height-width proportions satisfy for a face-like shape, we keep those areas, otherwise we
remove them. Thus the template with the approximate face area is multiplied by the
original image and we get the face. Then to consider only the meaningful portions of the
face we use a mask image. A bitwise ‘AND’ operation is used to apply the mask image

54

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

with the original face image. Features in the image which coincide with the white areas
on the mask image will be displayed.
The original video frame, the obtained gray level image, the mask image and the
resultant image are shown in fig. 3.12.

(a)

(b)

(c)

(d)

Fig. 3.12. (a) Original video frame (b) gray level image (c) mask image and (d) resultant image

3.4 Conclusion
Face detection is the first step of an image-based pain recognition system. In this
chapter, we have given an introduction to face detection techniques and have discussed
our face detection approach. In the next chapter, we will discuss the pain recognition
algorithms which will use the output of the face detection algorithm.

55

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 4

Machine Learning Approaches for
Pain Recognition

4.1 Introduction
The face plays a crucial role in interpersonal communication. Seeing a face, we can
recognize a person’s identity, gender, age, expression, etc. This information is
irreplaceable for the normal conduct of human communications. If machines could
recognize such information from a human face, humans and machines might thereby
communicate more smoothly, robustly, and harmoniously. In the previous chapters, we
have looked at the various face detection and facial expression techniques. Also we have
distinguished between feature-based and image-based expression recognition techniques.
In our work, we will use image -based pain recognition method.
Analysis of the face is the main task in image-based pain recognition method because,
during pain, distinct changes occur in the face region. Like humans, who recognize pain
of a person by seeing the face of that person, machines can also detect pain (and all other
expressions) of a person by the analysis of one’s facial image.
In our research, we have developed two image-based pain recognition systems using
two machine learning techniques for training and recognizing pain from the input videos.
The approaches are - the Eigenimage method and the multilayer neural network method.

56

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The rest of the chapter will describe these two approaches. Section 4.2 will describe
the eigenimage-based pain recognition system while section 4.3 will describe the
multilayer neural network -based pain recognition system.

4.2 Eigenimage-based Pain Recognition
The Eigenimage approach is a principal component analysis method, in which a small
set of characteristic pictures is used to describe the variation between the images. In this
method, the goal is to find out the eigenvectors (eigenimages) of the covariance matrix of
the distribution, spanned by a training set of images. Every image is represented by a
linear combination of these eigenvectors. Evaluations of these eigenvectors are quite
difficult for typical image sizes but, an approximation can be made. Recognition is
performed by first projecting a new image into the subspace spanned by the eigenimages
and then classifying the image by comparing its position in the image with the positions
of the known individuals. The general block diagram for eigenimage-based pain
recognition is given in figure 4.1.

57

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Learning Process:
Input Video

input Video
Analysis

!

ROB Color f
Imp

Skin Region

Face

Detection

Detection

Imm

Face Image

Face Deletion
Eigenimages
Creation

Eigenimages +

Recognition

jl..

Training Set
Formaton

Process:
Image Acquisition

Face Detection

Projection to Feature Space * Recognition Result

Fig. 4.1. Block diagram of eigenimage-based pain recognition system

We have used the whole faces, eye regions and lip regions as our image set. So three
set of eigenimages are produced in our work. They are eigenface, Eigeneye and Eigenlip.
The reason for choosing eye and lip region is because these regions are mostly affected
by pain or we can say that most changes occur in the eye and lip regions of the face
during pain.
We have used a total of 68 videos for our pain recognition system. Half of these
videos are of neutral mood and half are of painful mood. Figure 4.2 shows the sample
training images for producing eigenfaces.

58

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

training images

Fig. 4.2. Training images for Eigenfaces

4.2.1 Calculating Eigenfaces
Let a face image I (x, y) be a two-dimensional N X N array of (8-bit) intensity values.
Such an image may also be considered as a vector of dimension N , so that a typical
2
2
image of size N x N becomes a vector of dimension N or, equivalently, a point in N dimensional space. Images of faces, being similar in overall configuration, will not be
randomly distributed in this huge image space and thus can be described by a relatively
low dimensional subspace. The main idea of the PCA method is to find the vectors which
best account for the distribution of face images within the entire image space. These
vectors define the subspace of face images called “face space”. Each vector is of length
N2, describes an N X N image, and is a linear combination of the original face images.
Because these vectors are the eigenvectors of the covariance matrix corresponding to the

59

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

original face images, and because they are face-like in appearance, they are referred to as
Eigenfaces [94],
Steps for Eigenfaces calculation:
1. The first step is to obtain a set S with M face images. Each image is transformed into a
vector of size N and placed into the set.

2. The second step is to obtain the mean image 'F.

Fig. 4.3. Average Image

3. Then we find the difference O between the input image and the mean image
® , = r ,

-

T

4. N ex t w e seek a set o f M orthonorm al vectors, u n, w h ich best describes the

distribution of the data. The kth vector, Uk, is chosen such that

60

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

5.

Ak is a maximum, subject to

O th e r w ise

where Uk and kk are the eigenvectors and Eigenvalues of the covariance matrix C
6. The covariance matrix C has been obtained in the following manner

7. To find eigenvectors from the covariance matrix is a huge computational task.
Since M is far less than N2 by N2, we can construct the M by M matrix,
L= At A ,

[ where L mn -

® mO n ]

8. We find the M Eigenvectors, vi of L.
9. These vectors (vi) determine linear combinations of the M training set face images
to form the eigenfaces ui (figure 4.4)
M

U I “

X

V Ik ®

k ,

1=1,2,3,............ , M

k = 1

61

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

eigenfaces

Fig. 4.4. Eigenfaces for recognition

After computing the Eigenvectors and Eigenvalues on the covariance matrix of the training
images, these M eigenvectors are sorted in order of descending Eigenvalues and chosen to
represent Eigenspace.
Finally, we project each of the original images into Eigenspace. This gives a vector of
weights representing the contribution of each Eigenface to the reconstruction of the given
image.

62

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

reconstructed im age s

Fig. 4.5. Reconstructed training images for eigenfaces

4.2.2 Recognition using Eigenfaces
Once eigenspace has been defined, we can project any image into eigenspace by a
simple matrix multiplication:
= u k (r —'I ' )

f t T = [ w i > ® 2 > ................. , ® m 1

where, Uk is the kth eigenvector and cok is the kth weight in the vector O t
= [ c o i,o ) 2 ,c o 3 ,...o o m ].

The M weights represent the contribution of each respective

Eigenfaces. The vector Q, is taken as the ‘image-key’ for an image projected into
Eigenspace. We compare any two ‘image-keys’ by a simple Euclidean distance measure,
- O , II
— Z s k

63

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

An acceptance (the two images match) or a rejection (the two images do not match)
is determined by applying a threshold. Any comparison producing a distance below the
threshold is a match.
The steps for recognition process are as follows:
1. When an unknown face is found, project it into eigenspace.
2. Measure the Euclidean distance between the unknown face’s position in
eigenspace and all the know faces’ positions in eigenspace.
3. Select the face closest in eigenspace to the unknown face as the match.

4.2.3 Rebuilding an Image using Eigenfaces
A face image can be approximately reconstructed (rebuilt) by using its feature vector
and the eigenfaces as

r ' = vF + o /
where
M'
^

^ W iUi
/=l

is the projected image.

We see that the face image under consideration is rebuilt just by adding each
eigenface with a contribution of

to the average of the training set images. The degree

of the fit or the "rebuild error ratio" can be expressed by means of the Euclidean distance
between the original and the reconstructed face image as

Ir-r
Rebuild Error Ratio, RER =

up

64

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

It has been observed that, the rebuild error ratio increases as the training set members
differ heavily from each other. This is due to the addition of the average face image.
When the members differ from each other, especially in image background, the average
face image becomes messier and this increases the rebuilding error ratio. There are four
possibilities for an input image and its pattern vector:
1. Near a face space and near a face class,
2. Near a face space but not near a known face class,
3. Distant from a face space and near a face class,
4. Distant from a face space and not near a known face class.

In the first case, an individual is recognized and identified. In the second case, an
unknown individual is presented. The last two cases indicate that the image is not a face
image. Case three typically shows up as a false classification. It is possible to avoid this
false classification in this system as

II 0 - 0 ) /

where 6k is a user defined threshold for the faceness of the input face images belonging
to kth face class.
If the image is found to be an unknown face, we could decide whether or not we want
to add the image to our training set for future recognitions. Figure 4.6 and 4.7 depict the
images before and after recognition respectively.

65

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

I«< M W HKOtt t m v i

Fig 4.6. Image to recognize.

The image after the reconstruction process is

Fig 4.7. Image after the reconstruction process.

4.2.4 Eigeneye and Eigenlip Methods:
The method for eigeneyes and eigenlips are similar to the eigenface method except
that in these methods, instead of using the whole face, only the segmented eye or lip
portions of the face images are used. The average eye and lip image and eigeneyes and
eigenlips are shown in figure 4.8 and figure 4.9.

66

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Eijeneye

(a)

(b)
Fig. 4.8. (a) Average eye (b) Eigeneyes

Elfl«nilp

(a)

(b)
Fig. 4.9. (a) Average lip (b) Eigenlips

67

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

We have discussed an eigenimage -based pain recognition system in the previous
section. First we have described the eigenface technique and then eigeneye and eigenlip
methods were also illustrated. In the next section, we will discuss the multilayer neural
network-based pain recognition method.

4.3 Multilayer Neural Network-based Pain
Recognition
Neural network is another machine learning technique that we have used in our pain
recognition system. The general block diagram of neural network-based pain recognition
is shown in figure 4.10.
Input Video

V ideo frame

Skin color modeling
Detected hitman face

Feature extraction
Location
F ea tu re s

Neural network
Painful face

Painless/normal face

Fig. 4.10. Block diagram of neural network-based pain recognition system

68

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The following sections will give an overview and will illustrate our implementation.
Section 4.3.1 will give us some neural network basics, section 4.3.2 will illustrate the
feature extraction process and section 4.3.3 will describe the learning and recognition
process.

4.3.1 Neural Network Basics:
Neural networks can be used in many ways for any pattern recognition problem. In
our implementation, we have used a multilayer error back propagation algorithm.

4.3.1.1 Artificial Neural Networks
An artificial neural network (ANN) is an information processing paradigm that is
inspired by the way biological nervous systems, such as the brain, process information.
The key element of this paradigm is the novel structure of the information processing
system. It is composed of a large number of highly interconnected processing elements
(neurons) working in unison to solve specific problems. ANNs, like people, learn by
example. An ANN is configured for a specific application, such as pattern recognition or
data classification, through a learning process. Learning in biological systems involves
adjustments to the synaptic connections that exist between the neurons. This is true for
ANNs as well.
With their remarkable ability to derive meaning from complicated data, ANNs can be
used to extract patterns and detect trends that are too complex to be noticed by either
humans or other computer techniques. A trained neural network can be thought of as an

69

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

"expert" in the category of information it has been given to analyze. This expert can then
be used to provide projections given new situations of interest and answer "what if'
questions. Other advantages include:

1. Adaptive learning: An ability to learn how to do tasks based on the data given for
training or initial experience.
2. Self-organization: An ANN can create its own organization or representation of
the information it receives during learning time.
3. Real-time operation: ANN computations may be carried out in parallel, and
special hardware devices are being designed and manufactured which take
advantage of this capability.
4. Fault tolerance via redundant information coding: Partial destruction of a network
leads to the corresponding degradation of performance. However, some network
capabilities may be retained even with major network damage.

Therefore, the important components of neural networks are “unit” and “connection.”
Neural networks can be categorized in two ways: how the units are connected and the
type of information processing in the unit. From the point of how the units are connected,
the neural networks are categorized as layered network and mutually connected network
A layered network is a network, which has a layered structure of units with layers
ordered from the input layer to the output layer. A unit in a layer is only connected to the
units in the next higher layer. Radial Basis Function (RBF) networks are layered
networks with three layers. Multilayer perceptrons in general fall in this category.
A mutually connected network is a network, which allows connections between any

70

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

two units for both directions. Hopfield networks and Boltzmann machines explained in
the following use networks of this type.
From the point of view of the information processing in the unit, neural networks are
categorized as follows:
•

Hopfield networks

•

Boltzmann machine

•

Backpropagation

A Hopfield network [118] uses a mutually connected network with symmetrical
weights. Hopfield networks are used for associative memories and solving optimization
problems.
A “Boltzmann machine” [119] is essentially a stochastic version of the Hopfield
network. A Boltzmann machine can also learn the weights of states of the environment
and can simulate the environment later.
Backpropagation is a learning algorithm (procedure) proposed for multilayer
networks. We will explain this algorithm in section 4.3.4.
In the rest of this chapter, we use the word “neural network” to mean a multilayer
perceptron without any confusion since we do not handle the other types of the networks
hereafter.

4.3.1.2 Structure of Multilayer Perceptrons
A multilayer perceptron is a network of units, where a unit receives outputs of other
units or the input from the environment as its inputs, and where a unit outputs to other
units or to the environment.

71

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A unit has more than or equal to one input and a single output. The weighted sum of
the inputs is combined with a bias and then it is operated on by a sigmoid function to
produce the output. Let n be the number of inputs to the unit, o be the output, x, ’s be the
inputs, Wj’s be the weights, b be the bias, and a be the sigmoid function. The output of
the unit is given as follows.
o

0)

Y Jwixi +b
V i =1

Here n is a positive integer, o, xfs, w fs, and b are real numbers, and the sigmoid
function, a is a one-dimensional non-linear monotonic differentiable function. Even
though any function can serve as a sigmoid function for a unit as long as it is one­
dimensional, non-linear, monotonic, and differentiable, the following function, given by
equation (i), is used in our implementation of neural networks and throughout the
experiments described in this thesis and a graph of that function is given by figure 4.11.
(ii)

a (x) =

1
\ + e~

1
0.8

0.6

0.4

0.2

Fig. 4.11. Sigmoid function

72

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A unit, which receives the input from the environment, is called an “input unit.” A
unit, which outputs to the environment, is called an “output unit.” A multilayer
perceptron must have at least one input unit and one output unit. Here the “environment”
means the outside of the network. We decided to allow any connection in the network.
The only restriction is that the network is connected to the environment (i.e. outside) in
both ways. We do not assume any layer structure for a multilayer perceptron. Units in the
network can have different sigmoid functions.
This kind of the networks is actually a layered network. The input layer is the layer,
which consists entirely of the input units. The output layer is the layer, which consists
entirely of the output units. Hidden layers are the layers which consists entirely of the
units without connections to the environment. A hidden layer is sometimes called a
middle layer.

Mddi

output
j L

iI

Fig. 4.12. Structure of a Multilayer Perceptron

73

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Multilayer perceptrons will be called feed-forward ANNs when they allow signals to
travel one way only; from input to output. There is no feedback (loops) i.e. the output of
any layer does not affect that same layer. Feed-forward ANNs tend to be straight forward
networks that associate inputs with outputs. They are extensively used in pattern
recognition. This type of organization is also referred to as bottom-up or top-down.

4.3.1.3 Back Propagation for Multilayer Perceptron
The backpropagation technique consists of the backpropagation of the errors by the
environment through the network from the output units to the input units, and weight and
bias updates. The purpose of backpropagation is to adjust the internal state (weights and
biases) of the multilayer perceptron so that the multilayer perceptron produces the desired
output for the specified input.
In order to realize this, the following error function, which is sometimes called
“energy function”, E is defined for desired input and output pairs {(Ip,Op)p}, where n is
the function defined by the multilayer perceptron.

(Hi)

E = \ ' £ ( 0 P - n ( I P))2
1 P

For each pair, the multilayer perceptron is made to propagate the input Ip forward,
then the squared distance between the output of the network n(Ip) and the desired output
is calculated. The squared distances are summed for all the pairs and divided by 2 to
produce the error function. If the error function is 0, it means that the multilayer network
produces exactly the desired output for each input.

74

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

The backpropagation algorithm is essentially a gradient descent procedure with
respect to this error function. The weights (and the biases) are therefore updated as
follows, where a is some positive constant, called the “learning constant.”
( iv )

Furthermore a momentum term by the past weight update value Aw is added as
follows to avoid oscillation for a practical purpose of making rapid learning possible
[120],

(v)

A oj = - a

— + /?Aco
8 co

where /? is a positive constant less than 1.
The backpropagation algorithm is used for calculation of 8, and the value is assigned
to each unit. In backpropagation, this S is propagated backward from the unit to those
units, which output to that unit. Actually backpropagation is an embodiment of repeated
applications of the chain rule for partial derivatives.
This algorithm is completely localized to each unit. A weight update can be
calculated from 8 and the output of the unit involved. This is the reason why
backpropagation is applicable not only to single-layer, but also to multilayer perceptrons
and such networks even without layer structures.

4.3.2 Feature Extraction
After detecting the human face from video frames, we need to detect reliable facial
features. We observe that most facial feature changes that are caused by pain are in the

75

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

areas of eyes, brows and mouth. In this thesis, two types of facial features in these areas
are extracted: location features and shape features. The idea for extracting features
presented here is similar to that taken by Yang et al. [121] and Ying-li Tian et al. [122], It
is an attempt to make the feature extraction robust to the available video sequences for
this field of research.

4.3.2.1 Location Features Extraction
In this system, six location features are extracted for pain recognition. They are two
eye centers, two eyebrow inner endpoints and two corners of the mouths.

4.3.2.1.1 Eye Corners and Eyebrow Inner Endpoints
To find the eye centers and eyebrow inner endpoints inside the detected frontal or
near frontal face, we have developed an algorithm that searches for two pairs of dark
regions which correspond to the eyes and the brows by using certain geometric
constraints such as position inside the face, size and symmetry to the facial symmetry
axis. Similar to reference [121], the algorithm employs an iterative thresholding method
to find these dark regions. Figure 4.13 shows the iterative thresholding method to find
eyes and brows. Generally, after four iterations, all the eyes and brows are found. If
satisfactory results are not found after 15 iterations, we think the eyes or the brows are
occluded. Unlike the work of Yang et al. to find one pair of dark regions for the eyes only,
we find two pairs of parallel dark regions for both the eyes and eyebrows. By doing this,
not only more features are obtained, but also the accuracy of the extracted features is
improved. Figure 4.13 illustrates this process. Figure 4.13(a), 4.13(b) and 4.13(c) are the

76

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

binary images after applying threshold value 45, 55 and 65 respectively to the original
image. As shown in figure 4.13(a), the right brow and the left eye is wrongly extracted as
the two eyes in Yang’s approach. Figure 4.13(c) shows the correct positions are extracted
for all the eyes and eyebrows in our method. Then the eye centers and eyebrow inner
endpoints can be easily determined.

tj

(a)

(b)

Fig. 4.13. Iterative thresholding of the face to find eyes and brows.

4.3.2.1.2 Mouth Corners
After finding the positions of the eyes, the location of the mouth is first predicted.
Then the vertical position of the line between the lips is found using an integral
projection of the mouth region proposed by Yang et al. [121]. Finally, the horizontal
borders of the line between the lips are found using an integral projection over an edgeimage of the mouth. After Yang et al. , the following steps are use to track the corners of
the mouth:
1) Finding two points on the line between the lips near the previous positions of the
corners in the image
2) Searching along the darkest path to the left and right, until the comers are found.

77

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Finding the points on the line between the lips can be done by searching for the
darkest pixels in search windows near the previous mouth comer positions. Because there
is a strong change from dark to bright at the location of the corners, the corners can be
found by looking for the maximum contrast along the search path [122].

4.3.2.2 Location Feature Representation
After extracting the location features, all faces are normalized to 90 x 90 pixels. We
transform the extracted features into a set of parameters. We represent the face location
features by 5 parameters, which are shown in figure 4.14. These parameters are the
distances between the eye-line and the comers of the mouth, the distances between the
eye-line and the inner eyebrows, and the width of the mouth (the distance between two
comers of the mouth).

Fig. 4 .1 4 . F ace location feature representation for exp ression recogn ition

In figure 4.12, LI and L2 are the distances between the eye-line and the inner
eyebrows, L3 and L4 are the distances between the eye-line and the corners of the mouth

78

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

and L5 is the width of the mouth (the distance between two corners of the mouth).

4.3.2.3 Shape Feature Extraction
In order to extract the mouth shape features, first an edge detector is applied to the
normalized face to get an edge map. Here the edge map is divided into 2 x 2 zones as
shown in figure 4.15.

Fig. 4.15. Zones of the edge map of a sample normalized face

The eyes and mouth shape features are computed from zonal shape histograms of the
edges in the mouth and eyes region. To place the 2 x 2 zones onto the face image, the
upper two zones are placed at the locations of the eyes and the lower two portions are
placed at the location of mouth. The coarsely quantized edge directions are represented as
local shape features and more global shape features are presented as histograms of local
shape (edge directions) along the shape contour. The edge directions are quantized into 4
angular segments (figure 4.16(a)). Representing the whole mouth as one histogram does
not capture the local shape properties that are needed to distinguish pain expressions.
Therefore we use the zones to compute four histograms of the edge directions. Hence, the
eyes and mouth is represented as a feature vector of 16 components (4 histograms of 4

79

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

components). An example of the histogram of edge directions corresponding to the lower
right zone is shown in figure 4.16(b).

■0

0

2

1

3

(b)

(a)

Fig. 4.16. (a) Four quantization levels and (b) Histogram corresponding to the middle
zone of the mouth.

4.3.3 Pain Recognition using Neural Network
The proposed system is applied on a wide variety of painful and neutral video
sequences collected from a variety of peoples (students, faculties, officers etc.) of the
University of Rajshahi, Bangladesh. The videos were taken individually in different
lightening conditions and different backgrounds. It is found that the system successfully
detects skin region of the images collected from video analysis. However, it is important
to note that not all detected regions contain faces. Some correspond to parts of human
body, while other corresponds to objects with colors similar to those of skin. We
implemented the entire algorithm in MATLAB 7.0 on a PENTIUM-IV windows XP
workstation.

80

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

We have used a neural network-based recognizer having the structure shown in figure
4.17. The standard back-propagation in the form of a layered neural network with varying
number of hidden layers was used to recognize facial expressions. The inputs to the
network were the 5 location features (figure 4.14) and the 16 zone components of shape
features of the eyes and mouth regions (figure 4.16). Hence, a total of 21 features were
used to represent the amount of pain in a face image. The outputs were a set of two
values - painful face or painless face. We tested various numbers of hidden units and
found that 10 hidden units gave the best performance.

5 Location
features +
16 Shape
features

5 Location features

\

Painfull
Face

| Id Shape features

|

Painless
Face

Fig. 4.17. Neural network-based pain recognizer

81

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

4.4 Conclusion
In this chapter we have discussed the two machine learning approaches for pain
recognition. After some introduction in section 4.1, an eigenimage-based pain recognition
system is described in section 4.2. In section 4.3, multilayer neural network-based pain
recognition system is illustrated.
In the next chapter, we will report on the results obtained using eigenimage-based and
neural network-based pain recognition approaches and then will compare the results.

82

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 5

Simulations and Results

1.1 Introduction
Two image-based pain recognition systems are developed in this work. One of the
systems uses eigenimage method, while the other uses multilayer neural network for
learning and recognition. Also for face detection from the available frame, skin color
modeling technique is used. We have described the face detection procedure in chapter 3
and in chapter 4, we have described the learning and recognition procedures by
eigenimage and neural network methods. The results of these experiments will be
presented and explained in this chapter.
In section 5.2, some details of the obtained dataset will be presented. The origin of the
dataset, the number of videos, the nature of the contents and the technical details of
those videos and the image extraction procedure will also be discussed.
In section 5.3 and 5.4, the results of the eigenimage-based method and neural
network- based method will be discussed respectively. Three eigenimage-based methods
will be presented in section 5.3. They are eigenface, eigeneye and eigenlip. The
individual and combined results of these three methods will be presented. In section 5.4,
the results with the multilayer neural network-based classifier will be discussed. The
variations of the results with different numbers of hidden layers and different numbers of

83

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

neurons of hidden layers will be compared.
Section 5.5 will show the comparison between these two machine learning methods
for pain recognition. Two types of comparison results will be shown - speed comparison
and accuracy comparison.
Finally section 5.6 will summarize the results of the research.

5.2 Image Acquisition
We have used a video database of persons with painful and neutral mood. This
database was collected from the Computer Science & Engineering department from
University of Rajshahi, Bangladesh. In this database we have 68 video files of 34
persons. These persons are of different colors, ethnicities, ages and genders and all have
shoulder pain. Two videos were taken for each person. One was in neutral mood and the
other was in painful mood. The pain was generated for raising hand or shaking head etc..
All of these videos were taken intentionally (i.e., the respondents are aware of the event).
The resolutions of the videos were 96 x 96. These video files were first read and the
numbers of frames of each video were determined. The middle frame of the videos were
then stored as an image in the database for further processing. The reason for taking
middle frame was that, in almost all the pain videos, the expression for pain begins after
some time from the starting of the videos and ends some time before the ending of the
videos. So, by taking the middle frame, it was ensured that the expression for pain in a
pain video will be captured. The videos were roughly 1 to 1.5 second long.

84

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

5.3 Results of Eigenimage Method
The general block diagram of eigenimage-based pain recognition method is depicted
in figure 5.1.

Input Video,
Image
Acquisition
Face
Detection
E igenim ages
(created in
learning phase)

Maximum
Features
Matched

elected Class
Painful,1P ainless video.

Fig. 5.1. Block diagram of eigenimage-based pain recognition system

The motivation behind use of the eigenimage is that, previous work ignored the
question of which features are important for classification, and which are not. The
Eigenimage approach seeks to answer this by using Principal Component Analysis (PCA)
of the facial images. This analysis reduces the dimensionality of the training set, leaving
only those features that are critical for face recognition.
The system is initialized by first acquiring the training set (ideally a number of
examples of each subject with varied lighting and expression). Eigenvectors and

85

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Eigenvalues are computed on the covariance matrix of the training images. The M
highest eigenvectors are kept. Finally, the known individuals are projected into the face
space, and their weights are stored. This process is repeated as necessary.
The steps for recognition process are as follows:
1. When an unknown image is found, we project it into eigenspace.
2. We first measure the Euclidean distance between the unknown image’s position
in eigenspace and all the know images’ positions in eigenspace.
3. Then we select the face closest in eigenspace to the unknown image as the match.
The system is implemented in Matlab 7.0 on a PENTIUM-IV windows XP
workstation. The following table shows the average skin region detection rate, average
false skin detection rate and average face detection rate of the detected skin region of the
proposed system:

Table 5.1. Average face detection rate
Test no.

Number of
test videos

Skin region
detection rate

1

10

2

No. of
detected Face

89% ± 1%

False skin
region
detection rate
14%+ 2%

20

91% ±2%

12% ±2%

17 ± 1

3

40

81% ±1%

8% + 3%

35 ± 3

4

55

89% ± 3%

16% ±3%

52 ± 2

5

68

92% ± 3%

15% ±1%

63 ± 3

9± 1

From the above resultant table it is found that the average face detection rate is

86

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

90% + 2% and it does not depend on the number of input videos. From experimental data
of Gaussian distribution it is also observed that there is no difference between the
chromatic color space for infants and adults.
We have used three eigenimage techniques in our research. They are eigenface,
eigeneye and eigenlip. To check the accuracy of our pain recognition system, we have
tested each eigenimage method individually and collectively. Table 5.2 shows the
accuracy results of different eigenimage methods.

Table 5.2. Comparison of various eigenimage methods for pain recognition

Eigenfaces

Average pain
recognition rate
89% ± 2%

Eigeneyes

82% ± 3%

Eigenlips

84% ± 5%

Eigenfaces & Eigeneyes

89% ± 1%

Eigenfaces & Eigenlips

90% ± 3%

Eigeneyes & Eigenlips

86% + 4%

Eigenfaces, Eigeneyes &
Eigenlips

92% ± 2%

Method(s) used

As the table shows, among the eigenface, eigeneye and eigenlip methods, eigenface
method gives us the best performance which is 89% + 2%. We also notice the results
obtained by combining two or three eigenimage methods. Combination of eigenface and
eigeneye methods gives us 89% + 1%, combination of eigenface and eigenlip methods

87

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

gives us 90% + 3% and combination of eigeneye and eigenlip methods gives us 86% +
4%. But the best result (92% + 2%) we obtained is by combining eigenface, eigeneye and
eigenlip methods.
The results are also shown by a bar diagram in figure 5.2.

Comparison of varius eigeimage methods for pain
recognition
«

94

Z 92
o

2

90

0

88

*c 86
1

84

*

80

4

78

m
01 82
e

76
Eigenface

Eigeneye

Eigenlip

Eigenface
+ Eigeneye

Eigenface
+ Eigenlip

Eigeneye
+Eigenlip

Eigenface+
Eigeneye+
Eigenlip

E i g e n i ma g e m e t h o d s

Fig. 5.2. Bar diagram of accuracy results for various eigenimage methods
This section has given us the results of the eigenimage-based pain recognition process.
The following section will give the results obtained with multilayer back propagation
neural network-based pain recognition technique.

88

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

5.4 Results of Neural Network Method
The multilayer neural network with error backpropagation technique is used for pain
recognition. In this approach, the errors are backpropagated by the environment through
the network from the output units to the input units, and weight and bias updates. The
purpose of backpropagation is to adjust the internal state (weights and biases) of the
multilayer perceptron so that the multilayer perceptron produces the desired output for
the specified input. The general block diagram of neural network-based pain recognition
is shown in figure 5.3.

Input Video

SMn color modeling
Detected human face

Feature extraction
Location
Features

Neural network
Painful face

Painless/ neutral face

Fig. 5.3. Block diagram of neural network-based pain recognition system

We have used two types of facial features as input to our neural network system.
They are location features and shape features. Five location features and sixteen shape

89

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

features are inputted to the 21-input neural network system. The number of output units
in the output layer is two, as there are only two possible outcomes - painful face and no­
painful face. Table 5.3 shows the effects of the number of hidden units for 1 hidden
layered network with 21 inputs:

Table 5.3. Effect of the number of neurons (in 1 hidden layer) on system accuracy
1 Hidden layer
Number
of
neurons
5
10
20

Training
Time
(min)
1.01 ±
0.1
1.56 +
0.14
1.89 ±
0.03

Average
accuracy

Recognition
time
(min)
0.45 ± 0.04

Non-painful
face recognition
rate
71% ±3%

Painful face
recognition
rate
46% ±4%

59% ± 3%

0.65 ±0.09

95% ±1%

88% ± 3%

92% ± 2%

0.69 + 0.02

80% ± 2%

73% ± 1%

76% ± 2%

From the above table, we can see that for 1 hidden layered network, with 10 hidden
neurons, the network gives us the best accuracy for both painful and non painful face
recognition. With 5 hidden units, the network gives the worst accuracy.
To obtain the best accuracy we have checked our system with 2, 3 and 5 hidden
layers. Table 5.4, table 5.5 and table 5.6 show the effect of number of hidden layers and
the number of neurons in those hidden layers for 21 inputs on training and recognition
tim e and system accuracy.

90

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Table 5.4. Effect of the number of neurons (in 2 hidden layers) on system accuracy
2 Hidden layer
Number
of
neurons
5
10
20

Training
Time
(min)
1.07 +
0.05
1.90 +
0.02
3.56 +
0.25

Recognition
time
(min)
0.53 ±0.02

Non-painful
face recognition
rate
76% ± 5%

Painful face
recognition
rate
66% ± 3%

71% ±4%

0.71 ±0.08

97% ±1%

91% ±2%

94% ± 2%

0.97 ± 0.04

82% ± 1%

76% ± 3%

78% ± 3%

Average
accuracy

Table 5.4 shows that for a 2 hidden layered network, the system gives the best
performance with 10 neurons in the hidden layers. The training and recognition time with
this network setup is 1.90 ± 002 minutes and 0.71 ± 0.08 minutes respectively. The
average accuracy is 94% ± 2% for 10 hidden neurons in each hidden layer. It can also be
inferred that the training and recognition time are increased with the number of hidden
layers and the number of neurons in hidden layers.

Table 5.5. Effect of the number of neurons (in 3 hidden layers) on system accuracy
3 Hidden layer
Number
of
neurons
5
10
20

Training
Time
(min)
1.98 +
0.3
3.16 ±
0.34
5.89 ±
1.23

Recognition
time
(min)
0.98 ± 0.09

Non-painful
face recognition
rate
92% ±2%

Painful face
recognition
rate
91.68%

Average
accuracy
92% ± 2%

1.77 ±0.21

92% ± 1%

90% ±1%

91% ±1%

1.99 ±0.67

87% ± 3%

80% ± 4%

83% ± 3%

91

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Table 5.5 shows the simulation results for a three hidden layered network. The system
gives the best performance with 5 neurons in each hidden layer. The training and
recognition time with this network setup is 1.98 + 0.3 minutes and 0.98 + 0.09 minutes
respectively. The average accuracy is 92% + 2% for 5 hidden neurons in each hidden
layers.
Table 5.6. Effect of the number of neurons (in 5 hidden layers) on system accuracy
5 Hidden layer
Number
of
neurons
5
10
20

Training
Time
(min)
2.89 +
0.3
5.78 +
1.01
9.98 +
2.31

Recognition
time
(min)
1.65 ±0.14

Non-painful
face recognition
rate
82% ± 3%

Painful face
recognition
rate
75% ± 5%

79% ± 3%

2.01+0.6

92% ± 1%

89% ± 4%

91% ±2%

2.99 ± 0.8

89% ± 1%

90% ±1%

89% ±1%

Average
accuracy

Table 5.6 shows the simulation results for a five hidden layered network. Also here,
the system gives the best performance with 5 neurons in each hidden layer. The training
and recognition time with this network setup is 2.89 + 0.3 minutes and 1.65 + 0.14
minutes respectively. The best average accuracy for 5 hidden layers is 91% + 2% and it is
with the 10 neurons in each hidden layers combination.
From the above tables, it is clear that the system works the best for the neural network
with 2 hidden layers and 10 neurons in each hidden layer. Also it can be seen that, the
times for training and recognition are proportional to the number of hidden units of the
system.
The next section will show the timing and accuracy comparison results between two
machine learning techniques.
92

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

5.5 Comparison between Eigenimage Method and
Neural Network Method
In this section, we compare between the eigenimage method and neural network
method. At first in section 5.5.1, results of the speed comparison and then in section 5.5.2,
the accuracy comparison result will be shown. For speed and accuracy comparison, the
best eigenimage method and the neural network with proper number of hidden layers and
hidden units are considered.

5.5.1 Speed Comparison
Processing speed is an important factor from the computational point of view. In this
speed comparison process, for the eigenimage method, we consider the combination of
eigenface, eigeneye and eigenlip methods, because this combination has given us the best
results. For neural network-based method, we consider a network with 2 hidden layers
and 10 neurons in each of those layers. Table 5.7 shows the speed comparison results.

Table 5.7. Speed comparison
Machine
learning
methods
Eigenimage

Training
time
(min)
3.06 ±0.56

Recognition
time
(min)
1.87 ±0.22

Neural
network

1.90 + 0.02

0.71+0.08

From the above table, it is clear that the neural network-based pain recognition
system is faster than the eigenimage-based pain recognition system. The training and
93

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

recognition time for the neural network-based system are 1.90 + 0.02 minutes and 0.71 +
0.08 minutes respectively. Whereas, the training and recognition time for the eigenimagebased system are 3.06 ± 0.56 minutes and 1.87 + 0.22 minutes respectively.

5.5.2 Accuracy Comparison
Accuracy of a pain recognition system means what percentage of painful and neutral
videos are correctly recognized by the system. Timing is not considered in this
measurement. For both approaches, 34 neutral videos and the same number of painful
videos were considered in accuracy comparison. Table 5.8 shows the accuracy
comparison result.

Table 5.8. Accuracy comparison
Machine
learning
methods

No. of input faces
Painful

Nonpainful

Eigenimage

34

34

No. of
correctly
recognized
painful
faces
29 ± 2

Neural
network

34

34

32 ± 3

No. of
correctly
recognized
non-painful
faces
33 ± 1

91% ±2%

33 ±1

93% ± 2%

Accuracy
(%)

From the above table, it can be seen that neural network-based pain recognition
system gives us better results th an the eigenim age-based pain recognition system . B ut the

difference of the accuracy results of the two methods is very small. The accuracy of the
eigenimage-based pain recognition system is 91% + 2% and the accuracy of the neural
network-based pain recognition system is 93% + 2%.

94

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

5.6 Summary
In this chapter, we have first shown the accuracy results of face detection technique
using skin color modeling technique. Then we have shown the results of eigenimagebased pain recognition system and neural network-based pain recognition system and
then have shown the comparison results between these two methods in terms of their
processing speed and accuracy.
The average face detection rate is 90% + 2% and it does not depend on the number of
input videos. From experimental data of Gaussian distribution it is also observed that
there is no difference between the chromatic color space for infants and adults.
For eigenimage-based pain recognition, we have used seven different eigenimage
techniques. They are eigeneye, eigenlip, eigeneye, combination of eigeneye and eigenlip,
combination of eigenface and eigenlip, combination of eigeneye and eigenface and
combination of eigenface, eigeneye and eigenlip. Among these methods, eigenface
method gives us the best performance which is 89% + 2% if we use it alone. We also
notice the results obtained by combining two or more of these three eigenimage methods.
Combination of eigenface and eigeneye methods gives us 89% + 1%, combination of
eigenface and eigenlip methods gives us 90% + 3% and combination of eigeneye and
eigenlip methods gives us 86% ± 4%. But the best result (92% + 2%) we obtained is by
combining eigenface, eigeneye and eigenlip methods.
For neural network-based pain recognition, we have used multilayer backpropagation
algorithm. In this approach, the errors are backpropagated by the environment through
the network from the output units to the input units, and weight and bias updates. For 1
hidden layered network, with 10 neurons, the network gives us the best accuracy for both

95

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

painful and non painful face recognition. The system is also checked with more than 1
hidden layer (and with different number of hidden neurons). From the obtained results, it
is clear that the system works the best for the neural network with 2 hidden layers and 10
neurons in each hidden layer. Also it can be seen that, the time for training and
recognition are proportional to the number of hidden units of the system.
For speed and accuracy comparison between eigenimage and neural network methods,
we have considered the best option for both eigenimage and neural network-based
method. From the results, we can say that neural network-based method is better in terms
of both speed and accuracy than eigenimage-based method.

96

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Chapter 6

Conclusions

In this chapter, we present a summary of contributions, a discussion of limitations, and
suggestions for future works.

6.1 Contributions
In this study, we have introduced two machine learning approaches for pain
recognition from facial images collected from video sequences.
First, we introduced the architecture and an algorithm for eigenimage-based pain
recognition. We have used three eigenimage techniques for this purpose - eigenface,
eigeneye and eigenlip. We used these eigenimage techniques individually and
collectively to check the performance of the system.
Secondly, we described an implementation of the architecture and the algorithm for
multilayer neural network-based pain recognition system. Error back propagation
technique is used for learning and recognition. We have tested the system with different
n u m b er o f h id d en layers and different num bers o f n euron in hidden layers.

Face detection was the first step of our implementation and we have used skin color
modeling technique for this purpose. The average skin region detection rate is 88% + 2%,
whereas the average false skin detection rate is 12% + 3%. The average face detection

97

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

rate is 90% + 2%. From experimental data of Gaussian distribution it is also observed
that there is no difference between the chromatic color space for infants and adults.
Videos of the people of different ethnicities are used as input for this system. But the
system’s behaviors were unique to all the videos.
The input and testing video files are collected from the video database of Computer
Science & Engineering department of University of Rajshahi, Bangladesh.

In this

database, there are two video files for every person and a total of 34 persons with
different colors, ethnicities, ages and genders are considered. In one file, the subject is in
neutral mood and in the other, the person is in painful mood due to moving shoulder or
moving heads etc.. The resolutions of the videos are 96 X 96. The videos are roughly 1 to
1.5 second long.
Among the three eigenimage techniques, eigenface method gives us the best
performance which is 89% + 2%. We also noticed the results obtained by combining two
or more of these three eigenimage methods. Combination of eigenface and eigeneye
methods gave us 89% + 1%, combination of eigenface and eigenlip methods gave us 90%
+ 3% and combination of eigeneye and eigenlip methods gave us 86% + 4%. But the best
result (92% + 2%) we obtained was by combining eigenface, eigeneye and eigenlip
methods.
For multi layer neural network-based approach, the system works the best for the
neural network with two hidden layers and 10 units in each hidden layer. Also in this case,
the time for training and recognition are proportional to the number of hidden units of the
system.
Comparing between the eigenimage and neural network-based approach, the second

98

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

one is faster and also gives us the better accuracy than the first one. The training and
recognition time for the neural network-based system are 1.90 + 0.02 minutes and 0.71 +
0.08 minutes respectively. Whereas, the training and recognition time for the eigenimagebased system are 3.06 + 0.56 minutes and 1.87 + 0.22 minutes respectively. The accuracy
of the eigenimage-based pain recognition system is 91% + 2% and the accuracy of the
neural network-based pain recognition system is 93% + 2%.

6.2 Limitations
In this section, we describe the limitations of the architecture and results presented in
this study.
The limitations are followings:
. With skin color modeling technique for face detection, the system cannot give us the
best result in the case of videos of fully or partially bald headed people. In those cases,
portions of head are detected as face. Also the system has some problem in face detection
if the color of the dress of the person in the video is almost similar to skin color.
. The system is not tested with the videos of low resolution and poor quality. So no
steps were taken to extract the image from a low quality video. Hence it will not work
well in that case.
. The implementation of the neural network approach is limited to layered neural
networks with full connections between layers and the algorithm is also limited to error
backpropagation.
. The choices of parameters such as numbers of units in hidden layers, learning

99

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

constants, and momentum are arbitrary, mostly by trial and error.
. The system is implemented in MATLAB, so it is comparatively slower.
. The results were not compared with other facial expression recognition system.

6.3 Future Works
In this section, we provide suggestions for future work in the research area of pain or
expression recognition. Even though we have discussed two approaches of pain
recognition, there are still items to be explored to make them really useful tools in
practical life. These items include the followings:
. Development and implementation of algorithms or methods that work in real-time
environment, i.e. in hospital.
. Comparisons with other algorithms and methods that can improve the system in
terms of accuracy. It can also enhance the application areas by recognizing all
expressions from facial images.
. Choosing a better face detection technique that can work better in all situations and
in the case of lower quality videos.
. Testing of other neural network algorithms and different architectures which give us
better results.

100

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Bibliography
[1]

Beat Fasel and Juergen Luettin (2003), “Automatic Facial Expression
Analysis: A Survey”, Pattern Recognition, Vol. 36, No. 1, pp. 259-275.

[2]

J. J. Weng and D. L. Swets (1999), “Face Recognition”, in A. K. Jain, R.
Bolle,

and

S.

Pankanti

(Editors),

BIOMETRICS:

PERSONAL

IDENTIFICATION IN NETWORKED SOCIETY, Kluwer Academic
Press.
[3]

S.A. Rizvi, P.J. Phillips, and H. Moon (1998), “A verification protocol and
statistical performance analysis for face recognition algorithms”, in the
proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Santa Barbara, USA, pp. 833-838.

[4]

V. Bruce (1999), “Identification of Human Faces”, Image Processing and
Its Applications, Conference Publication No. 465, IEEE, pp. 615-619.

[5]

R. Chellappa, C. L. Wilson and S. Sirohey (1995), “Human and Machine
Recognition of Faces: A Survey”, Proceedings of the IEEE, Vol. 83, No. 5,
pp. 705-740.

[6]

P. Temdee, D. Khawparisuth, and K. Chamnongthai (1999), “Face
Recognition by Using Fractal Encoding and Backpropagation Neural
Network”, in the proceedings of the 5th International Symposium on Signal
Processing and its Applications, ISSPA ’99, Australia, pp. 159-161.

[7]

J. Huang (1998), “Detection Strategies For Face Recognition Using
Learning and Evolution”, PhD. Thesis, George Mason University.

[8]

R. B runelli, and T. P oggio (1993), “Face R ecognition: F eatures versus
T em plates” , IE E E T ransactions, P A M I, 15(10), pp. 1042-1052.

[9]

M. A. Turk and A. P. Pentland (1991), “Face recognition using Eigenfaces”, in the proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Maui, Hawaii, USA, pp. 586-591.

101

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[10]

L. Sirovich, and M. Kirby (1987), “Low-dimensional Procedure for the
Characterization of Human Faces”, Journal of the Optical Society of
America, Vol. 4, No. 3, pp. 519-524.

[11]

M. Kirby, and L. Sirovich (1990), “Application of the Karhunen-Loeve
Procedure for the Characterization of Human Faces”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 12, No. 1, pp. 103-108.

[12]

B. Kepenekci (2001), “Face Recognition Using Gabor Wavelet Transform”,
MSc. Thesis, Middle East Technical University, Turkey.

[13]

B. Moghaddam, and A. Pentland (1995), “An Automatic System for ModelBased Coding of Faces”, in the proceedings of the IEEE Data Compression
Conference, pp. 362-370.

[14]

S. J. Lee, S. B. Yung, J. W. Kwon, and S. H. Hong (1999), “Face Detection
and Recognition Using PCA”, IEEE TENCON, pp. 84-87.

[15] S. Z. Lee, and J. Lu (1998), “Generalizing Capacity of Face Database for
Face Recognition”, IEEE, pp. 402-406.
[16] J. L. Crowley, and K. Schwerdt (1999), “Robust Tracking and Compression
for Video Communication”, in the proceedings of the IEEE International
Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in
Real Time Systems, Corfu, Greece, pp. 2-9.
[17]

B. Moghaddam, and A. Pentland (1998), “Beyond Euclidean Eigenspaces:
Bayesian Matching for Visual Recognition”, FACE RECOGNITION:
FROM THEORIES TO APPLICATIONS, pp. 921-930.

[18]

B. Moghaddam, T. Jebara, and A. Pentland (2000), “Bayesian Face
Recognition”, Pattern Recognition, Vol. 33, No. 11, pp. 1771-1782.

[19]

B. Moghaddam, and A. Pentland (1995), “Probabilistic Visual Learning for
Object Detection”, in the proceedings of the 5th International Conference
on Computer Vision, Cambridge, MA, USA.

[20]

B. Moghaddam, and A. Pentland (1997), “Probabilistic Visual Learning for
Object Representation”, IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 19, No. 7.

[21]

B. Moghaddam (2002), “Principal Manifolds and Probabilistic Subspaces

102

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

for Visual Recognition”, IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 24, No. 6.
[22]

C. Liu, and H. Wechsler (1998), “A Unified Bayesian Framework for Face
Recognition”, in the proceedings of the

1998

IEEE International

Conference on Image Processing (ICIP '98), Chicago, Illinois, USA, pp.
151-155.
[23] C. Liu, and H. Wechsler (1998), “Probabilistic Reasoning Models for Face
Recognition”, in the Proceedings of the 1998 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR '98), Santa Barbara, CA.,
USA, pp. 827-832.
[24]

K. C. Chung, S. C. Kee, and S. R. Kim (1999), “Face Recognition using
Principal Component Analysis of Gabor Filter Responses”, IEEE, p. 53-57.

[25] K.

Etemad,

and R.

Chellappa(1996),

“Face

Recognition Using

Discriminant Eigenvectors”, in the proceedings of the 1996 IEEE
International Conference on Acoustics, Speech and Signal Processing,
Atlanta, Georgia, USA, pp. 2148-2151.
[26]

P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman (1997), “Eigenfaces
vs. Fisherfaces: Recognition Using Class Specific Linear Projection”, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7.

[27] W. Zhao, A. Krishnaswamy, R. Chellappa, D. L. Swets, and J. Weng
(1998), “Discriminant Analysis of Principal Components for Face
Recognition”, in the proceedings of the International Conference on
Automatic Face and Gesture Recognition, Nara, Japan, pp. 336-341.
[28]

W. Zhao, R. Chellappa, and N. Nandhakumar

(1998), “Empirical

Performance Analysis of Linear Discriminant Classifiers”, in the
Proceedings of the 1998 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR ’98), Santa Barbara, CA., USA, pp. 164-169.
[29]

W. Zhao (1999), “Subspace Methods in Object/Face Recognition”, in the
proceedings of the International Joint Conference on Neural Networks,
Washington DC, USA, pp. 3260-3264.

[30]

C. Podilchuk, and X. Zhang (1996), “Face Recognition Using DCT-Based

103

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Feature Vectors”, in the proceedings of the 1996 IEEE International
Conference on Acoustics, Speech and Signal Processing, Atlanta, Georgia,
USA, pp. 2144-2146.
[31]

R. O. Duda, P. E. Hart, and D. G. Stork (2001), “PATTERN
CLASSIFICATION”, John Wiley & Sons, 2nd Edition.

[32] S. Eickeler, S. Muller, and G. Rigoll (1999), “High Quality Face
Recognition in JPEG Compressed Images”, in the proceedings of the IEEE
International Conference on Image Processing (ICIP ‘99), Cobe, Japan, pp.
672-676.
[33]

A. V. Nefian, and M. H. Hayes (1998), “Hidden Markov Models for Face
Recognition”, in the proceedings of the

1996 IEEE International

Conference on Acoustics, Speech and Signal Processing, Atlanta, Georgia,
USA, pp. 2721-2724.
[34] H. Spies, and I. Ricketts (2000), “Face Recognition in Fourier Space”, in
the proceedings of “Vision Interface 2000”, Montreal, Canada, pp. 38-44.
[35] P. J. Phillips (1999), “Support Vector Machines Applied to Face
Recognition”, Advances in Neural Information Processing Systems 11,
MIT Press, USA, pp. 803-809.
[36] C. S. Bobis, R. C. Gonzalez, J. A. Cancelas, I. Alvarez, and J. M. Enguita
(1999), “Face Recognition Using Binary Thresholding for Features
Extraction”, in the proceedings of the IEEE CIAP ‘99, pp. 1077-1080.
[37]

S. Cagnoni, A. Poggi (1999), “A Modified Modular Eigenspace Approach
to Face Recognition”, in the proceedings of the IEEE CIAP ‘99, pp. 490495.

[38]

A. X. Guan, and H. H. Szu (1999), “A Local Face Statistics Recognition
Methodology beyond ICA and/or PCA”, in the proceedings of the
International Joint Conference on Neural Networks, Washington DC, USA,
pp. 1016-1027.

[39]

A. Martinez (1999), “Face Image Retrieval Using HMMs”, in the
proceedings of the IEEE Workshop on Content-Based. Access of Image and
Video Libraries, Fort Collins, CO, USA, pp. 35-39.

104

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[40] P. Temdee, D. Khawparisuth, and K. Chamnongthai (1999), “Face
Recognition by Using Fractal Encoding and Backpropagation Neural
Network”, in the proceedings of the 5th International Symposium on Signal
Processing and its Applications (ISSPA ’99), Brisbane, Australia, pp. 159161.
[41] M. Lades, J. C. Vorbruggen, J. Buhmann, J. Lange, C. Von der Malsburg,
R. P. Wurtz, and W. Konen (1993), “Distortion Invariant Object
Recognition in the Dynamic Link Architecture”, IEEE Transactions on
Computers, Vol. 42, pp. 300-310.
[42]

L. Wiskott, J. M. Fellous, N. Kruger, and C. von der Malsburg (1997),
“Face Recognition by Elastic Bunch Graph Matching”, IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, pp. 129-132.

[43]

M. J. Er, S. Wu, and J. Lu (1999), “Face Recognition Using Radial Basis
Function (RBF) Neural Networks”, in the proceedings of the 38th
Conference on Decision & Control, Phoenix, Arizona USA, pp. 2162-2167.

[44]

C. E. Thomaz, R. Q. Feitosa, and A. Veiga (1998), “Design of Radial Basis
Function Network as Classifier in Face Recognition Using Eigenfaces”,
IEEE, pp. 118-123.

[45]

A. Martinez (1999), “Face Image Retrieval Using HMMs”, in the
proceedings of the IEEE Workshop on Content-Based. Access of Image and
Video Libraries, Fort Collins, CO, USA, pp. 35-39.

[46]

Z. Liposcak, and S. Loncaric (1999), “Face Recognition from Profiles
Using Morphological Operations”, IEEE Computer Society, ISBN: 0-79650378-0, pp. 47-52.

[47]

P. Ekman (1994), “Strong Evidence for Universals in Facial Expressions: A
Reply to Russell’s Mistaken Critique”, Psychology Bulletin, Vol. 115, No.
2, pp. 268-287.

[48]

C.E. Izard (1994), “Innate and Universal Facial Expressions: Evidence from
Developmental and Cross-cultural Research”, Psychology Bulletin, Vol.
115, No. 2, pp. 288-299.

[49]

P. Ekman and W.V. Friesen (1978), “FACIAL ACTION CODING

105

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

SYSTEM: INVESTIGATOR’S GUIDE”, Consulting Psychologists Press,
Palo Alto, CA, USA.
[50]

M.J. Blackm and Y. Yacoob (1995), “Tracking and Recognizing Rigid and
Non-rigid Facial Motions using Local Parametric Models of Image
Motion”, in the proceedings of the International Conference on Computer
Vision (ICCV ‘95), Cambridge, MA, USA, pp. 374-381.

[51]

G. Donato, M.S. Bartlett, J.C. Hager, P. Ekman and T.J. Sejnowski (1999),
“Classifying Facial Actions”, IEEE Transaction on Pattern Analysis and
Machine Intelligence, Vol. 21, No. 10, pp. 974-989.

[52]

I.A. Essa and A.P. Pentland (1997), “Coding, Analysis, Interpretation and
Recognition of Facial Expressions”, IEEE Transaction on Pattern Analysis
and Machine Intelligence, Vol. 19, No. 7, pp. 757-763.

[53]

A. Lanitis, C.J. Taylor and T.F. Cootes (1995), “A Unified Approach to
Coding and Interpreting Face Images”, in the proceedings of the 5th
International Conference on Computer Vision (ICCV ’95), Cambridge,
MA, USA, pp. 368-373.

[54]

J. Lien (1998), “Automatic Recognition of Facial Expressions using Hidden
Markov Models and Estimation of Expression Intensity”, Ph.D. Thesis,
Carnegie Mellon University, USA.

[55]

A. Martinez (1999), “Face Image Retrieval using HMMs”, in the
proceedings of the IEEE Workshop on Content-Based. Access of Image and
Video Libraries, Fort Collins, CO, USA, pp. 35-39.

[56]

K. Mase (1991), “Recognition of Facial Expression from Optical Flow”,
IEICE Transaction, Vol. E74, No. 10, pp. 3474-3483.

[57]

A. Nefian and M. Hayes (1999), “Face Recognition using an Embedded
HMM”, in the proceedings of the IEEE Conference on Audio and Videobased Biometric Person Authentication”, Washington DC, USA, pp. 19-24.

[58]

N. Oliver, A. Pentland and F. Berard (2000), “LAFTER: A Real-time Face
and

Lips

Tracker

with

Facial

Expression

Recognition”,

Pattern

Recognition, Vol. 33, pp. 1369-1382.
[59]

T. Otsuka and J. Ohya (1997), “Recognizing Multiple Person’s Facial

106

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Expressions using HMM Based on Automatic Extraction of Significant
Frames from Image Sequences”, in the proceedings of the. International
Conference on Image Processing (ICIP ‘97), Washington D C, USA, pp.
546-549.
[60]

M. Rosenblum, Y. Yacoob and L.S. Davis(1996), “Human Expression
Recognition from Motion using A Radial Basis Function Network
Architecture”, IEEE Transaction on Neural Network, Vol. 7 No. 5, pp.
1121-1138.

[61]

N. Ueki, S. Morishima, H. Yamada and H. Harashima (1994), “Expression
Analysis/Synthesis System Based on Emotion Space Constructed by
Multilayered Neural Network”, Systems Computation, Japan, Vol. 25, No.
13, pp. 95-103.

[62]

M. Pantic and L.J.M. Rothkrantz (2000), “Automatic Analysis of Facial
Expressions: the State of the Art”, IEEE Transaction on Pattern Analysis
and Machine Intelligence, Vol. 22 No. 12, pp. 1424-1445.

[63]

Y. Yacoob and L.S. Davis (1996), “Recognizing Human Facial Expressions
from Long Image Sequences using Optical Flow”, IEEE Transaction on
Pattern Analysis and Machine Intelligence, Vol. 18, No. 6, pp. 636-642.

[64]

T. Otsuka and J. Ohya (1997), “A Study of Transformation of Facial
Expressions Based on Expression Recognition from Temporal Image
Sequences”, Technical Report, Institute of Electronic, Information, and
Communications Engineers (IEICE).

[65]

M. Stifforring, H. J. Andersen and E. Granum (1999), “Skin Color
Detection Under Changing Lighting Conditions”, in the proceedings of the
7th Symposium on Intelligent Robotic Systems, Coimbra, Portugal, pp. 2023.

[66]

M.-H. Yang and N. Ahuja (1998), “Detecting Human Faces in Color
Images”, in the proceedings of the IEEE Conference on Image Processing
(ICIP ’98), Chicago, Illinois, USA, pp. 127-130.

[67]

G. Yang and T. S Huang (1994), “Human Face Detection in Complex
Background”, Pattern Recognition, Vol. 27 pp. 53-63.

107

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[68] M. J. T. Reinders, P. J. L. van Beek, B. Sankur and J. C. A. van der Lubbe,
(1995), “Facial Feature Localization and Adaptation of a Generic Face
Model

for

Model-based

Coding”,

Signal

Processing:

Image

Communication, pp. 57-74.
[69] H. Schneiderman and T. Kanade (2000), “A statistical Model for 3d Object
Detection Applied to Faces and Cars”, in the proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, South Carolina,
USA, pp. 746-751.
[70] H. A. Rowley, S. Baluja and T. Kanade (1998), “Neural Network-based
Face Detection”, IEEE Transaction on Pattern Analysis and Machine
Intelligence, Vol. 20, N o.l, pp. 23-38.
[71] E. Helomas and B. K. Low (2001),”Face Detection: A Survey”, Computer
Vision and Image Understanding, Vol. 83, pp. 236-274.
[72] K. Sung and T. Poggio (1998), “Example-based Learning for View-based
Human Face Detection”, IEEE Transaction on Pattern Analysis and
Machine Intelligence, Vol. 20, No. 1, pp.39-51.
[73]

T. Sakai, M. Nagao and T. Kanade (1972), “Computer Analysis and
Classification of Photographs of Human Faces”, in the proceedings of the
First USA-Japan Computer Conference, pp. 2-7.

[74]

I. Craw, H. Ellis and J. R. Lishman (1987), “Automatic Etraction of Face
Features”, Pattern Recognition Letter, pp. 183-187.

[75]

V. Govindaraju (1996), “Locating Human Faces in Photographs”,
International Journal on Computer Vision, Vol. 19.

[76]

A. Jacquin and A. Eleftheriadis (1995), “Automatic Location Tracking of
Faces and Facial Features in Video Sequences”, in the proceedings of the
IEEE International Workshop on Automatic Face and Gesture Recognition.

[77]

J. Wang and T. Tan (2000), “A New Face Detection Method Based on
Shape Information”, Pattern Recognition Letter, Vol. 21, pp. 463^171.

[78]

A. L. Yuille, P. W. Hallinan and D. S. Cohen (1992), “Feature Extraction
from Faces using Deformable Templates”, International Journal on
Computer Vision, Vol. 8, pp.99-111.

108

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[79] J. Choi, S. Kim and P. Rhee (1999), “Facial Components Segmentation for
Extracting Facial Feature”, in the proceedings of the 2nd International
Conference on Audio and Video-based Biometric Person Authentication
(AVBPA ’99), Washington DC, USA.
[80] R. Herpers, K.-H. Lichtenauer and G. Sommer (1996), “Edge and Keypoint

Detection in Facial Regions”, in the proceedings of the IEEE 2nd

International Conference on Automatic Face and Gesture Recognition,
Killington, Vermont, USA, pp. 212-217.
[81] T. S. Jebara and A. Pentland (1997), “Parameterized Structure form Motion
for 3D Adaptive Feedback Tracking of Faces”, in the proceedings of the
IEEE

Conference

on

Computer

Vision

and

Pattern

Recognition

(CVPR ’97), San Juan, Puerto Rico, pp. 144-150.
[82] S. Satoh, Y. Nakamura and T. Kanade (1999), “Name-It: Naming and
Detecting Faces in News Videos”, IEEE Multimedia, Vol. 6, pp. 22-35.
[83] J. L. Crowley and F. Berard (1997), “Multi-model Tracking of Faces for
Video Communications”, in the proceedings of the IEEE International
Conference on Computer Vision and Pattern Recognition.
[84] N. Oliver, A. Pentland and F. Berard (2000), “A Real-time Face and Lips
Tracker with Facial Expression Recognition”, IEEE Transaction on Pattern
Recognition, Vol. 33, pp.1369-1382.
[85]

S. Kim, N. Kim, S. C. Ahn and H. Kim (1998), “Object Oriented Face
Detection using Range and Color Information”, in the proceedings of the
3rd International Conference on Automatic Face and Gesture Recognition,
Nara, Japan, pp. 76-81.

[86]

R. Kjedsen and J. Kender (1996), ’’Finding Skin in Color Images”, in the
proceedings of the 2nd International Conference on Automatic Face and
Gesture Recognition, Killington, Vermont, USA, pp. 312-317.

[87]

K. Sobottka and I. Pitas (1996), “Extraction of Facial Regions and Features
using Color and Shape Information, in the proceedings of the International
Conference on Pattern Recognition, Vienna, Austria.

[88]

H. Wang and S.-F Chang (1994), “A Highly Efficient System for

109

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Automatic Face Region Detection in Mpeg video”, IEEE Transaction on
Circuits and Systems for Video Technology, pp. 615-628.
[89] Q. Chen, H. Wu and M. Yachida (1995), “Face Detection by Fuzzy
Matching”, in the proceedings of the 5th IEEE International Conference on
Computer Vision, Cambridge, MA, USA.
[90] J. Cai and A. Goshtasby (1999), “Detecting Human Faces in Color Images”,
Image and Vision Computation, Vol. 18, pp. 63-75.
[91]

S. Kawato and J. Ohya (2000), “Real-time Detection of Nodding and Headshaking by Directly Detecting and Tracking Between the Eyes”, in the
tF i

•

proceedings of the 4 IEEE International Conference on Automatic Face
and Gesture Recognition, Grenoble, France..
[92]

A. Albiol, C. A. Bouman and E. J. Delp (1999), “Face Detection for
Pseudo-semantic Labeling in Video Databases”, in the proceedings of the
International Conference on Image Processing, Kobe, Japan.

[93]

J. Yang and A. Waibel (1996), “A Real-time Face Tracker”, in the
proceedings of the 3rd IEEE Workshop on Applications of Computer Vision
(WACV ’96), Sarasota, Florida, USA, pp. 142-147.

[94]

M. Turk and A. Pentland (1991), “Eigenfaces for Recognition”, Cognitive
Neuroscience, Vol. 3, No. 1, pp. 71-86.

[95]

F. Luthon and M. Lievin (1997), “Lip Motion Automatic Detection”, in the
proceedings of the Scandinavian Conference on Image Analysis.

[96]

S. McKenna, S. Gong and H. Liddell (1995), “Real-time Tracking for an
Integrated Face Recognition System”, in the proceedings of the Workshop
on Parallel Modeling of Neural Operators, Faro, Portugal,.

[97]

J. Miao, B. Yin, K.Wang, L. Shen and X. Chen (1999), “A Hierarchical
Multi-scale and Multi-angle System for Human Face Detection in a
Complex Background using Gravity-center Template”, Pattern Recognition,
Vol. 32, pp. 1237-1248.

[98]

Y. H. Kwon and N. da Vitoria Lobo (1994), “Face Detection using
Templates”, in the proceedings of the International Conference on Pattern
Recognition, pp. 764-767.

110

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[99]

A. Lanitis, C. J. Taylor and T.F Cootes (1995), “An Automatic Face
Identification System using flexible Appearance Models”, Image and
Vision Computing, Vol. 13, pp. 393-401.

[ 100] L. Sirovich and M. Kirby (1987), “Low-dimensional Procedure for the
Characterization of Human Faces”, Journal of the Optical Society of
America, Vol. 4, pp. 519-524.

[101] A Pentland, B. Moghaddam and T. Strarner, (1994), “View-based and
Modular Eigenspaces for Face Recognition”, in the proceedings of the
IEEE

International

Conference

on

Computer

Vision

and

Pattern

Recognition, pp. 84-91.

[102] B. Moghaddam and A. Pentland (1997), “Probabilistic Visual Learning for
Object Representation”, IEEE Transaction on Pattern Analysis and Machine
Intelligence, Vol. 19, No. 1.
[103]

E. Viennet and F. Fogelman Souli'e (1998), “Connectionist Methods for
Human Face Processing”, FACE RECOGNITION: FROM THEORY TO
APPLICATION, Springer-Verlag, Berlin/New York.

[104]

H. A. Rowley (1999), “Neural Network-based Face Detection”, PhD thesis,
Carnegie Mellon University, USA.

[105]

S.-H. Lin, S.-Y. Kung and L.-J. Lin (1997), “Face Recognition/Detection
by Probabilistic Decision-based Neural Network”, IEEE Transaction on
Neural Networks, Vol. 8, pp. 114-132.

[106]

D. Roth, M.-H. Yang and N. Ahuja (2000), “A Snow-based Face Detector”,
Advances in Neural Information Processing Systems, Vol. 12.

[107]

A. J. Colmenarez and T. S. Huang (1997), “Face Detection with
Information-based Maximum Discrimination”, in the proceedings of the
IEEE International Conferene on Computer Vision and Pattern Recognition.

[108]

C. Papageorgiou, M. Oren and T. Poggio (1998), “A General Framework
for Object Detection”, in the proceedings of the 6th International
Conference on Computer Vision, pp. 555-562.

[109]

E. Osuna, R. Freund and F. Girosi (1997), “Training Support Vector
Machines: An Application to Face Detection”, in the proceedings of the

111

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

IEEE

International

Conference

on

Computer Vision

and

Pattern

Recognition, San Juan, Puerto Rico.
[110] V. Kumar and T. Poggio (2000), “Learning-based Approach to Real Time
Tracking and Analysis of Faces”, in the proceedings of the 4th IEEE
International Conference on Automatic Face and Gesture Recognition,
Grenoble, France.
[111]

H. Schneiderman and T. Kanade (1998), “Probabilistic Modeling of Local
Appearance and Spatial Relationships for Object Recognition”, in the
proceedings of the IEEE International Conference on Computer Vision and
Pattern Recognition.

[112]

R.

O.

Duda,

P.

Hart

and

D.

G.

Stork

(2001),

“PATTERN

CLASSIFICATION”, Wiley- Interscience, 2nd edition.
[113]

Crowley, J. L. and Coutaz, J. (1997), “Vision for Man-Machine
Interaction,” Robotics and Autonomous Systems, Vol. 19, pp. 347-358.

[114]

J. Cai and A. Goshtasby (1999), “Detecting Human Faces in Color Images”,
Image and Vision Computation, Vol. 18, pp. 63-75.

[115]

G. Wyszecki and W.S. Styles (1982), “COLOR SCIENCE: CONCEPTS
AND METHODS, QUANTITATIVE DATA AND FORMULAE”, 2nd
edition, John Wiley & Sons, New York, USA.

[116]

Y. Gong and M. Sakauchi (1995), “Detection of Regions Matching
Specified

Chromatic

Features”,

Computer

Vision

and

Image

Understanding, Vol. 61, No. 2, pp. 263-269.
[117]

J. Yang and A. Waibel (1996), “A Real-time Face Tracker”, in the proceedings of
the 3rd IEEE Workshop on Applications of Computer Vision (WACV ’96),
Sarasota, Florida, USA, pp. 142-147.

[118] J. J. Hopfield (1982), “Neural Networks and Physical Systems with
Emergent Collective Computational Abilities”, in the proceedings of the
Natural Acadmic Science, Vol. 79, pp. 2554-2558.
[119]

G. E. Hinton and T. J. Sejnowski (1986), “Learning and Relearning in
Boltzmann Machines”, Parallel Distributed Processing, Vol. I, chap. 7, the
MIT Press, USA.

112

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

[120]

D. E. Rumelhart, G. E. Hinton and R. J. Williams (1986), “Learning
Internal Representations by Error Propagation”, Parallel Distributed
Processing, Vol. 1, chap. 8. the MIT Press, USA.

[121]

J. Yang, R. Stiefelhagen, U. Meier and A. Waibel (1998), “Real-time Face
and Facial Feature Tracking and Applications”, in the proceedings of the
Auditory-Visual Speech Processing (AVSP ‘98), New South Wales,
Australia.

[122]

Yingli Tian and Lisa Brown (2003), “Real World Real-time Automatic
Recognition of Facial Expressions”, in the proceedings of the IEEE
Workshop on Performance Evaluation of Tracking and Surveillance, Graz,
Austria.

[123]

Jeffrey Cohn and T. Kanade (2006), “Use of automated facial image
analysis for measurement of emotion expression”, The Handbook of
Emotion Elicitation and Assessment, Oxford University Press Series in
Affective Science, J. A. Coan & J. B. Allen, ed..

113

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.