NOI E REDU TIO

FOR FA E IDE TIFI

TIO

IN VIDEO

h

E ARHA

PO

lcc1rical ( 'ouirll ) ~ u g ill< '<Tiug
Tnivcr. ity of hran,

. c.

THE I S BMITT D L P RTI L F LFIL IE T 0 F
THE REQUIREl\IE T
OR THE DEGREE OF
l\1 STER OF CIE
E
I

lATHE 1ATICAL co IP UTER, A TD PHYSI C L CIENCES
(C0l\1PUTER SCIE CE)

U

IV ~ R 'ITY OF N

RTHERN BRITISH 'OLU IBIA
201"

CD

cgar ll assaupollr, 20l .J

Abstract
h ,,, cIt h

f inf rm ation xtrac trcl fr om a s qu eue of fr am rs in a vidr o pr vi des

mple of th subj ct in cliff r nt illuminations , h ad r oL· s an d facial expr ssions .

H w v r , various s ur('('S can imp s n is on d t a (e.g. , ocdnsion , low rPso luiion
and face d t

tion failurP) .

In thi t h , i , a novel fr am w rk i. pr 1 o. r 1that E m p]oys the well-studied concept s
in quantmn probabilit y t h ory t cl sign a r presPnt ation , tructur<"' capab] r of Inak1ng
inf<'r<'llC'C'S with nmltipl<' soun·c's of llll<Trt aiut _v. The dnal <'X1<'llsion of 1his fra llH'work
is aimed at r clueing the ff ct of noi y fr a n1e. in a vid
th

. It is also used t

guid

ampling proces in a nov !lea rni ng , ·hem e. call d sp cialization g n r r aliza tion ,

which is designed to upp ort effi ci nt l arning, as well a neutra lizing t he ffect of
noisy satnplrs in t hr irl.r ntificat] on procrss .
The ·ontributions of this thesis are not m thocl- sp rcifi

and can b e utiliz c1 for

nhanc n1 nt of ot h r face id ntification a ppr ach , in the lit er a tur .

Contents

Ab tract

1

Li t of Figur

v

A cknow 1 dgem ent

VII

Introduct ion

1

1

3

1.1

Overvi ew of thi t h

1. 2

l ain coniribni ions

5

1.3

Organization of this the is

6

1

2 Pr vious Work
2. 1

2. 2
2.:3

3 B

8

Im ag -set b as d Face Identification
2 .1.1

R <'PH'S<'nta i ion

2.1.2

Simil rity

2 .1.3

Other l\ lethods

D

l ea, nrcm ent

11

1:...

ua ntum Throry in Infonnation R etrieval

13

OlllJH itcr Vi sion .

1, j

G <-1llssian P ro c<'SS<'S iu

kground Information

15

11

3.1

.2

L-t lwmatics of

uautnm

.1.1

Initi 1 tat

.1.2

Pr 1 abiliLti

.1.3

Pr di ti n

h<'ory
16

~

1

nt

1

au · ·ic n Pro r ·s s

10

R gr

1

· .2. 1

, 1011

.2.2
3.

n

Det
.1

3.3. 2
.4

3.5

4

21

24

tion

Viola- Jon s m thod .
Increnwnt al l arning [ r

24

isnal Tracki ng

2 ()

26

F at ur Extra tj n
3.4.1

Hi togram

3.4.2

L

3.4.3

Histogr am of Oricui cd Gradi <'lli s

qualizat ion

27
2

l Binary P att rn

2D

30

Welch t-te t

Video-based Face Ide ntification
4.1

4.2

4. '

32

Quantum Prob ability Inspired Framework

33

4.1.1

Ilnagc-s<'i.s of known idC'ni ii i<'s as <'V<'lli s

:~ 4

4.1.2

Imag -s ts of unknown identities as state.

34

4.1.3

Recognition . . . . . .

ns mbl of Abstract Seqn n
4.2.1

Represent ation

4.2.2

Sirnilarity l\ Icas urcment

4.2.3

Id<?ntificaiion

~ ns<>mbl<> of

ppro ach

R pr<? entatiYe ·

. . . .

. . . . .

nussinn Process

Iodcls on top of E.

.

l1l

R, n Hier nrchicnl

1ans:ian PloC<'s.· ~Iod<'l .· . . . . . . . . .

1. .2

p

ic lizat ion

, n rali za t ion L aTning

Icl ntifice1 t i n : 8 Hi rm-chi ( 1 Apprcmch .

5

E p ru

G.l

ntal

1

n

Dat as<' I s .
c:.l.l

ch nH' .

!)()

Ilondn /
51

.1.3
G.-

6 R
G.l

7

Yo uTul

c: ')

_.valuatio n .'<'1 t i n g~ ..

' ) .)

c: ')

'-.2.1

Facr

.2.2

Rr.,olntic n

.2.

F at ur ~ .

.2.4

rain ( I ,·t imag -.· t arra n g 1nent

ult

and Di

C'tc>c ion

u

C n1putatimwl

l).)

57

n

omplexit)' . . . . . . . . . . . . . . . . . . . . . . . . . ()2

Conclusion and Futur

Work

7.1

Snmmary of Coui rilmtious

7.2

Futur \i\.ork . . . . . . . .

64

()

Bibli graphy

67

n·

List of Figures

3.1

(a) 10 sampl function.,
fun ti n

stimc t d from th

tinlc t d from h

P pnor: (b) 10 sam pl

P po, t rior: (c) t h blnr plol i. the truc>

fun ction , 1he d ashed red ploi is i hC' pr<'dicl C'd fnuc1 iou by
n p*) and th

1

P ( h asc' d

, tancl rcl deviation away from tb

gr y area. mark

. 22
3.2

arnpl re tc ngle f aturr . Two-r

tangle f atur . arc shown in (A)

and (B), (C) shows a thrcc'-n'ctanglc f<'atnrc'. and (D ) a four-r<'ctanglc'
feature . Figur is ado1 t d fr 111 th paper authorc' l by Viola and .Jonr.'

. 25

[2004]. . . . .
3.3

In1ag b fore (l ft) and after (right) histogram equalization. The hi to gram of intensity level. of ach i1nage c re illu~trat d in t h second
row.

27

.4

Lo cal Binary P 8ttern op erator

2

3.5

Histogram of Orient l Gradi nts. Left : r aw im8ge; Right :

xtracted

HOG fe at ur s (using blo cks of 16 x 16 pix l ) . . . . . .
4.1

A ~

. 30

mplc t)eq 1cn e frmn the YouTub e Celebritirs d ai asct collected bv

Kim et al. [200 ] ( t p ), it s ~

R h asrd on intrnsity fcaturrs (n1icldle),

and a 1natchrcl E SR from anot h r clip (bottom ) . . . . . . . . . . . . 39

v

List of Figures

3.1

( ) 10

mplr f met ion,

, tinwt d from t h

fun tion e, tin1a t cl fr ) lll h

P p

P prior : (b ) 10 sarn1l

ter ior : (c) t h blue plot is t h t ruf'

fnnct ion , t h<' dashed n 'd plot is i he pr<'dict ('d fuuci ion by GP (h asC'd
on p*) and t h

gr ay ar ea: n1a rk

. t ancl ard clc>via tion away fr om t lw

pr diet ed fun tion (ba ed on a*) . . . . . . . . . . . . . . . . . . . .

.2

ample rect an gle featur

. Two-r ct an gl

. 22

fe ature'. arr sh own in ( )

a nd (B), (C ) shows a thrC'('-f('C't all gle f<' a tun '. awl (D ) a four-rC'ct a n gk
featur . Figur is adopt d frmn t h p a p er au thored by Viol a and .Jon es

[2004]. . . . .
.3

.

~5

Im age b efore (l ft ) a nd aft r (right ) hist ogran1 qnalization . Th e histogr am of inten ity levels of a h im age are illustrat e l in the sc ond
r w.

27

.. . . . . . . . . . . .

3.4

L

.5

Hi togra m of Oriente l Gradient s. Left : raw in1age: Right :

al Binary P att ern oper ator

HOG fe atures (using llocks of 16 x 1G pixel )

xtractecl

.. . . . .

. :30

4.1 A sample sequen c frmn the You ulw C lcbrit.i s cl at asci collect C'cl hY

I in1 et al. [200 ] (top ) it s
and a matched ~

, R 1 asrd on intensity features (midcl h ),

R from another dip (bottom ) . . . . . . . . . . . . 39

v

4.2

lw spcciali:0ation stC'p for k - 4 (lH'st vie\Vc'd in color ) . . . . . . . . . 15

4.

ampl n i

t d in the g n r alization st p when tra ining

frame d t

<17

a
4.4

Fl w h art for t h ici ntifi
mmnnum

ut-off forE

tion proc . s (l ft ):

effrct of

'R conficl ncr (T) on accura y (ri ght )

' 'D d at as rt

5.1
.2

x plorin g t h

a mpl facrs extra ct cl from

:\I - ~l o B o d a t ns t . I r ovidC'd b:v

1
vika lp

. 52

a nd Triggs [20 10] . . . . . .
amplcfa c<"

x tract dfro m You ub C leb r iti s d Rtas t: (a). (b ). and

(c) ilhvtrat

smnpl s fro m:) cliff rC'nt di p ~ of th r sanw p rrson

\ ']

. . . . . s;)

Ackno"W ledgernents
iL t , I w uld like t

thank my sup n·1sor. Profes:or Li a n g

ncouragr1n nt , p a ti nc . and h lp in I th my

hrn for his support

ducation a nd lif . His insight and

guidanc<' was \Yhat lll <:H lC' this tlwsi~ possil>l<'. and I am gratdnl to him for a llilH · antd n1ic achi ve1n nt I h av h d durin g thi. d gr
from working with him in term.
hop to follow hi

. I hav l arrwd a n <'nonnous mount
both a r 'SCRrchn nnd a : nprrvisor. I

f h w

xampl in t h futur .

A numhrr of sourrrs fundrd my r srar h.
th

Univer it

In pc rticnlar. I \..wmld ]ikC' to 1 hank

of ~Iorthern British Colun1l ia, ~Iit a .. and ~ 'ER ' fo r tlw financial

upport that they provided eith r dir

tly or through gr ant: awardrd to Dr.

Thank to all my fri nd in Prince G org , :\Tahid and 1\Iani,
I\1ojtaba, Dhawal , and Behro z. You w r

hrn.

Iona and R a hin1 ,

what made living h re so an1azingly re-

warding.

I am de ply grateful to my fan1ily. My fath r ,

en1at H as anpour. who en1phas izecl

the importan e of education and who in. tille l in n1
goals and th

conficlen

to a hieve then1 . ::\1y mother ,

the ins piration to

t high

oheyla :'\orouzi. -vdw h as

always h <'C'll a somT<' of motivatiou al1C l sl r<'ngi l1, and 111y rol<'-modd for hard work
prrsistenc

and personal sacrifice. . And 1ny sistrr Baluu H assanpour. who h as bern

my rnot.ional anchor throu gh my lif .
inally, I would lik thank my lovely lmsban L Sanwd Karclan . It was difficult for

V II

us , incc wc livcd a cross t hc provinc<' from on<' a not hC'r for most of t lw t illl<' of thi:-;

d gr

an1ad up port d 111

and pati nt manner.

t all t im , in a loving. 1111 at he tic opti1nist i . caln1 ,

hi. t he. i i d lica t l t o him .

Vlll

Chapter 1

Introduction

Fac i lentificati n i c n idcred as n of th mo t important applications of imagr
analysis and understanding awl has rrc<'ivcd a lot of att<'ntion frmu tlH' uwchilH'
vi. ion con1n1unity through th

pa t

veral year .

he classical fRee idrntification

task involve id ntifying a . ubj ect from a inglc im gc and with only a f w training
sampl s availabl . For uch configuration. th s

am ple images 1nu t be carefully

record cl in a controlled environ1nent. However, in r al-worlcl appli c tions. su h goo d
quality . ampl

are not asily att ainable .

Fortunat ly, with th

xt n 1vc av ilability of digit l imaging d v1crs . suffici nt

dat a i. ace . sible to allow the r ognition process t

he l asrd on imag -set to inwge-

sct matching. hnag -set could 1 e cith r a c llection of single hot inu1gcs featuring a
JH'rso n , or a s<'qtH'llC'<' of fnllll< 'S in a video .

he wee lth of infonnation extracted frmn a sequence of fr ames in a viclC'o featuring
an in livicluar s fac of-fers the potenti al to oYcl-conw th e r cognition errors thnt mn~·

1

occur due to the im1wrfcct quality of thC' intagC's which is au absent privilege in t.Jw
ingle hot face id ntification t a. k . In t hik
g n ral and vid

ba d fa

n.

i1nag -s t l asrd fac id nt ifi a t ion in

id ntifi ation in particular is pot ntially mor pr mising

than u ing inglc-, h t im g , . Thi. typ of fac idc>ntification t nd, to be mor robust

·me th r cognizer gets to sc many mor possil 1 \'ariations in app c ranc of th
ubj ct.

H w , , r , with in1ag -, t s,a n w chall ngr cm rgc>s dn to the nncrrtainty on how
w ll eachimag r pre, n
du

t

(i) poor quality of th

etc.), (ii) partial exist :.nc
p se etc.),

th individnal"·lwi is associ a tc>dwith.

of th

imag

rnc rt aintiesm y

(e.g. low r . olntion . illnmination contrast ,

facr in tlw imag ('·~ fir.ll of vic' w ( .g. oc ·lusion

n l (iii) failure of th fac det ctor alg )rithm to accnratrl~· spot th<' face.

All in all, unc rt aintir ar
haracterize th

in1po, ed du

to t hr fa t t ha

ach imagr rnay not fully

ubjc t' face.

Therefore' , it is iu1portant to dC'visC' algorit luns that can fnllv and dficiC'utl:v <'Xploit
the aYailab le data. Thi can b d n through reducing the pffrct of noi. y san1ples by
designing a repres ntation structure capable of relaxing the noi. e in each ·equcnce.
complement d by dev loping a recognition pro

clure th at rrject the \\T n g d ciswn.

affected by noise. D evising a principled way to :yst matically deal with t h

un ·er-

tainties on how w ll each frame can r present the individual h as n t 'y t been directly
'
a.dclres eel in the liter ature. This is the focu of this thesis.

2

.

Ov rvi w of thi th

1.1

h qu nturn pr bal ilit , theory off r

c

l

p w rful fram work f r rr1 r , rnting infor-

m bon and making inf r nc . in )\ t m . with n1nlti1l
de l with un ertainti , , qu Emhun th

, s it

nsl

(i.e.,

To achirY

f th

y. trn1.

of nne rt ainty. T

ry 1 r pos s to considc'r an rns rnll

ibl initial tat .... imultc n
v nt)

, nrc

fall p s-

tt mpts to fin l thr most prohc hlr out ·ome
thi .. th

math matical fonnalism

f quan-

tmn theory <'xtcnds the ordinary logic h)· the c·ow·c·pt of snnultan O?JS d czdalnltfy a

n

pt introduced by v m :;'\ urnann an 1

to c ntinu
th

r

, ning whil coni lrring th

y t n1. In pir d b

'Y r [l< .-.r:] - which ailows ph)'Sicists

un · rtaint)' css ocie~tcd with thr stat

f

thi. lin of re, arch, w prop s a Quantum Probabilit ' In-

spin'cl Fn-tlll<'Work (rdcrrcd to as QPIF ) for fa('<' idc'utification ill videos which llS<'S
the quantun1 probabilitie t

addr s, the und rl)ring unc rt a inti ,s asso ciat d with the

of a h imag .

It is ugg t d that th mathen1a i al foundation. of quantum probabiliti · v.:onld
c n tru t a ound knowl dge r pr
tion ext racted from th
space, wher

entation ystrm . In t hi repr sent at ion, in forma-

images of the known identities form . ul .~ p · c

in a H ilb rt

ach ub pac represents on . ubject of known identity. \'Then pr , nte l

with an in1ag - et that belong to a subj ct of 1nknown id ntity, an n

n1ble of uncer-

tain states is generated from th imag - et . R ecognition i posed a an optirnizat ion

problcn1 i.o find 1h<' l>C'st-rank<'d snhsp acc to whi('h the <'US<'mhle of stat cs corresponds .

ext, w will provicl a dual xtcnsion of the quantun fran1 work called En, emble
f Ab. tract Sequence Representatives (r f rr cl to as EA R
clcscrib din sect ion 4.2). In E

n n1cnclatnre '~'ill be

R approach , the representation structure of all ima ges

- <'ithC'r of known or uuknown id<'ntity - is the ScUll<' as the COll<'<'Pt of iuitial stat<' in

the quantum formalif-lm. This c-t pproach t,owa rds dat a r< ' ]H'C'S<'Ili at ion would uniformly
r lax th n i
r pr

111 11 of th raw dat · points by tr nsferrin g th m int

a high r l vel

ntati n pc

R i. . built through a pro

a h

the r aw im ages in ord r
deal with

ut lier. .

r lu

noi ~

. Thi:, i ~ foll owed r y a fil t ring mrch a nism to

imilar to t h m aj ori ly

a ppr a he. (e.g. w rk by \Yan g c ncl
[2007],

, , tha t includes sa mpling a n l suprrpo ition f

u et al. [2 01~] , a nd

f t h imagr -set b ased fa c id ntifi a tion
t e1 l. [2 01 2b], Ki1n J e1l.

h n [200<], \YHng

an g ct al. [201 ]) t h a ns

c

,

inglr s tructnr r to mode]

R trir t o m od l t h varia tions in e1 pp ar a ncr )f t hr subject

ach image- et , a h E

in an imag - · i . 'imilarity f E

' R · is a k n lc t d as t h dis( a ncc bC't wre n each p air of

tr ain (of known identity) and t . t (of unknown i lEnti ty) im ag -. ts. IdC'nt ific a tion is
p erformed by finding th mo t imilar r pres nta tic n of a known candicla t to diffr rcnt
r pr ent ations of th unknown ubj ct, a nd t h n aggr gating th iclr ntification res ult s
of all candid at<'s via m ajority vot in g ov<'r the' diif'nC'nt l'C' }H'<'S<' Ilt at iou s g <' JH Tat <'d for
th unknown subj ect .

Although EASR r educe the noi e in d a t a . th y arc linea r r present at ion s and ar e
not cap able of capturing th nnd rlying n on-lin ar structnr of the d ata. Ther fore, on
t op of th e EASR r pres nt ation 1nethod we int ro duce a n ensemble of bin ary G au ·sian 1 rocess ( GP) model in a one-versus-r st s tting for capturing t h e underlying
non-linearity in t h d at a . To reduce the amount of noise pre, ent l t o the G P n1od els
during the training we us a learning sch me called p eric lization - gen r r e1 lizat ion.
Th specialization tep a.tt e1npts to find a subset of tra ining lata samples such t h at
the hi ghest discrimina Liv
. amplrs to train to t h

p ower is achieved (i . . , only s lcct tlw 111< st challen gin g

classifirr , not all t h

training sampl rs) . The gen er alization

strp a.ttrnlpt. ' to reclucc> t lw cffr t of possibly noisy training samples b)· r e-t r aining

tlH' mod<'l on thcnw uns<'<'ll smupl s frmn training set t h ·rt tlw modC'l failed to classify
J

fi

tly t hu

rr

ti n pr

making ur
111

1n

that t h

111

pr di tion

. \V woull r f r t

fa t iclrnti-

l 'l g n raliz , w ll. Finally

f b th 111 th ds to id ntify th

ul j

thi .. lwl ri 1 id ntificRtion sy,,t 111 a. E

prob

qu n

1.2

Main contribution

t in th

R+

P.

Th<' main contrilmiion oftlli: work is two-fold :
Fir t w propo
mmnn1z th
th or

two r pr , ntati n , tructur ,, for imag -. rt . that ar drsign d to

ff ct of n i y fram . In pir d by the concepts in quantum pr balJility

QPIF and it dual xt n i n E

R ar cl :ign d . uch that the impact oft hosr

frame · that ar not u · ful for th id nhfi ·ation ta.·k (probably dur to occlusion , low
r

olution. or failur of th fa

tracker algorithm) i reduc d.

econd a novell arning sch 111 wa prop
of binary Gaussian proce

models. Thi l arning cheme

training data in ord r to not onl ' in rea e t h
but als

to build th

l for effici nt training of n n

models u ·ing th

1nble

1 ctively an1plr frmn th

eli criminat i n power of the cla ifier ,

l ast po ·sibl

co1nputational c ·t and with

mini1num introduction of noise.

A. ess1n nt of the propo ed 111 thod on three pul licly cwailabl ben hnutrk dataset
drnwnstr at PS signific antly higher perfonnanc

com pared to t h

preYi us n1et hods in

t.h<' lit<'ra,turc including stai <'-of-the-art.

P art of t h contents of thi. thesis h as lJecn publish ed iu the llt h IEE
tional

Intcnw-

nfcr ncC' on Autonwtic FacC' ncl GC'sturr Recognition (He ssanpour an l
r::

t

hen

[20lr:a]) and another pe-rt is pnblisll<'d as a t<'chuical r<'port at tb<'
Int llig n

niY r i y of J.

Lab ratory -

rth rn

ritish

·Olll pntational

olmnbia (H ssanpour and

h n [201r::b]).

1.3

.

Organization of thi th

h r t of thi

1

nm nt i,' organiz l a, f llow. :

Iu chapt<'r 2 the' prcvi m.· wm ks ou t lH' t lu <'<' uuuu aspc' c1 s of 1hi s r< 'Sc'arch arC'
di cu

d.

Thi. incluck. Im age-. t 1 a, d

Infonnation R tri YaL an l

In chapt r

th

a e Id ntification.

uantunr Theory in

au ia n Pr c s · ~ in Computer Vision.

b e kgr und inform atio n on which t h

foundation of t h

pro-

pos d m thod , i , based

n is explain d. In th fir st two sPct ions of thi · · l~apte r. thP

mathematical

of Qu antmn Th or,v a nd Gaussian Proc sse's a re disc ussed.

one pt

In th the third e tion the Fa

D t cti n algorithrns that Rr us d to lo 'R e f<1ce.

in a vid o fran1 e are bri fly introduced . Th
Ex raction rnethod . And in the finals

fourth , ction touche: on the Fc atnr

tion , a brief introduction on the W el h t-te t

is provid d. Thi · statistical t e 't is en1ploy d to che ·k whet h r t h im1 rov rnent · ar
signifi ant.

In chapt r 4, th propo cd algorithn1. are elabor ted in light

f the n1at hen1atica l

fonnclati 11 di cusfl din chapter 3. The propo eclrnethod are: (i) Quantum Prob al ility InspirTd FrmnC'work ( Q Pil~ ). (ii) Em.;c'lnhlc of Ahst nwt 'cquc'llC<' Hc'JH'<'S( 'll1 a t i\TS
( AS ) , nd (iii) Ensernblc of

aussi 11 Process l\ [od ls on top uf the EA R rq proach

(E SR+ P ) with R hiC'rarchical approach on the icl ntification 1 rocC'ss.

G

In chapt.n r:: t lw rxprrimC'ntal :dup for pcrfonnr-UH 'C' c' aJnatiou of the prop os('d
m th d

again t th

pr viou

in lud

an introdu

1

You

1b

m th d. in th

n

, l britir ). a

t
w 11 a

th

lit ratnrr i._ d , cril d . This

(nmnrl:v. H nda /

"'Vcluation

D,

rttings that wer .

valuating all m t hod · to all ,,. for fair comparison.

t

r

h apt rr

1 <'d o o. and
qually s t for

lwsr sc't t ings includr : facr dr-

tion a lg rithm imag re, olntion. extra t '1 f at nrrs. and partitioning tlw avai lablr
lata into train an l t st . uh cb.

In

hapt r ' the i l ntifica tion accunwr . of t h

hrc'

proposed approach ,s as

well a th mo t , u cr,-. ful nwth ds in th lit ratnrr arr reportrd .
av rag con1putation tirn s of all n1 tho ls ar
a lgorit hm ·

dditioually, the

notrd in order io giv(' a scn~c' of each

omputational c n1pl xity.

Finally, in chapt r 7 t hi do unwnt i. conclucl d by sumnwrizing the' rnain contribution

of th

propo d method

and highlighting th

res<' arch.

7

futnrr clirrctions of thi.

Chapter 2

Previous Work

In thi chapt r th 1 r \'lOU work

onlu ·t ed in th lit ratnre ar discu. sed . In t hP

fir st S<'ction , an ovC'rviC'w of t lw current im ag<'-sd bas<'d fa.c< ' identification t<'chniqw's
is provid d. Next , in th

nd

tion , a bri f . urv y . umm arizing the application.

of quantmn prob ability th ory in c:on1put er sci n

in gen ral an l Inach in lee r ning

t ask in p arti ular is provid d . Finally, t he third s ction gives a brief hi tory on
mploying G au sian proc

2.1
In most

models to soh: n1achine vi ion probl 111.·.

Image-set based Face Identification
f the publish d . tudies th t ask of in1age-s t bas d face iclcntificat ion is

addrcss d in two ste1 s: (i) r pres nt ati n of the im age-set s , (ii ) findin g a snite1 ble
similarity n1e .sure 1 tw en t h m. In the f llowing f t hi section , the b asic concept s
of thP most known approaches (including the st 8tC'-of-th -art ) arc claboratPcl .

2.1.1

R pr

ntati n

It i e

nt i l t

a bl t

r pr cnt v r. · 1111ag - t (oft n with varying nurnl er of
ompared with

nn g , in ide) in a unifi d 1nann r ,
each )i her.
t

he aiu1 of a rcpr<'s<'ni at ion , t rncl Hl'(' is to provic 1<- a wdl-ddincd nwt hod

tran. f r informati n

pr

m1 edcl r l in a , t )f im ag s into

unific'd strnctnr

rve a much inf rn1 Rti n a. po" il l . R pr srn at i m s1ructur P of t h inw g

an b

and
3

- ,

et s

ith r p ran1 tri or non-paran1r tric .

P aram tric m t h d

P aram tri 111 th d atten1pt t o repr . nt ach in1age-s t with a data- drivC' n clistribution fun tion. For in t anc
to

r andj l Yi

t a l. [200 ] fit a

aus:1 n n1ixtur " m odel

a h im ag -set and u , this di tribution a· t h , r pr ~ cnt a ti ve of th

rr. ·p r ·tivr

1mag - t. P ar am etri m thod., how ver , uff r fr rn t h a. smnption tha t all imagret r pre nting th

arne identity ar drawn fro111 the . am distribution. How vr r ,

1npirical re ult s haY

hown that t hi i mo t likel)' not the cas . Th refor c. n1e1 jority

of th curr nt works d ign a non-p aram etric r pr s nt ation ·tru ture.

Non-par am tric m thods

Th non-parametric rei rcsentation are divicl d into two at g n e
linear.

Lin ar R pr

ntation . I\ Iosi uo1 ahl<' lilH';.u· 111<'1 hods indnd<':

9

linrar and non-

• ~Iutual

ubspacr

tru t

a lin ar , ub p

h

u li l c n

• Di rirninant

1 1 proposed by Yamaguchi <'tal. [1D9] con-

1cthod
for

a ch imag -. t and al ulat . t h . imila rity using

n gl b tw

n th

Rn ni c l

orr lRti n. D ' '. propos cl 1- y Ki1n t Rl. [2007], fin ls

w

. ub. p R

a n optin1a l di s rimin a nt fnn r t ion t lw t trnnsfon ns t hr inwgf'-sr1 s into nnot llPr
. pa

in whi ch th v\·ithin-class Rnoni ·al ·orrcla tion. ar , n1axi1nizf' l whil th ,

b tw n- lR

Non- lin ar R pr

anoni Rl cmT lRti on. ar mini1n izC'd .

nt ti n . :\ on- linrc r 111 t h ds inclnclC':

• Con ·tra ined }.f ut u a l

u b ·p ac

:\I t h od

1\I

I. prop os cl by Fuku i a nd Ya m-

aguchi [ 200~ ]. on trucL a · n. tr in d .~ ul . p ac th at only in ·lude. the ff ctiv
ompon nt of th inpu t i1n age. f r r ecognition ( u. ing princip a l compon nt RnEtl-

y i ) a nd n1ca ur

th

imilc rity b etw

an gle b tween t h two u b p ac

n im age-set. as t h multiplf' can on ical

.

• K rn 1 Grassm a nni an Di ·t a n e KGD. prop osed b,v \Yan g a nd Shi [200 J. 1s a
kernel gener aliza tion of th Gra 1nanni a n di. t an ce in ord er to ca pture the nonlinear structures in th im age-s t .
• ::\Ianifold Dis rimina nt Ana lysi

!\fDA. proposed by V\Tang and

h en [2009],

form s t h<' su b spacc's for each i1uagr-sc1 with loc ally linear models ( i. c. , m a nifolds)
and att empt to learn a n en1b edding . p a

. wh r

a h 1nanifold i cmnpa t but

n1a.n ifolds of diff rent classes ar a s p a ratccl as possil l
• ~1anifold- l\lanifold Dist an

l\ I l\ I D , propoJ•cl by v\T( .u g ct al. [201 2b]. for nm-

latcs th<' recognition task as computation of dist ;uH ·c lH'tvvccu t-vYo lo cally' li n<'ar
SUr Sp c C' S of data

(i.e. lllRllifo]d. ).

10

2.1.2

.

il

i

r pre n

v rv

n

r

which nnag - .

h

mo~

d in

1nifi l 1nann r. w

·imilar to Pa h

c

n

art to find out

t h r. and con. qu n 1)·. d v lop a

o p<'rfonu a ·k. ·ud1 c-t id<'ntifica ion. Tll<' llH ' hod of :imilau1v llH'H ·nn·mc'n1

\Yet\'

d p nd
h

nt

ur

1

on

hod . If h

r pr , n a 1m m

h

r prP <'ntation 1 para m tric. a. in
i r pr . rntf'rl a. a

work by , r a ndj lo,·i
u .. ian n1ixtur mod l. h

n a pair of nnag -: h i niPa ur d by

1nulan v b t\\'

calcul ating he' lH' W< '<'ll-. <' eli: rilm ion di taw< ' (<'.g .. I\11lllJrt< k-Lc,ilJln div< 'lg< 'lH '<') .

~I

a ·ur m n of , imilan y in non-param

n

r pn' c·n at ion. can b

divided into

hr e ca eO'ori

Examplar Ba
one

d. The fiL

a

O'orv of ·imilari v m a. ·ur rn n

0

ha are ba d on calculatinCT h e di anc

imao e-

t . In tanc

• Affin

b

m

hod. a r

th

we n r pr , c·n ati\' , of thC' t\vo

in l1d :

Convex Hull ba ed Imag -.

Di a nc AHI D and CHI D. propo. ed by

C ,·ikalp and Trigg )010]. repr ~ ent each image- ·

by an affine/con\· x hull

d riv d bv panning he ub pace u ing he image, in th s t. Th , imilaritv i.
IIH '<-L lll<'d rl.'

h l

tru

lw eli ·tcuH '<' hP \V< '<'ll th<' do.T:t <'X<'mplm: of c·adt imag<'-.'< .

ur .

imilarit y ·an aL o b

m a. ur d b a.~ d on the r pres nt at ion

true ure a. a whol . In. m1r . in lude :

• Covarianc<>

rP <'n

i:criminant L arnmg

DL. prop ~ d by \Yang <'t nl. [2012 a] . r('p-

th unag -~ t hv it: ·o,·arianc 1natrix (i .r .. :econd-order ~tati . t1c). Tlu~

11

fonrm latcH t hr prohl< m as classification in t hC' R icmannian 111<-tnifold.
d fun ti n c nv rt. t h

1

u lid

B

n ,p

wh r 111

h Examplar nd

c v nan
._ ur 111 nt

111

pro-

rix fr m R i mann ian 1nanif lcl to

f . in1ilarity i.' , traightf rward.

t ru t r .

par

ppr rinwt d ~e a r . t p int

po

in1ilaritv 111 , sur 111 nt mrthod th at utili zes lwth th : tru ·tnral infor-

•

111ati n of th im ag -,·rt.. a:
thi · ap1 r a h, J{

.XP. propo. d by Il n rt a l. [2 012], pro-

ell a: th ir r pr . nt a tiv s. Tlw k rnrl xt n sion of

1\ TP. a llov\'s [ r m d lling t h

complrx non-linrar st ruct nr s

hat are n1b dd d in th d ata.

• R gulariz d i'\ear t Point. R .YP pr p o.rd b)' Yan g tal. [2 01:], mod ls r,ch
imag -

t a a r egulari z d affin

hull a nd 111 a: ur .· the sin1ila rity lwtwC'en thr

two ' t , 1 y calculating th di stanc b Lw
hull repr

nth

ne cue~t point s between th(' two

enting a ch image-s t.

R P i an improv m ent over

A 1\ rp in tern1s of compl xity reduction.

2.1.3

Other M ethods

Som r

ent methods have a holi tic approach t v.; ards th ir reprr entation , tructurr

which is complemented by their own r presentation-spe ific approach for imilarity
mrasur ,ment . Instances inclu lr:

• Dictionary-b sed face idrntificettion from Video DFRV, propo ed b)· Ch n et nl.

[2012], cap t ur s variRtions in videos and rc orcls them in R clictimwrv while
rrmovin

t.h ir rcclunclancics. Iclenti:fi at ion is 1 rfonnc cl via majorit)· voting.

12

• ).lean

rtiz t

b

par~

cpwucc

ti 11 hip

tvY

n c 11 c vail bl training i1n g . t
C'nt ation .] R, 1 r p s 'd b:

frnnws in c pr h

11 1

lassi ficabon

l. [201 ] p rf rm. a j( int ( 1 timization t) d t nnin

• Joint
th

R< pr s<'ntation-hasC'd

vid o s qn nc<' as a n

R

propos<'d

c

lin ar r la-

bnilo it n1 d l.
ni

t nl. [2014], r 1 rrs nts all

nsc'mhlr to su ppress thr f fl'rd of

for a m or stahl rrcov ry.

• hnag -

t l a. cl

llnl ratiY

R prr.~ nt a tiou and

1

lnssification I

R

1
,

pro-

p o d by Zlm ct nl. [201 4]. m d L h pr 1 nnag -s t as a cmlV<'X or n'gnlarizrd
hull a n l ·alculat , th
orr la tio11 b t w 11 th

2.2

cli.'tc nee to th imag -: t. in h e ga ll <'ry con:idering the
two.

Quantum Theory in Information R e trie val

In t h p a t two decade , quantun1 th ory h a found it way thr ugh th orrtical conlput r scien e problen1 . From algorithms for d at ba

, earch (work 1 y Grov r [1 6])

to decision th ory (w rk by Pathos a nd Bu 111 'er [2 009]). ga1n theory (work by
Piotrowski and Sladkowski [2003]), and infonn ation retrieval (work by Pi\'I'Owar ki
et al. [2010]) to n ame a few.

The quantu1n information retri val fr am work for inst c n c , fo cu . es on r pre\ nting queries and documents in ten11s of quantu1n pro1 ability theory in order to leal

with the nn< '<'rtaiuty i111pos<'d hy <-u nbiguons qn<'rics.
cas

111higuons qu<'rics might illclU<l<'

such as polysen1y and / or parti ed expression oft he inform a tion n erd (consult \H)rk

by Piwowarski t al. [2010] for more details).

13

2.3

in Comput r Vi ion

Gau ian Proc

u ian Pro

(

) m d L he v 1 r n pr vic u ·ly usr l in f-l C'V ral machin

VlSlOll

r l t d apJ li ati n , m h1ding:

• Human p

, timation : givrn nn im 0g , r st in1 tr t h ' '

ti n f t h b dy part (
• Flow

L

lo Cit ion < nd ori ntc1-

k t Rl. [... 00 ])

w rk by

tinwtion : m cl 1 t hr tra jr tory of a nwving objrct (s r work by Kin1

et al. [~0 11])

• Obj c re gnition in an a tiv

rning p ar aclign1: K ar oor et al. [2007] us GP

onfi l n e estimate a t nnlab 11 l d at e 1 oint in an a tive l<'arning par adigm

f r int r activ lal lling. Thi active learning a proc ch i,

f intcrrst for d ai a. rt s

iu vvhich abnw lant unlal <'ll<'d d ata is avc-tila hl<'. giv< 'll 1h a l l! Hum al la h<'lliug is

oft n exp ensive and /or tirne consun1ing in larg d a t aset . .

To th b

t of our knowled ge G aussian pro cess mod els h ave not yet b een used by

th m achin vision community to addre

the t a k of ,·ideo-1 a d fa

14

id ntification .

Chapter 3

Background Information

3.1

Mathematics of Quantum The ory

In physic t rms, quantum theory proYicl
atomi
th

and subatomi

parti 1 . Th

rea

a n1athen1ati al d ._ cription of rnat rr at

1 ngth cal . . where t h r

is uncertainty about t h

n i that at such 1 ngth s al

providr a rigid knowledge a1 out the initial st t
uncertainty. quantum t h ory propos

of t h

, th

nwasurem nt

tc t

of

c nnot

syst n1. To cl c:ll with such

to consider an n ernbl of all po ~ ibl initial

stat s si1nultan ously. To achi ve thi , the n1at h ma ical formalisn1 of quantum t h ory

cxi<'IHls the onlinary logic by tlw concept of simultan ou

d cidability - a cmH '<'p1

introduced t y von N etunann and Bey r [1955] - which allow physicist to continue
r asoning while considering unc rtaint

c 1 ont the state of the s)·stem.

Quantun1 pro1 1 ility is statrd to 1 a gemn trical ext nsion of th classical probability t h<'ory. In cl ass ical probability theory, t] l<' probability sp c-H'(' of a physical s~· stt'lll

1

correspond. to a <'a tq.;orica l prop<'rt.' < f t· he sys t<'lll. It is d<'fin<'d as a di

(i. .. ut mn ) th at Ina)' o cur if t h

f all p

v nt

initia l t Rt .

u a ntum pr b abilit . theory,

p ac

}J

th e1 t i , a v

sp a

. ~a h prol al ilish

nv thor

t r SJ R

th r h e1 nd ,

t g th r with an 1n11 r produ ct as th

s t of p ri1nitiv

prrb a bility

8 in If .

th r al world of xp 'ri ' ll C .

ccord in g t o
1

a ll ntin [1970]

rtor " which rcp r<'sr nt s tlw

·y t m. a ncl (ii ) ·· 1 nt u b, poc .. ,,hi h re p rcs ntsth prob a bilistic

v nt tha t m ay occur t

th e S) ' ·t 111 h as 1 on it s in i ial sta t .

In the f ll wmg, " . will lab r a t

3.1.1

H ill ert

Jn ·rp t. a nd th rela tions 1 twc>en t·hcnl

the prirnitiv c nc pt of quantum t h ory ar : (i) ", t at
fth

s<'L

a h known

n1pl ys th

cv nt is <i fi n d as a con f m1J0118 s ubsp c

con, isL of

whi h nlu.._ t b e n1a pprd t

n th

111 1 in

r t

on t h .

t wo con e 1 ts.

Initial State

E a h . t ate of a quantun1 syst em i d efin d a. a unit featurr v ct or 9 in an n di1n ensional Hilb r t sp a e. The Y ctor dJ repr . ents
st at

pu.r

stat

of t h e sy:t 111 . Th initi 1

of th syst em m ay b e a result of the sup erp osition of 1'/ pur . tat

t at , following (3.1). The m.ix d t a t es sh ar e th

mn pr p crti sa · P 'UT

, i.e., m.ixed
·ta t s: b oth

ar e n dim n sion al and of unit len gth.

(]'

~)=w

(3.1)
1= 1

In case th

initial st at e is n t uniquc 1 a n ensen1bl

16

of all p ossible statr , ·ectm·s

J i.' consid<'rC'd to

contrilmtc in drrivat ion of th<' cmtcmnc' of t h<' systc'lll. Each

d with p( 1 ) that d et rmin s th
th

tat
r pr

proba bility of

b ing thr true stat

ar mutually xcl u, ive 2 . t h pr l a bili t~· dL tribn ti n v r t h

'1 IS

If

n , 1nl 1 cfln

nt ed a a diagonal matrix p( ) :

0

il(JJ(v)) - 1
0

0.2)

JJ( 1, ',\J)

wh r 11! i th numb r of all po,, ibl ini ial , tat ,'. Th<' n, 1nblc of state's in qna ntum forn1 ali n1 is c1 finecl b

aggregation of th

participating statf' W'cton; a nd their

prob abilitie , in form of a den zty oper-ator:
1\I
(J

= L P(4 Jz)J•z •'['
z=l

H n

, stat

of a quanturn sy t m can be fully represcntC'd by an en. e1nblr of fill

po sibl stat e . B all ntine [1970) notes that th cone pt of ensr1nblr woulcl allow for
continuation of r a ·oning in uncert ain enYironn1ent and it i known as th

stat?:stical

interpretation of the quantum theory.
m asur ment is done.
2
In case t he sys em cannot b in two or mor 'tates at t he same t im . all possibl initial states of
t h syst m are mut ually xclusive. Otherwise, p( •) is not diagonal a nd t h en 'emble f ini ti 1 ste~tes
is onsid red to be entangled out of t he scop of t his st udy.

17

3.1.2

h

out

Probabili ti Ev nt

1n

f a phy i al y t m i._ d fin d a._ a 1 r babilistic ev nt. In t nns of

quantum prol ability it i, r pres nt l a, a , nb pa

in H . Pr be bilit ' of thr rvrnt

g1 en that the s. 'stcul is in initial . tat<'~ ~~ i~ d<·Iiv('d a~ th<' pro.i<'<·tion of V<'ctor
nt

1,

t h ._ ul p

( :3. )

wh r

th

proj 'c tion from H onto

proJ c ion m atrix hat gn' . a ve ·tor . pac

ub pa

3.1.3

Prediction

Pr diction of the final vent that n1ay
ffect fall po. ible tat

·ur in the . yst m i. made by aggr gating the

. In order to calculat the probability of evrnt S happening,

w need to 1narginalize (3.4) with r pect to y.;:
AI

AI

q(S) = L P( ·1)q(SI '7) = L

P( ·~) t r(~);s~·1)

1= 1

AI

AI

= L P( ,)tr(S 1 ; ) = Lt r (Sp (YJ~h\v:)
1= 1

1= 1

AI

= h' (S L p( i) ;1 f) = tr ( p)
A

(3.5)

1= 1

The abov · ,que ti 1 suggest s that th prob bility of an cvrnt in que: nh1n1 t lwor)'
is derived a. a w ightcd sum of the 1 rojections of all possible states ovrr thE C\'cnt

1

subspacr .

his probability is caknlatC'd for all po,'s iblC' candidate' cvcnts (87 z = 1 :

1) and th

v nt \Yith th high

m d l n w hi h v nt w ull

3.2

Gau

t q(

1)

ull 1 e report ed

'

s th

prcdi tion of the

ur.

ian Proc

aussi n proc , L a g n r aliza tion of th

'c

tvsian 1 roba1 ility distrihutiou an 1 is

a B a. 'C'si c-m a ltern ative to the k<'l'lH 'l m d llOd: s uch as , 'uppm t \ cTt 01 :\Iachiw 's. Siw ·c'
mod l learn d by
of the m del ar
am

P

r

n n-p r mn tri , c ny hard assmnpt i ms on the .·tructnrc

af ly a\·oiclrd (e .g.

R.

tuning Rll d Rt a p oin . nrr dra wn fr01n t h r

1nod 1 d . crib r d in ch a pt r 2). In thi. : rcti n ,

di tribution , i .. par nw tri

we briefly di ·cuss GP · for regr s ·i n c nci la · ifi ·at ion f ll owing t hr not e tion us d by
R a mu

nand vVillia1n [2006) and ~Iurphy [201 2).

3.2.1

R e gression

In supervised lrarning, rrgr<'ssion ath'lnpts to predict the cout innons qnani itics b ased

on as t of observations. Fonnally given as t of . ampl

.) ( = { I 1 . I 2 .. ... .rs } , \Yh rr

each X-1 represent. a f ature vector and th output of the unknown fun ction t those
p oints y

= {yl , Y2, ..., YN } , we ar

t d to find the

int r

1tput of thi

nnkn wn

function .- t X * data points.

Th ,

ussi n process solution for r cgr .. Ion ass un1c. that a 1· trnt functi on f(.r)

r.xi st.s such that y = f(I) + f, v.h rr f
th , hiclclrn valu

j (J.·) via n

rv

(O, a y2 ) links the

u. ·inn noisr model.

1

b srrvccl vRria hlc 'lJ to

'P ass unH's th ·1t p(fl-\ )

p(f(xi) ... f(x )) i. jc int ly

aussl cUL with

llH'cUl m(x)

IE[f(x)] awl cova,nancr

k (x 11 x1) = IE [(f (x) - m (.r)) (f (.r' ) - m (x')) T] .

f (.r) r-v QP (m (I), k (.r. .r' ))

\Vithout l
mmonly

t t

(3.6)

f gen rality an l for t h sake' of sim 1 licit y, the me c n function m (. ) is
z ro . Funch n k i. a p sit iv cl fin it k ' nwl function. d fined basrd

on ur prior beli f over t h kinrL of functi n. '"e xp ct to obsc'rv in dat a ( .g. lC'v 1
of m

thn · ). In oth r w rcL. th k rnel fun tion k (.r 1 • •rJ) controls the r lativC'ness

f point .r 7 and .r 1 i. "' if th k rn 1 on. id r .. r 1 and .r:.J as similar, th .n output of
t h fun ·tion at tho e 1 oint i.._

1

xp ct eel t

. imilar a · wrll. In this work wr usr a

r adial ba i fun tion (RBF ) k rn 1 that i in fonn of k (x 1 • .r1 ) =a'] xp(- 2; 2 (.r 1 -IJ) 2 ).
P aram t ers a f and l are opti1nized b a e l on ro s-va lidation ov r thr training dat a.
With a ll<'W set of nuohsnvcd dat a sample's X ,., G P llC'<'ds to pr<'dict f ,.. If .f* is
to b calculated onl r ba ed on our pri r kno-wledge ( d ctcrn1in cl 1 y function k ), t hC'n

f * r-v N (O. I\(X*, X *)). Figure 3.1-a illu trat

10 . arnple function. rstimat l frmn

th GP prior.

GP US<'S the training smnpks to train its model and ca knlat<' th<' post<'rior as
fo llows:
y

r--.~N

I(y
0,

KT I(,.*
*

f~

wher f y = k(X X) +(Jz i N is

K*

X

.

I(,. = k( ' )(,.) i.

is N~ x

20

(3.7)

The g )al is to cmnput(' post<'rior p( f *1..-Y .

, y) which has t-lH' following fonu:

(' . )
Jl "'

d riY d 1 y 1 plying t h rul . for

= t 1 (..-'\' r)

}\"T }<\.·."., l.l"')

ndi t ioning

:r atL sic n distribnt ions.

1

P r gr s:or lr ~c ribrd Fl boY<' caknlatrs a nw an 11 * that

n w un b erv d . amplr .r *,

i the exp ct d output of th fun tion pr dict r d by
usrd to int n r t t h

1 y a vari a n e CJ 2 whi h a n l

Figur 3.1-b ill u trat

k

in c ll.v: for ach

10 a mpl fun t ion

xpe ted. th e tin1a t d functi n

P at th point .r. accompa ni<'d
7

P ·s confidC'nc of it s pr diction .

, tim a t d frorn t h

P posterior.

s

nv rg to he ,'am output valn' at th training

mpl s. In Figure 3.1- , th blu pl t i th tru e fun ction. the d a. h d red plot is the
pr dieted function by GP (b a

d 011 11 *) and th gra~· areas m ark ;3 standard deviat ion

away from the predi ted functi on (b as d

n

*).

As a final note , it i go od to n1ention that the Cholesky d con1po. ition is usrcl
to compute I<;; 1 = L -T L - l in t ad of lir ct inv r. ion of the 1natrix sine it i faster
and also to avoid nu1n rical tability issue

( ugge t d 1 y R asn1u en a nd \ Villimns

[200G]).

3. 2. 2

Classification

The prrdiction of lal ls th at Gem sian pro cess 1 roYicle. is probabilistic and the confidrncr of

ach prr li Lion

an be fonn ally calc 1la t eel in terms of statistics.

21

This

Ten samples from the GP pnor

3~~----~~~~~~~~~--------,

3

Ten samples from the GP postenor

2

-2

-JL_-_~4-------~2------~o------2~-----4~_J

]L---~4-------~2------~0------~---

4

{b)

(a)
3~--------------~----~------

2

0

-1

-2

-3 L__--~----~------ - - - - - - -6
-4
4
-2
0

6

(c)

Figure 3.1: (a ) 10 muple functions esti1natecl from the GP prior: (h ) 10 saruple
functions e tin1atect frmn the GP post rior: (c ) the bluP plot is the true function. the
dashed red plot is the predicted function by GP (based on '' *) ancl the gra.v areas
1nark 3 standard clevia tion away frmn the predicted function (based 011 rJ *)
contrasts with the conventional k rnel-based 1net hods snch as Support

ector ~I a-

chines that only provide a guess at the class label which is not Rssocie'lted with a
fonnal confidence estima tc.

Following the procedure suggested by R nsumssc'll nud \ Yillimus [200G]. G P~ can

bP Pasily C'OllVC'rtcd to hinnry classifier. To clo so . we could assign C'it her + 1 or - 1
lah<'ls to the outputs of th<' ohscrvC'cl dRta sanlpl<'s. i.e .. y E { - 1. + 1}. Then. mle<'
thC' 1nodC'l is trai1wcl alHl thC' posterior clistrilmticm is cnlculatC'cL for unobserved .\ *.

22

wr cmnrmtr 11, whrrc' ign( 11 ) cktrrmiiws tlw pn'did<'cl class la bel.

1nc t h t

k at hc:m d i a n1ulti- l

f sul j

fr 111 a ho. t

t. in t h

gall ry)

cl , , ification (i .. , i l ntif. ing ra h sul)ject

\\'

n

l to adopt a proc dnr

to 1 rrfonn

multi-cla , ,' la ifi ation u, ing an n,' mbl of ~p bin cu~· classifiers .

ifi ati n

C'X

nd d to co\· r n1ulti-cla~s d a~~ ification prohlrms .

re tv,· con\ rntional 'trat g1

for reducing t h ta:k of nmlti-das, classification

The bin ry cla, ifi ·at i n can 1

to 1nultipl binc ry cla sifi c tion : (i) on \'s . on and (ii ) on v~. r ·~ t.

In on v .
v ry two cla

train d.

ne r du tion. on 1 inar · cla, sifi r i. train d to distin g ni ~· h bet we n
; t hi

111

an

voting chen1

L

K(K - 1) C' 1aSSl'fi CTS
2

th n appli d on th

arr ll ('( l('(J t 0 LI)

result ,· of all K(1~ - J) classifirrs to

corn up with the final pr diction of h model.

In one vs. res1 (a.k.a . one \ 'S . all) r<'<lnciion cmlv K dassifi<'rs ar(' 1rainc·d to

di tinguish b tw en
cla . ifiers

ach cla . ver u. the re t of cla e in data.

hould not only b

abl

to precli t the

confidence core for th ir de i ion. In case n1ultipl

la

In this strategy,

label , but al o provide a

la ifier pr di t a po iti,·e lab l

for a tPst dat a s<-unplr , tlw confid C'ncc scores arc' nsC'd for di s<-unbi gu aticm b~· ranking
the lab els and picking th 1110 t probe l l on .

In the curr nt study, we h 0ve addr . ed our n1ulti-class clEtssificRtion problem
with a one vs. r st strategy. Thr rationalr behind electing this trategy is that much
f wrr lassifiers are nc decl t.o 1 e trained whi h 1nake · 111 re sense clue to the high
computational c mplexity of training a GP-bas d classifier.

23

3.3
1n

Fac D t ction
th

obj

mm n pr
nl

pa

tiv

icl ntification, it is 1n r

in th lit r t nr to fir. t d t

ti

th

f hi, , tuch· L fa

ubj

t ' fac

r

to th

a pri r a lgori hn1 to tra k / drt

t c ncl crop fa

on r ni nt an 1 a

s from ra h im g and

h r for . it is n crssar) to apply

gmz r.

r p th " fC\ r. frmn

t and autmnaticnll:v

fran1 . In thr foll wing f thi ~ ction. th two most ci

C\ch

idro

d algorit luns for fa r clr c tion

that arc nsc d in this study cU'<' int rodw c'd aw l hridlv dc ·~n ihccl.

3.3.1

Viola-Jon

Th fa

d t

m thod

tion algorithm propo f l 1 y

f thr

tag

iola ancl .Jm1 s [2004) i: fas t c nd rol ust.

:

In th fir t tage. it u

a repr

nt ation n1odel call d the ··Int gra l In1<1ge·· that

allow for fa t computation of th

f at ur

include three kind off atur

Figure 3.2) :

(s

fron1 irnag ·. This r presentation model

1. Th va lu of a two-r ct angl feat ure is th

difference b tw en th

sum of the

pix ls within two rect an gular regions.

2. Th

regions have the

arne

ize and

hape and are horizont a lly or verticall~·

aclja ·ent.
~3. A thn'<'-n·ctanglc fcaJurc computes the sun1 within 1wo ontsicl<' rcct angle's s nh-

trcct d fron1 th

sum in a

computes the clifferenc betw

nter r

tangle.

Finc lly a four-rectangle fe ctnre

n dia gon al1 air of rectangles.

24

hC'n' ar sonH' limitation: associated \Vith th' ahovc-nwntio iH'd rc tanglC' fC'atur 'S,
u h a b 1ng

lg 'S b ar. , and ot h r , in1pl imag st rue-

of

n i ti v to t h

tur . . How v r empiri a l r ult s ho">· the t . u h r pr s ntRti n supp rL

1 arning while b 1ng

ff tiv

n1put ati nally ffi i nt .

~

8
.

-

·;~

~

B

~=;:
-~

t

--

l;:.o· .,

Figure 3.2: a1npl r tangle f Rt ur . Two-r ctangle f aturcs arc> sho\Vll in (A) and
(B ) . (C) how a thr -r tangl f at ur . and (D ) a f ur-r tangle> f at nrP. Figur i.
adopt d from th p ap r author db · Viola and Jones [2004].

In the

cond tage , a impl

clas ifier i trained to elect a f w critical vi. u al

featur s from a very larg set of pot ential f ature . The cla ificr is
by Schapire t al. [1998] whi h in thi work , h as b en used t

da Boo t propo

build a gr

1

dy feat ure

s lector. AdaBoost aggr gates the pr diet ions of a large ~ t of ( v; ak ) le arn rs Yia
a w ighted majority vote.

The v. eight giv n to

ach v.. ak l arner det rmine the>

import an e of that learn r , and in thi case , the feature.

In th third stag , the rlas ificr Rre cmnbined in a '·cas a le" setting that allmY.
for discarding the b ackground rC'gions of tll<' i111agc so 1hai i he focu s renwius so ld~ · on
th

pr n1ising fac -like regions. Each layer of th ca-. ad classifier onl)' lets through

th

sub-wind ws in th

image t hat it predicts to be a p siti\'C one i. . part i a ll~ ·
2;)

containing th

fac<'.

ar a fth fa

m

3.3.2

he fine l output of 1he cascade classifi<'r \Yordd dct<'rminc he

h tati nary1n1 ag.

f r Vi u 1 Tr

In r m nt 1 1 rnin

h In r m nt all ar11ing [ r

is na l

kin

r ackin g (I\ ·r) a lgorith m pr pos d by Hoss C' cl.

[200 ] att mp . t d a l \ ith 11011-,' ati n ar.v data (vicl

~) wh 'f both th

t a r gPt obj ct

and the h ackg r und hang m · r ti1n (i .. , ·am ra motion ). 1 his a lgorithn1 rffic1rntly
l a rn

n 1 upd at s a low dim n ,'ional .'nh pac

repref-, ·nt at ion of t hC' targr>t obj

t.

ThC' tar ~ct object is first mod<'ll<'cl h.' a cmn pact 1 C'pr<'sc nt at ion s1lll('( tln ' t lm1 f'acilit at
h a ng

bj

t r cognition. Th

111 appearan

ul p ac

f th t r get

m d 1 i.

ontinuousl)· updat d to rrflC'ct t h r

l)j ct Yia 8 11 fficirnt in TC'nwntal Ill<'tho d w hi h

all w for tr a king.

3.4

Feature Extraction

In order to C'Xp<'rimcut with different fC' at nrC' typc•s, a nd show 1h at t hC' proposed
a lgorithn1 w uld work well with a ny of th m. we u s .
in th

lit ratur , nanwly, histogram

on1n1on visual feature . us d

qu a lized int n sity level , Local Bina ry Patt ern

(LBP) codes. and Histogram of Ori nt d Gradi nt (HOG ) des riptor . The a knl at ion

proccdur<' of of thesC' feature' s arc hric'fly di scussC'd in th<' rC'st of this s<'ction .

26

3.4.1

Hi togram

qu liz ti

H istognun rqualization i. a

c utra st

nhanc nwnt t chniqnr by adjnst ing t lw intPn-

sit i s of i1na g 1 ix ls by rv nl~· clis t rilmt ing t h<'m in the inw gc hist ogrmn .

his ad-

justment r snlt s in a high r contra.' t for t h )S<' n ·g ion ~ pn'vionslv with n lower local
contrast .

I h ·togrmn equa liza tion gr a tlv C' nh c-11H '<'S the qu a lity of the inlH g <'s with

pc or illmnina tion . This met hod is SJWcifica llv n~ dul for low qnnlitv \'icleos.

1gnre

:3.:3 shows a u in1agr h 'for e' a nd after his togr a m Pqu a li za t ion .

5000

6000

4000
4000

3000
2000

2000

1000
0

0
0

100

200

0

100

200

Figure 3.3: h nagc' lH'fore (lC'ft ), aucl after (ri ght ) histogram ecpwlization . Th<' histognuu of iutcusity levels of ench ima ge arc illustrated in th<' second row.

27

Lo al Binary P tt rn

3.4.2

L cal

inary P tt rn (L

nity b '

j la t al. [1.

) 1 r t r fir t intro ln

thr rna hine vi i n ·onln1u-

d t

]. i a gray- cal inYariant t xtnr d s riptor. LBP 01 rat r

is kw wn i o h' robust to mmw1 onic grm·-~ ('alc vcu ia1 iou whi('h allows for dintina1 ion
f th

f p or ilhnnina ti on in im ag ,'. ~ bo, its compnt ntio n a l simplicity n1akes

ffr t

a n cd~· . is in clu~ll

it a perf ct t ol for imag
an

\' rvi \\' of h \\' to l riY t h

nging reRl-tim<' s ttin gs. Thi. s ct ion is

r cod ..

L

mor compreh 'llsive r'xrlanation of

this proc<'ss is gi Y<'ll in Pi<'tibiiJH 'll [2 01 J ].

Figur

·.

illu trat

Th L P patt rn forth pix 1 in th
th . valu

3 lJlo k cTOI pc>d frmn a gn1~·-l vel inu:tgE.

a n arbitra ry 3

· nt r (nmn d c .9r) is dC'riv db~· t hrc'sholding
L

fit neighbouring pixels (n amed a g7)1) with r sp C'ct to the valn<' of .rJr:
g p7

g p6

gps

g po

gc

gp4

g pl

g p2

g p3

25
26
46

83
48
52

91
56
85

0
0
0

1

1
1
1

1

Figur 3.4: Lo cal Bin ary P att rn

Binary cod e:

0011 1110

pPrator

P-1

LBP code=

L iyn (yp

1 -

Yc)2 1

(3.9)

1=0

whcr<' r is th<' llUllllH'r of ll<'ighh onring pixds aucl the si gn hmctiou is dC'llllC'd a.':

si.r;n( .r ) =

On ·e Lh loc al h inar

p t Lrrns ar

1 .r > 0

(: ~ .1 0 )

akulat eel for every pixel thr m gbout tlw whole

irna.g<'/patch. t hr LBP cc d<' for that image'/ patch is d<'ri\'<'d as t hc histogram of tlH'Sf'
p tt rn .

l a

not

rn . Hmv v r L P do

pat
w rd

n t u.

. om patt rn w ull fall in t h

patt rns r cJa · ·ifi l into tw

Th LBP p att rn
tion

tw

hav tw

tran, ition .4

will b

2P liff r nt LBP

2P bin, in onst ru ·t ing t h histogran1. In

t hrr

binning, t h

L P

. am

bin .

o p rform t h

a t g >ri . : unif nn a uci n n-uniform .

f a pix l i,

nit bin rv

uring 1 ix L , t h r

n ighl

that f r

on, id r d nniform if t ht r an' nonr or tw > tr a nsi-

cl .3 It i

tlw t th r ar P (P - 1) p atterns that

r ~ 1 att C'rn th at h e v no t r a n. ition .. 0 The rrs t

fp c tt rn whi h ar non-unifonn fall int o th<' la t lin . Th<'rrforr . thr fin a l histogram

Finall , th hi. togr rn g n r at don a h p a tch of t hr irnag i. con ·ate n at d which
f rm a v

t r call d the LBP fcatur

I'

t r. Please notr tha t in t his work w u sc'

th normalized v r i n of thi LBP f atur vrctor. 111 aning tha t a ll imag · constrnct

a unit V<'ctor wh<'n l'C'}>r<' S< 'nt<'d by t h<'ir r<'SlH 'ct iv<' LBP fc ';-dlnT vc 'c1 or.

3.4.3

Histogram of Orient e d Gradients

Histogrmn of Orient d Gra lient (HO G ) descriptor wa.· fir st proposed by D ala l c ncl
Triggs [2005] for the purp ose of object d tection in g ner a l a nd huma n d etccti n in
parti ular. This method i. based on the id a that t h local
can be l s rib d by th distribution of intensit

To calculat the HO

ppearan e of a n obj ect

gradient s (i.e. cdg directions ).

d script or , ca h imag is fir t divided into sma ll connected

~jFor instanc , the binary code stated in Figur
.4 contRins two transitions; one fr m t he 2 n d hi t
to t h Hl , and t h other from t h 7th hit to Lh 1It
4
'boos ( P - 1) different numl er of 0 (or l s), then put t h m in (P ) desired place ·
5
~ iLher all bits are 0 or all ar l.

29

llo ks and a hi ·tognun of gra lirnt din'C'timv is cakulat eel for t hr pi ds within ec-wh

11 ck.

his is clrriv d l y fi]t ring a h block with a d<'rivntiv mask such as [ - L 0 l]

(horizontal rrlgr drt ctor). [-1. 0. l]T (' ''rtical d gc d 'tee t or). and other nwn' cmnpl x
rnasks sn h as

ob 1 1nask.

c f thrsr hist< grmns (s c

lw H

)" d script or is repr sc'nt c>cl b~, the cone at <'nH t ion

ignrP :L!1 ).

Figure 3.5: Hist grmn of ( rirntcd racli nt:. L ft: raw image': Right: rxtrac!<'d HOG
features (using block: of lG
lG pixrl: )

3.5

W e lch t-t est

For pPrfonnanc cmnparison purposes 1> tween different nwthods. w nC'ecl c:m statistical tool to cmnpare the RccuraciPs (mran ± standard deviation ) cl rived b~, diffcrrnt
rnethocls. Wrkh's t-test (or unequal vetriances t-test ), proposed b~, \Y<'kh [ 1~) 11]. is a
1w -sa1nplr test that i. nsecl to test t hr h~' pot hesis t h · t hvo popnla tions haw' equal

means.

Welch's t-te::-;t is performed whrn thC' ·1ssmuptiou of equal variance bE'twC'<'ll two

populcli ion::-; is not sa tisfirc l. In this si ncl:v. t hC' accuracy of differ<'nt algori t lnns , ·nry

at cliff<'H'll<'C' ratr (i.< ., cliif<'H'll<'<' variances), thc'reforC', thC' npproprint<' test is \\'c,kh
30

t-tc st.

dditi n a ll~' vY lch t-t st i 1nor r li able th a n th m or c 1nmonly us cl stud nt
t -1 , t wh n th t\;~,r
if thrr is n
with th
our r

a

th in11l m nt r l C' O l fo r a n algorithn1 so tha t '~ r a n rnn it

t

xa t sam

ult .

an1pl , h aY u n qual an1pl siz s. T h is pror rt~' conws h a ndy

tra in/tr ·t p a rt it ion of d a t a. howevrr, w

::;t ill n

cl t o comp are

gain t th r Lp ctiv m th od 's.

\\' lch ', t-t

t cl 'fin

t lw. tRti.tic t h t h fo llowin g for mula:

(:3.11 )

wher

X 1 , .':>I a nd l\T1 ar e the fir t

The \ V l h- att rt hwait
a o iat d with t hi Ya n a nc

·a mple ·s m a n , v rian ·r, a nd sizr r rs pectively.

eq uati on r. · u,· d t

calc ulat

t h e lrgrrcs of frcrdom v

tin1a e:

(3. 12 )

wh r e ZJ1 =
fir st

1 -

1 and v2 = N2- 1 are the d egr e · of freed mn a sociat d \\·ith t h e

nd . econd varian e esti1nate re. p ect ivel . The a pproxi1na t c d egr

of fr ecdon1

is rounded down to the n ar est int eger .

These two calcula ted st a tisti s . t and v , a n b e u. ed with the t -distribut ion to
t est the null hy pothrsis that the two population h ave equ al m eans (using a t·wo-tai lecl
i.C'st ). Al t.cruat ivcly

as in this st n<ly- a, one- i ailed t C'si can 1)(' nscd t o cvalna t <' t lH'

hyp thesis th a t thr n1ec n of one p pulation is gr e·1tcr th an or eqm1l to t he ot her.

:1

Chapter 4

Video- based Face Identification

In thi

h e pt r , th pr po ed m th dL for Yid o-1 a:rd fac identification ar discussed.

In Section 4.L the cmH 'C'pt s of quantum formalism an ' ap plic'd t o th<' task of
face id ntifi ation in vid o , and the ap pr

h to pr diction of id ntit:v is elaborat d

in that fr an1 work. This approach i. referred to a

·'Quantmn Prob ability Inspired

Framework'' ( QPIF).

In Section 4.2, a dual exten ion of th quantum fr amework - ref rred to as "Ens mbl of Abstract S qu n e Rcpresent ai.iv ., (E

R) - is intr ducecl and the id n-

tification proc dure in this model is di cus ed.

In se tion 4.3 , a rnachine learning b ased approa ch is added on top of E

R in

order to captur th non-linrarity ave ilabl in dat a and maxi1nize the prediction performancc.
of th

Gaussia n process ( G P) is nsc'd to im plcnH'n1 the llHlchiw' lc aruing si<l<'

algorithm.

Pis a prob abilistic bnt n on-parem tric me tho d th ct p oses no

32

hard a,s,'u mption on structnrc )f thr nwdcl.

n r ali z ti n - i pr p , d to , 111 p ort

i liz ti n-

p

novel lC'aruin g sclwnw - rcfrnc'd to R.s

in all ' a f t irl n t ifi a ti

m

mnbin

P

P and

liction of h t h the

ntly id ntify h ,'nbj ct in a prob . qn n · .

m dnl , t

Quantum Probability In

4.1

rr

ffi i ' llt l c rniug f r the

ir d Fram work

In a u i m ag<'- S<'t h as<'cl fa c<' id< 'lli ificai ion ! ask . <'ach ind ividna l illl ag<' m ay uo! full .Y

ch ar a ct eri z

th

qua lity of th
of th

fa

au

f th

in1agc ( .g.

in th

of th fac
earlier

f c

p r. on in t h

dnr to ( i) p o or

l w r , olutio n. il h nninati 11. r t ·.). (ii ) p Artial

im ag · fic ll of vi"' ( .g. o elu sion . po:e,

d t e tor alg rit lnn t
un

in1age-sC't. This Inay b

c

x ist n cr

tc.) , a nd (iii ) fa ilurr

urc t ly ::,p t t he face . Thi · i ~: u r. as lll C' nti o necl

rt aintY n h uw much a fa

id entifi cat ion m t h o d can r dy on ach

individual im age in an im age- et (l ot h for kno\Yll and unk nm,·n i lcnt ities) .

Quantun1 th ory can model

uch unc rt aintics wit h it.

of probability ca lculation . It provide

a go d con .-t ruct to

on1pre h ns1vr u ot i n
x ploit t h

emb deled in the im ge-s t s and its repr sentation stru -ture can l

kuowl d g

c1nploy d t o solv

the complex t ask of face id ntific tion. Fir. t . w n eed t o m a p t he ·onccp t .

f qua nt 1111

theory in physical world to the problen1 of face identification with im age-. et s.
ther

x i t a H ilbert ·p ace H which includes a ll the po il l i1nages -

ssume

ither b · n ·a ble

or u nob servab le - fr mn all individu als. E ach inw,gc is rc'prc'S<'llh'd as a f<' a i nr<' v<'c t or
in th Hilh rt sp a ce.

.1 .1

c

t

Im

h iinctg -,'

i htc l 1 ac

unt

f kn w

am ng th

v nt

tr, ining imag -, ts (i. , gctll ry ) b longing t o th

f r a knmn1 icl nti t..

corrcspowling S<'l of f<'a l 111<'

to th c llCC'l t

id ntiti

indi-

cch known id ntit · is r 1r s nt cl as thP

<'cl m s whi ch . p rlll 1h <' ~ nh~ J HH ' < '

f \' nt in que n t um for m alism ).

' 1 C If (j .c'., <'qui valent

ach subspace' r pr sPnt s one singl '

idcntit. · whi ·h i,' qui\·al nt t o t h nc t ion of cur nt or oh.w ruohlc in t h ori gin al t h 'ory
f qu antum ph. : ical : yst m .. In ot lH'r wmd .. Pach sub pac<' . 1 is a collPct iv(' rc'prrsC'nt ativc of the pmt ia l info11na ti on <I<'Ii\T d fro m cli f-kr< ·nt i nJ ag<·s of tlw i]](lividnal 1.

T\1 r int er , ting l~'. ' 1 implicit ly cont <in.' t h<' unob.·c-rvC'd unagcs o f t hC' indi vidu al 1 as

d crib cl by th f atur s xtract C'cl from t h ' observC'd inwgc's (<'.g .. cl SJH'cifi c ('Xtnrc').

worth m nt ioning th at adding a 11 \\' subj 'd to th (' gall<'ry ca n 1><' easily
addre

db d finin g a n w v nt , ub: pac ancl Px t r nd ing t he f'V<' nt s spac<' t o cov<'r

t hi n w ub 1 a

4.1 .2

. Th r t

Image-

f th algorithm r nw ins int ac t .

t s of unknown id ntiti s a

tate

Each unknown identity (i.e .. image's in th e cmT<'sp owlin g prob<' im age-s<'t ) rs r<'pr en t ed as a n ens ml l of st at

. The feature vect or of ach in1ag

in t h

prob e

im ag -, et is considered as an lem nt ary obj ect which defin es a pur :-;t a t e ¢ .

,'m g

( .1) w th n con trnct s v ral mix d st a te.

', s the :snp eq osition of few rand omly

sel r cte 1 p?tr states in orcl r t

number of attribute th a t each state

m aximize th

can represent , as well as to minimize the effect of nncontrollt'd vari ant s in t he inwgt's.
Sup<'rposition leacls to g<'nerating initial st ntC's t b at cHC' morC' r o bn : -~ t t h nn t lw nois~·
singl inw ges and tlwrdore act ivrly improving t h rt'cognit ion process.

In ord<'r to l<'pr<'scnt thc unknuwn ide'tlt itv. it i. H'quiu'd to de'filH' a probability
eli, tributi n p(¥)

v r th

mll of particq ating m1nd sta t'S. Th distri lnlt1 m

'11.

L 1 fin d in ,' U h c wa~· that ach p( V 1 ) n'H t: a 1 la t iv int <'r-similarit Y m 'C\sur of
th ranlom p11T

, tat'.' {¢.J ..J-ln }

that con t11wt tb i 111 m1.nd s t at<'~·,:

n,

p(4'1 ) =-

\

11

I

11

D,

rl ( ),. r/Jk)

( .l)

D,
I

1

,,.h r metric d(.r . y ) c·omput<'.' tlw
th

uclicl 'ni l eli. tc1nc lwtw<'<'ll point~ .r nnd y ; 11 is

numb r of pun . t, t <'.' C'Oll.' ruct 1ng C'ach nu.rr d

at<': and .Y is the' nmnb 'r of

mud state\ in rn,'rmll . f inallV.JJ(V 1 ) i.- det 'nnincd hy ( 1.1 ).

4.1.3

r

R cognition

w that t h

one pts of

l

nt and n. m bl of stot s a re clarified in the cout ext of

face iclrntification in in1ag -s t .. w de. crib the proc s.· of rc>coguition of a n unknown
id nt ity wh n th . yst m i. pr

nt d with a n w llll.'<'<' ll t 'st image-. ct . In this

section , ,,. explain h ''" QPIF a.signs thr most probablc> icl ntity to th t st i1na gc-

et rcpre nt cl by an rnscmbl

f initial states via

arch ing in the event s spac<'

grn rat d fr 111 thr training image-srt ..

\\.'it hont loss of g<'lH'ntlit y, let us assnmc we· h av<'

id uti tie's. Th n whC'n present<>d with an imagC'-SC't

erate

1 iuwgc•-se·t s for 1he• known

f an unknown icl ' ntit~·, w< g<'n-

initi l st ate's. :\ow , t lw task of recognition is nrri d out ns follows : fir:-.t.

the similarity 1 c>tw<'<'ll all initial , tat s { ·, ..J

1 \'}

nnd VC'nts { 1 • 1 _ 1 .u} is cakulnt<'d

followi 11 g (:~ . L I) . Tll is r<'st!l t s i11 (he· Pn jrct ion Ill at rix P.~J l\" whc ·rc· <'nell <'klll<'ll t 1\,

') r:

<> ·)

P-

wh r the ;th r w

f m trix P (i .. , {q( ~lv 7 ) ..7 - I·S}) repr sPnts th qn Fmhnn proba-

of a h initial stat

biliti

(~.2)

belonging to th known idC'ntity

1•

Th n, if it is in. ertecl

into ( .. ). along \\'ith he probe 1 ility distribution of initial state's deriv d 1y (-±.1), thr
{ ~JJI.J = l

.v} will b 1narginaliz cl ut and t h

1 rohalnlity oft h C'\'C'nt

1

1

h ing t rn (i..

th e prol al ility that thi. prob i1nag -set b long.· to tlw known identity rPpn-'sented
by

1 )

i cal ulat d. H w \' r, 1 rior t

{q ( zi4'J)•] = l

}

thi. procr. s, w have p as~rd th distribution

through a . haping function:

( 1. :3)

rp( x) r pre ent the exp onen ial function e:r.
Thi

haping functi on wa

elected n1pirically 1 a ·ed on the experi1n ntal r .· ults.

We will employ the slmr)('cl distribution {qs(S 1
the result of m arginali zing {qs(S7

1

1

1 ) .1 = l :N} to d <'riv('

•

1 ) ,1 = l:N} over th

•

q(S1 ). Thc r1( 1 ) is
1

s t of all initial . tate { 4'J IJ=l. }.

The q(S7 ) i calculat d for all probabili tic eYent. following this pro crclur and finally
the Pv nt with the hi gh st prob ability is rer orted a th system 's pr dicti n of the
id<'ntity of the unknown (i.<'., proh<') imag<'-sC't.

s cl<'scrilwd a hovC' , QPIF <'mplo\s

a simplified int rpretation of the quantlun fonnalisn1 to aggregate th e info nn atiou of
the known Rnd unknown iclentitic and us the en 'emble of initiRl st ates for searching
in the v nt. p c to assign t h , most prob able icl ntity to each prob im age-set.

En

4.2

mbl

of Ab tr

t

R pr

qu nc

nta-

tiv
Data r pr s ntation

mpl o~ ·pd in th

\ ,' R a pp1oach is lll ~ Jlr<'d by the

lP il~ .·true( nrc

xt nd ~ I 's rC'plrsc'n l a tion mc' thod for unknown idc' utiti<'s

intn du ·rei ea rli r. \ Y

(i .. , 1Lrn1bl of :t0t . ) t o r JH c>:'nt thC' kno\\'11 id<'nlitic'~ a. we'll. . sa r s ult , both
r pr , ntation: of th

known a nd unknmn1 idcntlti '!:-> ca n c'xploit tlw n lvantag r s of

l11} ling a n l : upcrp o ~it ion. , ncb a~ n o is n 'lclxa t 1011 .

. m nti on
fully ·h ar e teriz

t c.

arli r. in im ag -s ' ( lm: 'cl fa · ' id 'nt ific a t ion . ('ncb image mav not
th

individua l' : fnc C'. Thi::, ll W \' 1)(' dnC' to (i) poor qn a lit~· of thC'

( .g. low rrsoluti n . illumina tion . r tc .). (ii) p ar t ia l c·x istC'uce of tb <' fa ce' in thC'

nnag

imag<'·s fidel of \'i<'w (c .g. occlusion , po. C' . c·tc. ). a wl (iii) fail ure · of 11H' fa c< ' d< ·1c·ctm
algorithm t

urat 1

po

th

face'.

, uch i::n C's would cast lmcert a int y on tlw

cl gre that a f c icl ntifica tion 111 thocl , h oulcl r ly on ra ch individu a l im age' in a n
1mag -

t (for image in 1 oth g, llery a ncl1 rob e's t s).

A R is a vcctor-basc •d l'<'j)l'C'S('lltation strnctnrc for im agc·-:C't s th a t cHldn·ssC'S th<'
unc rt ainties mention d ab ov

c.

follow. : it r lax e. the noi. e in t h r r aw clc1t a point s by

tran ~ fe rrin g t h n1 into a high r level r pre. nt at ion structur using st r atifi r d sa mpling
and

up rpo, ition . Th n . it cal ·u lat s th

and t st scqu ncrs.

Afterward:

th

in1il0riti · b tw

n

0ch p a ir of tra in

recognition is p erfonnrd by find in g t lw nwst

. imilar known E SHs l o d iifC'rrnt gc nerat ed u n k nown E

'R s

a nclid a t c.', a nd then

aggrrgating t h e identification r s ult s of a ll n mcliclntrs v ia nwjority voting.

In t h res t of this sC'ction , WC' fir st ex pla in t h ' r C' p r r sentnt ion struct urc of E . \ SH .
and tb n cliscnss the mrt h ocl of simil ar ity nwasnrC'uwnt bc twC'c' n <'n ch E . \ I t h nt i~

llC'CC'ssary for either pcrfonuin ~ th identification task or rankin~ hffcrC'llt candidates
in t rm of th ir imilc rit , t t h - I r b imag -s t.

4.2.1

R pr

nt tion
.1) can b r pre~ nt d as as t of norn1 liz .d n lim n-

video t)equcn

(1 p of Figur

i n al f tur

t r ( a h r f rrrd o a. n)

xtra ed from rYrr.v fran1e. How vrr,

u h pnmc r. r prC'. ent c tion i. pron

to noise, it i. b n ficial to transform

b cau.
th

Y

v tor into a noi. e-rrlaxrcl . C'conclar. r pre ·entation strnctnrr.

ficd sampling. we draw (with rcplac<'nwnt) a S<'1 of

sing strati-

vcct ors from <'ach s1 ratum (i.e.

qu n e f fra111 • in a vid o). whi h ar t h n grouprd into . rv r l non-OYEr lapping
ubset of iz m (i .. m ,. ctor p r snb ~ t ). Th n. for rach . ubsrt, a new feature
vect r (r pr

nt d by p) i

on tructed u ina (4.4).

( 4.4 )

We r fer to the e n w n dim n ional unit f atur v ct r a
resentative (ASRs).
m1n1n11Z1ng the

1

b t ract S quence R p-

Superposition lead to constructing mor robu t ._ ample 1 y

ffect of und sir d variati ns in single n i y Image. and therefore

actively improving the id entification accuracy.

For a h sequ nee we con tru :t a set of
of Al tract Sequ n

SRs of size !II and refer t it a· Ensc1nble

Representatives (EASR). Top of Figure 4.1 ,how · the first 27

franws of a raw sequence lalwllcd .Tl. lu the 111iddh' of Fignr<' 4.1 a snhsct of ASHs
forming the Jl 's EASR is pres nt d.
1

he i lea 1 ehind introducing cnsc1nlles along

lease note that this is th ::;am conc·er · a::; generating a ml.red state from a s "t of pure statu.;
as in quation .1.

Ra'v S qu nc Jl

•••

EASR

J5

•••

ignre 4.1: A san1ple sequence frmn the YonTulw ' lcbri ties da tctsrt coll('ctC'd b~·
Kim et al. [200) (top), its ASR basC'd on intensity feature's (n1iddlC' ), ancl amatch<'d
ASR fnnn another clip (bottom )

v. ith t h

mcjorit. votin~ is in. pir<'d by t lw cmH'<'pt. of bagging and cxploi t-in~ the'

kn wl d g of th

, whi ·h ar kn vYn f r t h ir robust p rform a nre

r w l in d

]).

in highly noi y d e1 a (.

4.2.2

In

Similarity M

rd r to find t h

fir L v.

find t h

ur m n

sinlilarit)' be \\' en two \·id<'O f.; qu llC ~s l and .J ( cl notrd Cl.S s1])

simil e ri ty v;!q l

R frcnn th

w

n

ll pos.'i bl

R pair., of t h forn1 (!3~, (3~)

R of, eqn<'nre 1 and .1~ is thr q th

s<'<llH'llC<' j's c ll~<' lllblc . Th<' p a ir-wi se' si1ni larit y 1H'h\'<'<'11 l wo

SH s ;-J~ and (3~ ( i.<'.

R.. c: in ( 4..-).

4J;iq) i, rRlculat d , th inn r pr dn t of th two

wher e e;;q i the a n gle b tw

SR frmn

n th two unit , . ctor 3~ a nd Bg.

ll t h

sequ nc -WlS<'

in1ile1rity valu . ?.tJi.v deriv d 1 y (4.5) are collect cl in 111e1trix wu the t i.

n .~1 x J/

m at rix, wher \11 11 = [ ~Jq], for p, q E [l .. .i\1]. Th ne e1 re t ASR pair oft h e two seqn n es
i and j (i. e .. the

~ and fJg with th

m aximun1

'iq mnong a ll pairs) d etermines the

sinlil ri ty m easur s?J.

(4.6)

Following our illustrated examp le, the bottmn of Figur e 4.1 s hows the EASR for
a seq 1enrc la b ll d as .J" which r pr ,ent s the san1

ubj ect as in sequ nee .Jl but

bC'lon gs to anothC'r vi deo dip of h C'rs in t hC' datasC't. TltC' s imil a rity lH'tW<' <'ll t h< '~<' j \Yo

AS s is m asur d b y sin1ilarity of their closes t p a ir of SRs ·1s hi ghlig ht <'cl in F igurc
4.1 , rakula.t d Yia (4. G).

40

It is a g< ocl practice to monitor t h<' qucllity of
ub- an1pling.

v r r w p

av rag1ng

(a w) f t h

R ;3p. w

or ea h

f \IJ. \

cakulat

l

it

Rs that ar<' gcn rated by random
n1

n pa1r-w1s

th n eel ulet ret\' rag

simil rity \IJ P b.

(\rl ) etncl standard cleYiation

\rJ P f r p E [1.. ! ]. Finally, w filter ont thr possibl out li rs. i .. , any

R f3o with an averag pair-wi · ::,imilarity (\lfn) that is two standard lrviat ions (a~)
l

then th av rage within- n . mblP. i1nilanty (\rl).

( 1. 7)

\ V identifi d two ource. f r g n rating outlier
that pr . ent th
frame

'H.·: (i) superposition of fr ames

ubj ct in highly differrnt conditi ns: and (ii) pr . rnc

( .g .. where t h

fac

were r j ct d in th proc

tracking failed) in t hr

of noisy

R. Threr sample ASHs that

of con tructing the E SH for the .J 1 srqucncr are shown

in Fig nrc 4.1. The two rejected

R s ou t h<' l(•fi <UT gcnnai c·d dnC' to sonrc<' (i) , whil<'

source (ii ) i b hind rejection of the third A R (mo. tb· a r sult of facr tracker failurr
on a numl er of frames).

4.2.3

Ide ntification

In the icl ntification stage, the Yot s of seYera l ind p nd nt decision-nwkeL (i.e .. . everal ensc1nble of ASRs f r each prob

video sequence) are aggregated to con1e up

with a robust identity recognition of a ·cqnen e bel n ging to an nnkn v:n identity.
For <'ach probe' vid<·o seqw·ucc' (let n s say ith) t.hc propos<'d md lwd builds sc•vnal

(let us say J() ens mbks a nd calculates th eir simil arity ( 1u, k = 1 : J() to ever)·
cnsc n1blc c f all tretining SC'C]lH' llc rs (j = J : Q). This will generate thC' matrix "\ 1 " xcJ

41

for the ith test sequence . Therdore the k 11 row iu
g ner t d fron1 th

f

r w

pr b

mbl

emu pares the kth <'nscm ble

te t s qu n e again t all th training equ n

we find th
n

1

n tr ining .._equen

, . Th n, f r

that has th n1 ximum sin1ilarity to this

n l r port the icl ntity as.o iat d with this· qu n

I J<xJ

ch

= tnd .r( Arg)h cx {
TOIL '

using (4. ).

( .8)

1 })

11'18

wher indc.r(r) return the identit · as,r ciated with the .r 111 irnage-set.

ft rward , th function uotc ·ount · the number of tinws that C'a h identity h as
been el

t d a the 1110.1 i1nilar on to th

and record it in the r

p cti v

n . cmbl of cnrr0nt probe video srqnrnce

ll in ma rix V which consists of P c ll. . whrre P is

th nun1b r of known identitie._ (i .. , nu1nb r of. nbj cts). 2

(4.0)

Ultin1ately, th

known id ntity a sociated with th

highe t number of vot e

r eport ed as the system's final prediction of id ntitr, following (4.10).

identity = Argi'd ax {VPx J

In cas

(4.10)

of a tie in number of votes b tween two or more candidates, a score is

assigned to each of them for ranking p urpose. This score is calculat d for candidate

t, denot d as s ·m ·C' , , a.· the stun of :i1nilarity Yah1es frmn t h enseml let) that voted for
2

P = Q only if t here is exactly one training video sequence per subject availnhl<' in the gnllery.

42

this candidatr using (4.11).

l\1cx{ 1k}

.car et=

(4. 11 )

k Efmd(J == I)

th fun tion find ( r == n) r turn, th indic s of th elm nt in v t r

wh r

whi h ar

lU l t

th . calar a: an l

j

'k

i. th k 1h row )f th nwtrix

1

in ( . ).

h

ide utit. with the higlH'st ~wo n is I"<']H >rtC'd as tlH' fiual pr<'diction of the' systC'm .

Ensemble of Gau

4.3

ian Proce

Mode ls on top

of EASR, a Hierarchical Approach
In the pr viou

ction En mbl of

b tract

qu nc R presentative, (EA R ) was

introduced that i a v ctor-based 1n th d f r r pr s nting a video sequPncr . W
mentioned ·hat

ach E

R is built by sampling and ·up rposition to recluc

noise.

follow d by a filt ring 111 chanisn1 to deal \Vith outliers. EASR models th ,·aria tions of
the subj ect in an image-s t and the in1ilari ty of EASR s can be u d for identification
purposes. However , EASR represent ation i lin ar and would fail to capture th nonlin<'ar undrrlying structure of the d at a .

In order to address th

non-linearity in d at a. w

r ropo

to us

an en, 1nlle

of binary Gaussian proc ss (GP) models in a on -versu -rest set ting on top of the
EASR repres ntati n metho l. Th result is a hierarchy of two 1nain modules:

A R

module and GP 1nodule. EASH module off0rs lwttrr resista nce to noi ·y data nne!
th

P module in orporates a l c rning schc1nc call0d sp rcialization

f r cff

ivr trainin g of c u ruse1nl l of binary

gcncrnlizn hou

P cl8ssificrs (r 11·1hling furt hrr noi:-:;r

reel u cb em).
tructur
m

t

lw identification process cmnbincs both nwdn]cs nsmg a hierarchical
m x1n11z id ntifi

ti n r t .

h- r

f this s

t

tion describ s t h

GP

ul in d tail .

4.3.1

M d I

Gau ian Pro

In th curr nt w rk , ,,.r u

r gr s

th

r to con stru 't a

P binar y classifier (i .. ,

Y1 E { - L + 1} ). F r a h . ul j ct i. this la. sifi r is ca p abl of i l ntifying th subj ct i

v r u th r

f th

t

ubj

t . F r th

i has rn 1 samples in i ota] for tra ining.
to coll

t th

ake of implcm nt Rtion . let us cssnme subj ect
\\·(, la l)('] ilwsC' sample's a~ (+ 1). In onlc'r

( - 1) lal ll d a mpl s, we , nb-. a mpl

fr m th tra ining

t

an equal mln1bn of dat

points

el n ging to th r st f th s ur j ·t s in t h e ga ll r :v s u h tha t the

tot al number of ample for tra ining i clo e. t po ·sibl t o 2m 1 . This is to n1ake s ure
that th tra ining d at a 'a inpl , are 1 alanced a nd av id bi a · in t h classifi r ·.· d rc ision
m aking process .

For ach . ubje t i in a gallery with L ubj ct., we on t r 1 ·t one GP model GP 1 .
This n1odel i train d to predict wheth r a equen · b elongs to the subj ct i or not .

T

predict th id ntity of a prob

vid eo sequen ·e p with 111.P fr an1 ' the fnun

pres nt cl to all L mod ls. E ch GP 1 pr diet s ih
wher the /h iten1 h w th
the /h frame as input.

xpectcd value

ar e

/' * tha t is a " ctor of rn P lengt h .

f GP 1

uncl rlying functi n (f *) with

lassification of each fra1ne is b as d on th tlign of p *, if it is

ncgativP it nwans GP 1 r jects tlw possibility that this fr a m e b el n g to the snbj rct
i and vice' versa. lu orclC'r i.o aggrq?;at <' 1 he on1 puts of all m P fnunc s . v\'t' cakul a1 c

the av rag

of

ll f l, j E

[l..L] a nd rccor 1 it a. th

over all output of G P 1 (i.<' ..

sum-fu sion) . Aft er ·a.lculating the 'lggr<'gatcd output for C'Yrry m oclc'l, icl cnti t ~' of the

4

( EASR
similarity

t.-.1--'--i......ll-~

,

I
I

'
\

'

'

t-+--+--t---1

Figure'.±.:... : Th<' speci<-llizRtion step fork =

(lH'. t vif'wC'd in color )

subject with thr hi gh st nggregatC'cl ontpnt is rPportPcl as th<' prC'dictcd idf' ntit:v by
the GP rn.'cn1bl .

4.3.2

Specialization - G n eralization Learning S ch em e

GP binary cla. sifiC'rs arr sensitiv to thr qnalit.v of tn1ining sa n1plf's. thus a sintple
randmn san1pling process without a n~' provisicm for a\'oicling nois~' sc-1n1plrs rf'dnc<'s tbC'
identification power of thr resulting nwclC'l. In this section. \\"(' clescril>C' our lC'arning
schenw which r lies on EASRs for finclin g t hr nwst r levant sC'CJlH'ncrs for 1r<:1ining
each binary GP nwcld (i.C' .. specialization stC'p. schrn1atically shown in Figure' 1.2).
contplC'mrntecl by a genC'ralization strp which triC's to ·1llrviHtC' the C'ifcct of potC'ntia lly
noisy frarnes in t hC' 1raiuing s<-nnplrs.

Starting with 11 subjects awl 111 seqlH'lH '<'S for <'nch suhj<'ct m th(' tn1ining d ntn.
WC' have'

a SC2n xm which ccmtains SC'C]lH'llC'C'S for <'ncb suhj<'c·t .

p

ializati n t p (Fi ur 4.2):

2.

al ul t th

air-wi

. imilarity

11

1 tw n ra ·h two . ubjects 1 and j. follow-

ing (4. ) .
3. For each sul jcct 1 find t lw top k ll< 'arC'st sn hjcct s wi1 h t lw ltigll<'st 8 11 and storC'
.

J

.
lll

1'

4. Train GP 1 with all fram , from SQ 1 • [l .. m] as (+ 1) instmlC'es and randomly
ub- a n1pl

qual numb er f fram

fr 1n

.J ,[ l..rn]· J E

1

as ( - 1) instan rs.

G n ralization t ep:

5. U

GP 1 t

l b el

a h

equ nc

m

Q1.[ 1 m]· j

~ .. TS 1 , for ra ·h fn rne

f if

GP1 (j ) > 0 (i. e., n1i lab ll d ) del it toG nL 1 list to])(' r trained to GP 1 .
6. Updat GP 1 with all fr am e fin GenL 1 as (- 1) in tances.

In th specialization step for each GP modeL fran1es oft h training cqu n c for
thr targrt identity are used n,s (+ 1) instances. The (- 1) instances arC' randomly snhsampled fron1 the s quenc s b longing to th k nearest subj ct. to the t ·nget i l ntity,
as d etern1in d by the EASR in1ilarity (Figur

4.2 ). The g al of the sp ·ializat ion

st p is to fore

that separ ate

G P to learn distinrtiY f at n r

Cl ·h

subj rt frotn th

most similar subj rt.s to him / her in th gallery.

However , we h ave to 1nakc sure that t lw G P bin ary classifier would generalize
w ll on th

sul j

ts whmu it h as n t s en during it s fir st b atch of trniuing (i.e.,

46

.3 : c rnpl
P m d l f r th

Figur

p

ializ ti n

d in t h g neralizati n . t p wh n tr aining a
.1

qu n

t p ).

In g n rali z t i n . t p , w' n1nclomh· ,' U b-smnpl

s n1

fr am s

from vid o. f aturing t ho ' E' . ul j e L not n~ rd in t b ' sp cializa tion st p and evaluat

m
th

rr ·t 3 (t h correc t lab l would b ( - 1)

bin ary la · ifi r . If th lal 1 i: n )t

the
w

kn w thi . 11111

i. d fi nit ly not f at nr ing the r<' 1 ct ivc subj ect ), th n

mod l i. re-train d with thi. sa nrpl

g n r liz ati n . t p pro id ,' m r
probl m

pa

as a (- 1) instance. In other words. the

( - 1) in, a n c . to t hr

th at th e nw d r l L

irnproving the gcucrali;mbilit y of the

Th gen eraliz

P m odrl in arras of t hr

corr ct ly iclrut if

. uch in. t ancrs , t bus

P uw dd .

i n t p also rninin1iz

the effe · of noi y franws in t he ( + 1) in-

t anc s. For ex an1pl , c n id r th fir t 3 fr am s of . equence J1 : hown at the top of
Figur 4.1. The e fr am e do n ot provide any u eful infon nation for ident ify ing t he
subject in that viclco . In tlw initi al tr aining of t bC' GP m<Hl C'l for sequen ce J1 , t h('S<'
fr am s ar e provided as ( + 1) in t ances which nri. l ads t he n1odel t o cla . ify any. imila.r noi. y fr am

as (+ 1). In the genera lization . te p

uch n isy fr an1 . 1 elonging t o

the rest of the training s quenc es ar det ected . Figure 4.3 how a sel ction of t h ._ e
d t eet d fr am es wh n t r ainin g t h model for Jl. T he ·e fr mn es ar r then used a._ n '"

( - 1) in. t anc s t o 1pdat e th GP model. Thi. process h lp. to . ucccssfully cancel out
th

ffcct of noisy fr an1 s in th (+ 1) in. t an

s.

Now that the GP moclrls h e ve be n constructrcl consulting the E
the next st age is to rnakc pr eli tions l ascd on t h rnoclels.
3 Bascd on the

mpiri cal result s Lhe lah cl is correct. in most of t he sampl es.

47

R s n ggr~tions ,

• 0 02•
¥7723

77

Query(t)
EAR

Test
sequence(t)

76

~ Que~

75

:' '\

(

(t)

I

I

GP

arne

I

/

• 0028
v 1112

73

ideo til) ?

•

of
Report
lubelcJ,

I EAR
confident?

Yr\

Rep ort
label .~R
70 o:--=-'=oo:-:--1--=o~ro=--=-o~m-=-'oa.~:-:---:o~os=---=-o-=:C~;-:::-'o01
Th•eshold

1gurr ..!..±: Flowchart for th<' idC'nti£-ication proc('ss (l<'ft ):
xploriug tlH' <·ff<'d of
1nininnun cnt-off for
'R con£-idelH'(' (T) on cH·curac~· (right )

4.3.3

Identification: a Hi rarchical Approach

In this section. we discuss our propos<'d hierarchiC'al approach for r1ggregating th<'
pr dictions of the hvo 1nodules. nanwl~· E SR modnl<' <-mel ;p moclnlC' . to cmw' up
with the nwst accnra te pr<'cliction of iclrnti tv for e1 pro be vidC'o ~<'qllC'llCP.

Fig nrc

--1.4-left illustrates thr flowchart of the proposrd approach.

Clearly. if predict ions of both nwd ules agree on the se1nw iclent i ty. that prediction
is report Pd. Otherwise . the hierarchical appronch is u:-;cd as follows: Fir:-;(. \\'C' give
priority to the EASR n1oclnle sine<" it is nwre noise tolrnmt. \YC' trust E , ' R- ba~rcl
prPcliction whPn it iclPntifies t h<" prohr viclro scqurncr p h~· a clrnr winner : that is.
when clifferrnce in sin1ilF1ri ty of p and t h<' \vinnC'r

pu·

\'C'rsns p and t hC' dos('st next

ca uch cla i' e Spc as clcri vecl by ( U j) i:-; a hove H pre-de finC'd t hn'shold T. If the co us t raiu t
for the cnt-off is not satisfiPc l. it iudicatcs that th<' E SR moclule is not confid('llt in
it s prediction , th<'rdor(' tlH' label geuendccl by lh<' GP module is rcport<'d as tlw fi1wl
preclictc d ideutity.

1

Fignr<' 4.4-right ~ bow s t h<' overall acc urac:y of o11r LIH't hod for different valw ~s of
T

in tw

cliff r nt

xperim nt

(de rib d lat er). The ri ght sid of Figur 4.4 shows

that ac ura y dr p wh n w rely to
th other hand , r l ing too much

nth EA R m clule (low T v lu s). On

n1u h

n th

P moclul has the smnE' t>ff-ct . The valuE'

f T f r ea h d tas t i ' s le t 1 1 as d on cro~~-validation 0\' r the train in g set (th
el t d point ar highlight d in t h gnq h in Figurr 4 .4-right).

, sir11ilarity of tw

A Rs fa lls 1 tw r n zrro a nd on , nw .r(T ) = 1. Hmvl'vrr, in our experi1nents, T is
much m all r (alway, le .. than 0.0. ) .

9

'

} igur<' .).1: 'amr 1 fac ',

5.1.1

Hn

I

xt Htctc'd from Honda/

'

'

'

' '

'

''

•

'

'

,o';

', :

''

.. ,

~ ",'

, 'D da t as<'(

D

datasC't i: a collrction of-< vidf' >. tc'cord d from 20 ~u l>jcct s in ordc'r

Honda / C

to form a common ground for as:rs:-, nH'll( c f diffC'rcnt [;.H·<· id Pnt ific Ht ion a lgorithm:

(s c Figur 5.1).

ach snbj 'Ct has at lea. t 2 vidPos (cxC'q>t for ow' snbjcct ).

id<'o. h av an rqual resolution of ()-!0
th \'icl o. vary from 71 to ()4:
c nclllO.

r sp ctivcly.

5.1.2

CM -MoBo

11 thP

-l 0 and n 'cord<•d at l.)fp:-, rate' . Duration of

frames. \\'ith man and standard dt'VH1twn of 2.- .'2

'T\1l- 1o o dataset was primarily collcctt'd for nntomntic idcnt1ficntion of pt'oplc b~·
gait. Ilov·:<'V<'r, it has lH'C'll rccC'ntly ns<'cl for inw g<•-sc t lm scd f<H'<' id<'ntificntion stndH'.

as w<ll (see

ignrc s.~) . This data set coutnins video sc qlH 'l}( ' ('S from(} t 'i\ llll' l i1 \ lt'\\' S ()f

2.1 s ubject s r <'rformin g four diff't'rcnt wa lkin g activit i<'s on a t r<'adnnll: slow . l'ast. on
indin<'d snrfac<' and lwldmg a ball. Following t lw lit <'rnt 111 <' t h<' snl>J<'<'t '' 1t h f<'\\ <'l

tlwn four walkin g pattnns is cxclncl<·d !'tom the clntn s<'l, tlllts onh· th<' first 2l snhj('< ·ts

'I

'

•

'

'

Chapter 5

Experimental Design

In thi

h apt r , w bri fly lescTib th d at ascts and valua tion set ting for our exp r-

imcnts.

5.1

D atasets

Thre publicly ava ila ble lJC'nclnnark d at a. ct s ar r used for evalnation of the prop osed
methods in t his study: Honcl c /

C D d at ase t collect ed by L c et a l. [2003], Cl\I U-

l\1o o d a t.as t collect ed 1 y Gross a nd Shi [2001 ], and t be more challenging You Tub e

Cclc'britics collC'dC'd 1>Y Killl d rd. [200 ].

!) ()

Figur 5.1:

5.1.1

mnpl fac

xtra t cl from H onda/

D latasrt

Honda/ UC SD

H nda/ "C

D datas t i.' a

to f rm

om1non gro und for a~sessm nt of diffrrrnt facr idrnhfica tion algorithms

oll ·ti n

f ::>9 vid o. r cordPd from 20 subjects in orcl r

Figure 5.1). Each subj ct ha. at l at 2 vidro. (Pxccpt for onr snbjcct ).

(
vide

have an qual reso lution of 6-10

ll the

4 0 and recc rdrcl at 1.)fps ratr. Duration of

th vid os vary fron1 71 to 64.5 fram es. with 1nea n and ~tandarcl drviation of 2'- .2
and 110.

respecti,·ely.

5.1.2

CMU-MoBo

1

MU-M Bo clatas twas primarily collected for automatic identification of p oplc b)·

gait. H owev r, it has brC'n r cc nt ly u~rd for imn gr-sct bnsed fcwe identifi<' <1t ion stndic~
as w 11 (srr

igurr .5.2). This datas t contains , ·ideo srq1H'llC'CS from (j cam<'Hl ,·icws of

25 su1 jrcts performing four dif-f<'rr11L walking activit ics on a t rendmill: slm\·, fnst. on
inclined snrfac , and holding n ball. Following tlw lit nature. t ht' sHh.it'd \\ 1th fewer
thrm four walking pa!t<'l'n~ is excluclc·d from the dntc1s<'l, thn~ only the ii.r~t 21 ~uhj<'ds

Figure .~. an1pl0 fa ·p C'Xtra ·t d frmn
and Trigg. [2010)

ar

u ·eel.

11 \·ideo· ar of 640

of th Yideo. vary fr01n 202 to

~

~I

-:'do o data~et, provicl0l by Ccvik.- lp

0 resoluti Hl cllld r0cordccl at 30fps ratr. Duration

7 fran1 s, with 1ncan and standard deviation of 495.6

and 169.

r spe ·tiy ly.

5 .1.3

YouTub e C elebrities

You Tub Celebrities clata'- et is a culle tion of real-world vid o, from YouTulw website
f aturing 47 c lehrities (~ e Figure 5.3). Th

Yicleos c-u

noisy. low resolution, rmd

dcn1on trate large Yariations in illumination, pos0 , expression , awl otlwr uncontrolled

conditions. For <'ach snl)j<'d thC'r<' ar<' :3 video dips . whnc <'a('h clip is diYid<'d into
several s qu nccs of unequal rcsolntion and clnra ion (l>PtW<'C'll 7 io 3r:0 frames. wiih
mean a nd standard clcvintion of 16' .0 and '-1. '- r<'sprctivcly). Thcr<' is n totnlllmllb<'r
of 1910 sequences. all cucodcd in l\1P ~ G4 nt 2!)fps nlt<'.

!)2

(a)

(b)

(c)
1

i ·nre .. 3: mnplr fac . extracted fr m YouTubr C'l britiPs d atasrt ; (a), (b), and
(c) illnstratc sa mpl< 'S from :3 diff<'U'ut clips of th<' sc-Ull<' ]H'rsou

5.2

Evaluation Settings

In t hi

ecti n, w de crib t h procedure for prrparation oft he training and trst clat a.

\!I/ followed the c nnnon

5.2.1

ttings u~rd in the litrratur to a llow for fair comparison.

Face D etection

It i a common practice to fir t track an d crop fa 'f'S from rach framr and only pass the
snbj('cts fac<'s to the r<'cogni:;;<'r. SillC'<' th<' ohj<'di,-<, oftl1i~ study is fac<' idc'utificntion.
it is mor

convenient to only pa . . s the ul jccts· facrs to the rf'cognizcr. Therefore.

it is nee ssary t

a pply a pri or algorithm to track/detect and a ntom aticnlly crop the

face s frmn each video fr an1r. Similar to the pnvions works in the litc'rnturc. Y iolnJ on s met ho l. proposed by Viola a nd .Jones [200 -l], is used for ext rnd ing fnccs 111 the

r:•)

.).)

1

2

H onda/ l

D and

\ i la-J n

alg ri thm fa iL to clet

H u t c l. [2012] w

3

' 1l- Io o d at as<'ts. For t llC' YonT11lH'

lL c thr In

t fa

~ m

numb r of sequ nc

5.2.2

ll t h

R

. Thu , f 11 wing

r m nt all arning for Vi u a l Tracking (IV ) algorithm ,

pr P · cl b. Ros~ et al. [200 ]. I T return._ th
·cqu nces, hm\' \ er. som

cl<' britics d ai.as<'t, t he

m ay not r pre~ nt a

face ar a in all frames of all 1910

OITect face (see Fi gnr 4.;3). 4

oluti n

cropped face.' arr r !:--izrd to an equal rf's lntion. Im ages in Hond a/

clat asd nrc r<'sizc'd to 20

d ata._ t to 20

20 pi xds,

·~n -I\ loBo 1o

JO

D

10 pix<' ls, a nd YonTuiH'

20 pix ls (20 x 20 r solution was sc'lectc'cl to reducr the

c mput a ti nalco.t).

5.2.3

Fe atures

In ord r to exp rm1 n t \\'it h different feature tYpes, wr usc histogram equalized int n sity levels for the Hond a/U SD dataset. Loca l Binary Pattern (LB P ) codes, proposed l y Oj a la et a l. [2002], for the CI\ 1U-l\1oBo dat asPt. and Histogram of Oriented
Gradient (HOG ) descriptors, propos<'d by D a lal aw l Triggs [200.'>], for ill<' YonTnl )('
1

The author w ulcl like to t h ank D r. Liang C hen for pro,·iding th<' Honda/ CS D dntn.set with
fac s detect d.
2
In this work, we h av directly used t he pr -processt'd vrrsion of C:t-.IU-:t-.IoBo dataset pnn·idccl
by t h e authors of evika l1 c-mcl Triggs [2 010]. The pn·-processing procedur0 include's fan• tracking,
resolution , a nd feat ure extraction .
3 The a uth or would lil<e to thank Dr. Li nng C hC'n for providing tlw YouTu\){' 't'lc'hrit iC's dat asd
with far es d C'tertC'd using t. hc IVT algorithm .
4 W a n g et al. [20 12h ] lw w crC'a ted nnotlH'r VC'rsion of this dat n.set in which l he~· also ust' the \ · wlaJ on s '" lgori Lhm, howC'ver, t hC' sC'q tl<'ll C<'S containing fnmH's ~wit h falsely d<'t<'C led fnccs W<'r<' lllilllllnlly
r movecl. Thus rC'sul t ing in a clatasC't, wit h fc'W<'r than lDlO , ·ideo SC'C] ll<'li<'<'S ils in lh<' original \'C'rsion
of the cl atasc•t. I n t his st udy, 110 <'\'aluat.ious were r un on the \Yang ('( al. [20 1:21>] wrsion of til<'
YouTub e ~E' l c hrit.ics dataset.

5.2.4

Tr in/ T

l"'or th Hond a/

t im

t arr ng m nt

D clatns t \\' C' randmnly sf'lpct 20 srquc:ucrs (onr viclro p8r subject)

T

f r training and thr rC'st fort sting. It sh nld })(' notrd that, thC'r is an alt rnativ
evaluati n , tting for t h

H onda/

D dRtas t which usrs a prrd :finrd s t of 20

, equen e. (onC' Yid o pC'r suhjPct) for training with mt any random permutation ..
m

rrcent alg ri hms ( .g., R:\P I

R , an l tch proposed mrthods) , chievr 1007c'

accm·acy with this prcdcfi.JH'cl s<'t tiug, we' usc' i he rawlolll ~C't t ing which provide's lllOlT
variation in ord r to hav a mor meaningful comparison.

F r th

~I

- l oBo dat as t we also randomlY s I ret 24 srquPnces, on vid o prr

ub j ct for training and t h

rest for te, tin g.

For the YouTul e C'elel riti s dataset w perform f)-fold cro ·s-valiclation , follow ing
the va luation protocol used by Hu tal. [2012]. Sequrnces of rach subject arc scqu ntially p artition d (no prior shuffling) into 5 folds, wllC're ach fold contains exa tly 9
sequences (from 3 clips) with n1inimal ov rlap bet-v\·crn fold ,' . In each fold, l clip i.
r and01nly sel cted for training (3 scquencrs) and the' othrr 2 clips nrr used ns trst
d ata (6 s quenc s).

It is in1portant to nlE'ntion that there is anotlwr evaluation srtting for thr YonTulH'
l britics d ataset first nsC'd by \i\'ang tal. [20l:..b]. In this setting, for rv<'r~· snbjcct

in each fold 9 sequences (3 p<'r cljp) is randomly s< lc'cU d: 3 sC'qncnccs (1 per chp ) for
l.r a iniu ~, and i.hC' n·sl. for t <'S! iu~.
0 Th e a uthor wo uld

of t.lw IIOC fC'atur<'s.

lik<' to t ll m1k Dr. Linng

hen for prm·iding hi:-; co<k fm dHci('llt <'<1kul<ltion

Ol>viou s l~r

b cau

it is an <'cl.'IC'r t·as k to id<'llt i:f.v tlH' snhj<'ct wit-h t.hC' s< cowl sc'tting,

th re i. alr a ly on vid o s qncnc from each clip avail blr in t-he training

rt. which fa tor. out cliffr r nc s in app arance of th subj rct in diff rent clips. For
t hi rra , on wr b li \' t lw t- t h first srtt ing is losrr t r al world scrnarios thus we
adopted the proto

l used by H u C't rd. [20 12].

or all t hrc cla t a, rL we r<'port c-H·cun lcy rPsults for th full length s qn nee, as
w 11

trun at cl s quru · s the1t only contain thP first .JO consc'cutivr fr ame's of each

vidro ~ cq u ncr. All vah w t ions ar don using .5- fo lcl cross- \'alirla t ion .

Chapter 6

Results and Discussions

In thi

cti n. we , Ulllnlanz th

id ntification accuracies of the thrr proposrd ap-

proachC': aw l cou1par<' them against tll<' most sw·c<·ssfnl and n·<·<·ut mdlwds in tlw
lit rature (na1n ly, I\IS~ L :\ID , AHI ' D /C HISD. ~A~P. R:\P. ~1 '~ RC , .JSR. and I 'C RC in chronological ord r). Except for JSR , for all ot lwr met hocls wr nsccl the cod
provided b

the a uthors adju ted with their 'uggcstcd paramC'trr values. For J R

we did not h ave acce s to th

code thus report the rrsults provided by thr author..

How ver , it should b e noted that thr evaluation s t tings for .J 'R arc different than
what we are using in this study

they used the \Ynng et al. srtting for the YouTub c

C l britie. data. et (whi h leads to high r accurac~' rr~mlts compnrcd to the Hurt al.

[2012] s setting u sed here), and :~0 x :~0 resolution for both YouTubr
1

ch'hritirs awl

MU-Mol3o d ai asc!s.

o m a ke the cmnpnnsons fnir wr usrd the same rvaluntion S('( tiugs, including
frctnrc type for trainin g all algorithms (i.<'., in(<'llsity levels for Ilond il / C 'SD, LBP
for

,:\1 -.I\1oBo , and II 0 ; for YonTnlw '<'l<'hrit ics ). Int <'n's( in g))', this cnhmH'<'llH'llt

:>7

l<'d to improved an·nrar:v for all algoritluns (including the old er algoritlu ns su ch as

NP) on th
riginal paper.._.

uTub

1 brities datas<'t

mnparcd to the r sult reported in th

lso, it mu. t be not d that th original cvalu tion of RNP was lone

nlv on 29 sul)jert. forth

YouT11b

ekbritiPs dataset and th results obtained in

the r sp ct i ve pap r (Yang ct al. [20 13]) cur highC'r than t h result t> obt ainecl on the fu ll
data

t.

ddit ion all)·, ?\I-' 'R ' (Ortiz

t al. [201 :3]) comes with its own face tracking

c:dgorithm which ,,.Rs clisabl din onr valuations, sine<' th aim is to cornpare nly th
id ntifica ion 1 wrr of diffcrc'nt algorithms. ther fore. th sam tracking algorithm is
u · d for all valuations.

P rformance rcsnlt s on C'ach of t lw t hr<> b '11 ·hm ark dat asC't s is clcriv d by exactly following t h

protocol de~cribecl in the Evaluation , ettings s<>ction in chapter

.S. Thi protocol is the . am a.' that in the relatC'd works in thr literature to allow
for fair comparis n . \ V<> perform \\Teich's t-test (srr W lch [1947]) to ch<>ck whrth<>r
th<' inlprovcuH'llt in pcrfonnaiH '<' of tlw propos<·d UH'tlwd(s) is s1a1i:-;1ically :-.ignificant compared to th

be. t perfonnancP of thr cont<>nder methods. Outcomes of the

significan e test are clescrib<>cl along ,,·ith the summary of rwrfornwncr results .

Tabl s 6.1-6.3

ummanze the ]\fran

± Standm-d D euiatzon of the icl<>ntification

rates for djfferent met hods in the litcrat urr and the t hr<>e mrt hocb proposed in this
work (namely, QPIF, EASR , and EASR+GP) on Honda /CCS D , C;\1U-:"JoBo, nncl
YouTube C lebrities d8tasets for both 1he truncCit<>d s<>qll<'IlC('S (only thr first 50
frames are available to perform th<> identi!icatiou task), ns well as th<' full length
v ideo sequ nces.
O n the Honda/ UCSD dnt asC'(, both QP IF nud E

'H mrt hods p<'rfornwd well [or

t.hc full length viclro S<'qlH'llC'('S and EASR +G r on( purfornwd al ot lwr llH't hods [or
b oth truncated as well a.s the full l<'ng! h vidPo scqtH'nccs. The ddkrnH ·(' in Hlcnt iti-

abl 6.1: Id ntifi ·ati n Rate (o/c)
().Iran ± ' t andanl cYiat ion )

Method

f

iff r nt ~I t h ds on Honda/U

Year

50 fram s

F\1ll l nglh

1

7.

± G.l2

90.2G 1: 2.1.

~I

200

7.G -±: 2. 1

OG . ll ± lAO

HI

2010

.21 ±

a

3 ..5() ± 3. 9

HI D

2010

G.1.1

2. 2

Tp

2011

7.1

± .01

R Tp

201:3

2Ll 1

( 3 .3' _i 2 .2

2013

± 1.10
92 .31 ± 1.11

03 . r: ± 2.29

I .\I

~1. '. 'R

'

I

011

fl.2

01.2

~ 2.2

OG . 11 ± 2.29

05 .3

± 1.1.5

QPI

1.2

± 2.29

97.44 ± 3.63

AR

91.2

1.·10

98.46 ± 1.40°

94.87 ± 4.05

99. 9 ± 1.15*

EASR+GP

Dat set

ot

* in eli ates tat is(jcally significant improvc>mcnt of ace uracy com parc>d to Lhe> sc>cond b st
re ·ult at n = 0 .05
o indicate's statishc a llv . ignificanr improvC'rnC'nt of a(Tttracy comparw l to thC' sC'cond hC'st
r sult at o- = 0.1

cation rat s for EASR+GP rnethocl i -· significantly (statistical) higher than the brst
ont nding rnethod in the lit erat ur .
On the C 1 -:MoBo dataset (sre Table 6.2), theE SR+GP method achiew•d a
slightly lower id ntific ation rRt comp Rrrd to I

R c (less than 1ex). HmYC'\T'l', based

on the st atistical c n alysis, t herr is no significant diifNcncP between the r snlL' .

It shou ld be uot <·d that t h<' Houda/UCS D and Cl\llJ-1\loBo dnt as<'1 s nrc <'OlllJUunh·

used as l enchmarks and consid('rccl as cnsier r<'cognit ion tasks since nwst oft lH' nl gu-

rithrns in the li Lrr a turr h ave f1 lrcndy nchicvc'cl a l>m'c' ( Ol)(' nccurncy. Therefor<'. then'

(o/c)

bl

.2: Id ,ntific ti n Rate
(::.. Ican ± 1andard Dc\·iat ion)

Method

f DifLr nt

1 ,thod

Year

on

Fulll ngth

L :l\1

1

2.50 ± 2.71

97 .22 ± 1.70

ID

200

1.17 + G.. G

95.2 ± 2.

HI D

2010

92.50 ± 2.71

.5 .5G ± 2. 1

'HID

2010

( 2 ..50 ± 2.71

.G1

Tp

2011

a2.50

R 'P

201 :3

( 2.50 j_ 2. 71

a .17 ± 0.7G
a .33 1. 1.1G

l\I 'R '

2013

1.11 ± 3.la

9 .33 ± 1 ..52

2011

94.44 ± 2.2ot

99.44 ± 0.76t

QPIF

a2 .50 -± 2. 71

7.50

EA R

91.91

9

9 ± 1.1Gt

93 .Gl ± 2.71 t

9

9 ± 1.1Gt

EASR+ GP

I -l\1o o D ata t

2.71

1.39

1. 1

t : t indirat s no sig nifica nt cliff renee b twPen the best perform ance res ult (bold) a nd the pro-

pos d approa h · (stati sti all)·)

i not much r on1 for imprO\·em nt. Hm\·ever , we belirYc that the results on t hr 1nost
challenging dataset , YouTubc Celebrities. can rank cliff rent algorithms in terms of
p rfonnance and effici ncy.

For the YouTube C l brities dataset (sre Table 6.3), QPIF and E

'R slightly

outperform cl the contending mrt hods on t hr full length vidro , cquen es ancl the
ASR+GP approach achiev d significantly better results a nd improved state-of-theart by ~ 4 o/c for the full length sequencC's. EASR + GP al ·o achieves the highest
a.ccnr acy for th(' 1n me a 1cd s<'q ncncc's. T lw sn p<'r i or n 'snl1 s of ill<' p1 o pos<'d lll<'1 hod
can be attributed to its cnpabilit)' of handling c'xi rem ely nois.r srunpl<'s in the YouTu 1>{'
elebritics dataset nwn' efficiently compnr<'cl to the rest of the met hods in t h<' lit cr-

aturc> .

()()

al l

·· : Id ntifi ati n R at

ai.as<'t ( ll<'an ±

(%) of Diff r nt

I th ds

n YouTub

t andard D<'viat ion)

M thod

Year

50 fram s

Full length

T\1 I\I

1

70 .57 ± f .33

G5 . 2 ± 4.5G

~I

200

GJ .2G

.7

G9 .22 + 1.90

HL'

2010

G . 13 ±·1.1G

G3 . :3 ± 3.2 1

HI

2010

G7. 73 ± .r: .09

G9 .G5 ± 1.5C

20 11

7.5C ±.5.71

73 .10 ± 3.1

20 1:3

G .50 l::: .. 0

7. 1 _L 3.G5

2013

70 .7

.3.1

72 .20 J: 3.52

2011

GG ..

± 1.73

70 .71 ±:3.11

QPIF

G9 .01 ± 1.:3G

74.82 ± 2.49

E

70 . 13 ± ·1.1G

74.18 ± 3.35

73 .12 ± 3.11

77.23 ± 3.81°

~I'. 'R

'

R

EASR+GP

rote: o inclicat s t at is ti ally sign ifi ·aut improvcm nt of acc t1racy compmcd tot he second besl r suit
atn= 0.1

It i a lso worth m ntioning that a ,~ implP rn rmblr of GP binary classifirrs without
c'mployiug t hC' spcciali;mi icm - gC'ucr a lizai ion learning st rai c·gy pnforuH'<l poorly 011

YouTube CelebritieL, t stifying to them rit of our approach.

A n1 ntioned b f re, the cornpeting methods al. o br1wfitecl from using HOG features. especially the top two performrrs, nanwly RSP a nd SA~P. \Vlwn HOG features
a rc used , the avcntgc accuracy of R NP aw 1 SA P a lgorit lllus iwT<'ilS<' h~· over •(Yr <'Olll-

pared to the rep rt ed a,ccnracies in thr respect ivr papers (Ya ng ct al. [2013]) an d (H u
t a l. [2012]) in which the intensity levels were u sed ns frat urrs.
Thr idrni.ificatiou rat<' for .JSR on the YouTnlH' Cclchriti<'~ cl at n~<'t with fnll kugt h
vidro scqtwnccs is 7:3 .7% as rC'portC'd in work by C'ni <'1 al. [:..Ol--1 ]. Thi~ <H'Cill" cH' ) ' i~

only 0. 2% high er th :-m th<' brst coutcndcr reported ht n ' (a nd it docs not <1tfcd tlH'

() 1

r<'1mlt oft he .·ignificauce t<'st). However. as mcntiorH'd ])('fore' the c'valnation Hcthng

u. cl in

ui ct al. [201 ] i differ nt and th rc ult ar r<.>port d for

th r for

'"e did n tin lud thi. r. llt in Table G..

PI
vYhile as

an l

0 x 0 reso lution,

R m t hod. prrformecl vrr closr to each ot h r in all t hr e datas ,t ,

xprct cL E , 'R

i P p rform d l rttrr than both of tlJPsr m tho Is. The

P and E

R rnal>l 1 us to achieve a b tt er prrformanc , noticrably

oml inatiou of

high r than th indi,·i hnl components of the method ( A R and
thi ~ in Teas in idcntifica t ion rate to t hr

a well as t h

P ). We attril utP

SR · fitrrngth in d aling with noisy fram ,

r ·s str ngth in capturing underlying 11011-linear st ructures in data.

Computational Complexity

6 .1

\ Ve also report the average computation tin1e of all mrt hods in exprriments on the
YouTul

C 1 brities dataset for the truncated sequences (with 50 frame~). All the

timing re ult s ar reported based on running ::\Iatl<tb codes provided by the authors of
ach algoritlun on a machine with an Int el Xeon E.--260:3 (l.

GHz) procrssor and -±0

Gigabyt s of RA..\I. \\T r port the average onlin identification time (in seconds) for
on

equence (Table 6.4). \\Te also provide the total offline training time (in seconds)

for met hods that required training including the EA R + P met hod.

Whil our proposed method requires an initial offline training of the moclcb (oY('r
70o/r of i his t imc is nscd for t rnini11 g thc G P uwclds), it is im porlim1 1o 1101 (' t hcli 1h('
offline trainin g timr is a on<'-time only ov<'rhcnd. For comparison .. .A.:'\P w1ll rcqmrc
an rxt ra 1GO seconds for idcn tifving onl.v 10 test seq lH'Uccs com pored to on r nwt lwd.
Also , adding a new subject to the gnllC'ry rcquirt's fnr less training t imc, :mcc ouly

'')
( >~

c bl

the
1:

" rag
<m T11l ){'

c n1putati n tim

dchritics

d atas<'t

ot al fflin training time.

MSM

Tl

MDA

AHISD

2:

CIDSD

with

SANP

21.21

/ A indicates onlinP-unly nwt hods

n n w m d ln

cl. t ) b

m thad

inmcatcd

(5 0

sequences

·onstrnct d.

RNP

on

fr ames).

Yera g onlin t sting time f r n . .qu n e.

T2
ot ·

( con 1 ) of cliff rent

MSSRC

ISCRC

EASR+GP

/A

22.31

156.1

70.7

2.04

2.79

Chapter 7

Conclusions and Future Work

7.1

Summary of Contributions

• Introducti n of two non-sophisticated r presrntation structure>. using the notion of quantun11 robability theory, namely, QPIF ancl its dual Pxtrnsion EA R.
The propo ed reprcs ntation structures are dr ·ign d to minimizr the cffrct of
noisy fram , in a video based fac icl ntification task. T!JPsr two rrprcsentation
, tructur s specifically target those frames that arr not nsdnl forth

idrntifica-

tion task, nwinly clue to fac occlusion, low rrsolution of image, or failnre oft hr
fa ·e trackrr algorit lun. Ther forr, unlike most of thr nlC'thods in the literature
which usc sophish a ted nm1-linPar rcprrscntntious, these h\"O methods k< <'P t lH
H'JH'<'S<'lli.a t iou linear and simpl<' whil<• n•1 aiuing t ll<' Sll]H'rior lH'rfonll<Ul< '<'.

• A nm'el learning sch<'Ill<' was proposed for eflicicnt t rnini11g of an <~nscmhk of
binary

aussian process llHH1C'ls. This l<'nrniug sch<'lll<' s<'lect IYdy smnpl<'s from

t h< trainin~ data in order to not only incr<''tS<' the discrimination pow<'r of the
ifi r but al

to build its mod L lL. ing t h

le st po ible

on1putati nal

, t Rnd with minin1um intr du ti n of n is .

, a final n t . th contribution

f thi, work ar not mrthod- 'P cific and c n be

utiliz d for rnhanc 111 nt of oth r fa

7.2

icl<'ntification approa ·hc:s in thr lit rature.

Futur Work

• In he currrnt work.

a h represrntativr is eith r accPptPd or r jcct d to con-

tril ut in building tb modrl and predicting the idc>ntity. A promising extrnsion
of this w rk wonlcl b to modify the ntli r filtPring proc ss in E

'R by utilizing

a prol abili. tic approach that is a 1 l to a. sign R (kgrer of uncc'rtainty on h w
w 11 each fran1e i. a good r pre. entative of an individual's face.
Thi. should b a con1paniecl by a prediction nwt hod that can exploit t hr rxtr
information provided by . uch w ighted sampl s. Consequently wr will have a
clas ification approach that is aware of tlw quality oft he sam1 les nncl know . on
which of ilH'lll it should rely the most in ord<'l' to perform the• id<'ll1iii.('ation task
ffecti Ye ly.
• En1ploy EASR approach Ellong with its outlier filtering process as a genend
purpose filtering approach to improve o1 her methods in the litPrnturr in terms
of t ll<'ir n'sili<'IH'<' to uoisy fnuncs.

•

sr oth<'r kern<'ls (e .g. , p~Tamid match k<'rncl propos<'cl h~·

Tl'<Hllllall n11d Dnrrcll

[2007]) for the Ganssian process models which can lwt tcr t nkc n< h ·nnt agt' of

(i ·'>

locali~<'d fC'atnr<' d<'scriptors such as , calP-Invariant FC'atur<' Transform (SIFT)

pr po, db.' L w
•

b y nl th

[20 4].
fRee i l ntific tion task and trst

h

propos d m thod with

da ak t for oth r t r s of \'i 1 o bas d r ·ognition tasks (r.g .. objrct cat g rization).

GG

Bibliography

ranclj lmric.

. lwkhnarovich, .J. Fi.-lwr. R . 'ipolla. and T. D arrell. Face recog-

niti n with image : :; t. u, ing n1a nifold drn. ity divergence . In IEEE

omp1der Vz zon and Patt nz Rc rogmhon Yolmw· 1. pagc·s S 1- S

onfcrcnc on
. 200.rJ.

L li E B all ntine. The st ati. tical intc'rpr tat ion of qu antum nw ·lwnics. R vz ws of

Mod rn Ph y ir , 42( 4):35 . 1970.
Leo Breiman. B agging pr dictor. . .A1achm

learning. 24(2):12:3- 140. 1996.

H . Cevikalp and B. Triggs. Face rccogn i1ion based on image· SC'1 s. I11 IEEE Confcrcnre

on Computer Vision and Pa ttern Recognition. pages 2567- 2.57:3, 2010.
Yi-Ch en

Chen.

Vish al I.

Dictionary-based fac

P at el,

P .Jonathon

recognition from video .

Phillips.

and

R anw

Chdh!p} a.

In Andrew Fitzgibbon.

'v tlana

Lazchnik Pietro Peron a Yoi chi Sato aw l Cordelia Schmid, editors. ComJndc r
'
'
Vision, ECCV 201 2, vohune 7577 of Lerturr Not( s lll Computer Clcncc, p age~
766- 779. Spring r B erlin I-Icidcllwrg, 201 2.

Zh n \1i , Hong Clwng, Shignnng

h<m,

ingpeug l\In. <md Xilin

represent at ion for video-based face recognition.

()7

hen . .Joint ~p<lrsc'

Clli'Ocompuftn_q. 1:~f>(O)::H)(i

:~12.

2014. I

09 2r:-23 12. doi : ht tp: I I dx. doi .orgl 10.101 G/.i .ncncorn.20 13 .12. 004. URL

http://www.s ciencedirect.com/science/article /p ii /S092 52312 1301148X.
l avn et Dalal an 1

In

ill Triggs. His

grams of orient d grAdi nt s for hmnan d tech n.

ompu.t r l L wn and Patt rn R cogn1 i10n. 2005.

oc1 ty

on} rene on \'olumr 1, p ag s

VPR 2005. IEEE Comput r

G- f);). IE E, 2005.

arl H nrik Ek, Philip H '

orr, and :'\C'il D Lawrrnc r. 'au. sian process latrnt vari a blr

m drL f r lmmAn pos

s tinl Ation. In flfachm

p ag

1 2- 14' .

lrar-nm g fo r muUzmodal int raction,

pring r, 200 .

I\ azu hiro Fukui and

samu Yamaguchi . Facr recognition usin g multi-viewpoint p at-

t rn for rob t vi. ion. In Robot7 c. Re s arch, p agrs 192 201. pringer 2005.
Kri ten Grmuna n a nd Tre\·or D arr ll. Thr pyrAmid lllAtch krnwl:
w ith s t of feature
R a lph Gro · and Jianb
R eport

Th

ffici nt learning

Journal of llfochm Lrar-nmg Rrsrarch. :72.5 760, 2007 .

'hi. The c1n11 motion of body (mobo) d atabase. Technical

'I\1U-RI-TR-Ol-1 , R ob otics Institute. Pitt sburgh, PA, June 2001.

Lov K Grover. A fA t qua ntum m ech Anica l a lgorithm for datahAsr search. In Proceed-

ings of th tw nt;y-eighth annual A CJ\1 symposimn on Theory of computmg. 1 ages
212- 219. AC 1, 1996.
egar H assan p our and Li a n g Chen. A hierarchical training and idcntificat ion nlf't hod
using gaussian process rnodrls fo r face rccogn iticm in video!:'. In Tht 1 Jth IEEE

Int rnatiorw.l Confcrrncr on Automaf1c Fuce ond Gcstul'e Rccoqmtzon, :.JllGn.
N egar H assa n pour ancl Liang Chen .

qwmt mn theory inspirrd frnmcwork for fa('c

id<'u1ificatiou iu \'id<'os. Tcdllli('al rq)()r l. Uni \'<'rsi!y of .:'-Jorllwru I3rilish Columl>iil .
D cp arinH'nt of Computer

1

CH'llC'<',

Computational Int cllig<'llC<' Laboratur~·. :20l.Jb .

Yiqun II u ,

jmal •

inlat d n ar

1ian rm d Robyn Owens. Face n'cognition using sparse approx-

t point._ brtwrcn imag . ct. . Pattern A naly 1: and Machin

g nc . IEEE Tra.n actwns on,
hi h K ap

r. Kri . trn

Int lli-

(10):19 2 2004 2012.

nnnnan, R aqn l

rtasun , and Trrvor D arrell.

ctiv learn-

ing with ga ussinn pro ·rssc's for ohjPc:t rntPgori"'ation. In Compv,t r Vis1:on 1 2001.

IC l 2001. IEEE 11th Interrwtzonal
Kihwan Kin1 . Don gr~· ol Lee . a n l Irfan

analy . i ofm tion traj ctori ..·. In

onfrrrnrr on pages 1- . I
Si-ia .

E. 2007.

Russwn procr. s rrgres. ion flow for

ompvt r V1 s1on (I

V), 2011 IEEE Int rna-

twnal Conf rcncr on. pages 11G ~1- 1171. IEEE . 2011.
Iiny ung Ki1n . . Kmn ar. V . P avlovi , and H. Rowky. Face tracking and recognition
with visual con tr int::-; in r a l-worll vid os.

In IEEE Con.frr ncf on Compvter

Vi wn and Patt rn R cognzhon. p ages 1- . 200 · .
Tae-K 'Un Kin1. J osef Kittler. and Rob erto

ipolla. Discriminative I aruing nnd rrcog-

nition of imag set classes using canonical corrolations. Patt rn1 Anal.1Js?.s and fda-

chine Intelligenc . IEEE Transactions on, 29(G):l00.5 101 . 2007.
Ku ang-Chih Lee. J. H o I\ Iing-Hsu an Yan g, and D . KriPgman. Video-b ased face recognition using probabilisti c a ppearc ncP m a nifold .. In Computer l 'zs wn and Patt ern

R rogm.tion. 2003. Proceedings. 2003 IEEE Compu teT Socu ty Confcrrnc on volunw 1) p ages I- 313- I- 320 vol.1 ) 2003.
D avid G Lowe. Distinctive in1age fcaturrs frmn scalr-invariant kcypoints. I nterna-

tional )o'U,Tnal of computer vision. 60 (2):9 1- 110, ...,00-1 .
Kcviu P Murphy. !11achine lro:r?ling: a. pnJbahtltsfz c JWrsp rdzuc. ~111 pn '~s. :2012.
imo Ojala ) Math PiC'hbiiw'l1 and David Harwood. A cmnpnrntivc st ndy oft t>xt un'
1

learning for robust visual tracking. l nt rnational Journal of Comput r Vision , 77
(1-3) :125- 141 200 . I
R obert
n1argn1 :

0 20- 691.

chapir

Yoav Freuncl, PEt r

n w

xplanation for th

artlrtt and Wee

un L r. Boosting th

ff ctivencs. of voting m thods . Annals of

. tati t1.cs, pagr s lG.- l - 16 6 199 .

P a ul Vi ola and ~Ii h a l .Jonr,. Robust r cal-timr facr d tection . International Jov.rnal
of Comput r V?s107L 57:1 7- 154, 2004 .

.John von ::\r um ann and Robert T B ycr. Jda.th nwtz cal foundat ions of quantum m echan? cs. Princeton

ni\'Crsity Pr s ·, 195.5 .

Ruiping \\Tang and Xilin Ch n. ~ I anifold discriminant an alysis. In IEEE Con} r n ee
on Comput T Vi wn and Patt ern R ccogmhon, p agrs 429- 43E.i . 2009 .

R uipin g \ Vang, H uimin Guo. L. . Davis, and Qionglw i Dai. Covariancr di, criminative

1 arning: A n atural and effici ent approach t o imagr set classifica tion . In Com puter
Vision and Patt rn R cogniti on (C VPR ) . 201 2 IEEE Co11.frn"ncc on. p ages 2496-

2503. Jun 2012 a . doi: 10.1109/ CVPR .2012 .6247965.
Ruiping \ Vang Shiguang Shan , Xilin Chen , Qiongha i Dai. and \\'en Gao. i\lanifolclInanifold dist ance and it s application to face recognition \\'ith image s ts. IEEE
Tran sactions on Im.age Processing, 21(10 ):440G-,117D, :201 2h.

Tiesh ng Wang and Prngfri Shi. Krrn 1 gr a:-;sm annian cbst ancc's and di, crimiu nnt
analysis for facr rr cogni t ion from image srt ;-). Po tt crn R erognit to 11 L ett ers. 30 ( 13) ·
1161- 1165 , 2009 .
BC'ruard L \V<'kh. Th<' f!;<'ll<'raliza( iou of ~1 w lcu! 'sj)roi>l<·m wlwu :-i<'V< ·rcd diifcr<'lll pop-

ula tion variances art' iuYolvcd . Btom (' /nka , :31 ( 1/2):28 ~3 5, 1917 .

71

0. Yamaguchi K. Fukui , and K .

s qu n

. In Pro

1aC'da.

Face rC'cognition u smg t<'mporal image'

ding of 3rd IEEE Int rnational Con} r nee on Automatic Fac

and Gestur R cogn1iion page
~1eng Yan g, P ngf i Zhu , L. Van

1 - 32 , 1 9 .
ool and L i Zhang. Fac rccogniti n bas d on reg-

ubri;:;;Pd nrarC'st points h twe n i1nage sc'ts. In 1Oth IEEE Internatwnal

onferenr

and Workshops on Automat?.c Fa , and G siur Rrrogm.tion p ag s 1 7, 2013.
Pcn gfci Zhu. \ angmf'ng Zuo , L i Zhc ng

. .- K .

ba d collal oratiYe repr sent tion f r fa

hiu ancl D. Zhan g. Image set-

recognition. Infonnation Forrnsics and

Secunty, IEEE Transact? on. on, 0(7): 1120- 11;32, 2014 .

..., ,)

r~