NOTE TO USERS

This reproduction is the best copy available.
®

UMI

A Neural Network Model of the Primary Visual Cortex

Alan Spara
B.Sc. Simon Fraser University, 2001

Thesis submitted in Partial Fulfillment of
The Requirements for the Degree of
Master of Science
In
Mathematical, Computer and Physical Sciences
(Computer Science)

University of Northern British Columbia
July, 2007
© Alan Spara, 2007

1*1

Library and
Archives Canada

Bibliotheque et
Archives Canada

Published Heritage
Branch

Direction du
Patrimoine de I'edition

395 Wellington Street
Ottawa ON K1A0N4
Canada

395, rue Wellington
Ottawa ON K1A0N4
Canada

Your file Votre reference
ISBN: 978-0-494-48826-3
Our file Notre reference
ISBN: 978-0-494-48826-3

NOTICE:
The author has granted a nonexclusive license allowing Library
and Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or noncommercial purposes, in microform,
paper, electronic and/or any other
formats.

AVIS:
L'auteur a accorde une licence non exclusive
permettant a la Bibliotheque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par telecommunication ou par Plntemet, prefer,
distribuer et vendre des theses partout dans
le monde, a des fins commerciales ou autres,
sur support microforme, papier, electronique
et/ou autres formats.

The author retains copyright
ownership and moral rights in
this thesis. Neither the thesis
nor substantial extracts from it
may be printed or otherwise
reproduced without the author's
permission.

L'auteur conserve la propriete du droit d'auteur
et des droits moraux qui protege cette these.
Ni la these ni des extraits substantiels de
celle-ci ne doivent etre imprimes ou autrement
reproduits sans son autorisation.

In compliance with the Canadian
Privacy Act some supporting
forms may have been removed
from this thesis.

Conformement a la loi canadienne
sur la protection de la vie privee,
quelques formulaires secondaires
ont ete enleves de cette these.

While these forms may be included
in the document page count,
their removal does not represent
any loss of content from the
thesis.

Bien que ces formulaires
aient inclus dans la pagination,
il n'y aura aucun contenu manquant.

Canada

Abstract
Many problems in modern computing require a visual component. That is to say, it is
fairly common for applications to have a need to see their environments. These
applications will typically employ techniques designed specifically to solve the particular
task needed for the application, and have little or no relation to the human visual system.

Humans generally do not have difficulty interpreting the world around us. When
traveling through known environments, we can easily recognize particular walls, doors
and other objects in our view. We are not confused by the huge number factors that can
complicate an image. The generalization and robustness of the human system would
provide a huge benefit to any system that requires more advanced vision than is capable
with the ad-hoc methods developed previously. If the underlying principles that make the
human visual system so powerful can be identified and implemented programmatically,
then a machine could reap the benefits obtained by humans.

The purpose of this thesis is to demonstrate that a visual system modeled after the human
visual system will be robust and accurate enough to solve real world problems - and to be
useful in a non-trivial application. By developing neural networks that directly model
the most primitive image processing cells of the human visual system, a platform can be
built on which advanced vision systems can be developed.

11

Table of Contents
Abstract
1. History of the Project
1.1. A New Vision System for Robot Navigation
1.2. Issues to be Resolved
1.3. The Path to Resolution
1.4. Comparisons to Other Research
2. A Model of the Human Visual System
2.1. Image Capture and Early Processing
2.2. Early Visual Processing
2.3. Visual Pathways
3. A Neural Model of the Early Visual System
3.1. Early Image Processing
4. A Model of Primary Visual Cortex
4.1. Cell Characteristics
4.1.1. Cell Configuration Summary
4.2. Cortical Maps
4.3. Complex Cells
4.4. End-Stopped Cells
5. Training Complex Cells
5.1. Training Cycle 01
5.2. Training Cycle 02
5.3. Training Cycle 03
5.4. Training Cycle 04
6. Training End-Stopped Cells
6.1. Training Cycle 01
6.2. Training Cycle 02
6.2.1. New Cell Configuration Summary
6.3. Training Cycle 03
6.4. Training Cycle 04
7. Off Angle Edges
7.1. Arc Region Transition
7.2. Issue Analysis
8. Result Analysis
8.1. Correct Line Interpretation
8.2. Correct Joint Interpretation
8.3. Curve Interpretation
8.4. Off-Angle Orientation Interpretation
8.5. Rounded Edges
8.6. End Stopped Cells and Joint Identification
9. Future Work
10. Appendix
10.1. Exact Target Angles
10.1.1.-75 Degree Results
10.1.2.-60 Degree Results
10.1.3.-45 Degree Results
10.1.4.-30 Degree Results
10.1.5.-15 Degree Results
10.1.6. 0 Degree Results
10.1.7. 15 Degree Results
10.1.8. 30 Degree Results
10.1.9. 45 Degree Results
10.1.10. 60 Degree Results
10.1.11.75 Degree Results
10.1.12. 90 Degree Results
10.2. 20 Degree Line

2
1
1
3
4
5
7
8
11
15
16
17
19
20
22
23
24
24
25
26
28
29
31
34
35
37
40
40
42
44
44
46
49
50
51
53
54
57
59
62
64
64
64
65
65
65
65
66
66
66
66
67
67
68
69

in

10.2.1.-75 Degree Results
10.2.2.-60 Degree Results
10.2.3.-45 Degree Results
10.2.4.-30 Degree Results
10.2.5.-15 Degree Results
10.2.6. 0 Degree Results
10.2.7. 15 Degree Results
10.2.8. 30 Degree Results
10.2.9. 45 Degrees Results
10.2.10. 60 Degree Results
10.2.11. 75 Degree Results
10.2.12. 90 Degree Results
10.3. 23 Degree Line
10.3.1.-75 Degree Results
10.3.2.-60 Degree Results
10.3.3.-45 Degree Results
10.3.4.-30 Degree Results
10.3.5.-15 Degree Results
10.3.6. 0 Degree Results
10.3.7. 15 Degree Results
10.3.8. 30 Degree Results
10.3.9. 45 Degree Results
10.3.10. 60 Degree Results
10.3.11. 75 Degree Results
10.3.12. 90 Degree Results
10.4. Hallway Results
10.4.1.-75 Degree Results
10.4.2.-60 Degree Results
10.4.3.-45 Degree Results
10.4.4.-30 Degree Results
10.4.5.-15 Degree Results
10.4.6. 0 Degree Results
10.4.7. 15 Degree Results
10.4.8. 30 Degree Results
10.4.9. 45 Degree Results
10.4.10. 60 Degree Results
10.4.11.75 Degree Results
10.4.12. 90 Degree Results
11. References
11.1. Papers
11.2. Human Spatial and Visual Systems Textbooks
11.3. Vision and Artificial Intelligence Textbooks

IV

69
69
70
70
70
71
71
71
72
72
72
73
74
74
74
75
75
75
76
76
76
77
77
77
78
79
81
84
87
90
93
96
99
102
105
108
Ill
114
117
117
122
123

List Of Figures
Figure 1 -- A hallway image and an interpretation graph of that image
Figure 2 - Early Dataflow of the Human Vision System
Figure 3 - On and Off Response of Retinal Ganglion Cells
Figure 4 - Vision Areas of the Occipital Lobe
Figure 5 - Simple and Complex Cell Reactions
Figure 6 - Layout of a Hypercolumn
Figure 7 - A Model of the Human Vision System
Figure 8 - Overlapping Receptive Fields
Figure 9 - From Original Image to End-Stopped Cells
Figure 10 - Complex Cell Positive Training Examples
Figure 11 - Complex Cell Negative Training Examples
Figure 12 - Testing Image
Figure 13 - -50 Degree Cell and 10 Degree Cell Output
Figure 14 - 10 Degree Target
Figure 15 - Different 10 Degree Line Segments
Figure 16 - Image Segments that Generate False Positives
Figure 17 - Short Line Segments
Figure 18 - Incorrect Processing at End Point
Figure 19 - Positive End Stopped Training Examples
Figure 20 - Negative End Stopped Training Examples
Figure 21 - Original Dataflow
Figure 22 •- Modified Dataflow
Figure 23 •- Overlapping Positive Training Examples
Figure 24 •- Partially Activated End Stopped Inputs
Figure 25 - Complex Cell Output with Differing Thickness
Figure 26 - End Stopped Output with False Positives
Figure 27 - New Negative Training Example
Figure 28 - Off Angle Activation
Figure 29 - Hard Separation between Neighboring Cells
Figure 30 - Overlapping Cell Activation Regions
Figure 31 - Shared Segments in Differently Oriented Cells
Figure 32 - Actual Reaction of to Off-Angle Cells
Figure 33 - Colours to Represent Complex and End Stopped Output
Figure 34 -- Correctly Interpreted Edges
Figure 35 - Correctly Interpreted Joint
Figure 36 -- Activation Around a Curve
Figure 37 -- 23 Degree Edge Triggers Multiple Cells
Figure 38 -- 20 Degree Edge Partially Activates Cell
Figure 39 -- Partially Activated Cells to Off-Angle Edge
Figure 40 -- Edge Reconstruction Optical Illusion
Figure 41 - Cell Reaction to Rounded Joint
Figure 42 -- Cell Reaction to Imperfect Edges and Joint
Figure 43 - Location of End Stopped Cell Activation

V

2
8
9
11
13
14
17
19
23
25
26
27
27
28
28
29
31
32
34
34
35
37
39
40
42
42
43
44
45
45
46
47
49
50
51
53
54
55
55
56
57
58
59

1. History of the Project

1.1.

A New Vision System for Robot Navigation

Robots that have been designed to navigate autonomously typically rely on Range
Finders as their primary source of information about the world in which they are trying to
navigate. Clearly, this is a very limited way of interpreting the robot's environment. The
development of a vision system for these robots would give them a huge improvement to
their navigation ability. In theory, an existing technique known as Markov Localization
[Fox, Burard & Thrun, 1999] could be modified to use camera sensor readings rather
than range finding images.

Markov Localization is a dense sensor approach to localization - meaning that it works
by comparing the robot's current sensor reading to expected sensor readings. Expected
sensor readings are pre-calculated for every possible position and orientation that the
robot could have. The robot's position is then calculated by finding locations where the
robot's current sensor reading matches the pre-calculated expected readings.

Dense sensor localization strategies traditionally rely on range finders as sensor readings.
These sensor readings are very limited, but they do provide a number of benefits. First,
the expected sensor readings can be easily predetermined from a map of the environment.
Second, the actual sensor readings generated from the robot can be directly compared to
these pre-calculated values. While a vision-based system would provide vastly more
detailed information, neither of the advantages of a range finder apply anymore.

1

It is impossible to predetermine the expected sensor reading generated by a camera.
Since image data is so complex, the slightest change in circumstance can result in a vastly
different image when doing a
pixel by pixel comparison.

The key to making this
process work is to develop a
new vision system. This
system will abstract away
from the original image and
build a graph that represents
the image presented to the
robot. Such a graph should be
built in a manner that is
independent of such
conditions as lighting, robot
orientation or slight changes in
the area's configuration (such

Figure 1 - A hallway image and an interpretation graph of
that image. The original intent was to find a structure that
could completely represent the original image in a manner
that was insensitive to conditions such as lighting or camera
position.

as a door being open or closed) A well know structure that suits the needs of this work is
the Conceptual Graph [Sowa, 1994]

Figure 1 illustrates a graph that might be used to illustrate a typical hallway image.
Every object on the original image is represented on the interpretation graph by a
rectangular node. Oval nodes represent the relationships between those objects. From
this graph, it is easy to see that wall B lies beneath the ceiling, above the floor, connects

2

to wall C with a left hand turn and disappears behind wall A. These sorts of relationships
can be easily determined from a map of the environment and stored in a database of
expected sensor results.

1.2.

Issues to be Resolved

Clearly, the primary step in this theoretic vision system would be the ability for a robot to
build a graph representing its current perceived image. In order to do this, boundaries
between objects need to be interpreted in such a way as to make the objects identifiable.
An edge map is well known technique for finding those boundaries. However, this edge
map is simply a sketch of the original photographic image; there is no interpretation of
that image. Building a graph representing the scene requires an interpretation of that
edge map

It is known that an interpretation of an edge map can be built by a careful study of the
corners in a scene. That is to say, the number and orientation of edges that come together
to create a corner present a lot of information that can be used to interpret the objects
defined by the edges. A joint that is comprised of three edges would most likely
represent three surfaces coming together at a single point. These surfaces could include
two walls and either the floor or the ceiling. By studying the angles of each of these
edges, the exact configuration can be determined. The techniques for doing this are quite
well known [Winston, 1992].

The underlying problems with using these techniques involve finding the locations of the
joints, and identifying the types of those joints. In order to identify any particular joint,
one must first identify the orientation of each joint from which the joint is built.

3

1.3.

The Path to Resolution

As humans, we have no problems identifying objects in our environment. The
boundaries between objects are quickly and easily identifiable. When trying to find
methods to interpret those boundaries in an artificial environment, it follows that a
reasonable starting point would be to examine those human systems.

A careful study of the human visual system reveals structures that seem tailor made to the
problem at hand. It seems that the underlying building blocks of human vision are based
on an ability to determine the orientation of boundaries between objects and to determine
the start and end points of these boundaries (which will have a strong correspondence to
the joints and corners in an image) [Mundel, Dimitrov & Cowan, 1997][Heitger,
Rosenthaler et. al., 1992][Wurtz &Lourens, 1997] [Lowe, 2000][Dimitrov,
1998][Coren, Ward & Enns, 1996]. Some of the earliest portions of the human
visual system consist of cells that react to boundaries of particular orientation.
Furthermore, other cells in this region react to the end points of the boundaries found by
those directionally sensitive cells. These boundary determining cells form much of the
underlying building blocks for our visual system. The primary theory behind the work
presented in this thesis is that artificial neural networks can be build which model this
functionality, and thereby produce a framework on which a robust vision system could be
built for some artificial system (such as a robot).

A fully functional system that could build a graph representation of an input image is
beyond the scope of a Master's Thesis. It would require modeling a huge variety of
different parts of the human visual system. However, the fundamental building blocks of
such as system would be a model of the early human visual system. This system is

4

responsible for segmenting the original image - identifying edges and edge ends (which
correspond to potential joint locations) of the original image.

1.4.

Comparisons to Other Research

Models of this portion of the human visual system have been attempted before[Heitger,
Rosenthaler, von der Heydt et. al., 1992][Lowe, 1999][Wurtz &Lourens,
1997]. This research differs from those other projects in two principal ways. Other
systems have modeled the primary visual cortex using mathematical operators, while my
system functions through the use of neural networks. The other main difference is one of
scale. Others have kept their models simple by limiting the number of different
orientations recognized. In contrast, the neurons built here are designed to be a much
closer match to the human system.

Other systems that model the underlying functionality of the primary visual system have
done so with mathematical operations; functions that are applied to regions of a target
image. This system will take a neural network approach to these models. This allows the
research to abstract away from the mathematics, and instead concentrate on the
underlying functionality. That is to say, this research can focus on the particular
conditions that define a Complex or End-Stopped cell's activation, and not on the
mathematics behind that definition. This will provide a better understanding why the cell
activates rather than how the cell activates.

Another benefit of a neural design is that a neural network approach can be better scaled
to accommodate future developments. For example, it is known that the orientation and

5

line ending cells of the human system rely on information fed-back from later stages of
the vision system. The existing mathematical operators can never accommodate this
feedback information. Instead, a completely new operator would have to be developed,
completely invalidating the old model. However, this information can be incorporated
into a neural approach. Once it has been determined that a neural approach to the human
visual system is valid, those higher level visual systems can be developed. At that time,
new neural networks can be built that use this information. While there are new
networks, the underlying approach has not changed. Furthermore, as we learn more
about how the human visual system that new understanding can be built into future neural
networks.

The other main difference from other models of the primary visual system is that the cells
developed in this research have orientations that are a much closer match to actual human
cell activation. Determining how cells with neighboring orientations should react in
relation to each other comprises much of the difficulty in developing this model. To put
it another way, as an angle shifts from one orientation to the next, one must determine at
exactly which point the first cell ceases to activate and exactly when the second cell
begins to activate. When other researchers limit their models to only four or six different
orientations, they reduce the complexity of this inter-cellular relationship.

6

2. A Model of the Human Visual System
The human visual system is a highly complex system that has evolved to be both highly
robust and accurate. We are capable of easily identifying a wide variety of objects and
placing those objects into their context in an extremely timely manner. We can recognize
those objects even when they are occluded, have unexpected colour, are oriented in an
unusual direction or are of a strange size. Furthermore, we are capable of doing this
despite almost any particular lighting condition. In short, the human visual system
provides a perfect example of what we would like an artificial system to be able to
accomplish.

Researchers have collected details on the Human Visual System from several sources
[Coren, Ward& Enns, 2004][Kolb &Whishaw, 1996][Rolls, Aggelopoulos &
Zheng, 2003][Dimitrov, 1997]. For example, there is direct examination of the
brain structures involved in similar mammalian vision systems. Inserting probes into a
monkey's occipital lobe and studying the effects that cause a spike in electrical activity
has provided considerable information. Less direct sources of information include case
studies of people with damage to their visual systems. When a particular region of a
person's brain is injured, there are often very specific results to that person's vision.
Through these various sources, we have a fair understanding of how we are able to see
the world around us.

The Human Visual System is composed of a large number of structures, each of which
processes the image information in different ways. The image is passed from structure to
structure; at each step being converted into a form which is closer and closer to

7

interpretation. A general overview of this process can be found in Sensation and
Perception, 6th Ed[Coren, Ward& Enns, 2004].

The purpose of this section is to provide a basic explanation of some of these key
structures; enough to build a model from which the system can be built. There are other
structures which are not a part of this model (such as the Tectopulvinar Pathway).
Details on these structures are not provided.

2.1.

Image Capture and Early Processing

Any vision system must begin by capturing an image, and the human visual system is no
exception. Once that
image has been
captured, early portions
Optic Nerve

of the human visual
Optic Chiasm

system process that
Optic Tract

image in order to make
Lateral Geniculate Nucleus

it easier to interpret.
Optic Radiations

Figure 2 illustrates a
number of the structures
in the human system,
which will be detailed

Occipital Lobe/Visual Cortex
Figure 2 - Early Dataflow of the Human Vision System. Visual
information is presented to the retina. It then undergoes several stages
of intermediate processing as the image is passed from structure to
structure until it reaches the occipital lobe, where the first stages of
image interpretation begin.

below.

The human system captures an image through a number of photo receptive cells located
in the retina [Coren, Ward & Enns, 2004]. It is well known that we have two basic

types of photo receptive cells: rods and cones. These cells allow both nighttime and
daytime vision. Rods activate in low light conditions, and do not allow for colour. In
contrast, Cones react to daylight conditions. Furthermore, they come in three varieties,
each reacting most strongly to different wavelengths of light. This provides our ability to
see colour.

Almost immediately, there is some processing done on this image. The outputs of
numerous nearby photo receptive cells are aggregated into various types of Retinal
Ganglion cells, which transmit the image data to later stages of the vision system
[Coren, Ward & Enns, 2004]. The set of photo receptive cells that cause a particular
Retinal Ganglion cell to activate is known as the cell's Receptive Field. Neighboring
Retinal Ganglion cells will have overlapping Receptive Fields; which naturally implies
that any particular rod or cone will affect the output of many different Retinal Ganglion
Cells. The original image captured by the retina is not maintained. Instead, the Retinal
Ganglion Cells transmit to the Optic Chiasm this altered version; which represents an
enhancement of the original
image.

Different types of ganglion
cells react differently to the
intensity of light stimulus

On Response Center
|]ff RespQnse o^j

[Coren, Ward & Enns,

Off Response Center
0 n Response Surround

2004]. If a small point
of lieht is m o v e d through

a cell S Receptive Field,

Figure 3 - On and Off Response of Retinal Ganglion Cells. Certain
tinal ganglion cells produce a strong positive reaction to bright
regions in the middle of their receptive field, and a strong negative
reaction to dark areas outside of the center. Other ganglion cells
produce exactly the opposite reaction.

re

9

the cell can either respond to the light being turned on (known as an On Response) or to
the light being turned off (Off Response). Now, it seems as though some cells will
produce an On Response when a point of light stimulates the center of the Receptive
Field, and an Off Response in the outlying regions of the Receptive Field. Other cells
will react in an opposite manner (off response to the center of the receptive field and on
response to the outside). These two different types of response allow the visual system
provide different processing of light and dark, and to better scale relative brightness and
darkness of the image to make processing easier.

The ganglion cells (combined to form the optic nerve) transmit the image to the optic
chiasm: a structure which separates the image into the left and right field of view
[Coren, Ward &Enns, 2004]. These fields of view are then sent to be interpreted by
the right and left lobes of the primary visual cortex respectively.

The Geniculostiate Pathway transmits this image to the Visual Cortex. The main
structure of this system is the Lateral Geniculate Nucleus (LGN) [Coren, Ward &
Enns, 2004]. The LGN integrates the processed image data coming from the Optic
Chiasm with back projected image data from higher levels of the visual system, and
transmits this to the various cortical maps of the Visual System through a series of cells
known as the Optic Radiations.

10

2.2.

Early Visual Processing

The Visual Cortex is located in the Occipital Lobe, at the very back of the brain. It
consists of several sub-regions;
each responsible for a different

uCCipital Lobe

type of analyses of the image
[Coren, Ward &Enns,
2004][Kolb &Whishaw,
1996]. For example, V3 is
used to process colour
information and V5 is used

Figure 4 - Vision Areas of the Occipital Lobe. The now
processed image is presented to several different cortical maps on
in processing motion. As is the occipital lobe. These maps represent the first stage of image
interpretation. Each region processes a different aspect of the
i m a g e (such as intensit
illustrated in Figure 4 these
y ' t e x t u r e > c o l o u r o r motion). The Primary
°
'
Visual Cortex combines the result of this processing to be used by
higher level interpretation

visual areas exist among the
folds and wrinkles in the Occipital Lobe at the back of the brain.

Each of these sub-regions is a map of the original image presented to the retina. That is
to say, there is a correlation between the location of photosensitive cells in the retina and
the location of cells in each visual area which process that information. If exciting a
particular cell in the retina causes some cell in a visual map to be stimulated, then one can
estimate that a stimulating a nearby cell in the retina will cause a nearby cell in the visual
area to be activated as well. Naturally, due to the processing of the image by the Retinal
Ganglion Cells, this is not a direct 1-1 mapping, but a general relationship between the
locations of the cells.

11

The primary visual cortex (known as VI) is located in the back of the occipital lobe. This
area is the most important of the visual areas. As such, this is the area that will be the
focus of this thesis. VI is a highly complex region. It is organized into numerous layers
that allow input from the LGN, input from other visual areas in the Visual Cortex and
other higher processing centers of the vision system, and output to various neural
streams.

The main purpose of the Primary Visual Cortex is image segmentation. It breaks down
the original image into structures which define the boundary and joint conditions for that
image. For example, when viewing a hallway image, VI will produce reactions at the
edges between walls or around doors or windows, and at the places where those edges
meet. There are three cells used by VI for image segmentation: simple cells, complex
cells and end-stopped cells. These cells respond to differences in the intensity of the
input image, but each in different ways.

Simple cells [Coren, Ward &Enns, 2004][Heitger, Rosenthaler et. al, 1992]
respond to the boundaries between dark and light regions in the image. Simple cells are
designed to react when one side of the cell's receptive field is dark and the other is bright.
As such, these cells are somewhat reactive to different orientations. One simple cell will
react to the transition from dark to light at a particular orientation, and the neighboring
simple cell will react to the transition from dark to light at an orientation around 10 to 15
degrees away.

12

Complex cells [Coren, Ward &Enns, 2004][Heitger, Rosenthaler et. al,
1992][Mundel, Dimitrov & Cowan, 1997] also respond to oriented edges. These
cells take the input from simple cells and perform further processing. There are two
types of Simple Cells which
could react to an edge,

0

0

0

depending on which side of
the edge is brighter and which

0

is darker. However, Complex
Cells will aggregate this

Simple Cells React to
Original Image

0

0

Complex Cells React to
Simple Cells

information, so only one
_,

,

~ ..

-ii v.1 t K

Complex Cell will able to be

Figure 5 - Simple and Complex Cell Reactions. Simple Cells
react to variations in intensity along a particular orientation.
C o m p i e x C e U s p r o c e s s the output of those cells to produce

coherent depictions of an oriented edge.

responsive to that edge. That
is to say, Complex Cells are purely orientation selective.

End-stopped cells [Coren, Ward &Enns, 2004][Heitger, Rosenthaler et. al,
1992][Wurtz &Lourenz, 1997][Henricsson &Heitger, 1994][Mundel,
Dimitrov & Cowan, 1997] are also directionally sensitive. However, they respond to
areas where the edges end. End-Stopped cells have Complex cells as their input. The
distinction is that End-Stopped Cells only react to the termination of a series activated of
Complex Cells in the correct orientation. A short sequence of activated Complex Cells in
an End Stopped Cell's Receptive Field will cause the End-Stopped cells to activate.
However, a longer row of activated Complex Cells in that same field will have in
inhibitory effect on the End-Stopped cell, and it will not activate. In this way, EndStopped cells will activate at line terminations. It is also important to notice that they
will often activate in the areas of the visual scene that contain edge joints - which contain

13

large amounts of information necessary to interpret the image. This makes these cells
very useful as joint recognizers.

These cells are arranged in a very specific manner [Coren, Ward & Enns,
2004][Mundel, Dimitrov & Cowan, 1997]. There is one cell capable of responding
at approximately every 10 to 15

s*WW\

degrees of orientation. These cells
are grouped together and arranged
/ /

in order. Successive cells in an

\ \

orientation will react to successive
Right Eye

orientations. This arrangement of
Left Eye

cells is known as a hypercolumn.
A hypercolumn consists of cells
which respond to a full 360
degrees of orientation.

Figure 6 - Layout of a Hypercolumn. The various
orientation selective cells of VI are arranged in a
particular order. Cells of neighboring orientation are
clustered in a sequence to form a group known as a
hypercolumn. There a millions of these hypercolumns
covering VI

Hypercolumns are scattered across VI, each responding to its own Receptive Field.

VI also contains cells which operate across hypercolumns[Coren, Ward & Enns,
2004][Mundel, Dimitrov & Cowan, 1997]. These cells can suppress or enhance a
detected edge based in information from surrounding hypercolumns.

Edge information from each different visual map is aggregated together to build a
complete view of the edges and edge ends, which can then be passed on to the Object
Recognition and Object Localization streams to produce a meaningful interpretation of
the scene.

14

2.3.

Visual Pathways

The object recognition portion of the human visual system is located in the temporal
cortex, along the side of the brain[Coren, Ward &Enns, 2004][Li & Attick,
1994][Lowe, 2000][Rolls, Aggelopoulos & Zheng, 2003]. Cells have been
identified in the inferior temporal cortex which respond to very specific features in the
visual scene (such as a particular person's face). These cells are able to respond to the
same feature from a variety of different positions and lighting conditions. In other words,
they are insensitive to changes in size, orientation and other conditions

These cells have inputs that respond to less specific objects and shapes. For example, a
specific face recognizer will respond to cells that react to general face characteristics.
These cells will react to cells which recognize simpler objects. This process continues in
this manner back to cells that respond to very simple shapes - such lines and curves.
Ultimately, the pathway begins with the image parsing mechanisms of the early visual
system.

The object localization portion is located along the top of the brain (and is therefore
referred to as the dorsal pathway). It has connections to the object recognition stream,
and to the internal map (stored in the hypothalamus). Like the object recognition stream,
this stream begins with the segmentation information from the early visual processing,

15

3. A Neural Model of the Early Visual System
A complete vision system is beyond the scope of a Master's Thesis. Instead, the work
will focus on the development of a working model of the Early Visual System. This
model will lay the groundwork, from which future development can proceed. The image
segmenting cells of the Primary Visual Cortex were chosen as the focus of this research
because this is the first step in the Vision System which actually interprets the image.
Earlier portions of the visual system alter the image to make it easier to be interpreted,
but the image segmentation of VI is the first step where any level of understanding of the
image takes place.

If the image can be segmented in a manner similar to the segmentation done by the
human system, a later process could be developed which would use the edges and edge
terminators to find lines and joints. These can be used to build specific joint recognizers,
which can be ultimately be used to build cells to recognize walls, doors, windows and
other objects found in an image (modeling the successive levels of complexity found in
the Human Object Recognition stream). These joints could then be combined into a
comprehensive interpretation of the original visual scene (using a model the Object
Localization stream of the human system). However, the first step must be to build the
foundation for this processing - image segmentation.

16

Original Image-

¥ Histogram Equalization

The overall model used

—.; Edge Enhancement

for building this system is
9

pr-

»
\6

detailed in Figure 7. The

ie Detection

original image that is to
Complex Cells

V5.

be interpreted (a natural

End Stopped Gells

model for the
photoreceptive cells of the

._! Cross Hpreoli
Processing

retina) is processed with
some basic techniques to

i Object Idenlcatio

Object kicalizatio

simulate the early image
processing of the Retinal
Ganglion and other early

Figure 7 - A Model of the Human Vision System. In a manner
similar to the actual human system, image data can be
processed step by step in order to make the image easier to
interpret. Neural Networks modeled after the Complex and
End Stopped Cells can then be used to interpret the image.
Further processing of the image is left to future work

cells; including the
Simple Cells. The next
step is the main

processing of the VI model, the implementation of Orientation and Line Ending
detectors. This leaves a lot of work that can be done in the future; such as the modeling
of other early visual areas, cross-hypercolumn processing, higher level image recognition
systems, and the incorporation of feedback from these systems back into the earlier
processing steps.
3.1.

Early Image Processing

The interpretation of an image with this model begins with some basic image processing
steps. Since the current system is building a model of VI, there is no colour processing.
Therefore, the image is first converted into black and white. A histogram equalization

17

routine is used to better accentuate the bright and dark regions of the image, and some
basic edge enhancement is used to make the resulting image easier to interpret. These
steps are done with basic, well known image processing techniques and do not require
further explanation. This processing is similar to some if the early processing done by
the Retinal Ganglion cells and other early vision systems.

The next part of this process is a somewhat controversial choice. Simple Cells have not
been modeled with Neural Networks, as has been done with Complex and End Stopped
Cells. Instead, it is done with a simple Edge Detection routine. The primary
consequence of this decision is that the output map from this step has no orientation
information, but a simple map that indicates where edges can be found for later
interpretation.

There are several reasons for making this decision. It can be argued that the main point
of the Simple Cell is edge detection, since those cells ultimately detect edges in a manner
very similar to standard edge detection routines; by reacting to rapid changes in the
underlying image intensity. It is also worth noting that other researches have followed
this approach and have only modeled the Complex and End Stopped Cells.

However, the main reason for choosing to implement Simple Cells with a standard Edge
Detection routine is that that there is really no need for two sets of cells that react to
different orientations; especially when processing a static image. Further processing
routines do not require output from the Simple Cell. They rely only on the Complex and
End-Stopped cells.

18

4. A Model of Primary Visual Cortex
The primary work in this thesis is the Primary Visual Cortex. That is to say, the
directionally sensitive and line terminating sensitive cells that will produce the basis from
which other vision systems can be developed. It has already been described how the
early parts of the visual system will be modeled with some simple image processing
techniques. The higher level processing streams will be left to future work.

What remains is the development of the Complex and End Stopped Cells that will model
the cells found in the human Primary Visual Cortex. The full set of these cells will
represent a logical hypercolumn, which can be used to interpret the edge map. These
cells will be built using standard feed-forward neural networks.

The cells react to a
Receptive Field, a
small region which

#

maps to a point on the
original image. When
a cell processes an
image, it scans the

Overlapping Receptive Fields of the
Input Map

Processed [lata Stored on Output
Map

entire Receptive Field,
and marks the output of

the cell at a matching

Figure 8 - Overlapping Receptive Fields. Every 9x9 region of the
input is individually scanned with the Neural Network. The results of
this processing is recorded on a direct pixel by pixel basis to an
Output Map

position on the cell's
output map. Every position on the input maps will be individually processed by the

19

hypercolumn, so the cell outputs will be the result of processing a large number of
overlapping Receptive Fields.

It has already been mentioned that one of the benefits of neural network approach is that
we can abstract away from the mathematics and focus on the conditions which will make
the cells work correctly. This needs to be reiterated. Mathematics are tools which can
be used to accomplish many interesting things. However, the maths used to create these
applications should not be of primary concern, as they really are only a tool. Instead, we
must focus on understanding the problem and each step taken to resolve that problem.

The mathematics behind neural network theory is well understood and there is no reason
to reiterate them here. Instead, the focus will be on the conditions that will make the
neural networks work correctly. Now, neural networks work through a training routine.
They are given a series of both positive and negative training examples and are trained to
respond correctly to those examples. If those examples have been chosen correctly, then
the network will respond to previously unseen data in the correct manner. The work in
this thesis will not focus on functions, formulas, calculations or algorithms. It will focus
on the positive and negative training examples that will be used to create networks which
respond correctly. It will also deal with incorrect behavior of those networks, and the
modifications to the training examples that will improve that behavior.
4.1.

Cell Characteristics

Before the artificial hypercolumn can be developed, the characteristics of the cells must
be determined. There are two issues which much be addressed. First, the separation
between the orientations which will stimulate neighboring cells (which will implicitly

20

determine the number of cells that will be built) must be determined. The other question
which must be answered is the size of the Receptive Field, or the number of elements
(pixels) of the input maps which will be processed by the neural networks.

The first decision that must be made is the number and orientation of the cells that will be
used. The separation between orientation selective cells in the human visual system is
between 10 and 15 degrees. One of the goals of this research is to build a VI model
which comes closer to the human system. So, the separation between the artificial cells is
set at a similar level. In order to make everything work correctly, separation of 10
degrees is chosen, somewhat arbitrarily.

In both Complex and End-Stopped cells, one cell must be built to react at each possible
orientation. However, the number of cells is different between the two types. Complex
Cells are designed to react only to the orientation, and a line at any particular orientation
is identical to another line which is 180 degrees away. As a consequence, only half a
circle needs to be considered. As an example, there is no difference between a line with
an orientation of 0 degrees and a line of 180 degrees, so only one of these angles needs to
be considered. The model therefore calls for 18 Complex Cells.

In contrast, End-Stopped cells must consider angles from around the complete circle.
This is because every line will have two end points; each pointing in opposite directions.
Consider a line with orientation 30 degrees. This line with have two end points, one
oriented at 30 degrees and the other oriented at -150 degrees. It follows that there must
be twice as many End-Stopped cells as Complex cells.

21

In order to determine the size of the input region, several simple premises are considered.
First, the cell should react to edges that pass through the exact center of the test region.
To make this work, the region should be square and have odd number of pixels on each
side. Secondly, the target model for each cell should be distinct for every different cell.
Through some straightforward experimentation, it was determined that the smallest
square region (with an odd number of pixels on each side) that could represent 10 degrees
of separation is 9x9. Therefore, this is the size of the region used as input to the network.
4.1.1.

Cell Configuration Summary

Receptive Field: 9x9 pixels
Orientation Separation: 10 Degrees
Number of Complex Cells: 18
Number of End Stopped Cells: 36

22

4.2.

Cortical Maps

The output of the cells

0rigina| |mage

Edge Map

Complex Cells

End-Stopped Cells

must be stored in a
manner which is easy to
interpret and is useful to
later image processing
steps. In humans, these
cells are arranged in a
cortical map. The most
natural output for these
cells is a series of
images which represent
a map of the cellular
output.

If the region being
processed contains an

Figure 9 - From Original Image to End-Stopped Cells. To interpret the
original image, begin by generating an edge map. Complex Cells then
isolate the sections of the edge map that lie in a particular orientation.

edee of the correct

^ w 0 ^nc* Stopped c e " s W 'N t n e n **n^tne s t a r t an<* st°P °f tnat edge.

orientation, then it activates. The results of the scanning with these cells are stored on a
map that corresponds to the original image. So, if the edge map is scanned at position
(X,Y), then the results of that scan will be stored at position (X,Y) on the map that
corresponds to that Complex Cell's output. There will be one output map for each
Complex and End Stopped Cell.

23

4.3.

Complex Cells

Complex cells will be designed to react to a small region of the edge map derived from
the original image. The neural networks built will be designed to activate when an edge
of the correct orientation is presented to the network. That is to say, when a line with the
correct angle passes through the center of the network's receptive field, the cell activates.
Any other image presented to the cell should cause it to fail to activate.
4.4.

End-Stopped Cells

End-Stopped Cells work in a similar manner. They will scan a region the same size as
the Complex Cells. However, they scan the results of the Complex Cells rather than the
edge map. When a region that was previously determined to be of a particular orientation
is found to terminate, then the End-Stopped Cell that reacts the orientation of the
stopping edge reacts. In other words, the cell fires when the center pixel of the cell's
activation region contains the last pixel of an edge with the correct orientation. The
output from these cells is stored in another series of maps that correspond to both the
Complex Cell output and the original image.

24

5. Training Complex Cells
Training examples are generated that will cause the cells to react correctly. The process
used to create valid Complex Cells is cyclical in nature. Since the necessary training
examples cannot be determined beforehand, a reasonable "first guess" at rules for
generating training examples is created. Then, as any issue is uncovered, those rules will
be altered to correct the problem. The networks created from these new rules are
evaluated for any new issues, and a new set of training example rules is created.

Issues that may need to be addressed are not always obvious. In some cases, the output
generated by the Complex Cell is clearly wrong and the training examples are in need of
correction. However, in other cases, the issues with the Complex Cells cannot be
recognized until those cells' output is processed by the End Stopped Cells.

Figures 10 and 11 illustrate some examples of both positive and negative training
examples. A network trained to respond to 30 degree edges will be trained to return a 1
for any of the images in Figure 10, and to 0 for any of the images in Figure 11.

1

Correct Orientation

Close Orientation

,•

Short Edges

•-

Nearby Alternate Edges

Figure 10 - Complex Cell Positive Training Examples.
Some examples of small regions of an edge map that should
generate positive Complex Cell output. These are presented
to a network in order to teach it to respond correctly,

25

Incorrect Orientations

Not through Center of Field
Multiple Incorrect Angles

Figure 11 - Complex Cell Negative Training Examples.
Some examples of small regions of an edge map that should
not cause a Complex Cell to activate. These are presented
to a network in order to teach it to respond correctly.

5.1.

Training Cycle 01

The training examples used for the first version of the Complex Cells were designed
using some basic rules that were chosen because they intuitively should produce cells that
have somewhat correct results. A routine which takes a parameter and generated training
examples was written. The training examples produced by this routine follow some
basic rules.

1) A white line though the center pixel, oriented in the correct direction should cause the
cell to react strongly (trains to 1)
2) The reversed image (white pixels turned to black, black pixels turned to while)
always trains to not react (trains to 0)
3) A completely black region trains to 0
4) If the region contains only one or two random stray pixels, the cell should train to 0
5) Lines through the region, but orientated away from the specified angle should train to
0

26

6) Lines oriented in the correct orientation, but not passing through the center should
train to 0
7) If there are two lines through the region, one correct and one with any other
orientation and position should train to 1

Cells built based on these rules did indeed produce cells that were somewhat directionally
sensitive. When presented with a test image consisting of several lines of various
orientations, the line with the correct orientation is generally isolated. However, there are
a couple of issues. First, there is quite a bit of noise around the joint. More significantly,
the lines that have been isolated are not always complete - there are gaps in the line
which causes a dashed-line appearance to the output.

Consider Figures 12 and 13. It is clear that the output for
these cells do generally find the line with the orientation that
they are designed to. However, the noisy joint and the
dashed lines are particularly noticeable in the output for the
cell oriented at 10 Degrees.
Figure 12 - Testing Image

Figure 13 - -50 Degree Cell and 10 Degree Cell Output. The results of
scanning the testing image with two different Complex Cells. Although the
negative fifty degree cell reacts well, there are a number of problems with the
ten degree cell's output.

27

5.2.

Training Cycle 02

The dashed lines seen in the 10 Degree recogniser are an example of the first issue to be
resolved. Although the 10 Degree line has been isolated, only certain portions of the line
actually cause the cell to activate. As a result, the line is not correctly recognied. When
the end-stopped cells are applied to this kind of output, they active all along the line
rather than just at the ends.

This is a fairly straight forward problem, and is to be expected. While a line segment
through a 9x9 region seems like a simple object to generate, it is in fact
more complicated than one may initially expect. One must remember that
a line is composed of pixels that are arranged
in such a way
b
J as to be close
^

^

F

'g u r e 14 ~ 1 0

Degree Target

to the line in question. Those pixels are, in many cases, not always aligned perfectly
evenly. Consider, for example, figures 14 and 15. Figure 15 represents the line segment
generated as the target for an edge with 10 degree orientation. That is to say, the 10
degree Complex cell is trained to respond
with a 1 when it encounters a region of the
edge map that matches this image.
However, as is illustrated by Figure 14, an
actual edge with orientation 10 degrees is
much more complicated. Different positions
along the edge can look very different.

Figure 15 - Different 10 Degree Line Segments.
Although both of these regions are a sub image
of an edge with ten degree orientation, they look
completely different from each other, and are
different at almost every pixel.

The clear resolution to this problem is to add
the missing positive targets to the training examples. While these different line segments

28

may look very different, they all lie along the same orientation, and can be found along a
much larger line in that orientation (see Figure 14). In order to correct this issue, a large
line is drawn in the orientation of the desired cell. Then, a 9x9 region is scanned over
this image. Every possible region that passes exactly through the center of the region is
then added to the training examples with a target of 1. By adding these training examples
to the routine that generates training examples, the dashed lines will be not be generated
by the Complex Cells.
5.3.

Training Cycle 03

The next step is to clean up the noise seen around the joints. This false activation was
completely unexpected, as there is no apparent reason for the Complex Cells to trigger
around the joint. In order to diagnose this issue, an analysis of the conditions that caused
the cell to react must be taken.

First, a position where false activation takes place is found on the Cell Output map.
Since a position on this map relates directly to a position on the original edge map, it is a
straight forward process to find the image region that caused this false activation. This
process is taken over several false activations of the cell. The edge map regions that
caused the false activation are compared so that similarities can be determined.

Figure 16 displays some of the regions
that are causing a Complex Cell to
activate incorrectly. A cell that was

Figure 16 - Image Segments that Generate False Positives.

trained to react to an edge o f - 5 0

It was not anticipated that two lines of completely incorrect
orientation would generate a positive reaction, even when
each individual incorrect edge would not

29

degrees was investigated, and it was found that these regions of the edge map were
causing stray activation.

When looking directly at the edge map regions that cause false positive activation of the
cells, it becomes immediately clear what is happening. While single edges that are off
center and do not pass through the center of the region are specifically trained to cause
the cell to not activate, this training does not generalize to multiple bad edges in the
testing region. When two (or potentially more) completely incorrect edges fall into the
region being tested, the cell can activate even though it would not activate if any of those
lines were to be tested individually.

Specific training examples must be added to deal with multiple bad edges in the testing
region. Combining multiple existing negative training examples into a single image did
this. That is to say, every negatively oriented training example is considered, and paired
with every other negative training example, one by one. Each of these pairs is then added
to the set of negative training examples. Naturally, this increases the set of negative
training examples by an order of magnitude (which slows down the training process
considerably), but the results are worth the extra processing.

These extra training examples cause an unexpected side effect: the input space is no
longer linearly separable. That is to say, it is no longer possible for a single cell to
distinguish between correct activation and incorrect activation. Instead, Complex Cells
must be modeled with a small network of interacting cells. This is likely a side-effect of
the decision to model complex cells directly, rather than separate the functionality of
simple cells. In the human visual system, the inherent (and somewhat ironic) non-

30

linearity of recognizing a straight line is processed through a multi-level orientation
processing system, and it seems that a similar multi-level processing system must be built
for an artificial vision system as well.
5.4.

Training Cycle 04

Until this point, correcting issues with the directionally sensitive cells has consisted of retraining the cells until they match a straightforward understanding of how a cell should
work. The next issue to be resolved does not have such a clear resolution, and the best
solution can only be found through experimentation.

The cells need to have some
mechanism for dealing with lines of
a correct orientation, but do not
span the entire testing region.

F i g u r e 17 _S h o r t L i n e Segments.

only edges that are long

enough to reach the center of the testing region should be able
to trigger a complex cell to activate.

Consider Figure 17. While it is immediately obvious that the first region displayed here
must trigger a 45 Degree Complex Cell to activate, and that the last region should never
cause that same cell to respond, it is not at all clear how the cell should react to the other
three line segments.

There are several possible ways that the Complex Cells could be trained. First, they could
be trained to only respond to a line segment that traverses the entire receptive field for the
cell. That is to say, the first region would be trained to activate while the rest would be
trained to not respond.

31

The concern with this approach has to do with the calculation of the End Stopped cell.
When a long edge is processed by the Complex Cells, those regions of the edge which
completely span the receptive field of the cells in that will activate those cells. However,
the cells near the end of the edge will fail to activate, as the edge does not span the entire
receptive field. When the End-Stop Cells process the output of these Complex Cells, it is
clear that they will
react at the wrong
location, since the
actual end of the edge
was not recognized by
the Complex Cells.

Figure 18 - Incorrect Processing at End Point. If only edges that span the
entire receptive field is allowed to activate the Complex Cell, then the results
w j U b e shortened, and the End Stopped Cell will be triggered at the wrong
location

The next option would be to allow the cells to provide partial reaction, scaled to react
proportionally to the number of pixels left in the sample image. For example, the first
test image of Figure 17 would provide 100% activation, the second with 78% (7/9 pixels
in the image) and 56% activate for the third (5/9 pixels). However, it is unclear how the
End Stopped Cells should react in this case; for similar reasons as the previous option.
The line recognized by the Complex Cells will not terminate at the correct point, but will
rather fade out over a longer distance. This will provide no clear point for the End Stop
to react to.

The best results have been found when the Complex Cells react to any correctly oriented
edge that passes through the exact center of the receptive field. Furthermore, all cells
should be trained to respond with 100% activity. That is to say, a Complex Cell is
reacting to an area around pixel (X,Y), then it should provide a full response whenever a

32

correctly oriented edge lies on the point (X,Y); even if it is only a partial edge. In that
manner, the complete edge will be recognized by the Complex Cells and will provide
good input to the End Stopped Cells that react in that area.

33

6. Training End-Stopped Cells
Training End Stopped Cells is similar to training Complex Cells. Rules are generated to
produce training examples that can be used to cause the cells to react properly. Where
Complex Cells were trained with simulated Edge Map values, the End Stopped Cell are
trained with Simulated Complex Cell output. Once again, when it is found that the
current set of rules do not produce correctly functioning End Stopped Cells, those rules
are adjusted and a new set of cells is generated and tested.

Figures 19 and 20 display some examples of the training examples generated for an End
Stopped Cell. These images represent a region of the output from the Complex Cell with
matching orientation to the End Stopped Cell being trained
Correct End Stops
Close Orientation
End Stops
J

Correct End Stops with
Alternate Incorrect Edge

Figure 19 - Positive End Stopped Training Examples.
Examples of cases where it is known that an End Stopped
Cell should activate must be presented to the cell in order
for it to be trained correctly.

Short Line Segments

Long Line Segments
Incorrect Orientations

Figure 20 - Negative End Stopped Training Examples.
Examples of cases where it is known that an End Stopped
Cell should not activate must also be presented to the cell
to make it work correctly

34

6.1.

Training Cycle 01

The inputs of the first End Stopped Cells were designed strictly according to the model
described earlier. That is to say, the outputs of all Complex Cells were aggregated into a
single input vector and presented to
the End Stopped Cell. This is
illustrated in the following diagram.

It has already been pointed out that End
Stopped Cells are directionally sensitive, as
Complex Cells are. As a consequence, they
must react strongly when the Complex Cell
that corresponds to the End Stopped Cell's
orientation displays an ending. The
activations of the other Complex Cells have
little impact.

Edge
Map

Complex
Cell

The reality of this implementation is

End
Stopped
Cell

completely unfeasible. First, the
size of the input vectors is quite
large (size of the receptive

Figure 21 - Original Dataflow. It is known that End
Stopped Cells process the output of Complex Cells.
However, it is not reasonable for an End Stopped Cell to
process the output of every Complex Cell

field*number of Complex Cells).
This dramatically slows down the speed at which the cell can react.

Even more importantly, the number of training examples needed to produce cells that
react this way is completely unacceptable. First, the training examples for the matching

35

orientation Complex Cell must be calculated (according to rules found in the next
section). Then, training examples must be generated for all of the off-angle Complex
Cells, which will cause those inputs to be largely ignored. This is done with a large
number of wildly varied input samples. When the cells are trained to this input, the offangle cell inputs must not affect the activation of the cell, allowing the edge ending
characteristics of the on-angle inputs to determine the activation.

The main problem with training End Stopped Cells according to this model is number of
training examples that must be generated. Consider that every single on-angle training
example must also train all other (off-angle) inputs to be irrelevant. This implies that
there must be one training example generated for the combination of every on-angle
training example with the combination of every off-angle example for every off-angle
cell. This combination of combinations of training examples produces a total set that is
so many orders of magnitude larger than just on on-angle training examples that it is not
realistically possible to train cells to react in that manner.

36

Instead, a much simpler approach
is taken. Since off-angle input
cells should not affect the
activation of the End Stopped
Cell, they are completely removed
from the input to that cell.
Instead, an End Stopped Cell will
react only to the Complex Cell
that matches its orientation. Since
there are twice as many End
Stopped Cells as Complex Cells,
, „

,

„ „

.,,„

,,

each Complex Cell will feed to

Edge

Complex

End

Map

Cell

Stopped
Cell

Figure 22 - Modified Dataflow. End Stopped Cells should
only consider the output of the Complex Cell that matches its
own orientation

two End Stopped Cells, but when processing an End Stop, only one Complex Cell needs
to be considered.

Using this approach, the training can focus on the on-angle Complex Cells. The offangle Complex Cells will be implicitly ignored simply by virtue of the fact that they are
not taken into consideration.
6.2.

Training Cycle 02

Generating training examples for End Stopped cells works in much the same way as the
generation of Complex Cell training examples; a routine has been written to
automatically produce both positive and negative training examples. Rules that specify
both correct and incorrect exemplars are generated according to rules that will lead the
trained cell to react accordingly.

37

The initial rules that are used to generate End Stopped training examples are listed below.
All of these rules are used to generate training examples generated by Complex Cell with
the matching orientation.

1) An edge of the proper angle that ends in exactly the center pixel should train to 1
2) White regions with black lines always train to 0
3) All black regions train to 0
4) One single activated pixel in the receptive field is just noise, and should train to 0
5) A line of the correct orientation, but not long enough to reach the center of the
receptive field should train to 0
6) A line of the correct orientation, but too long (i.e., extending well past the center of
the receptive field) should train to 0
7) Correctly oriented lines that miss the center of the receptive field should train to 0.
This includes lines which are either too short, the correct length or too long.
8) Lines with the incorrect orientation train to 0. This includes edges that are in the
center of the receptive field as well all cells that are offset from the center of the field.
9) A correct end stop should train to 1, even when there is also an incorrectly oriented
line in the receptive field. That incorrect line could also be offset from the center of
the receptive field
10) A line segment which has a very similar orientation to the correct one, and stops
exactly at the center of the receptive field should train to 1

38

The first set of cells that used these rules could not be trained. Every time a cell was
generated using these rules, a large number of the training examples would always be
incorrectly mis-classified by the resulting networks.

To try and determine the cause of this network error, a careful study was made of the
training examples
Offset 10 Degree End Stop
Targets

created with these
rules. The problem

0 Degree End Stop Target

arises from some
very basic training
examnlps Fvprv
"
^

Figure 23 - Overlapping Positive Training Examples. Although a 9x9
region can distinguish ten degrees of separation for a Complex Cell, it is not
sufficient to maintain separate input values for the End Stopped Cell. It is
seen here that both zero and ten degree cells will react to the same input.

different line segment that could be a part of an edge with the correct orientation must
cause the cell to activate if it ends exactly on the center of the receptive field. The
problem is that, in some cases, there is no difference between a necessary training
example for one cell and a necessary training example for its neighbor's cell (which must
train to 0).

As can be seen, there are a large number of line segments that could trigger a 10 Degree
End Stopped Cell. In fact, there is such a wide variety of examples that some are
indistinguishable from necessary 0 Degree End Stopped Cell targets.

As was mentioned previously, the configuration of the cells (10 degrees of separation
between orientations, with 9x9 pixel receptive fields) was chosen based on two criteria:
the orientation is similar to the separation between cells in the human visual system and
9x9 was sufficient to separate these edges. However, this analysis was based on

39

Complex Cells, not on End Stopped Cells. It seems that the short line segments required
by End Stopped Cells cannot be separated with this configuration.

In order to generate required cells, the configuration of the cells must be altered. There
are two options for making this alteration: the orientation between cells can be increased
or the size of the receptive fields can be increased. The decision on which option to take
is a little bit arbitrary. Since 9x9 is already a fairly large receptive field, and there is
some room in the cell orientation separation (the human cells tend to have a separation
between 10 and 15 degrees of separation), the angle between cells was increased to 15
degrees.

6.2.1.

New Cell Configuration Summary

Receptive Field: 9x9 pixels
Orientation Separation: 15 Degrees
Number of Complex Cells: 12
Number of End Stopped Cells: 24

6.3.

Training Cycle 03

Until this point, all cells have been built assuming perfect input in the hope that the cells
would generalize to cases where the input has some unexpected conditions. This works
well for Complex Cells, since
edge maps produce quite good
input to the cells. However,
Complex Cells do not always
produce perfect response to End

Figure 24 - Partially Activated End Stopped Inputs. When the
Complex Cell's output is not at perfect strength, the End
Stopped Cell's output should be scaled to match

Stopped Cells.
40

When the image presents unexpected edges to the Complex Cells, occasionally they will
produce partial activation. That is to say, rather than generating a 0 or a 1 the cell will
react with some in between value, such as 0.6 or 0.7. When an End Stopped Cell is
presented with this partially activated cell as input, they can produce unexpected results.

In order to deal with these partially activated cells, new training examples were generated
to represent positive training examples which are partially activated. The target for these
partially activated inputs is scaled to match the activation level of the cell input. So, if
the input to the End Stopped Cell is only 80% of the normal input then the cell will react
with 80% activation. If the Complex Cell only provides 40% input then the End Stopped
Cell will provide 40% activation.

41

6.4.

Training Cycle 04

The next issue to occur when building End Stopped Cells is another issue that is caused
by unexpected processing of the Complex Cell. It seems as though, in certain
circumstances, an off angle edge can produce some stray activation around the edge,
which causes the Complex Cell output to alternate between somewhat thicker and thinner
sections of the edge that it finds.

Test Image

45 Degree Cell Output

Figure 25 - Complex Cell Output with Differing Thickness. When an off angle edge is composed
of small jagged lines, small imperfections in the Complex Cell output can arise.

45 Degree End Stop

135 Degree End Stop

Figure 26 - End Stopped Output with False Positives. These imperfections in the Complex
Cell can produce false activation in the End Stopped Cell, which can be trained out with some
particular training examples.

42

It seems that the transition between the thicker and thinner sections of the Complex Cell
output is sufficient to stimulate the End Stopped Cells, at least to a small degree.

There are two options that can be used to correct this issue. It is reasonable to believe
that this is an issue with the Complex Cell. However, there are no
straight forward techniques which can separate out the false positives in
the Complex Cell Output from the correct cell output. Instead, this
issue can be resolved by adding think-to-thin negative training
examples to the End-Stopped Cell training examples.

43

Figure 27 - New
Negative
Training
Example

7. Off Angle Edges
Once the cells are working reasonably well, there is still quite a bit of work that must be
done to fine tune them to ensure correct results. The main issue when creating these cells
concerns edges that do not lie exactly along the orientation that the cell is designed to
respond to. In any realistic situation, most edges will not line up exactly with the cell's
primary orientation. They must be able to respond to angles that are nearby to the correct
orientation. In essence, these cells need to react not to a particular output angle, but to a
region of arc.

7.1.

Arc Region Transition

Clearly, a cell should react to an edge that is very close the main target angle. Also, the
cell should not react to an edge that is very close to the next cell over. However, it is not
clear how the cell should react to an
edge that lies immediately in between
the two cells' main orientations.
Orientation A+1

Consider Figure 28. Orientation A
and Orientation A+l are adjacent to
Orientation A

each other, and each has a cell trained
to react to that orientation. Now, if
cell A is presented with an edge that is

,,
^ ,.
r.
„ Figure 28 - Off Angle Activation. Most edges do not
some small orientation away from call ..
..
.. „ .
. . ..
,
„ T . .
3
he exactly on the main orientation of a cell. Instead,
rules must be developed with determine how off angle
A's main orientation (in green), then
edges transition from one cell to the next.

clearly cell A should react. However, if presented with an edge that is much closer to cell
A+l (in red) then cell A should not react. The issue in question is in regards to an edge
(in blue) that lies immediately between cell A and cell A+ls' main orientation.
44

There are many possible ways of dealing with this situation. The most obvious approach
is to try and draw a hard boundary
between these two cells' activation

Cell A+l Activation
eqion

regions. An angle is chosen that
lies in between Orientation A and
Orientation A+l, call it Orientation

/*MB§Hl^^

\

Cell A Activation

A+l/2. Now, if the edge being
tested by cell A has an orientation
less than Orientation A+l/2 then it
activates. If that edge has an

orientation greater than Orientation

Figure 29 - Hard Separation between Neighboring Cells.
Early cells attempted to classify an edge with a particular
cell. A hard decision boundary was establish between
neighboring cells, and an edge was to be classified
according to which side of the boundary it fell on. This
approach was doomed to failure.

A+l/2 then it does not.

The other option is to have
some overlap in the
Cell A+l Activation

reactions between the cells.

Region

In this case the edge that
lies right between the two
cells' main orientation will

r

Cell A Activation
Region

cause both cells to react.
This can be illustrated with
Figure 30. Cell A is
trained to react to the blue

Figure 30 - Overlapping Cell Activation Regions. Later cells
incorporated a Joint Activation Region - a small set of angles that
c o u l d stimulate both neighboring cells.

region. Cell A+l is trained to react to the red region. In addition, there is a purple

45

region. Both cells will react to an edge whose orientation lies in the purple region. From
here on, this purple region will be known as the Joint Activation Region.
7.2.

Issue Analysis
__ React to only 0
Degree Cell

Through experimentation, it was found
that the second option is the only one

— Identical Regions

which can actually work. The
problem with attempting to define an

React to Only 15
Degree Cell

orientation that serves as a hard
boundary between two cell's output
regions
is that small segments
ofc
°
°

• ,i .

.1

.

Figure 31 - Shared Segments in Differently Oriented
„ ,. ....
...
*
.
. . . . , .„ .
Cells. Although these two edges should be classified
by different cells, sub regions of these edges are

differently oriented lines can look

identical.

identical. When two lines have similar but different orientations, it often happens that
certain segments of those edges will look exactly alike. The consequence is that an edge
on one side of the hard cutoff line will have areas that look exactly like areas on the other
side.

Attempts to build a network that has a hard cutoff at a particular orientation can never
succeed because they will be required to be both active and inactive for a particular line
segment. Consider Figure 31. While the two edges have very different orientations, and
should produce reactions to different cells, there are segments from those edges that are
identical and cannot be separated.

46

In fact, there is no real method to separate the two regions in a perfectly clean manner.
No matter where the border between the two regions lies, there will be regions
immediately outside of the activation region which will have segments that still trigger
the cell. Instead, the cells
will react with degrading
activation as the edge
being tested is further
away from the cell's
target angle. It has been
found experimentally that
the best results are
obtained when there is
a small area where two
neighbor cells will both

Figure 32 - Actual Reaction of to Off-Angle Cells. In order to produce
cells that operate properly, both Joint and Partial Activation regions
must be established. Joint regions stimulate both neighboring cells.
Partial activation cells may cause some stray activation of the wrong
neighboring cell, but not enough to later trigger the End Stopped Cell
to react.

react. Edges outside of the joint activation region provide partial reaction throughout the
activation region for the neighbor cell. In this case, partial activation implies that certain
segments of the edge will cause the cell to activate and other segments will not. This
Partial Activation Region happens implicitly; there are no special training examples
needed to cause it.

When training cells, the goal is to carefully balance the on and off response of the cell.
This implies that training examples must be chosen in order to meet some very specific
criteria
•

on responses must be clearly separated from off responses

•

a cell should never produce an on response for its neighbor's primary orientation

47

•

the region of joint activation between neighboring cells must be minimized

•

For complex cells, the vast majority of the partial activation region should
provide only stray activation, which is insufficient to stimulate the End Stopped
Cells.

•

For End Stopped Cells, there should be a minimum amount of stray activation
caused by the partial activation regions

Cells that meet these conditions have been found experimentally. For Complex Cells, a
shared activation region of approximately 5% to 10% of the total activation region
produces reasonably good results. End Stopped Cells must be built with a smaller joint
activation region. If more than 2% to 5% of the total activation region is trained as a
joint activation region it becomes inevitable that a cell will be trained to react to its
neighbor's primary orientation.

48

8. Result Analysis
The following sections display images which have been included to demonstrate the
functionality of these cells. They
have been created in order to
demonstrate how the different cells
react to specific areas of the edge
map. The results of the Neural
Network outputs have been
combined into a singe image. The
Complex Cell results have been
loaded into the red component of the
image. The green and blue
components contain the results for
„„„„ A:~„ TJ i
the rpair oi corresponding
Endto

Figure 33 - Colours to Represent Complex and End
v
,|Stopped
* ,.Output.
. *\ „Red indicates
.
. that a Complex
„ „* Cell, „has„
j e e n activated, yellow shows where an End Stopped Cell
is active and magenta indicates that the oppositely

Stopped cells (for the angle that

oriented End Stopped Cells is active.

+i

• „r

matches the Complex cell and 180 degrees opposite). A point that only stimulates the
Complex Cell will be marked in red. A point that stimulates the End-Stopped Cell of the
same orientation as the Complex Cell should be marked in yellow (red + green). A point
that stimulates the End-Stopped Cell 180 degrees opposite of the Complex Cell
orientation should be marked in magenta (red + blue).
These images have been created specifically to demonstrate the activation of these cells
in certain circumstances. They are heavily processed in order to demonstrate this
functionality, but they do represent a true and complete record of the cell output in that
area. The unaltered outputs to the testing situations have been provided in Appendix A.

49

For the sake of brevity, throughout this analysis the Complex Cell and the two matching
End Stopped Cells will be referred to only by the orientation that the Complex Cell has
been trained to respond to. For example, the 30 degree cells will include the Complex
Cell trained to 30 degrees, the End Stopped Cell trained to 30 degrees and the End
Stopped Cell trained to -150 degrees.
8.1.

Correct Line

Interpretation

Figure 34 - Correctly Interpreted Edges. An edge is correctly classified by a connected series of
Complex Cell outputs, bounded by both matching an oppositely oriented End Stopped Outputs

Figure 34 shows how the cells react to an edge. Proper interpretation of an edge should
consist of a complete red line along the edge (Complex Cell Activation), with a yellow
mark (Complex Cell and matching orientation End-Stopped Cell) at one end and a
magenta mark (Complex Cell and opposite angle End Stop) at the other.

50

This is exactly what the output of the cells show. The 45 degree cells clearly separate out
the boundary between the wall and the floor, while the -60 degree cells have found the
wall/ceiling boundary. The ends of these boundaries (where the edges combine with
other edges to form joints) have been clearly labeled in yellow and magenta.
8.2.

Correct Joint

Interpretation

Figure 35 - Correctly Interpreted Joint. A joint is classified by a point where several
differently oriented End Stopped Cell outputs are all triggered.

Figure 35 displays the correct interpretation of a joint. Since the joint is clearly defined
in the underlying edge map, the joint is easily found.

The Complex Cells have clearly identified the edges of the correct orientation. Notice
that each cell only finds the edge(s) that matches that cell's orientation.

51

Furthermore, there is End-Stopped activation correctly specified at the joint generated at
the end of each line. For example, the output of the 0 degree Complex Cell has caused
the 0 degree End-Stopped cell to activate. This can be seen by the yellow mark at the end
of the 0 Degree output. In contrast, both the -45 and 90 degree Complex Cells have
caused activation of the End-Stopped Cell that is 180 degrees opposite of the Complex
Cell orientation (135 and -90 degrees respectively). These outputs can be clearly seen in
the magenta spots at the end of the matching output.

A joint can be seen as a point where several End-Stopped cells are activated in the same
area. In theory, the cells would be activated at exactly one pixel. However, in any
realistic scenario those cells will be grouped into a nearby region.

52

8.3.

Curve Interpretation

Figure 36 - Activation Around a Curve. Each Complex Cell identifies the regions of a curve that
match its target orientation. Nearby edge segments will tend to be related to each other by neighbor
orientation End Stops. Rather that seeing several joints around the edge, we tend to see the joining
of neighbor cells as one continuous curving edge.

The cells react not only to edges and joints, but also to curves. Since a Complex Cell is
essentially a first derivative operator, it reacts to segments of the curve that are close to
the cell's orientation.

As was already pointed out, there is some overlap in the activity between adjacent cells.
This has a consequence when dealing with curves; adjacent End-Stopped cells not
activate at a single pixel. Instead, in many circumstances they will tend to activate along
the curve but potentially some small distance away. They will activate on the curve
because End-Stopped cells are trained to activate when the pixel being tested has a strong
output value.

53

8.4.

Off-Angle Orientation Interpretation

As discussed previously, adjacent Complex and End-Stopped cells are trained to have
overlapping regions of activation. As a consequence some edges will be duplicated in the
cell outputs. That is to say, the edge will show up in adjacent output maps.

Consider an edge with a 23 degree orientation (seen in Figure 37). This edge is in the
Joint Activation Region between the 15 and 30 degree oriented cells (22.5 degrees being
the exact middle). As expected, both cells find this edge. Careful examination of the 15
and 30 degree cells
show that the edge and
its boundaries have
been identified by all
cells.

Figure 37 - 23 Degree Edge Triggers Multiple Cells. An edge that has a
twenty three degree orientation will trigger reaction from both the
fifteen and thirty degree oriented Complex Cells.

54

An edge with a 20 degree, while only separated by 3 degrees from the previous example,
is processed quite differently. Its orientation falls in the 15 degree cell's Activation
Region and the 30
degree cell's Partial
Activation Region.
In this case, the 15
degree cells properly
identify the edge. In
contrast, the 30

Figure 38 - 20 Degree Edge Partially Activates Cell. A twenty degree edge
also triggers both the fifteen and thirty degree Complex Cells.

degree cells only
find small portions of the edge. Dashed lines are produced as the Complex Cells react to
some portions of the line and not to
others. This in turn triggers the End
Stopped Cells to activate all along
the edge as well.

Taking a closer look at this partial
activation, it can be seen that the
cells actually find many small line
segments. These line segments line
u p nicely; with each E n d Stopped
Cell matching an oppositely

Figure 39 - Partially Activated Cells to Off-Angle Edge.
The thirty degree cell only partially identifies the edge. It
identifies small regions which must be stitched together by
later processing via the matched End Stopped Cell
activation.

oriented cell. The edges in a cell's
Partial Activation Region are represented by a chain of edges. The task of combining the
links of this chain into a single edge is left to future work.
55

The way that the cells react to edges in the Partial Activation

1

Region has one interesting property. It displays the need for a
mechanism to combine edge segments into a single larger
edge. This behavior is very similar to behavior exhibited by
humans. There is a well-known optical illusion where
humans automatically fill in missing edge segments (see
Figure 40) and combine edge segments into larger edges. It

i?jgure 40 _ Edge

seems as though the human visual system is taking on

Reconstruction Optical
Illusion. There seems to be a
similar need to combine

similar issues those generated
implicitly
by the cells
0
r
J J
this model.

56

from

m a c h e d ed s in t h e h u m a n

'

f

system, as this occurs in this
well known optical illusion.

8.5.

Rounded Edges

Taking a first glance at the results generated by the hallway image, it is tempting to
believe that there is a lot of noise in these images. However a closer examination of the
cell outputs around these "noisy" spots reveals some interesting details. It can be seen in
Figure 41 that the short line segments found in the -15 and -45 degree cell outputs are not
actually noise, but reflect small changes in the orientation of the edges around the corner.

Figure 41 - Cell Reaction to Rounded Joint. Noisy output from the Edge Detection algorithm can
cause the edge map to have a rounded appearance at the joints. To classify this, neighboring cells
react around the curve. Non-neighbor cells then combine at their End Stops to form the joint.

57

Figure 42 - Cell Reaction to Imperfect Edges and Joint. Another example where a rounded
corner is interpreted as neighboring cell outputs form a single curved edge. These rounded edges
join together with non-neighbor End Stopped Cells.

The region around the door frame displays similar "noise". That is to say, small line
segments displayed in the cell output actually reflect small imperfections in the
underlying edge map. Careful study of the edge map will reveal that the lower portion of
the frame has a small deformation causing the edge to bend slightly upwards. This
deformation is matched by a small line segment detected by the 15 degree cells.
Examination of all such noise in the cell output can be matched to small deformations in
the underlying edge map (including the rounded edges seem in the previous example).

It seems as though the mechanism used to stitch together line segments into a longer
single edge must be expanded to search neighbor cells rather than just output from a
single cell. That way, the small rounded and jagged sections will be properly
incorporated into the larger edge to which they correctly belong. Another way of stating

58

this is that an Edge Recognizer must track segments around nearby cells, and not just
react to a long reaction to a single orientation.
8.6.

End Stopped Cells and Joint Identification

Figure 43 - Location of End Stopped Cell Activation. It can be seen that each joint is
comprised of a cluster of End Stopped Cell activation. In addition, End Stopped Cells are
activated around curved areas, allowing for arbitrary curve tracking.

One of the reasons for embarking on the work in this thesis was the belief that the End
Stopped Cells could be used as Joint Recognizers. It has already been shown that the
End Stopped Cells do activate at junctions. However, there still has been no explicit

59

check to see how strong the relationship between joints and End Stopped Cell activation
actually is.

The location of the End Stopped activation in relation to the original edge map is
displayed in Figure 43. The relationship between End Stopped Cell activation and the
joints in the edge map are immediately obvious. Every single joint has been clearly
marked by the End Stopped Cells.

There are also a number of places where the End Stopped Cells have activated which are
not joints. The non-joint activations of the End Stopped Cells can be categorized into
two types.

First, the cells fire all around the curve (i.e., the light fixture). Since the End Stopped
Cells are reacting to changes in the Complex Cell activation, and the circle has
continuously changing orientation, there is considerable activation of the End Stopped
Cells. It seems that End Stopped Cells serve multiple purposes; joint identification or
aids to curve tracking are prime examples. The End Stopped Cells around this curve fire
along a neighbor Complex cell, and close to that Complex Cell's matching End Stopped
Cell. In contrast, End Stopped Cells at joints lie at almost exactly the same point as an
End Stopped Cell from non-Neighbor orientations.

Another type of place where End Stopped Cells can fire is at places where there is a
discontinuity on the original edge map. When there is a small gap on an edge, or that
edge is somewhat uneven, the resulting breaks in the Complex Cell output will cause the
End Stopped Cell to activate. It has already been mentioned that there will have to be a

60

mechanism to chain together continuous edges that have been identified. This chaining
of small edge segments will remove these sorts of the End Stops from consideration. It
should be noted that the chaining together of small edge segments into one larger edge is
very similar to the curve tracking already discussed. The only real distinction is that
chaining is along the same orientation whereas curve tracking is along neighboring
orientations. It seems likely that any mechanism to do one will be the same mechanism
used to do the other.

The conclusion of this study is that End Stopped Cells can be used for two purposes. If it
can be matched to another End Stopped Cell with a nearby (either a neighbor or exactly
opposite) orientation, then the End Stopped Cell indicates that the line segments should
be chained together to form one coherent object (either a straight line or a curve).
However, if the End Stopped Cell can be matched to one or more distant End Stops then
the End Stop could represent a joint.

In the case of the rounded corners discussed previously, both of these features can be
seen. Each edge coming into the joint can be chained to smaller edge segments, causing
a rounded appearance. Now, these extended edges have non-neighbor End Stops (from
the ends of the short segments added to the edges) which fall at common points which
represent a joint.

61

9. Future Work
The research in this thesis has focused on the development of a logical hypercolumn; the
building blocks from which other vision systems could be built. However, this is clearly
only the first step in building a robust vision system.

The first work to be done should be the development of cross-hypercolumn cells. These
cells would be used to join together and extend edges of the same or neighbor
orientations which have been found in the immediate area. This would clean up a lot of
the noise found around joints, as well as produce the optical illusion found when a single
edge lies in a cell's Partial Activation Region.

The logical hypercolumns described here have been based on changes in grey scale
image intensity. However, this is not the only feature which can be used to segment an
image. Other types of features that could be considered would include colour, surface
orientation, pattern and (if a video camera is employed) motion. There are other visual
areas in the human visual system that process these types of features. A more robust
study of these visual areas could produce better image segmentation than intensity alone.
The orientation and end-stop results of all the different discriminators could then be
combined to produce a single, reliable set of features which could be sent on to be
processed by higher level processes (i.e., by the object recognition and localization
streams of the visual system).

The next major piece of work that must be considered in the future is the development of
the higher-level image interpretation streams. This includes both the Image Localization

62

and the Image Interpretation streams of the human visual system. This would require a
detailed study of how these systems work for humans, so a detailed model of these parts
of the system could be built and implemented.

Finally, the human system has a large amount of feedback built into it. The nature of this
feedback could provide valuable information to earlier portions of the visions system;
particularly if the system is processing motion (as the change from image to image over
time could be processed). Careful study of the mechanisms through which the current
state of the vision system informs the processing of new information could provide some
very interesting new mechanisms for future upgrades to the cells developed here.

63

10. Appendix
Following are the raw outputs from the Complex and End Stopped Cells when presented
with a number of different testing examples. As has been done previously, the cells with
matching orientations have their outputs displayed together for easier reference.
10.1. Exact Target Angles
This section displays the results of the Complex and End-Stopped Cells against an
artificial edge map that represents lines that match exactly the target edges that the cells
have been trained to react to.

10.1.1.
-75 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

64

10.1.2.
-60 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.3.
-45 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.4.
-30 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.5.
-15 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.6.
0 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.7.
15 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.8.
30 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.9.
45 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.10. 60 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

10.1.11. 75 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

67

10.1.12. 90 Degree Results
Edge Correctly Found by Complex Cell, End Stopped Cells Correctly Identify Start and
End Points of both Edge Segments

68

10.2. 20 Degree Line

10.2.1.
-75 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.2.2.
-60 Degree Results
No reactions to any cells, since there are no edges that match this orientation

69

10.2.3.
-45 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.2.4.
-30 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.2.5.
-15 Degree Results
No reactions to any cells, since there are no edges that match this orientation

70

10.2.6.
0 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.2.7.
15 Degree Results
These cells correctly identify the nearly oriented edge presented

10.2.8.
30 Degree Results
These cells partially identify the nearby edge presented. Extra outputs to the End
Stopped Cells are generated to stitch together the dashed output

71

10.2.9.
45 Degrees Results
No reactions to any cells, since there are no edges that match this orientation

10.2.10. 60 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.2.11. 75 Degree Results
No reactions to any cells, since there are no edges that match this orientation

72

10.2.12. 90 Degree Results
No reactions to any cells, since there are no edges that match this orientation

73

10.3. 23 Degree Line

10.3.1.
-75 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.3.2.
-60 Degree Results
No reactions to any cells, since there are no edges that match this orientation

74

10.3.3.
-45 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.3.4.
-30 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.3.5.
-15 Degree Results
No reactions to any cells, since there are no edges that match this orientation

75

10.3.6.
0 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.3.7.
15 Degree Results
These networks correctly identify the nearby angle

10.3.8.
30 Degree Results
These networks correctly identify the nearby angle

76

10.3.9.
45 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.3.10. 60 Degree Results
No reactions to any cells, since there are no edges that match this orientation

10.3.11. 75 Degree Results
No reactions to any cells, since there are no edges that match this orientation

77

10.3.12. 90 Degree Results
No reactions to any cells, since there are no edges that match this orientation

78

10.4. Hallway Results
The following image represents a more realistic test of the VI Model. An image that is a
closer representation of an image that a system could interpret is presented to the system.
Preprocessing steps are applied to this image, and an edge map is generated, which is
then applied to the Complex and End-Stopped Cells.

79

80

10.4.1.

-75 Degree Results

Small line segments aligned in the matching orientation are correctly identified.

81

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments.

10.4.2.

-60 Degree Results

Small line segments and one longer edge which is aligned in the matching orientation are
correctly identified.

84

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edge.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edge.

10.4.3.

-45 Degree Results

Small line segments and a long edge which is aligned in the matching orientation are
correctly identified.

87

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edge.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edge.

10.4.4.

-30 Degree Results

Small line segments aligned in the matching orientation are correctly identified.

90

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segment.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments.

10.4.5.

-15 Degree Results

Small line segments aligned in the matching orientation are correctly identified.

93

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments.

10.4.6.

0 Degree Results

Small line segments as well as long edges which are aligned in the matching orientation
are correctly identified.

96

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edges.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edges.

10.4.7.

15 Degree Results

Small line segments aligned in the matching orientation are correctly identified. As well,
as small amount of stray activation occurs.

99

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments. The stray activation of the Complex Cell is not sufficient to produce
activation of the End Stopped Cells, and cannot affect further processing.

100

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments. The stray activation of the Complex Cell is not sufficient to produce
activation of the End Stopped cell, and cannot affect further processing.

101

10.4.8.

30 Degree Results

Small line segments aligned in the matching orientation are correctly identified.

102

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments.

10.4.9.

45 Degree Results

Small line segments as well as edges which are aligned in the matching orientation are
correctly identified.

105

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments as well as the edges.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments as well as the edges.

10.4.10.

60 Degree Results

Small line segments as well as long edges which are aligned in the matching orientation
are correctly identified.

108

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edges.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edge.

10.4.11.

75 Degree Results

Small line segments aligned in the matching orientation are correctly identified. In
addition, an edge with orientation close to the matching cell's orientation is partially
identified.

ill

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments. The partial activation of the Complex Cell is not sufficient to produce
activation of the End Stopped Cell.

112

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments. The partial activation of the Complex Cell is not sufficient to produce
activation of the End Stopped Cell.

113

10.4.12.

90 Degree Results

Small line segments as well as long edges which are aligned in the matching orientation
are correctly identified.

114

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edges.

Similarly oriented End Stopped Cells correctly identify the start and end of the identified
line segments and edges.

11. References
11.1.

Papers

[1] G. Adorni, G. Destri, M. Mordonini, F. Zanichelli, Robot Self-Localization by
Means of Vision, Proceedings of the 1st Euromicro Workshop on Advanced Mobile
Robots, pp 160-165, © 1996 IEEE

[2] Giovanni Adorni, Stefano Cagnoni, Monica Mordonini, Landmark-Based Robot
Self-Localization: A Case Study for the RoboCup Goal-Keeper, 1999 International
Conference on Information Intelligence and Systems, © 1999 IEEE

[3] Sven Behnke, Raun Rojas, Neural Abstraction Pyramid: A Hierarchical Image
Understanding Architecture, IEEE International Conference on Neural Networks, Vol
2, 1998, pp 820-825

[4] Sven Behnke, Hebbian Learning and Competition in the Neural Abstraction
Pyramid, International Joint Conference on Neural Networks, 1999, pp 1356-1361

[5] Sven Behnke, Learning Iterative Image Reconstruction, International Joint
Conference on Artificial Intelligence, 2001, pp 1354-1358

[6] Virginio Cantoni, Alfredo Petrosino, 2-D Object Recognition by Structured Neural
Networks in a Pyramidal Architecture, Proceedings of the 5th IEEE International
Workshop on Computer Architectures for Machine Perception, © 2000 IEEE

117

[7] Antonella Carbonaro, Primo Zingaretti, Landmark Matching in a Varying
Environment, Proceedings of the Second Euromicro Workshop on Advanced Mobile
Robots, pp 147-153, © 1997 IEEE

[8] Cyril Caucois, Eric Brassart, Bruno Marhic, Cyril Drocout, An Absolute
Localization Method Using a Synthetic Panoramic Image Base, Proceedings of the
Third Workshop on Omnidirectional Vision, © 2002 IEEE

[9] Ee-Chien Chang, Chee K Yap, A Wavelet Approach to Foveating Images,
Computational Geometry 97, pp 397 - 399, © 1997 ACM

[10] Deiter Fox, Wolfram Burgard, Sebastian Thrun, Markov Localization for Mobile
Robots in Dynamic Environments, Journal of Artificial Intelligence Research 11
(1999), pp 391-427, © 1999 AI Access Foundation and Morgan Kaufman Publishers

[11] Jeurgen Gausemeier, Beat Bruederlin, Development of a Real Time Image Based
Object Recognition Method for Mobile AR-Devices, Proceedings of the 2nd
International Conference on Computer Graphics, Virtual Reality, Visualization and
Interaction in Africa, pp 133-139, © ACM

[12] Alexa Hauk, Stefan Kabser, Christoph Zierl, Hierarchical Recognition of
Articulated Objects from Single Perspective Views, Proceedings on the 1997
Conference on Computer Vision and Pattern Recognition, pp 870-876, © 1997 IEEE

118

[13] Ramana Isukapalli, Russell Greiner, Efficient Interpretation Policies, International
Joint Conference on Artificial Intelligence, 2001, pp 1381-1387

[14] Jason A Janet, Ricardo Gutierrez-Osuna, Troy A Chase, Mark White, Ren C Luo,
Global Self-Localization for Autonomous Mobile Robots Using Self-Organizing
Kohonen Neural Networks, Proceedings of the International Conference on Intelligent
Robots and Systems (1995), pp 504-509, © 1995 IEEE

[15] Todd M Jochem, Dean A Pomerleau, Charles E Thorpe, Vision-Based Neural
Network Road and Intersection Detection and Traversal, Proceedings of the
International Conference on Intelligent Robots and Systems, pp 344-349, © 1995 IEEE

[16] Eric Marchand, Francois Chaumette, Controlled Camera Motions for Scene
Reconstruction and Exploration, Proceedings of the 1996 Conference on Computer
Vision and Pattern Recognition, pp 169-176, © 1996 IEEE

[17] Eric Marchand, Francois Chaumette, Active Vision for Complete Scene
Reconstruction and Exploration, IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol 21, No 1, January 1999, pp 65-72, © 1999 IEEE

[18] Jose del R. Millan, Angelo Arleo, Neural Network Learning of Variable Grid
Based Maps for the Autonomous Navigation of Robots, Proceedings of the 1997 IEEE
International Symposium on Computational Intelligence in Robotics and Automation, ©
1997 IEEE

119

[19] Silviu Minut, Sridhar Mahadevan, A Reinforcement Learning Model of Selective
Visual Attention, Proceedings of the 5th International Conference on Autonomous
Agents, pp 457-164, © 2001 ACM

[20] Philipe Mulheim, Wee Kheng Leow, Yoong Keok Lee, Fuzzy Conceptual Graphs
for Matching Images of Natural Scenes, International Joint Conference On Artificial
Intelligence, 2001, pp 1397-1402

[21] Robert Osada, Thomas Funkhouser, Bernard Chazelle, David Dobkin, Shape
Distributions, ACM Transactions on Graphics Vol 21, No 4, pp 807-832, © 2002 ACM

[22] Leonardo Romero, Eduardo Morales, Enrique Sucar, An Hybrid Approach to
Solve the Global Localization Problem from Indoor Mobile Robots Considering
Sensor's Perceptual Limits, International Joint Conference on Artificial Intelligence,
2001, pp 1411-1416
[23] Franc Solina, Ruzena Bajcsy, Recovery of Parametric Models from Range
Images: The Case for Superquadratics with Global Deformations, IEEE Transactions
on Pattrern Analysis and Machine Intelligence, Vol 21, No 2, February 1990, © 1990
IEEE

[24] Paul Suetens, Pascal Fua, Andrew J Hanson Computational Strategies for Object
Recognition, ACM Computing Strategies, Vol 24, No 1, March 1992, pp 5-61 © 1992
ACM

120

[25] Glenn Wasson, David Kortenkamp, Eric Huber, Integrating Active Perception
with an Autonomous Robot, Proceedings on the 2nd International Conference on
Autonomous Agents, © 1998 ACM

[26] Zhaoping Li, Joseph J. Atick, Towards a Theory of the Striate Cortex, Neural
Computation, Vol6, Number 1, Jan 1994, pp 125-146.

[27] Trevor Mundel, Alexander Dimitrov, Jack D Cowan, Visual Cortex Circuitry and
Orientation Tuning, Advances in Neural Information Processing Systems, volume 9,
1997, pp 887-893 © Cambridge MA: MIT Press.

[28] F. Heitger, L. Rosenthaler, R von der Heydt, E. Peterhans, O. Kubler, Simulation of
neural contour mechanisms: From simple to end-stopped cells, Vision Research 32,
pp 963-981, 1992.

[29] Friedrich Heitger, Rudger von der Heydt, Esther Peterhans, Lukas Rosenthaler, Olaf
Kuber, Simulation of neural contour mechanisms: representing anomalous contours,
Image and Vision Computing 16, 1998, pp 407-421

[30] Rolf P. Wurtz, Tino Lourens, Corner Detection in color images by multiscale
combination of end-stopped cortical cells, ICANN '97, Lecture Notes in Computer
Science vol. 1327, pp 901-906 © 1997 Springer Verlag.

121

[31] David G. Lowe, Towards a Computational Model for Object Recognition in IT
Cortex, First IEEE International Workshop on Biologically Motivated Computer Vision,
Seoul, Korea (May 2000).

[32] David G. Lowe, Object Recognition from Local Scale-Invariant Features, ICCV,
1999

[33] Edmond T. Rolls, Nicholas C. Aggelopoulos, Fashan Zheng, The Receptive Fields
of Inferior Temporal Cortex Neurons in Natural Scenes, The Journal of
Neuroscience, January 2003 vol. 23, pp 339-348.
[34] Alexander Dimitrov, Spatial Decorrelation in Orientation-Selective Cortical
Cells, Neural Computation 10, 1998, pp 1779-1795

[35] O. Henricsson and F. Heitger, The Role of Key-Points in Finding Contours, in
Computer Vision - ECCV'94, edited by J. O. Eklundh, vol. II, pp. 371-383, SpringerVerlag, Berlin (1994).

11.2. Human Spatial and Visual Systems Textbooks
[36] Kolb, Whishaw, Fundamentals of Human Neuropsychology 4th Ed, Chapter 19,
pp 438-464, © 1980, 1985, 1990, 1996 by W. H. Freeman and Company

[37] Stanly Coren, Lawrence M. Ward, James T. Enns, Sensation and Perception 6th Ed,
2004, John Wiley and Sons Inc.

122

11.3. Vision and Artificial Intelligence Textbooks
[37] Ra mesh Jain, Rangachar Kasturi, Burial G. Schunck, Machine Vision, © 1995 by
McGraw-Hill, Inc

[38] George F Luger, William Stubblefield, Artificial Intelligence 3rd Ed, © 1998
Addison Wesley Long man Inc

[39] Milan Sonia, Vaclav Hlavac, Roger Boyle, Image Processing, Analysis, and
Machine Vision 2nd Ed, © 1999 by Brooks/Cole Publishing Company

[40] Patrick Henry Winston, Artificial Intelligence, 3rd Ed, © 1992 Addison-Wesley

[41] David Marr, Vision, 1982, WH Freeman and Company

[42] John F. Sowa, Conceptual Structures: Information Processing in Mind and
Machines, © 1984 Addison Wesley, Reading, MA

123