A BASIS FOR PRONOMINAL ANAPHORA RESOLUTION USING
A MODEL OF WORKING MEMORY AND LONG-TERM MEMORY

Clifford J ames Thompson
B.Sc ., University Of Northern British Columbia, 2001

Thesis Submitted In Partial Fulfillment Of
The Requirements For The Degree Of
Master Of Science
In
Mathematical, Computer, And Physical Sciences
(Computer Science)

The University Of Northern British Columbia

© Clifford James Thompson, 2006
May 2006
UNIVERSITY of NORTHERN
BRITISH COLUMBIA
LIBRARY
Prince George, B.C.

Abstract
This thesis presents a new theory of information modelling in natural language processing that attempts to resolve anaphoric references, while also addressing the problem of
knowledge complexity. A modular model of semantic representation is introduced that
addresses the deficiencies of existing representations, as well as the drawbacks associated
with expanding these semantic representations. Rather than using a single semantic representation to model human knowledge and the knowledge within a sentence, the theory
proposes a modular, multi-level model which is based around a semantic network. The
behaviour of the model uses theories on the nature of working and long-term memory
from cognitive psychology. Two methods of artificial neuron activation and decay were
implemented - the ACT-R model and the Thompson model. Maximum success rates of
54.10% and 83.61% were achieved for The Three Brothers corpus, and maximum success
rates of 56.00% and 86.67% were achieved for the Rumpelstiltskin corpus.

11

This thesis is approved
for recommendation to the
Graduate Council.

Thesis Director:

Dr. Charles Brown

Thesis Committee:

Dr Charles Brown

Dr. William Owen

Dr. Liang Chen

Table of Contents
Abs tract

ii

Lis t of Figures

vi

List of Tables

v ii

Chapt er 1 - Ove rview
1.1 Introduction . . . . .
1.2 Pronominal Anaphora Resolution .
1.3 Outline of Thesis . . . . . . . . . .

2

Chapte r 2 - D e cla ra tive Se m a ntic Structures
2.1 Semantic Networks .. .
2.2 Conceptual Graphs .. .
2.3 Conceptual Hierarchies .
2.4 Event Timeline Models
2.5 Thematic Roles . . . . .
2.6 Logical Form and Quasi Logical Form
2. 7 Conclusion . . . . . . . . . . . . . . .

5
6
7
9
10
11
13
16

Chapter 3 - Working & Long-Te rm M e mory
3.1 Working Memory Models . . . . . . . . . . . .
3.2 Long-Term Memory Models and Associativity .
3.3 ACT-R . . . . . . . . . . . . . . . . . . . . . .
3.4 Computational Models of Memory . . . . . . .
3.5 Neural Network Models of Associative Memory
3.6 Conclusion . . . . . . . . . .. . . . . . . . . .

17
17
19
20

Chapter 4 - The ory on Combining Se m a ntic Structures
4.1 Linking Semantic Structures . . . . . .
4.2 Modularization and The Human Mind
4.3 Conclusion . . . . . . . . . . . . . . .

29

Chapter 5 - Ana phoric R e fe re nce a nd R e fe re nce Resolution
5. 1 Types of Anaphoric Reference . . .
5.2 Problems in Anaphora Resolution
5.3 Anaphora Resolution Algorithms
5.4 Resolution Failure
5.5 Conclusion . . . . . . . . . . . .

33
33
33
34

2

2

3

iv

22
23
28

29

32
32

37

38

Cha pte r 6 - Syst e m Mode lling
6.1 Modelling Long-Term Memory
6.2 Working Memory . . . . . . . .
6.3 Prolog Model of Working Memory
6.4 English Grammar Rules . . .
6.5 Annotated P arse Tree Model . .
6.6 Lexical Feature Sets . . . . . . .
6.7 Anaphora Resolution Algorithm
6.8 Summary of Chapter . . . .

39

Chapter 7 - Syste m Tes ting
7. 1 Overview of Testing Methodology.
7.2 Testing P latform . . . . . .. .
7.3 Decay Rates . . . . . . . . . . .
7.4 Default Memory Configuration
7.5 Testing Procedure . . . . . . .
7.6 Anaphora Resolution Results .
7.7 Working Memory Capacity Results .
7.8 Discussion of Results . . . . . . . . .
7.9 Classification of Observed Error Types .

49

Chapter 8 - Conclusions a nd Future Work
8. 1 Comparison with Related Work .
8.2 Future Work
8.3 Conclusion

59
59

Bibliography

63

Appendix A - Corpora

71

Appendix B - Tagged Corpora

75

39

41
42
43
44
45

46

47

v

49
49
49

50
50

51
52
52

54

60
62

List of Figures
2.1 Probabilistic Semantic Network . . . . . . . . . . .
2.2 Conceptual Relations of Arity (a) 1 (b) 2 and (c) 3
2.3 Conceptual Graph for Mary gave John the book.
2.4 Conceptual Graph for John is going to Boston . . .
2.5 Another Conceptual Graph for John is going to Boston
2.6 Conceptual Hierarchy . . . .
2.7 Timeline Model . . . . . . . . . . . . . . .
2.8 Reichenbach Timeline Model . . . . . . .
2.9 [Altmann 1999] Thematic Implausibility: .
2.10 Sentences Implying The Same Event . . .
2.11 Logical Form Rules from the Core Language Engine [Alshawi et al 1989]
2.12 Logical Form for Every doctor visited Mary . .
3.1 Baddeley and Hitch's Working Memory Model
3.2 Hunt 's Distributed Memory Model
3.3 Biological Neuron . . . . . . . . . . . . .
3.4 Multi-Input Neuron . . . . . . . . . . .
3.5 Multi-Layered Artificial Neural Network
4.1 Semantic Network Linking Semantic Representations .
4.2 Multiple Semantic Representations . . . . . .
5.1 Brown's Anaphora Resolution Algorithm
5.2 Two Parsings of I saw a man with a teles cope
5.3 Mitkov's Anaphora Resolution Algorithm . .
6.1 Annotated Parse Trees with (a) Unresolved and (b) Resolved Antecedents
6.2 Pronoun Reference Resolution Algorithm Pseudo-code . .
7.1 The ACT-R Model Resolution Results- 3 Brothers . . . .
7.2 The ACT-R Model Resolution Results- Rumpelstiltskin .
7.3 The Thompson Model Resolution Results- 3 Brothers . .
7.4 The Thompson Model Resolution Results- Rumpelstiltskin
7.5 The ACT-R Model Working Memory Max Capacity- 3 Brothers
7.6 The ACT-R Model Working Memory Max Capacity- Rumpelstiltskin
7.7 The Thompson Model Working Memory Max Capacity- 3 Brothers
7.8 The Thompson Model Working Memory Max Capacity- Rumpelstiltskin
7.9 Working Memory Contents Comparison - 3 Brothers . . .
7.10 Working Memory Contents Comparison- Rumpelstiltskin . . . . . . . . .

Vl

7
8
8
8
9
9
10
12
13
13
15
15
19
24
25
26
27
30
31
36
37
38
45
47
51
52
53
54
55
56
56
57
57
58

List of Tables
2.1
2.2
6.1
6.2
6.3
8.1

Example Sentences Taken from [Allen 1995]
Thematic Roles . . . . . . . .
Semantic Network Predicates
Lexical Feature Set . . .
Semantic Feature Set ..
Algorithm Comparison .

Vll

11
14
40
46
46
60

Acknowledgements
This thesis would not have been a success without the patience and hard work of
many individuals to whom I am greatly indebted. First, and foremost, I must thank my
supervising professor, Dr. Charles Brown, for introducing me to the field of Artificial
Intelligence and encouraging me to enter graduate studies. Your selfless dedication of
time and effort will always be remembered. I must also thank Dr. William Owen for his
advice in a field that was completely foreign to me. Your help and direction allowed me to
achieve the multi-disciplinary approach that I persevered for. I also would like to thank
Dr. Liang Chen for his advice and comments, Dr. Paul Siakaluk for taking the time to be
the external examiner for my thesis, Erin Beveridge for the original lb-TEX style files, and
UNBC for awarding me the Graduate Entrance Scholarship (which made the decision of
graduate studies much easier). And finally, I must thank my wife Coral. Without your
love, advice, encouragement, and general kick in the butt, this thesis would have not have
been possible.

1

CHAPTER

ONE

Ove rview

She had a pretty gift for quotation, which is a serviceable substitute for wit.

- W. SOMERSET MAUGHAM

1.1

Introduction

It seems ironic that although natural languages are very difficult to model, the languages

themselves are quite effective and efficient for communication. If humans use their native
language with ease, then why is it so hard for computers to understand natural languages?
One of the most obvious answers is that the human brain is so complex. The complexity
of human knowledge, and the medium on which it is stored and processed, cannot be
understated.
This thesis presents a new theory of memory modelling in natural language processing
that attempts to resolve anaphoric references, while also addressing the problem of complexity. Rather than using a single semantic model to represent human knowledge and the
knowledge within a sentence, the theory proposes a more general model where multiple
semantic representations can be used in a system that models the observed behaviour of
working and long-term memory.

1.2

Pronom inal Anaphora R esolution

The goal of this thesis was to develop a multi-level model of human memory that is modular
and flexible, processes multiple well-known semantic representations such as semantic
networks, conceptual graphs and quasi-logical form , and uses these models to resolve
2

anaphoric references. In-particular, this thesis focused on the resolution of simple pronouns
such as he, she, they, etc. In the most general of cases, pronoun resolution is quite simply
a matter of searching backwards through a corpus of text until the first noun phrase that
matches such attributes as number and gender is found:

I

I h work bench. To his

Bright and early the next morning, the shoemaker l1 rose and went to his

I

amazement, there on the table were two shoes

b, already finished. I They r were beautifully made,

neat and true, and with not a single fals e stitch.

The situation can be made slightly more complex by making antecedents separate
entities in the context:
For some time that same thing happened, until I the good man 11 and I his wife j2

were thriving

and prosperous. But I they~ were not satisfied to have so much done for
whom I they ~ should be grateful.

But of course, this is not always the case. The next example, adapted from [Sidner 1983],
demonstrates where this method of resolution can break down:

I My neighbours j1 have I a monster Harley 1200 j2. I They ~ are really huge but gas efficient bikes.
In the second sentence, if an individual was to read just the pronoun they, their initial
preference for the reference may not be a monster Harley 1200 based on number alone.
In this context, a common preference for the pronoun they would be my neighbours. After
reading the remainder of the second sentence, it is apparent that this conclusion was
incorrect. Given the additional context, common knowledge concludes that the neighbours
are not motorcycles 1 .

1.3

Outline of Thesis

Chapter 2 discusses numerous semantic representations that have been introduced over the
past few decades. The general domain of use is covered for each semantic representation
1

That is unless your neighbours actually are motorcycles.

3

as well as some of the drawbacks for each structure. Chapter 3 will discuss the evolution
of theories of short-term and long-term memory as well as current theories in human
memory. Several modern psychological models of short-term and long-term memory will
be elaborated on, as well as some computational memory models. In Chapter 4, the
discussion on semantic representations will move towards a theory on combining semantic
representations to overcome their individual deficiencies with the intention of creating a
system that is easier to understand and easier to expand. The theory will also model
human behaviour more closely.
Chapter 5 will discuss the current state of understanding in pronominal anaphoric
reference and anaphora resolution. Several theoretical problems will be introduced and
discussed. A number of modern anaphora resolution algorithms will be presented that
attempt to solve anaphoric reference issues. The chapter will conclude with an examination of what occurs when humans fail to resolve anaphoric references. In Chapter 6,
the combined semantic representation model, memory models, and anaphora resolution
algorithms will be integrated into a system that will attempt to solve anaphoric reference
problems introduced in Chapter 5. Chapters 7 will cover the methodology for testing the
implemented model, the results of testing, and a discussion of those results. Chapter 8
will conclude the thesis by comparing the testing results with the results of other models,
and a discussion on how the model presented in this thesis could be improved.

4

CHAPTER

TWO

Declarative Semantic Structures
Oh, and sir, you're wrong. We won't be apart - we just won't be together.
-ARNOLD J. RIMMER (Holoship)

Although many types of semantic representations have emerged during the history
of natural language processing research, understanding in the domain of semantics is
still limited. Some models fall short and are intended for a limited knowledge domain.
Others can be expanded but the resulting expansions are often unclear or more difficult
to computationally manage.
In this chapter, a number of semantic representation models are examined. As each
model is investigated, the shortcomings of each model will be shown. The examination
of these shortcomings will lay the initial groundwork for a hypothesis on improving these
models. By integrating each semantic model separately into a larger, multi-level system,
it is hypothesized that the resulting system would be easier to expand than a system with
a single complex semantic model, and would provide a diverse knowledge base from which
an anaphora resolution algorithm, or group of algorithms, could draw from. J ames Allen
makes a statement in [Allen 1995] to this vain:
... a vigorous debate about knowledge representation is actually the result of
each of the debaters focusing on one of the aspects of representation without
considering the concerns of the other.

Humans apply much implicit knowledge when understanding an utterance. Information in long-term memory is not considered in many structures, and even if it is, the
information is stored only at the discourse level. Ignoring the complexity of a human
knowledge base only trivializes the vast learning power of the human mind.
5

2.1

Se mantic N etworks

Some of the earliest research with respect to semantic networks can be found in
[Quillian 1968] and [Collins and Quillian 1969]. Semantic networks were first introduced
as a model of human memory. How semantic networks are realized as models in computation is quite a broad topic.

Interpretation varies from graphs with concepts as

nodes and the associations between the nodes as links, to more complex graphs such
as Sowa's conceptual graphs [Sowa 1984] or conceptual hierarchies [Ma and !sahara 2000]
[Chung and Moldovan 1993] . For the purposes of this thesis, semantic networks will be
restricted to the first definition, graphs with concepts as nodes and associations as links.
Figure 2.1 represents a semantic network that has a strength associated with each link.
The network roughly represents an artificial neural network, which will be discussed further
in Chapter 3. Each node 1 represents a single topic or concept. Thus, the relationship
between two semantic concepts is based on the strength of the association between the two
semantic concepts. The drawback of this model is that it only models a loose relationship
between topics. It does not identify what the relationship is. The next example illustrates
how drawing the appropriate knowledge from a semantic network would be difficult:

John had a son named Bob. His son is an excellent skier.
In this example, an anaphora resolution algorithm would have a difficult time resolving the possessive pronoun His without the father-son relationship being modelled more
explicitly.
[Kazuhiro et al 1992] and [Berger et al 2004, Belew 1987] demonstrate examples of semantic networks with weighted links being used in kana-kanji conversion and information
retrieval, respectively. In Chapter 4 we will see that semantic networks will not be used
to model knowledge directly. Rather, they will be used to connect semantic concepts and
their associated semantic representations.
1

A node does not necessarily represent a single neuron within the human brain. A node could represent
a group of neurons.

6

Figure 2.1: Probabilistic Semantic Network

2.2

Conceptual Graphs

Some of the earliest work with respect to conceptual graphs can be attributed to Sowa in
[Sowa 1976, Sowa 1979, Sowa 1984]. Conceptual graphs are defined as a directed bipartite
graph with two types of nodes. Each node in the graph can be either a concept or a
conceptual relation.

Concepts can be concrete (such as cat), or they can be abstract

(such as sadness) . Conceptual relations can have an arity of n 2: 1. Figure 2.2 illustrates
conceptual relations with various arities.
Conceptual graphs are not limited to the simple relations shown in Figure 2.2. They
can also model simple sentences, as seen in Figure 2.3. Since conceptual graphs are used
extensively in database systems , relational database theory allows us to perform certain
operations to obtain new conceptual graphs, such as copy, restrict, join, and simplify. In
their basic form, conceptual graphs do not model the strength of relationships. Common
knowledge dictates that information stored in long-term memory is not as concrete as the
conceptual graph model it to be. Fuzziness with respect to relations is not accounted
for. Conceptual graphs, as defined by Sowa, do not model temporal information implied
by verb tense and verb aspect. This deficiency is apparent in Figure 2.4 adapted from
7

~
(a)

colour
(b)

(c)

Figure 2.2: Conceptual Relations of Arity (a) 1 (b) 2 and (c) 3

[Sowa 2000]

Figure 2.3: Conceptual Graph for Mary gave John the book.

Figure 2.4: Conceptual Graph for John is going to Boston

It is apparent that the tense and aspect have been lost for the verb phrase is going.

Tense and aspect could be reflected by adding another relation, as shown in Figure 2.5.
How would we model one event occurring before another event? Another relation would
seem to be a possible answer. This process of adding relations to reflect previous missed
information could lead to quite a rat 's nest. It becomes difficult to separate verbs and
nouns from other semantic information without adding more relations, and it potentially
becomes a model that is more difficult to expand and maintain.

8

Figure 2.5: Another Conceptual Graph for John is going to Boston

2.3

Conceptual Hierarchies

Conceptual hierarchies are a very common structure in object oriented programming.
They allow programmers to show how certain objects inherit the properties of another object. Figure 2.6 illustrates an example of how classification and sub-classification have been
observed by biology throughout the world. Subclasses inherit attributes from their superclass as well as adding their own attributes. Research in cognitive categorization, such as
[Kay 1971, Rosch et al 1976], suggests that the human mind stores and groups information based on taxonomy in long-term memory. One of the main advantages of conceptual
hierarchies is that they are very efficient at storing information [Ma and !sahara 2000].
Information common to many concepts is only stored once.

Figure 2.6: Conceptual Hierarchy

9

One t he main disadvantages of concept ual hierarchies is t heir intended use: modelling
concepts and t he inheritance of characteristics between concepts. They are not intended
to model entire sentences or any kind of temporal information.

2.4

Eve nt Timeline Mode ls

The Bull Framework
In [Culce-Murcia and Larsen-Freeman 1999], Celce-Murcia and Larsen-Freeman adapted
t he The Bull Framework [Bull 1960] (originally created for Spanish) for teaching ESL
students English. The Bull Framework proposes four axes of t ime: past, present fut ure,
and hypothetical. The first three axes contain a point of reference in the cent re and the
times occurring before and after the t ime of reference to t he left and right, respectively.
Figure 2. 7 demonstrates an example using t he verb ski. The fourt h axis, hypothetical, is
used to model hypot hetical events, for example, events created using constructions like

if. .then.

Past

•

Past
Future

•---------· --------------

Present

ski : I

•

Pas t

Present

ski : !

Future

•---------· --------------

Present

Future

Future Perfect
ski: !
- -• - - -• - - - - - - - - -• ------------------• - Past Perfect
Simple Pas t
Present
Simple Future

"/ skied in the Whistler-Blackcomb backcountry."

Figure 2.7: T imeline Model

Since t he Bull Framework was only intended for explaining verb tenses, it does not
suitably model many aspects of natural languages such as nouns, adj ectives, and adverbs.

10

The Reichenbach Theory
In [Allen 1995], Allen presents Reichenbach's theory on timeline representation from
[Reichenbach 1947].

Reichenbach theorized that each verb embeds information about

three points in time: time of speech (S), time of the eventjstate(E), and time of reference(R ). In the simple aspect 2 , the time of the event/state and the time of reference are

always equivalent. This equivalency does not exist for the perfect and posterior aspects 3 .
Figure 2.8 gives a table outlining the various timelines for different tense and aspect combinations. Table 2.1 shows some example sentences and their tense/ aspect. Reichenbach
methodology is very similar to that of Bull's in that they both attempt to model a type
of temporal ordering, which is implied by varying tense and aspect combinations in sentences. Naturally, Reichenbach's theory does not attempt to account for the nature of
nouns, adjectives, and adverbs.
Tense
Simple Present
Simple Past
Simple Future
Perfect Present
Perfect P ast
Perfect Future
Posterior Present
Posterior P ast
Posterior Future

Example Sentence
J ack sings
Jack sang
Jack will sing
J ack has sung
Jack had sung
Jack will have sung
J ack is going to sing
J ack was going to sing
J ack will be going sing

Table 2.1: Example Sentences Taken from [Allen 1995]

2.5

Thematic Roles

Thematic roles are linguistic entities (embodied in the form of noun phrases) that satisfy
certain semantic constraints implied by the main verb phrase of a sentence. The idea of
2

It must be noted that the progressive aspect and perfect progressive aspect are missing from Allen's
listings.
3
Allen refers to simple as being a tense, not an aspect.

11

Present

Simple

Perfect

~

IE I;

.... . . . .. . ............ .. .. . . . . . ... . . --- - - -------- - -- - · · -

Past

~ Is

IEIRis

. ..... . . ............... .

Future

Is I;

··· · ·· ······ ····-

lsiEIR

Posterior

-- ----- - ·-· -- -- -- - · · ·-

1

RIs IE
.

·········· ·····

Is IRIE

Figure 2.8: Reichenbach Timeline Model

thematic roles draws a close parallel to the morphological case systems found in languages
such as German and Latin, but expands on the case system by adding a much larger
number of cases.
Verb phrases require that these thematic roles are present before a sentence can make
sense semantically. Altmann demonstrated in [Altmann 1999] that even if all thematic
roles are met for a verb phrase, if the antecedent of a thematic role is not plausible, the
sentence will not make sense. Figure 2.9 gives an example of (a) an implausible antecedent
to a thematic role, and (b) a plausible antecedent to a thematic role. A major drawback
of the Thematic Role model is that it only considers concepts at the sentence level. It
does not attempt to address how concepts can be inter-related throughout are large body
of text.
The structure of English allows thematic roles to be located at different syntactic
positions within a sentence. The result is a sentence with a different syntactic structure
and more emphasis can be placed on certain roles. Although the syntactic structure is
different , when constructed properly, the new sentence should describe the same event.
Figure 2.10 gives an example. It is apparent that the antecedents of thematic roles can be
extracted from a sentence based on their semantic contribution to that sentence, rather
than the syntactic contribution. Table 2.2 outlines some of the thematic roles proposed
by Sowa in [Sowa 2000].
12

(a) A young toddler was running around his playroom. It was

empty except for some chairs in one corner and some pet
cats in the other. He chased a chair that he had run into
before.

(b) A young toddler was running around his playroom. It was

empty except for some chairs in one corner and some pet
cats in the other. He bumped a chair that he had run into
before.

Figure 2.9 : [Altmann 1999] Thematic Implausibility:
(a) Bart threw a chicken at the house.
(b) A chicken was thrown at the house by Bart.

Figure 2.10: Sentences Implying The Same Event

2.6

Logical Form and Quasi Logical Form

T he Core Language Engine was developed at the Stanford Research Institute and The
Center for the Study of Language and Information at Stanford University. The meth-

ods of anaphora resolution in the Core Language Engine [Alshawi et al 1989] are heavily
motivated by its internal semantic representation, logical form and quasi-logical form.
Quasi-logical form is based on first order logic, which has been used widely in the fields
of philosophy and linguistics. The structure of logical forms is motivated by the desire to
use and extend first order logic, which is well suited for modelling quantifier scoping and
anaphora.
This section will discuss the Core Language Engine's fully scoped logical form, and a
logical form where scoping rules are relaxed for reference resolution, quasi-logical form.
Although logical form and quasi-logical form were designed to handle many types of reference phenomena such as unscoped quantifiers, unscoped descriptions, and unresolved
relations, the phenomenon that will be focused on is unresolved reference. Resolution in

13

Agent
Beneficiary
Completion
Destination
Duration
Effector
Experiencer
Instrument
Location
Matter
Medium
Origin
Path
Patient
Point In Time
Recipient
Result
Start
Theme
Table 2.2: Thematic Roles

the Core Language Engine uses a set of reference resolution rules that propose possible logical forms that can transform a quasi-logical form statement into a logical form statement.
Fully resolved logical forms must conform to the following set of properties:
• should be expressions in a disambiguated language.
• should be suitable for representing the meanings of natural language expressions.
• should provide a suitable medium for the representation of knowledge expressed in natural
language, and they should be a suitable vehicle for reasoning.
Figure 2.11 lists some of the grammar rules used in the Core Language Engine to
model the logical form language, and Figure 2.12 shows an example of logical form for the
sentence Every doctor visited Mary.
Not all references can be resolved immediately using logical form. Sentences such as

(1) Most doctors read every article, and (2) the bishops arrived contain references where
the scope of quantification is not exactly clear. In sentence (1) , does each doctor in
14

(lf _formula)

---+

quant( (quantifier), (varia ble), (restriction), (body ))

(lf _formula

---+

[(functor ), (argument1 ), (argument2), · · · , (argumentn) ]

(functor)

---+

(atom)

(quantifier)

---+

foralllexistsl· · ·

(restriction)

---+

(lf _formula )

(body )

---+

(lf -formula)

(ar gument )

---+

(lf _formula)

Figure 2.11: Logical Form Rules from the Core Language Engine [Alshawi et al 1989]
quant(forall,D,[doctor1,D],
[past,
quant(exists,E, [event,E],
[visit,E,D,mary1])])
Figure 2.12: Logical Form for Every doctor visited Mary
the most doctors set read all articles or does the most doctors set collectively read all
articles. This question can also be considered for sentence (2). Is the arrival of each
bishop from the the bishops a separate event (the distributive reading), or does a single
arrival event encompass all the bishops (the collective reading). The quasi-logical form
language extends the grammar of the logical form language to include rules that handle
unscoped quantifiers, as in sentence (1), under-specified relations, as in sentence (2), as

well as many other reference phenomena.
The logical form semantic representation is very well suited for modelling reference
information where quantification is of key importance, and the discourse contains utterances where the exact scope of a quantifier is not clear. One of the biggest advantages
to using logical form is that it allows us to use a large body of knowledge relating to
first order logic. Although the Core Language Engine models some verb-structures 4 using
logical form and quasi-logical form , it does not attempt to perform any temporal ordering
on the verb structures (as in the case of the timeline models from Section 2.4). Another
slight deficiency in logical form is how it handles "fuzzy" quantifiers such as some, many,

15

and few. In the Core Language Engine, the threshold defining the boundaries of these
quantifiers is given a definite value. Common sense dictates that certain quantifiers, such
as some, are fuzzy, and subject to contextual factors and personal preference.

2. 7

Conclusion

This chapter has examined a number of semantic representations that are used in the fields
of natural language processing and knowledge management, such as semantic networks,
conceptual hierarchies, logical form and quasi-logical form . The benefits and potential
drawbacks were outlined for each representation. Chapter 3 will examine models of human
knowledge from the perspective of cognitive science. Chapter 4 will present a new model
of semantic representation that attempts to address the shortcomings mentioned in this
chapter by combining the representations into a modular multi-level system. This modular
system is easier to maintain and allows multiple anaphora resolution algorithms to operate
on a corpus simultaneously.

4

Using first-order logic to model a verb phrase, intuitively, does not seem the most natural method to
model that knowledge.

16

C H APTER

THREE

Working & Long-Term Memory
Although there is still great debate about the exact nature and structure of working
memory and long-term memory, the majority of cognitive psychologists agree that the two
forms of memory are distinct in their behaviour and capacity [Logie 1996]. The human
mind does not have infinite time and infinite working memory capacity.

If a natural

language processing system is to more closely model how humans process language, it
would make sense for that system to be constrained by the limitations and behaviour of
human memory. It is the intent of this thesis to stimulate more interest in memory model
approaches.
This chapter will outline some of the various views on working memory and longterm memory. It will briefly discuss the evolution of theories of short-term and longterm memory as well as current theories in human memory. Several modern psychological
models of short-term and long-term memory will be elaborated on , as well as some memory
models with a computational approach.

3.1

Working M emory Models

Some of the earliest work with respect to working memory can be found in the works of
William James (1905) [Richardson 1996]. James described working memory, then termed
primary memory, as being limited in capacity and volatile in nature. Primary memory was
considered to be a distinct system from long-term memory (called secondary memory at
the time). Information was retained in primary memory by rehearsal. Rehearsal was also
used to move information to and from secondary memory. Within this model, primary
memory did not control the flow or manipulation of the information, it only provided a
medium of storage.

17

Richardson continues by describing that during the 1960's, the theory of working
memory, then termed short-term memory, was extended to include a control mechanism.
This mechanism was responsible for the flow of information as well as processing. Instead
of being used only for the storage of information, the space in short-term memory was
shared with the processing of the control mechanism. Thus, in this model, there was a
trade-off in working memory between processing power (in the control mechanism) and
storage capacity.
The work of Baddeley on working memory is some of the most prominent. Gathercole
and Baddeley outline in [Gathercole and Baddeley 1993] the structure and behaviour of
Baddeley and Hitch's working memory model. Figure 3.1 gives a visual representation
of their model. Gathercole and Baddeley state that the central executive is the most important component of the model. The central executive is responsible for controlling the
flow of data within working memory, the retrieval of data from long-term memory and
other memory systems, and the processing and storage of data. Baddeley expanded the
model to include an episodic buffer in [Baddeley 2000]. In addition to the central executive, an additional two slave systems are also included in the working memory model, the
phonological loop and the visuo-spatial sketch-pad. The phonological loop is responsible

for verbal information while the visuo-spatial sketch-pad handles visuo-spatial information.
Baddeley and Hitch used dual-task experiments in [Baddeley and Hitch 1974] to justify
the separation of the two slave systems. They discovered that when a subject performed a
verbal and a visual task concurrently, the individual could perform the tasks as efficiently
as if the task were performed serially. When the number of tasks for a single slave system was increased to two tasks, the subject could not perform the tasks as efficiently as
performing them one at a time.

Limits of Working Memory
The limits of working memory are as intensely debated as the structure of working memory.
Early theories, such as Miller [Miller 1956], place specific limits on working memory. In

18

Yisuo-Spatial Sketch Pad

Central Executive

(

Phonological Loop

)

Figure 3.1: Baddeley and Hitch 's Working Memory Model

Miller 's view, working memory was seen as a short-term storage without any processing
ability. More recent work, such as that found in [Baddeley 1986, Haarmann et al 2003],
assumes that working memory is a multi-component system with processing capacity being
inversely proportional to storage capacity. In [Baddeley 1990] , Baddeley hypothesized that
the span of working memory could partially be the result of the refresh-rate of items within
the current memory span :
If we assume that memory fades , then the memory span will be determined

by the number of items that can be refreshed before they fade away. That
number, of course, will depend on how rapidly the trace fades and on how long
it takes to articulate each item and hence refresh each memory trace.

3.2

Long-Term Memory Models and Associativity

Federmeier and Kutas mention in [Federmeier and Kutas 1999] that although long-term
memory is an integral component of sentence processing, the exact nature of how workingmemory interacts with long-term memory information is still largely unknown. Just as
there exists a debate as to the exact nature of working memory, differences in opinion
also exist on how pieces of information are associated within long-term memory. Federmeier and Kutas outline two hypotheses, the independent association hypothesis and the
associative symmetry hypothesis. The independent association hypothesis states that asso-

ciations in memory are not bidirectional. That is, given that the recall of item A triggers
19

the recall of B , A ---+ B , does not necessarily imply that the converse, B ---+ A , is true.
In the associative symmetry hypothesis, given A ---+ B implies that B ---+ A. Using the
independent association and associative symmetry hypotheses in the context of reading,
when a word corresponding to a semantic concept is accessed through the reading of a
sentence, the activation levels of neighbouring concepts may also increase.

3.3

ACT-R

The ACT-R theory of human cognition is rooted in ACT-E theory and ACT* theory
which were introduced by John Anderson in [Anderson 1976, Anderson 1983], respectively.
ACT-R models the interaction between two types of knowledge: procedural knowledge and
declarative. Procedural knowledge involves rules that define human cognitive behaviour.
Anderson formally calls these rules productions in [Anderson et al 2001].

Declarative

knowledge encompasses factual information that defines behaviour of cognition, defined
by Anderson as chunks. Examples of declarative knowledge are the sky is blue or snow
is white. One of the major factors that influences cognitive performance in the ACT-R
system is the granularity at which processing occurs. Production rules take at least 50ms
and at most 500ms to fire.

Procedural Long-Term Memory
ACT-R is a goal-oriented system that uses productions to define the cognitive behaviour
that acts upon declarative memory. Productions define actions such as retrieving information to be processed, as well as actions that define the manipulation of the retrieved
information.

Declarative Long-Term Memory
In addition to the procedural memory, the ACT-R system also models declarative knowledge, that is, knowledge that is defined as being not factual and does not control the
behaviour of cognition. Declarative memory is composed of chunks that are differentiated
20

using a unique identifier. Each chuck has a type and may contain multiple slots, with
each slot linking to additional chunks. A good example can be taken from [Group 2004].
Using the sentence the dog chased the cat, we can derive the following chunk of declarative
memory:
Action023:
isa chase
agent dog
object cat
In this example, the type of the outer chunk is isa chase, and the two slots of the
chunk are filled with the chunks agent dog and object cat.

Declarative Memory Activation
In the ACT-R system, the retrieval of declarative chunks in memory is governed by the
speed at which they can be accessed. In [Anderson and Matessa 1997], Anderson defines
activation equations that predict the power law of learning and the power law of forgetting.
The activation level, Ai, of a declarative memory chunk 1 is define as follows:

(3.1)
where Bi is the base level activation of the chunk i, Aj is the activation of a chunk j
within the current focus of attention, and sji is the strength of the association between
chunk j and chunk i. The base level activation, Bi, models the recency and frequency of
activation of the chunk i, and thus has a factor of decay associated with it. Bi is defined
by ACT-R as:
Anderson notes that the definition of chunks should not be confused with the definition from
[Miller 1956]
1

21

(3.2)

where tk is the time since the kth 2 use of the chunk i, and w is the activation decay. In
[Anderson and Matessa 1997, Anderson et al 1998],the value of w is fixed at 0.5. In order
for a chunk of declarative memory to be retieved and brought into the current focus, the
threshold of activation, O", must be met.

3.4

Computational Models of Memory

A survey of memory would not be complete without examining those models created
from a computational perspective. In particular, this section will examine the models of
memory presented by Schank in [Schank 1986] and Hunt in [Hunt 1973] . Although the
types of memory described by Schank and Hunt may have much information overlap with
previously described models, it is still relevant to examine computational models along
with psychological approaches. To some respect, the idea of memory modelling has been
largely ignored in the field of computational linguistics.

Event Memory and Generalized Event Memory
Event memory contains semantic knowledge for particular events experienced in a person's
life. Schank states that events can be such things as going to Dr. Smith's dental office last
Tuesday, and getting your tooth pulled or forgetting your dental appointment and having
them call you up and charge you for it. As a specific event remains in memory longer,
the exact details of the event begin to become less salient , and eventually, the event may
become a more generalized event or it may disappear entirely.
Generalized event memory, as the name describes, is a more generalized version of the
2

1n [Anderson and Matessa 1997], the index k is actually j. This was changed to prevent confusion
with the index j in the equation for Ai. In addition , d is used as the decay rate, instead of w and Aj is
used for the activation of node j instead of Wj

22

event memory described earlier. Generalized event memory is modelled as a portion of
memory that contains abstract events, i.e. events that have occurred numerous times and
thus have a template associated with them. As events from event memory are brought into
short term memory the associated generalized event is also brought in to aid in cognitive
processing. One of the results from this behaviour is that events from event memory will
become less and less salient and the more generalized event will only remain.

A Distributed Memory Model
In [Hunt 1973], Hunt describes a model of memory that builds upon the basic memory
model containing only short-term memory and long-term memory. As shown in Figure
3.2, Hunt adds an intermediate term memory structure that resides between short-term
and long-term memory, and buffer memory which is analogous to sensory memory in
other literature. Intermediate-term memory stores information about the current situation or episode, and thus intermediate-term memory is volatile like short-term memory.
The buffer memory is the most volatile of the structures, storing stimulus from sensory
input, such as auditory input, for only brief periods of time. In contrast to Baddeley's
model of working memory where information flow within working-memory is controlled
by the central executive, Hunt's model places control within the respective memory substructures, Hunt notes that in his distributed memory model each memory component is
likely to be associated with varying anatomical areas of the human brain, but does not

provide evidence.

3.5

N eural N etwork Mode ls of A ssociative Memory

Neural Networks, or more correctly artificial neural networks, attempt to model the behaviour of neurons within the human brain. Some of the earliest work on the modelling of
artificial neurons can be attributed to McCulloch and Pitts in [McCulloch and Pitts 1943]
and Hebb in [Hebb 1949]. Since the early work of McCulloch and Pitts, the field of artificial neural networks has developed into a mature field with large amounts of research in
23

Data flow
--------- .,_ Control information

Short term memory

Intermediate term memory

Buffers

Buffers

Long term memory

~

e

~

Figure 3.2: Hunt's Distributed Memory Model

areas such as network topology and neuron structure.
Artificial neurons are modelled after biological neurons. Axons send out signals to
another neuron's dendrite. If the sum of the signals received by a neuron is greater than
some threshold, then the neuron fires, sending signals along its axons to other neurons.
Figure 3.3, adapted from [Russel and Norvig 1995], is an example of a typical biological
neuron. Artificial neurons attempt to model neurons at the level of a single biological
neuron. An artificial neuron does not attempt to model the actual physical chemical
reactions occurring, rather, they model neurons at more of a cause and effect level.
A typical artificial neuron can be found in Figure 3.4. The input values for neuron i,
Aj , can be thought of as the dendrites. The output value, Ai, is an axon. The strength

of the link between node i and some node j # i is modelled using the Sji term. The bias
24

A xon- - --

Synapse
Axon - - --

Figure 3.3: Biological Neuron

value, Bi , is often used to influence base-level activation, allowing Ai to be non-zero. The

activation function of a neuron takes into consideration the input values, Aj, and their
associated link strengths, Sji, and generates a result f. Typically, the neuron activation
function f for a node i is defined as follows:

f

k

Bi + LAjSji

(3.3)

j=O

The final portion of an artificial neuron is the threshold unit. If the total signal received
by the neuron is greater than some threshold, the neuron will fire. The actual value of the
threshold is defined by a threshold function, g(f). The resulting output from the neuron,
is represented as A. In general, g can be any single-variable function , but the Equations
3.4-3. 7 are some of the more commonly used.

25

I

Inputs

Hard Limit Neuron

I I

~

A,;s: L

_[

f

I

A;

g

~B;

I

I I

I

Figure 3.4: Multi-Input Neuron

hardlim(n)

=

if n 2: 0

{: if < 0

(3.4)

n

satlins(n)

=

logsig(n)

=

tansig(n)

=

1

if n > 1

-1

if n < -1

n

if -1 < n < 1
1

(3 .5)

(3.6)
(3.7)

Neural Network Topology
A single set of inputs to a neuron is not very useful, or realistic. Of the lOll neurons in the
brain , each neuron is connected to 104 other neurons. One of the most common configurations of neural network topology is shown in Figure 3.5 3 . This three-layer configuration
allows arbitrary functions to be represented, and is the most commonly used in pattern

matching applications.
3

Each circle in the diagram represents a multi-inpu t artificial neuron.

26

Input

Layer

Hidden
Layer

Output
Layer

Figure 3.5: Multi-Layered Artificial Neural Network

Hebbian Unsupervised Learning
An artificial neuron by itself does not learn to perform complex tasks or match complex
patterns. A learning strategy must also be applied. Two strategies of artificial neuron
learning are supervised and unsupervised learning. This thesis will concentrate on the
latter of the two learning strategies. Within the human brain, neurons are connected to
many neighbouring neurons. The neuron must be capable of creating implicit associations
without direct intervention. Before outlining how an artificial neuron can learn to make
associations, it's important that neuron association be properly defined. Hebb's postulate
from [Hebb 1949] states:
When an axon of cell A is near enough to excite a cell B and repeatedly or
persistently takes part in firing it , some growth process or metabolic change
takes place in one or both cells such that A's efficiency, as one of the cells firing
B, is increased.

Using Hebb's observation, the strength of the association between a neuron i and a
neuron j can be realized using Equation 3.8 adapted from [Luger and Stubblefield 1998]:

(3.8)
27

Sij is the strength of the association between neuron i and neuron j , a

is the learning

rate of the semantic link, and Ai and Aj are the current activation levels of neuron i and
neuron j, respectively.

3.6

Conclusion

This chapter examined numerous theories on the structure and behaviour of human memory. The evolution of working memory and long-term memory was examined from the perspective of cognitive psychology. Modern theories on human memory were also examined.
Models of human memory from computation, such as neural networks, event memory,
generalized event memory, and the distributed memory model were discussed. Baddeley's
work with respect to working memory has demonstrated t hat there is a trade-off of capacity versus processing power in working memory. The ACT-R model established how
associations between concepts can be modelled as well as the activation of those concepts.
The discussion on neural network demonstrated how human memory can be modelled by
using the behaviour of biological neurons as its basis. In Chapter 6, the work of Baddeley, the ACT-R model of activation, and neural network theory will provide a basis for
the behaviour and structure of a memory model. A second model of activation, called
Thompson's model, will also be introduced in Chapter 6. The results of testing using the
ACT-R and Thompson model will be discussed in Chapter 7.

28

CHAPTER FOUR

Theory on Combining Semantic Structures
In Chapter 2, numerous semantic structures were outlined, as well as some of their
shortcomings in their purest forms. A common thread was that each structure needed to be
expanded in order to accommodate additional types of semantic information. The result
is a structure that is possibly more complex, without the guarantee that the new structure
can adapt to the dynamic nature of human knowledge and language. A structure that
works today may not necessarily work tomorrow. From a software engineering perspective,
modularizing the semantic structures makes the system easier to understand and easier
to expand. As deficiencies are found in the semantic structures, new structures can be
incorporated, and deficient structures can be removed.

4.1

Linking Semantic Structures

Intuitively, having completely disjoint semantic representations would not effectively model
the nature of human knowledge. Research in human memory has shown that when concepts are activated in memory, related information may also be activated if the link between them is strong enough. Information from one semantic representation must somehow
be linked to related information in another semantic representation. Semantic networks,
which were introduced in Chapter 2, can be used to model the links between the semantic
representations. Figure 4.1 expands on Figure 2.1 by connecting the semantic concept
ship to various semantic representations such as quasi-logical form, conceptual graphs,

conceptual hierarchy, and a timeline model. The information from a semantic representation could also be connected to other concepts nodes within the semantic network. In
addition to the bare links, the strength of the links between concept nodes can also be
considered by adding weights to each link. In theory, the more often the concept node in
the semantic network is activated, the stronger the link will be to other concept nodes.

29

This is similar to the behaviour of synapses originally t heorized by Hebb in [Hebb 1949].
In addition, activation levels for the concept nodes can be modelled using activation level
models from artificial neural networks and ACT-R theory. The topology of the resulting
network differs from the traditional art ificial neural network topology in t hat t here is no
distinct input, hidden, or output layers. Rather, the concept nodes are connected in a
non-specific fashion. The resulting network is commonly called a localist network in other
literature. Single concepts are represented as a single node within the network. This
representation differs from networks like artificial neural networks, where a single concept
is represented as activations across a set of nodes, and nodes can represent more than one
concept.

Figure 4.1: Semantic Network Linking Semantic Representations

Implicit Semantic Links
It is plausible that knowledge which is represented using one semantic representation is

often intertwined with t he knowledge within another. Figure 4.2 shows a good example
of how multiple semantic representations collectively model t he sentence Once there was
a shoemaker who worked hard and was very honest. Notice that certain information such

30

as shoemaker: 1 and work: 1 occur in more than one structure. We would like to some
how connect t his information together, and a semantic network would be a good method
to achieve these connections. Concepts in the semantic network would be automatically
linked due to the information wit hin their semantic representations.

quant(exists: I ,B,[shoemaker: I ,B])

ex ists: I
Past - - - - - - - - - -· - - - - - - - - - - - - - Future Perfect
ex ists: I
1
1
Linear - -• - - - '• - - - -• - - - - - - - - -• - Past Perfect
Simple Past
Present
Simple Future

Past

work : I

- - - - - - - - - - -· - - - - - - - - - - - - - -

Future Perfect
wo rk: I
1
1
Linear - -• - - -• - - - -• - - - - - - - - - -'• - Past Perfect
Simple Past
Present
Simple Future

Past

honest: I

- - - - - - - - - - -· - - -- -- -- -- - - - -

Future Perfect
1
I
honest: I
Linear - -• - - -• '- - - -• - - - - - - - - -• - Past Perfect
Simple Past
Present
Simple Future

Figure 4.2: Multiple Semantic Representations

31

4.2

Modularization and The Human Mind

Creating a modularized model of semantic structures is not only a good engineering technique, it is also supported as model of the behaviour of the human mind by such psychological literature as [Foder 1983]. Modularity of mind theorizes that parts of the human
mind are modular in nature and act autonomously with respect to other modules in the
mind. To some degree, the modularity is thought to be genetically determined. Fader's
model of human mind requires that modules be specialized in their domain, encapsulate
their information from other modules, and have limited outputs to other modules. These
requirements are a natural result of the model described in this chapter.

4.3

Conclusion

This chapter has introduced a modular model of semantic representation that addresses
the drawbacks of existing semantic representations by connecting them using a semantic
network. This model allows new semantic representations to be added and removed from
a system without impacting the existing ones. It also addresses the desire to expand
a semantic representation to deal with new forms of knowledge. Instead of expanding
to a current semantic representation, a new semantic representation can be created and
plugged into an existing system . This modular separation of semantic representations also

permits the development of anaphora resolution algorithms for each semantic representation independently. It is apparent that in order to combine the semantic structures from
Chapter 2 using a semantic network, a theory of the exact nature of the network must
be considered. The link strength between concepts needs to be modelled, as well as the
activation levels for each node. Chapter 3 discussed theories on how link strengths and
activation levels are determined in the ACT-R system [Anderson et al 2001] and using
traditional artificial neural network theory. Chapter 6 will outline how activation theories
will be used in conjunction with a semantic network to define the behaviour of the model
proposed in this chapter.

32

CHA P TER

FIVE

Anaphoric Reference and Reference Resolution
Anaphoric reference is a linguistic mechanism with which reference can be made to
objects that have been introduced at an earlier point. References are typically made
with pronouns or different variations of definite/ indefinite articles within a noun phrase.
Anaphoric references are also used to reference verb phrase structures. Anaphora, especially in the case of pronouns, often can be resolved by scanning backwards through a
corpus of text until the first noun phrase that matches such features as number and gender
is found , although Barbara Grosz demonstrated in [Grosz 1977] that this technique can
break down.

5.1

Types of Anaphoric Reference

Traditionally, anaphoric reference is observed in the use of pronouns such as he, she, or it.
But within recent decades, there have been numerous proposals to extend the definition
of anaphoric reference to include other linguistic phenomena such as verb phrase ellipsis [Grosz 1977, Hardt 1997, Nash-Webber and Reiter 1977, Ginzburg and Cooper 2001],
presupposition

[Piwek and Krahmer 2000,

Geurts 1999],

and

temporal

anaphora

[Partee 1984]. In the context of this thesis, the domain of anaphoric references will be
restricted to pronominal references of noun phrases.

5 .2

Problems 1n Anaphora Resolution

As stated before, in many cases resolving the antecedent for a pronoun is as simple as
searching backwards in a body for the first noun phrase that matches based on such
attributes as gender or number. But in some cases, a more complex model discourse must
be modelled in order to resolve a pronoun reference:

33

John had a son named Bob. His son is an excellent skier.

In this example, a knowledge base about parent-child relationships must be known in
order to resolve the reference implied the possessive pronoun his. Even trivial references
can be more complex by introducing existential quantifiers, as illustrated by Partee in
[Partee 1984]:

Every farm er who owns a donkey beats it.

The pronoun it does not just reference a single donkey, the pronoun references multiple
instances of a donkey. The next example, adapted from [Sidner 1983], demonstrates where
this method of resolution can break down:
I

My neighbours 11 have I a monster Harley 1200 12 . I They~ are really huge but gas efficient bikes.

In the second sentence, if an individual was to read just the pronoun they, their initial
preference for the reference may not be a monster Harley 1200 based on number alone.
In this context, a common preference for the pronoun they would be my neighbours. After
reading the remainder of the second sentence, it is apparent that this conclusion was
incorrect. Given the additional context, common knowledge concludes that the neighbours
are not motorcycles

5.3

Anaphora Resolution Algorithms

Determining the antecedent of an anaphor is central to the study of anaphoric reference.
Over the past few decades much research has involved creating computational methods to resolve these references. Works such as [Sidner 1979 , Sidner 1983], [Grosz 1977,
Grosz and Sidner 1986], and [Carter 1985, Carter 1990] have concentrated on the study of
discourse and the theory of anaphora within a discourse, while [Hobbs 1986, Brown 2003 ,
Mitkov 1998] have focused more specifically on pronominal anaphora resolution. The next
34

few sections will outline Carter, Hobbs, Brown, and Mitkov's approaches to pronominal
anaphora resolution.

Hobbs' Approach and Brown's Algorithm
In [Hobbs 1986], Hobbs outlines a simple algorithm for the resolution of pronouns, and
although naive, it provides goods results. The algorithm works by starting at the location
of the pronoun and working back through the parse tree in a breadth-first manner until
a suitable antecedent match based on gender and plurality is found. When tested on 300
occurrences of references in selected corpora, the algorithm had a success rate of 88.7%
in resolving the anaphoric reference. Hobbs notes though that in over half of the cases,
there was only one plausible antecedent.
Hobbs analyzed the results further and went on to consider the results for the cases
when there was more than one plausible antecedent. Of the 132 cases where an antecedent
confli ct existed, 98 were resolved by the algorithm, thus a 74.4% success rate. Hobbs goes
on to improve the naive algorithm by adding simple restrictions for resolving pronouns,
such as dat es can't move, places can't move, and large fixed objects can't move. Without
these restrictions, the success of the resolution algorithm was, 81.8%, overall. When the
selectional restrictions were used , a 91.7% success rate was achieved.
In [Brown 2003], Brown outlines an algorit hm for resolving noun phrase references that
is a variation on Hobbs algorithm. Figure 5.1 illustrates the algorithm in pseudo-code 1 .
Brown's algorithm has the benefit of not specifying how a reference is resolved when
there are multiple antecedents for a single noun phrase, which consequently, allows the
implementor to choose how the antecedent can be resolved.

Carter's Approach
In his PhD thesis, [Carter 1985], Carter describes in [Carter 1990] the approach used in
the SPAR system . The SPAR system initially starts by resolving semantic and syntactic
1

In Figure 5.1, NP is an abbreviation for noun phrase.

35

IF the NP is a proper name THEN
ATTEMPT to identify the reference in the knowledge base
IF no antecedent is found THEN
CREATE a new reference in the knowledge base
ELSE IF the NP is an indefinite NP THEN
CREATE a new reference in the knowledge base
ELSE IF the NP is a reflexive pronoun then
SET the reference to the subject of the clause
ELSE IF the NP is a pronoun THEN
CHECK NPs that precede for number/gender/ person agreement
check NPs in previous sentences in the same manner
ELSE IF the NP is a definite NP THEN
CHECK NPs that precede for number/gender/ person agreement
CHECK NPs in previous sentence in the same manner
IF no is antecedent is found THEN
CREATE a new reference in the knowledge base

Figure 5.1: Brown's Anaphora Resolution Algorithm
issues without concerning itself with potential anaphoric references. Multiple structures
can result from this process depending upon the word-sense of the words within a sentence.
Given the sentence H e picked up a jack, Carter theorizes two possible structures. One
structure where jack is interpreted as a playing card, and the second where it is a tool
used to raise an automobile. According to Carter's algorithm , the pronoun he is left
unbound , and will be dealt with in further stages.
After the initial structures are generated, they are reprocessed and given scores based
on factors such as repeated relevant information and its influence on syntactic structure.
For example, in Figure 5.2, (b) would be given a higher score than (a) because a t elescope
is used for seeing things, thus it is more highly related to the verb saw than the noun
phrase a man. After assigning scores, the algorithm proceeds to use anaphora resolution
rules which Carter describes as being similar to those found in [Sidner 1979].

36

s
s

rA

VP

NP

NP

I

VERB

NP

PREP

NP

I

~

I

~

saw

a man

with

a telescope

pp

NP

I

(a)

VERB

~

NP

PREP

NP

I

~

I

~

saw

a man

with

a telescope

(b)

Figure 5.2: Two Parsings of I saw a man with a telescope

Mitkov's Work
In [Mit kov 1998], Mitkov outlines an anaphora resolution algorithm that uses scoring factors to determine a plausible antecedent to a reference. The scoring factors are based
on the analysis of what Mitkov terms indicators. Indicators can be such things as the
definit eness of the possible antecedent , the givenness, indicating verbs, lexical reiteration, prepositional position, and referential distance. The domain of possible scores is
{ -1 , 0, 1, 2}, with varying values being chosen for each indicator class. Figure 5.3 outlines

Mitkov's algorithm in pseudo-code format . Mitkov claims a success rate of 89.7% with
this algorithm.

5.4

Resolution Failure

Many resolution algorithms make the assumption that the antecedent of an anaphora must
be resolved. Levine hypothesizes in [Levine et al 2000] that there are conditions under
which readers fail to resolve the anaphoric reference, yet are still able to comprehend the
text. Levine showed that if t he antecedent was salient and distant enough from the point
of reference, readers were content with not resolving the reference if it was not disruptive
to the comprehension of the text. Although this finding could have a big impact on how

37

EXAMINE the current sentence and the two preceding sentences
SELECT the noun phrases which agree in gender and number
APPLY the antecedent indicators to each candidate and assign scores
IF two candidates have an equal score THEN
SELECT candidate with the higher score for immediate reference
IF immediate reference does not hold THEN
SELECT the candidate with higher score for a collocational pattern
IF collocational pattern suggests a tie or does not hold THEN
SELECT the candidate with higher score for indicating verbs
IF this indicator does not hold THEN
SELECT the most recent candidate

Figure 5.3: Mitkov's Anaphora Resolution Algorithm
anaphora resolution algorithms will work in the future, it does not begin to explain what
a comprehensible piece of text is.

5.5

Conclusion

This chapter has given an overview of the problem of anaphoric reference with respect
to pronouns, as well as algorithms that address reference resolutions. Chapter 6 will
describe an anaphora resolution algorithm that uses the theories on working memory and
activation, introduced in Chapter 3, to create a list of possible antecedents for a pronoun.
The algorithm will then use the idea of feature set scoring from Carter to form a basis
for resolving conflicts when multiple plausible antecedents exist. Chapter 7 will outline
the results of the algorithm when tested on The Three Brothers and the Rumpelstiltskin
corpora using the ACT-R model and the Thompson model of activation.

38

C HA P TER

SIX

System Modelling

If you don't gosub a program loop, you'll never get a subroutine.
- KRYTEN

(Justice)

Throughout Chapters 2, 3, and 4, varying theories on semantics representation, the
structure of human memory, and its behaviour have been discussed. This chapter will focus
on combining parts of these various theories to solve the problems of anaphora resolution
outline in Chapter 5. The model of long-term memory and working memory will be
outlined as well as the grammar rules and their interaction with the memory models. The
algorithm for anaphora resolution will also be discussed. For the purposes of this thesis,
the semantic network will be the only semantic representation that is implemented. The
implementation of other semantic representations, such quasi-logical form and concept
graphs, will be deferred to future work. The implementation and subsequent testing of
the semantic network will provide baseline results from which this future work can be
compared to.

6.1

Modelling Long-Term Memory

In this thesis, long-term memory will be modelled using a combination of the semantic
network theory described in Chapter 2 and the neural network theory described in Chapter

3.

Semantic N etworks for Long-Term M emory
Semantic networks are undirected graphs with strengths associated with the links between
nodes. Within the context of this thesis, as was described in Chapter 4, semantic networks
39

will not be used directly to store semantic information, rather, they are used to link the
other existing semantic structures (i.e. conceptual graphs, classification hierarchies, event
timelines, and quasi-logical form). This decision is intended to achieve a behaviour similar
to that observed in neural network theory. As the activation levels of a node within the
network increase, the activation of neighbouring nodes will also increase. Semantic markers
will be used to uniquely identify semantic structures and concepts.
Table 6.1 outlines the Prolog predicates. In the nn_semNode predicate, SemMarker
signifies the semantic marker for the node, Activation holds the current activation, and

ActHistory contains the activation history of the node. In the nn_semLink predicate,
S emMarker:l and S emMarker:2 identify the two structures or concepts being linked,
and Strength is, of course, the strength of the link. The nn_semNode and nn_semLink
predicates provide all that is required to build and modify a semantic network.
Predicate

Description

nn_semNode ( + SemMarker, + Bias , + Acti v ation, + ActHistory)
nn_semLink ( + SemMarker : 1 , + SemMarker : 2 , + Strength)

Semantic Node
Semantic Link

Table 6.1: Semantic Network Predicates

Activation Level Models
The two different models of semantic node activation levels will be used and tested in
this thesis: (1) The ACT-R model for activations of declarative memory, and (2) a model
derived empirically, called Thompson's model. From the ACT-R model, Anderson's model
for the activation level and activation decay for declarative memory will be used, and will
be based on Equations 3.1 and 3.2, as described in Chapter 3. Since the activation level
of the ACT-R equation is unbound, the satlins threshold equation, Equation 3.5, will be
applied to bound the resulting activations to the range -1 2: Ai ::; 1.
Thompson's model will be based on Equation 3.3 for the activation function of a
semantic node, Equation 3.7 for threshold function, and the following equation for the
fading the activation:
40

(6.1)
where w is the decay rate of node i's activation. The base level activation of node i will be
set to B i = 0.0. It must be noted that the decay equation for Thompson's model is not
applied during node activation, as in the case of the ACT-R model. Rather, decay will
occur, due to mental processing in working, which will be described in more detail later
in this chapter.

6.2

Working Memory

If the activation level of node i is greater than some activation threshold, CT , node i will

be brought from long-term memory into working memory. Working memory acts as a
repository for concepts that are easily accessible for mental processing.

VVorking ~e

y Structure

In this thesis, working memory does not contain the actual structures that represent the
currently active concepts, rather, working memory is conceptualized as a list containing
semantic markers. The semantic markers act as links to the concepts within long-term

memory. As semantic concepts are activated, they are placed within the list representing
working memory, and as they decay, they are removed from the list.

Se01antic Node Behaviour
The working memory model that was modelled is based on the model described by Baddeley in [Baddeley 1986, Baddeley 1990, Gathercole and Baddeley 1993]. From Baddeley,
the theory on working memory capacity was used. The limits of working memory will
be based on the contention between storage capacity and processing time. Storage and
processing are inversely proportional to each other, and thus, processing will affect how
41

quickly active concepts in working memory will fade. In t his t hesis, processing will be restricted to the application of grammar rules only. As grammar rules are used (i.e. mental
processing), t he activation levels of concepts in working memory will fade . This fading
effect of grammar rules gives t he storage versus processing behaviour described in Baddeley's model. When the activation level of node i falls below a, node i will be removed
from working memory.

Semantic Link Behaviour
As will also be described in Chapter 7, the semantic links between semantic nodes do
not exist when the system is initialized. Semantic links are created between the nodes in
working memory after each sentence is parsed and are given an initial link weight, Sij and
Sji

(a semantic link for each direction). If the links already exists between nodes, new

links will not be created .
Although the strengths of links between the nodes are created with the same initial
value, they are updated independently after this creation. When a semantic node i is
activated into working memory, the new strength of the link to node j is updated . The
equation for Hebbian learning, Equation 3.8, will be used for calculating t he new semantic
link strength:

6.3

Prolog Model of Working Memory

The Prolog model of working memory will be a functor, wnuvorkingMemory/ 1, with a
single list as an argument. The list will contain the semantic markers of the concepts that
are currently active in working memory. The current activation level of a concept will
not be contained within the list, rather, the semantic marker will be used to look-up the
activation in the semantic network described earlier. For example, given that concepts
rimmer: l , lister:l , and kryton:l are active in working memory, the following functor would

42

result:
wm_work ingMemo r y([rimmer : 1, lister : 1, kryton: 1])

6.4

English Grammar Rules

One of the major advantages of using Prolog as an implementation language, is that it
allows the use of a Definite Clause Grammar to specify grammar rules. Definite Clause
Grammars also reduces the amount time required to code grammar rules by eliminating
the need to specify mechanism for consuming words from a sentence while parsing. For
example, rather that using the rule
adj ([long I B], B).
to process the adjective long, we can use the Definite Clause Grammar rule
adj -+ [long].
Definite Clause Grammars are much more elegant because, notationally, they are very
similar to context-free grammars.The result being that the source code will be more readable, easier to maintain, and less prone to errors.
The cost associated with using a Prolog grammar rule will be explicitly modelled within
the grammar rules. Prolog allows us to add goals to Definite Clause Grammar rules that,
when expanded into regular Prolog predicates and clauses, do not consume words from
the input string while parsing. As an example, consider the following grammar rule:
sent (sent (NP, VP)) -+ np (NP), vp (VP)
Adding a processing cost, the resulting rule would be something similar to the following:
sent (sent (NP, VP))-+ {nn_fade Nodes}, np (NP), vp (VP)

43

Here, the predicate nn_fade Node s is a predicate that updates the activation levels of
concepts in working memory.
The placement of the cost predicate within the grammar rule is important. By placing
nn_fade Nodes at the front of the rule, the cost is incurred as soon as the rule is used. This
placement creates the behaviour that as more backtracking is performed on grammar rules,
the more complex the processing. Concepts will fade much more quickly from working
memory when backtracking occurs, as opposed to no backtracking. If nn_fadeNodes was
placed at the end of the grammar rule, the cost would only be incurred after the successful
completion of a grammar rule.

6.5

Annotated Parse Tree Model

In this thesis, the modelling of parse trees will be an extension to the model found in
[Sterling and Shapiro 1999], where parse trees are stored in an embedded-functor form.
For example, given the following Definite Clause Grammar rule
sent --t np, vp .
the equivalent Definite Clause Grammar rule with parse trees embedded would be
sent (sent (NP, VP )) --t np (NP), vp (VP)
Sterling and Shapiro's parse tree model will be extended to also include t he antecedent
for pronoun references . Parse trees of the form
noun(N)
would take the following form for pronouns, where "Ant" is bound a plausible antecedent
for the pronoun or "null" if no antecedent exists.
noun(N, ant (Ant ))
Since this extension to the parse tree model still adheres to the syntax of the original
model, existing pretty printers can be used on the model without any modifications. Figure
6.1 shows two example parse trees with an unresolved and a resolved antecedent.

44

se nt

sent

~

~

6

vp

np

I

noun

ant

verb

He

null

ran

L
pp

in

the

6

park

(a)

vp

np

I

noun

ant

verb

He

man: I

ran

L
pp

in

the

park

(b)

Figure 6.1: Annotated Parse Trees with (a) Unresolved and (b) Resolved Antecedents

6.6

Lexical Feature Sets

Lexical feature sets have been used in various theories of natural language processing such
as Generalize Phrase Structure Grammars [Bennet 1995], in the Core Language Engine
[Alshawi et al 1989], and the LangEng project [Brown et al 2001]. Features sets allow
parsers to restrict parsing based on set of semantic attributes inherent to certain words.
For example, in addition to semantic meaning, nouns also have attributes that imply the
gender, person perspective, and plurality of the noun. The feature set syntax used in this

thesis will be an extension on feature set syntax of the LangEng Project.
Each noun entry will contain two feature sets. The first feature set, the lexical feature
set, will contain lexical entries such as the noun 's case, category, and so forth, The second
feature, called the semantic feature set, will contain semantic features such as the gender,
number, and person perspective. The following is an example of a noun entry with the

above features sets:
noun([case : nom, cat : np], [gender: masc , num: sing, person: 3])--+ [he]
Feature sets are used by anaphora resolution algorithms to resolve references when
multiple antecedents exist. Section 6.7 will discuss how the feature sets are used with an
anaphora resolution algorithm. Tables 6.2 and 6.3 illustrate the noun feature sets that
45

were used in this thesis.

Ir==:
Feature I
sem
cat
case
sound
type

Description

Value Set
prolog atom

{np}

{nom , ace, gen, dat , abl, Zoe, tmp}
{soft , hard}
{common, proper, gerund, pronoun , rlfx_pronoun}

semantic marker
marks lexical category
morphological case
first consonant sound
noun type

Table 6.2: Lexical Feature Set

Feature

Value Set

Description

gender
num
person

{masc , f emn, neut}
{sing, plur,mass }
{1 , 2,3}

gender of a noun
plurality of a noun
person perspective

Table 6.3: Semantic Feature Set

6. 7

Anaphora Resolution Algorithm

In Chapter 5, various anaphora resolution algorithms were discussed. A common thread
between all the algorithms is that in the absence of multiple antecedents for a reference, the correct antecedent is identified, with the exception of those cases outlined in
[Levine et al 2000].
The algorithm that was implemented is combination of the ideas outlined by Brown in
[Brown 2003], Carter in [Carter 1985, Carter 1990] , and the results from Koh and Clifton
in [Koh and Clifton 2002]. Brown's algorithm will provide resolution for references that
only have one antecedent. From Carter, the idea of scoring factors to influence how
conflicts between lexically identical words are resolved was used.
The scoring factor for a semantic concept i will be a combination of the current activation of node i within working memory and relative similarity of the semantic feature
set of concept i to the semantic feature set of a pronoun j:

46

(6.2)
where Fij a score based upon the relative similarity of the feature sets of concept i
and pronoun j. The value of Fij is computed by starting with an initial score of 1.0. Each
entry in the semantic feature set of concept i that matches an entry in pronoun j increases

Fij by a factor of¢, and each entry that does not match decreases Fij by a factor of '1/J = ?; .
The current activation of node i will influence whether it is compared to pronoun j in
the anaphora resolution algorithm. The algorithm will ignore concepts with an activation
of less than a. Figure 6.2
FIND the semantic concepts currently in working memory
IF (at least one concept was found) THEN
FOR (each concepti found) DO
SET S corei = 1.0
FOR (each feature in the feature set of concepti) DO
COMPARE the feature value to the feature of the pronoun
IF (the values match) THEN
SET Scorei = Scorei * ¢
ELSE
SET S corei = Scorei * '1/J
ADD the activation level(Ai) of node i to Scorei
FIND the concept with the highest value for S corei
SET the antecedent of the pronoun to that concept
ELSE
SET the antecedent of the pronoun to NULL

Figure 6.2: Pronoun Reference Resolution Algorithm Pseudo-code

6 .8

Summary of Chapter

This chapter has examined models for long-term and working memory and how they were
realized in Prolog. It has also given a general overview of the format of the grammar rules
that are used as well as the f eature sets and annotated parse trees that accompany the

47

rules.
In Chapter 7, the model described in this chapter will be tested against a number of
corpora with varying decay rates for the semantic nodes.

48

CHAPTER SEVEN

System Testing
This chapter will cover the methodology and procedures used for testing the anaphora
resolution algorithm described in Chapter 6 using the activation model from ACT-Rand
the Thompson activation model. Various activation decay rates will be tested for each
model. The results will be compared to human-based resolution.

7.1

Overview of Testing M ethodology

The anaphora resolution algorithm will be tested against a corpus selected from the Grim
Brothers library found at [Ockerbloom 2006]. The selected body of text was slighted
modified from their original form to facilitate ease of parsing while retaining the spirit of
reference placement. The modified text can be found in Appendix 8.3.
Each corpus will be tested independently and not have influence on the tests of the
other two corpora. That is to say, the working memory, and long-term memory will be
reset to a default configuration for each test phase. The bodies of text will be tested a
number of times each with different decay rates, w , for the semantic nodes within long-term
memory.

7.2

Testing Platform

The testing was performed on a 1.8GHz PowerPC G5 1.25 GB RAM under Mac OS X
vl0.4.4 using SWI-Prolog v5.4.7.

7.3

Decay Rates

Since the decay rate, w, affects the activation levels in the ACT-R model differently than
those in the Thompson model, different sets of decay rates were chosen. The sets of decay

49

rates were chosen in such a way that they presented a broad spectrum of the behaviour
for each model. The value of w ranged from values that caused short activation times, i.e.
fast activation decay, to values that caused low activation, and thus caused the contents
of working memory to be quite high.
In the ACT-R model, the values of w ranged from 0.05 to 0.30. The decay rate of
w = 0.5, which is used in ACT-R, was not chosen because preliminary testing showed

that the value caused an extremely high level of decay, which resulted in a large number
of pronouns being unresolved. This extreme decay is most likely due to the fact that the
ACT-R model may not be 100% compatible with a neural network-type model.
The decay rates in the Thompson model ranged from 9.90999 x w- 1 to 1.0. Although,
a decay rate of w = 1.0 would imply no decay, that is not actually the case. Since the
current activation is also based on neighbouring semantic nodes and the weight between
the nodes, a certain amount of decay will still occur.

7.4

Default Memory Configuration

Initially, the contents of working memory were empty. This initial state of working memory
was represented in Prolog by an empty list as the argument of the wm_workingMemory
functor:
wm_workingMemory([])
Long-term memory, was represented as a neural network, initially contained nodes for all
possible nouns that can be parsed by the system. Each node, i, had an initial activation
A i = 0.0, and bias Bi = 0.0. The links between nodes did not exist, rather, they were

created as describe in Chapter 6.

7.5

Testing Procedure

The ACT-R and Thompson activation models of system were tested against all of the
sentences from the The Three Brothers corpus in sequence using various rates of decay.
50

The following items were tabulated during testing:
• results of the anaphora resolution algorithm were tabulated against t he expected results
outlined in t he tagged corpora of Appendix B
• t he maximum capacity of working memory across all decay rates for each activation model
• t he growth of working memory capacity over t he co urse of parsing the corpus.

7.6

Anaphora Resolution Results

The results of the anaphora resolution algorit hm are outlined in Figures 7.1 and 7.2 for
the ACT-R model and Figures 7. 3 and 7.4 for t he Thompson model.

~

ll l

~

e t_ _ _-_-_-_·_In_c_o_rr_e_ct_ _ _._.._.._· u_ n_r_e_so_l_ve_d_ _ _ _

100.00% '

I

~
80 .00% f - - - - - - < ; - - - - - - - - - - - - - - - - - - - - - - - - - - - 1

.
r::

70 .00% f - - - - - ' . - - - - - - - - - - - - - - - - - - - - - - - - - - 1

----------------

60 .00%

:I

0

r::

0

...

50 .00%

...
~ 4 0 .00%
~
0

1

I

:L
I

---\ I

,/ -.. .

.---~__
____.! \
~~

30 .00 % _ _ _ - /

I

I ,. ... /
_

X/

..._,/

~

,'

~

-x\-···... _ .:::

./ '--..-../

·· ...

~

0.30

0.25

0.20

0 .19

0 .18

0. 17

0.16

0. 15

0.14

0 .13

0. 12

0.11

0. 10

0.09

0.08

0.07

0.06

0.05

1

Decay Rate
----------

_ _ _ _j

Figure 7.1: The ACT-R Model Resolution Results - 3 Brot hers

51

-

I

· ··· Unresolved

- - - ·Incorrect

Correct

100.00% , . - - - - - - - - - - - - - - - - - - - - - - - - - , 1

90 .00% ~

-

80 .00%

-I

70 .00% f - - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1

§"' 60.00% -

~~~ ~

eg sa.oo% ,_

i

-

---i - - - -

1-1

4000% 3000% -

____:.

- - - - - - - ---1

~~~

~~~~

/

~ ... ""-

-------------------

_ _/

·.'··...,---.J

·.

~~

' .... , _ _ _ _ _ _

/

../

//'/

2000%

10.00%

.

.. .. .. .... . . . ..... _ . . .. . . ... .. ... .

/

0 .00% ~

0.30

~

0.25

0.20

0. 19

0.18

~

0.17

0. 16

0 .15

0.14

~~

~

0.13

0. 12

~

0.11

0.10

0.09

0.08

0 .07

0.06

0.05

Decay Rate
'---------

Figure 7.2: The ACT-R Model Resolution Results- Rumpelstiltskin

7.7

Working Memory Capacity Results

In addition to the results of the anaphora resolution algorithm, the behaviour of working
memory was also observed for each decay rat e. The working m emory max capacity was
the largest number of nouns that were observed to be in working memory at the end of
each parsed sentence. Figures 7. 5, 7.6, 7.7, and 7.8 outlined the maximal working memory
contents for the ACT-R model and t he Thompson model, respectively. A comparison of
the growth rate of working memory capacity using the optimal decay rate is outlined in
Figures 7.9 and 7.10.

7.8

Discussion of Results

A baseline comparison between the ACT-R model and the Thompson model can be made
by considering the decay rates that give an equivalent , non-zero, number of unresolved
pronouns. For Th e Three Brothers corpus, given the decay rate of 0.09 , the ACT-R model

52

-

I

100.00%

-

-

Correct

- --- -In-cor-re-ct - .-... -.. -Unr-e-s o-lv_e _d - - -

r - - - - - - - - -

-

90.00%

70.00%

"'c:I 60.00% 0

c

...e 50 .00%

.
~

0

-§.

40 .00%

I

><-

~
...........

____ ......

30 .00%

\

20 .00%
10.00%

________ I

I

80.00%

t --

-'-'
··c:..:
··.:..:.
··:.:..:
·

J

\

\

\

,_----- ----

_/"

~

Figure 7.3: The Thompson Model Resolution Results - 3 Brothers

resolved 39.34% of the pronouns correctly. The Thompson model, given the decay rate of
0.995999 , was able to resolved 54.10% of the pronouns correctly. Thus, in this comparison,
the Thompson model achieved a higher success rate.
When the Rumpelstiltskin corpus is considered, the ACT-R model was able to correctly
resolve 54.67% of the pronouns, given the decay rate of 0.17. The Thompson model, given
the decay rate of 0.995999 , was able to correctly resolve 42.67% of the pronouns. So, in
this comparison, the ACT-R model achieved a higher success rate than the Thompson
model.
When the overall range of results is examined, the Thompson model of activation
achieved maximum success rates of 83.61% (The Three Brothers) and 86.67% (Rumpelstiltskin) , while the ACT-R model of activation fell short with maximum success rates of
54.10% (The Three Brothers) and 56.00% (Rumpelstiltskin). In general, the Thompson
model was able to resolve a higher number of pronouns over a larger range of decay rates.
Although the Thompson model achieved a higher overall success rate, a unresolved rate
53

-

Correct

--- ·Incorrect

·· ·· Unresolved

100.00% , -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . [
90 .00% - - -

~

80.00% l------------------------------f--------------------1

--------[

70.00%

§"'

0

-[

60 .00%

c

e so.oo%
a.

"!-0 40.00% - - -- -

-- --

'-,,
~

30.00%

--------- - - · ~~

~

~
\

20 .00%

10.00%

L

I

\

\

f------------------------------------'c,-------------------j

'---

~

Decay Rate

Figure 7.4: The Thompson Model Resolution Results- Rumpelstiltskin

of 0.0% is psychologically implausible based on the findings in [Levine et al 2000].
Examining Figures 7.9 and 7.10, it appears that the ACT-R model of activation had
difficulty with concepts begin activated into working memory at the start of each corpus
and then problems with getting those concepts out of working memory at the end of the
corpus. This difficulty can be attributed to the fact that the ACT-R model of activation is
based on the activation history. Concepts appear to move in and out of working memory
more fluidly using the Thompson model of activation.

7.9

Classification of Observed Error Types

Throughout the testing of the implemented system, two types, or classes, of errors were
observed. The first class of errors involved the incorrect resolution of a pronoun due to
lack of information. The following sentence illustrates this error:
The I son !1 that builds the best masterpiece will inherit [ his [1 house.

54

I

-

Ma x WM Capacity

25

I
I

-----

20

I

---

-

I

I

Ill

/

15

c

I z0

:I

'0

•

10

I

L_o

/

/

; --

I

-r

-

I

I

I

0 .30 0 .25 0.20 0 . 19 0. 18 0 .17 0 .16 0 . 15 0.14 0.13 0.12 0 .11 0 . 10 0 .09 0 .08 0 .07 0.06 0.05

Decay Rat e

Figure 7.5: The ACT-R Model Working Memory Max Capacity- 3 Brothers

In this example, the system resolved the pronoun his with the noun son, which was
incorrect. The pronoun actually references the noun father, which was mentioned earlier in
the corpus. This incorrect resolution occurred because the noun son was the concept that
had the highest antecedent score, and the accompanying noun house was not considered by
the anaphora resolution algorithm. The second class of errors that was observed involved
the incorrect resolution of pronouns that reference events. The next example illustrates
this error:
The I fath er l1 thought that I this 17 was wonderful.

In this example, the pronoun this was incorrectly resolved to reference the noun horse
from a previous sentence, which was not correct. The pronoun this actually references an
event from earlier in the corpus. Since resolution of events was not within the scope of
this thesis, occurrences of this type of error were not included as results.

55

-

Max WM Capacity

35 , - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ,

30

I

I

~

-

~

~

.: ~~
20 r-

-

0

~ ~

/

5

~

~

f-----__j-+-------------11
~

0.30 0.25 0.20 0 .19 0 .18 0 .17 0 .16 0.15 0.14 0.13 0.12 0.11 0.10 0.09 0.08 0.07 0.06 0.05

Decay Rate
L __ _ _ _ -

-

-

---- --

Figure 7.6: The ACT-R Model Working Memory Max Capacity- Rumpelstiltskin

-

Max WM Capacity

16

I

14 - · 12 -

..

10 - - -

c

z"

8

""

6 t-

0

0

4

--

-

-=-----/-/ ~

I

~

2 f---·-

I

Decay Rate

Figure 7.7: The Thompson Model Working Memory Max Capacity- 3 Brothers

56

-

Max WM Capacity

20 ,--------------------------------,
18 1-- -

16
-·-

14 1-- - - - - -

I

~ 12
z

10 ~

~

~

~ ----- ~

~

/

8

--

~

t

~

F igure 7.8: The Thompson Model Working Memory Max Capacity- Rumpelstiltskin

- - ACT- R WM Capacity

I

--- ·Thompson WM Capacity

16 . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ,
- - - - - - - - - - - - - - - - - - - ---c,,- - -r - - - ---1

14

/

12 ~

10

~

S

~

\-((:I

-,

I

\

~~~~

-

-

f

I \

l_/t', __

-

,'

~

I

T

~

~

I

-

\

\

f

\ - -.

/

I

:~ t ~
/

I

II

~

._____;

II

I

\

/ \

\

I

\

I

I

1 \

~
~ ~ ~

____ - -

,_ j

II

I

/\

I I

~~~

I

\

~

,,

~

1

/
,-

T

1

~

J

~

- - - - - - - - ----1

I

I

I

I

I
I

I

I

I
I
I

~~~

I
~~~~~~~

~~~~y~~~~~~~~

Sentences Read

Figure 7.9: Working Memory Contents Comparison - 3 Brothers

57

I

-

20

-- ACT- R WM Capacity

18

f-- --

16

r--

14 f- · - - - -

- - - - - --

-

-----

--- - -

I

--

_ _ __ _

I

I

I

2 - 1- 1

0

/

L_

11.

,...,

~~~~~ ~

I
r-1

~

,...,

1

1

1.-- I ~
I
1
I
I
1
I

r'- ~
I

,

~~

-

---------,

---.-------.-

,--

12

-

--- ·Thompson WM Capacity

I

I

~
I
I
I

I

I

,_/j, i!!_Q
_ ~~ I
I
1 I

/1

1/

\

I

\

~

-- - -

J

1

I

I

1

I

I

I

rl

I

1 I

, ,____
\1

I

''--- -- - -1

-- I ---·-

I

11\1\--.-jr-_____
/

1,
_ _ _ _ _ - - - - - --1

-r----r--: ..

-1

1 I

\

I

~~~~~ ~~~~

-r--t,
___
Sentences Read

Figure 7.10: Working Memory Contents Comparison- Rumpelstiltskin

58

CHA P TER

EIGHT

Conclusions and Future Work

It will be happened; It shall be going to be happening; It will be
was an event that could will have been taken place in the future.
-ARNOLD J. RIMMER (Future Echos)

8.1

Comparison with R elated Work

Anaphora R esolution
Two of the biggest difficulties with making comparisons between anaphora resolution
algorithms are the forthcomingness of authors to publish the results of their algorithms,
and obtaining the corpora used for testing their algorithms. Over time, many corpora
become difficult to obtain, thus making direct comparisons difficult. Early work such as
[Sidner 1979], [Hirst 1981], and [Grosz and Sidner 1986], although popular in the fields of
anaphora resolution and discourse analysis, fail to provide comprehensive results for their
models. Table 8.1 shows the results obtained in this thesis compared to the work of other
authors 1 .

Capacity of Working M emory
Figures 7.9 and 7.10 illustrated the results of the dynamic capacity of working memory
when an optimal 2 decay rate was used . If a single concept is considered to be a chunk, the
maximum capacity of the working memory model in this thesis appears to be much higher
1

Mitkov, Callway, Hobbs, and Fernindez all give results for pronominal references. It is unknown
whether these results included pronominal references to events or just nouns.
2
0ptimal decay rate was defined as the value that gave the highest anaphora resolution results.

59

Algorithm
[M itkov 1998]
[Hobbs 1986]
[Callaway and Lester 2002]
[Fernindez et a! 1998]
ACT-R
Thompson

Accuracy
89.70%
91.70%
97.80%
83.00%
55.05%
85.14%

Corpus
Minolta Photocopier Manual and StyleWriter User's Guide
Early Civilization in China, Wheels, and Newsweek
Little Red Riding Hood
TTU CCITT Handbook
The Three Brothers and Rumpelstiltskin
The Three Brothers and Rumpelstiltskin

Table 8.1: Algorithm Comparison

than the capacity proposed in [Miller 1956], for short-term memory. These differences are
possibly an artifact of the differences between what is considered a chunk in the human
mind and what is considered a chunk in the model presented in this thesis.

8.2

Future Work

Inclusion of Additional Semantic Structures
Linking multiple semantic structures in the manner described in Chapter 4 could potentially increase the level of accuracy of anaphora resolution by providing additional
contextual information. An excellent example is from The Three Brothers, where both
the ACT-R model of activation and the Thompson model failed to resolve the antecedent
for the sentence The son that builds the best masterpiece will inherit his house. Both
models resolved the possessive pronoun his with the noun son, since son had the highest
activation within working memory and highest feature set scoring. Unfortunately, this
resolution is incorrect. The pronoun his should resolve to the noun fath er described in
an earlier sentence. If an anaphora algorithm had information relating to the father-son
relationship available, and information relating to the fact that the father owned a house,
the algorithm could use this information to give a higher score to fath er, and resolve the
antecedent correctly. Semantic representations, such as conceptual graphs, are effective
in modelling this type of information. This modularization of modelling the behaviour of
the human mind is supported by psychological literature.

60

Noun Instances in Working Memory and Long-Term Memory
The models of long-term memory and working memory presented in this thesis only interact with single, generalized instances of nouns. For example, within the model of long-term
memory, there exists only a single occurrence of the noun man. The existence of only one
occurrence of a noun is problematic when defining the correctness of an anaphora resolution algorithm when a corpus contains multiple m en. A possible solution to this problem
is the introduction of an additional memory model that is a hybrid of working memory
and long-term memory. Event memory, which was introduced in Chapter 3, is a model
that could be adapted to handle nouns in addition to handling events. Event memory
could contain instances of concepts that are generated as a corpus is read. The instance
of a concept would gradually decay until only the most general concept exists. Long-term
memory would act as a repository for generalized concepts, analogous to generalize event
memory. The hierarchy of generalized concepts and instances of concepts could be realized
by using the conceptual hierarchies introduced in Chapter 2.

Chart Parsing
Chart parsing is a technique for bottom up parsing that avoids parsing the same structure
more than once. Parsed sub-phrases are stored in a database called a chart, which is
consulted when any type of backtracking occurs. A chart parser can be used to increase
the speed of parsing while also creating a parser that is more tolerant to ungrammatical
sentences [Allen 1995], [Russel and Norvig 1995], [Brown 2000], and [Thompson 2001a].
Potentially, a complete parse of a sentence would not be required for anaphora resolution.
Parsing could be limited to just the noun phrase and verb phrase levels, and anaphora
resolution could proceed from there.

61

8.3

C onclusion

This thesis demonstrated the viability of using a modular, two-level memory model to
perform anaphora resolution.

Two models of human memory were used in conjunc-

tion with an anaphora resolution algorithm to solve the problem of pronominal references. Two models of concept activation and decay were implemented and subsequently
tested on corpora of text with varying decay rates. The two-level memory model and
anaphora resolution algorithm achieved resolution accuracy rates of up to 54.10%(ACTR) and 83.61 %(Thompson) for a modified version of Th e Three Brothers corpus, and
56.00%(ACT-R) and 86.67%(Thompson) for a modified version of the Rumpelstiltskin
corpus. Although the results fall a bit short of the results from other works(with the
exception of [Fernindez et al 1998]), these results are only a baseline for additional work.
The model is intended to be an expansive model of human memory. It is theorized that
adding additional semantic representations, and anaphora resolution algorithms, would
increase the accuracy of the two-level memory model.

62

Bibliography
[Allen 1995] James Allen.

1995.

Natural Language Understanding.

jamin/Cummings Publishing Company, Redwood City, CA, 2nd edition.

The Benvii, 5, 11,

61

[Alshawi et al 1989] H. Alshawi, D.M. Carter, J. van Eijck, R.C . Moore, D.B. Moran,
F.C.N. Pereira, S.G. Pulman, and A.G. Smith. 1989. Research Programme in Natural
Language Processing. Technical report , SRI International, Cambridge, UK. vi, 13, 15,
45

[Altmann 1999] Gerry T.M. Altmann. 1999. Thematic Roles Assignment in Context.
Journal of Memory and Language, 41(1):124- 145, July. vi, 12, 13

[Anderson and Matessa 1997] John R. Anderson and Michael Matessa. 1997. A Production System Theory Of Serial Memory. Psychological R eview, 104:728- 748. 21 , 22

[Anderson et al 1998] John R. Anderson, Dan Bothell, Christian Lebiere, and Michael
Matessa. 1998. An Integrated Theory of List Memory. Journal of Memory and Language, 38:341- 380, May. 22

[Anderson et al 2001] John R. Anderson, Raluca Budiu, and Lynne M. Reder. 2001. A
Theory of Sentence Memory as Part of A General Theory of Memory.

Journal of

M emory and Language, 45:337- 367, October. 20 , 32

[Anderson 1976] John R. Anderson. 1976. Language, M emory, And Thought. Erlbaum,
Hillsdale, N J. 20

[Anderson 1983] John R. Anderson. 1983. The Architecture Of Cognition. Harvard University Press, Cambridge, MA. 20

63

[Baddeley and Hitch 1974] Alan D. Baddeley and Graham Hitch, 1974. The psychology of

learning and motivation: Advances in research and theory, volume 8, chapter Working
Memory, pages 47- 89. Academic Press, New York, NY. 18

[Baddeley 1986] Alan D. Baddeley. 1986. Working Memory. Oxford P sychology Series.
Oxford University Press, Oxford. 19, 41

[Baddeley 1990] Alan D. Baddeley. 1990. Human Memory: Theory And Practice. Allyn
Bacon, Boston. 19, 41

[Baddeley 2000] Alan Baddeley. 2000. The episodic buffer: a new component of working
memory? Trends in Cognitive Sciences, 4(11):417- 423, November. 18

[Belew 1987] R. K. Belew. 1987. A connectionist approach to conceptual information
retrieval. In !GAIL '87: Proceedings of the 1st international conference on Artifi cial

intelligence and law, pages 116- 126, New York, NY, USA. ACM Press. 6

[Bennet 1995] P aul Bennet. 1995. A Course in Generalized Phras e Structure Grammar.
Studies in Computational Linguistics. UCL Press Limited , London, England. 45

[Berger et al 2004] Helmut Berger, Michael Dittenbach , and Dieter Merkl.

2004.

An

adaptive information retrieval system based on associative networks. In CRPIT '04:

Proceedings of the first Asian-Pacific conference on Conceptual modelling, pages 27- 36,
Darlinghurst, Australia, Australia . Australian Computer Society, Inc. 6

[Brown et al 2001] Charles Grant Brown, Lynda Williams, David Kubert , Clifford J ames
Thompson, Neil Hagen, Dan Wolczuk, Cameron Rowe, Kurt Bolko, J ason Hess, Doug
Smith, David Fidler, Mike Sluszkiewicz, John Everman, Richard Little, Ho Shum, and
Alex Yung. 2001. The Language Engine Project 1996-2001. University of Northern
British Columbia, Prince George, BC. 45

[Brown 2000] Charles Grant Brown. 2000. Natural Language Processing Manuscripts.
University of Northern British Columbia, Prince George, BC. 61
64

[Brown 2003] Charles Grant Brown. 2003. Lecture Notes in Computational Linguistics.
University of Northern British Columbia, Prince George, BC. 34, 35, 46

Time, Tense, and the Verb: A Study in Th eoretical and
Applied Linguistics, with Particular Attention to Spanish. University of California Press,

[Bull 1960] W. Bull.

1960.

Berkeley, CA. 10

[Callaway and Lester 2002] Charles B. Callaway and James C. Lester. 2002. Pronominalization in General Discourse and Dialogue. In Proceedings of the 40th Annual Meeting on

Association for Computational Linguistics, pages 88- 95. Association for Computational
Linguistics, July. 60

[Carter 1985] David Maclean Carter. 1985. A Shallow Approach To Anaphora Resolution.
PhD Thesis 88, University Of Cambridge, Camdridge, England. 34, 35, 46

[Carter 1990] David M. Carter. 1990. Control Issues In Anaphora Resolution. Journel of

S emantics, 7:435- 454. 34, 35, 46

[Chung and Moldovan 1993] Sang-Hwa Chung and Dan I. Moldovan. 1993. Modeling
Semantic Networks On The Connection Machine. Journal Of Parallel And Distributed

Computing, 17:1993. 6

[Collins and Quillian 1969] A. Collins and M.R. Quillian. 1969. Retrieval Time from
Semantic Memory. Journal of Verbal Leanring & Verbal Behavior, 8:240- 247. 6

[Culce-Murcia and Larsen-Freeman 1999] Marianne Culce-Murcia and Diane LarsenFreeman. 1999. The Grammar Book: An ESL/ EFL Teacher's Course. Heinle & Heinle,
Boston, MA , 2nd edition. 10

[Federmeier and Kutas 1999] Kara D. Federmeier and Marta Kutas. 1999. A Rose by
Any Other Name: Long-Term Memory Structure and Sentence Processing. Journal of

Memory and Language, 41:469- 495. 19

65

[Femindez et al 1998] A. Fernindez , M. Palomar, and L. Moreno. 1998. Anaphor resolution in unrestricted texts with partial parsing. In Proceedings of the 36th annual

meeting on Association for Computational Linguistics, pages 385- 391 , Morristown , NJ ,
USA. Association for Computational Linguistics. 60, 62

[Fader 1983] J erry A. Fader. 1983. Modularity of Mind: An Essay on Faculty Psychology.
MIT Press, Cambridge, MA. 32

[Gathercole and Baddeley 1993] Susan E. Gathercole and Alan D. Baddeley. 1993. Work-

ing Memory and Language. Essays in Cognitive P sychology. Lawrence Erlbaum Associates , Hove, United Kingdom. 18, 41

[Geurts 1999] Paul Geurts. 1999. Presuppositions and Pronouns. Current Research in the
Semantics/Pragmatics Interface. Elsevier, Amsterdam. 33

[Ginzburg and Cooper 2001] Jonathan Ginzburg and Robin Cooper.

2001. Resolving

Ellipsis In Clarification. In Proceedings of 39th Annual Meeting of the Association for

Computational Linguistics, pages 236- 243. 33

[Grosz and Sidner 1986] Barbara J. Grosz and Candice L. Sidner. 1986. Attention, Intentions, and the Structure of Discourse. Computational Linguistics, 12(3):175- 204.

34,

59

[Grosz 1977] Barbara J. Grosz. 1977. The Representation And Use Of Focus In Dialogue
Understanding. Technical report , SRI International. 33, 34

[Group 2004] ACT-R Research Group.

2004.

ACT-R Tutorial: Unit 1.

http:/ /act-

r.psy.cmu.edujtutorials, January. 21

[Haarmann et al 2003] Henk J . Haarmann, Edddy J. Davelaar, and Marius Usher. 2003.
Individual Differences In Semantic Short-Term Memory Capacity and Reading Comprehension. Journal of Memory and Language, 48:320- 345. 19

66

[Hardt 1997] Daniel Hardt. 1997. An Empirical Approach to VP Ellipsis. Computational
Linguistics, 23(4):525- 541, December. 33

[Hebb 1949] D.O. Hebb. 1949. Th e Organization of B ehavior. Wiley, New York. 23, 27,
30

[Hirst 1981] Graeme Hirst. 1981. Discourse-oriented anaphora resolution in natural language understanding: a review. Computational Linguistics, 7(2):85- 98. 59

[Hobbs 1986] Jerry R. Hobbs, 1986. R eadings In Natural Language Processing, chapter
Resolving Pronoun References, pages 339- 352. Morgan Kaufmann Publishers, Los Altos, CA. 34, 35, 60

[Hunt 1973] Eart Hunt , 1973. Computer Models Of Thought And Language, chapter The
Memory We Must Have, pages 343- 371. Freeman and Company, San Francisco, CA.
22 , 23

[Kay 1971] P. Kay. 1971. Taxonomy and semantic contrast. Language, 47:866- 887. 9

[Kazuhiro et al 1992] Kimura Kazuhiro, Suzuoka Takashi, and Amano Sin-ya.

1992.

Association-based natural language processing with neural networks. In Proceedings
of the 30th annual meeting on Association for Computational Linguistics, pages 224231, Morristown, NJ , USA. Association for Computational Linguistics. 6

[Koh and Clifton 2002] Sungryong Koh and Charles Clifton. 2002. Resolution of the Antecedent of a Plural Pronoun: Ontological Categories and Predicate Symmetry. Journal
of M emory and Language, 46(4):830- 844, May. 46

[Levine et al 2000] Willian H. Levine, Alexandria E. Guzman, and Celia M. Klin. 2000.
When Anaphor Resolution Fails.

Journal of M emory and Language, 43( 4):594- 617,

November. 37, 46, 54

67

[Logie 1996] Robert H. Logie, 1996. Th e Seven Ages of Working M emory, pages 31- 65.
Counterpoints: Cognition, Memory, and Language. Oxford University Press, New York,
NY. 17

[Luger and Stubblefield 1998] George F Luger and Wlliam A Stubblefield, 1998. Artificial
Intelligence, chapter 14, pages 690- 700. Addison Wesley Longman Inc, Reading, MA ,
3rd edition. 27

[Ma and !sahara 2000] Qi Ma and Hitoshi !sahara. 2000. Semantic Networks Represented
by Adaptive Associative Memories. Nerocomputing, 34:207- 225, September. 6, 9

[McCulloch and Pitts 1943] W.S. McCulloch and W. Pitts. 1943. A logical calculas of the
ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:115- 133.
23

[Miller 1956] G.A. Miller. 1956. The Magical Number Seven, Plus or Minus Two: Some
Limits On Our Capacity For Processing Information. Psychological Review, 63:81- 97.
18, 21, 60

[Mitkov 1998] Ruslan Mitkov. 1998. Robust pronoun resolution with limited knowledge.
In Proceedings of the 36th annual meeting on Association for Computational Linguistics,
pages 869- 875, Morristown, NJ , USA. Association for Computational Linguistics. 34,
37, 60

[Nash-Webber and Reiter 1977] Bonnie Nash-Webber and Raymond Reiter.
1977.
Anaphora And Logical Form: On Formal Meaning Representations For Nat ural Language. In Proceedings of 5th Int ernational Joint Conference on Artificial Intelligence,
volume 1, pages 121- 131. 33

[Ocker bloom 2006] John Mark Ocker bloom. 2006. Grimm's Fairy Tales. http: I /www.
cs . cmu.edu/-spok/grimmtmp. 49

68

[Partee 1984] Barbara Partee. 1984. Nominal and Temporal Anaphora. Linguistics and
Philosophy, 7:243- 286. 33, 34

[Piwek and Krahmer 2000] Paul Piwek and Emiel Krahmer. 2000. Presuppositions in
Context: Constructing Bridges. In Emiel Krahmer and Paul Piwek, editors , Vari eties
Of Anaphora, pages 87- 96, Birmingham England. ESSLLI. 33

[Quillian 1968] M.A. Quillian, 1968. Semantic Information Processing, chapter Semantic
Memory, pages 216- 270. MIT Press, Cambridge, MA. 6

[Reichenbach 1947] H. Reichenbach. 1947. Elements of Symbolic Logic. Macmillian, New
York. 11

[Richardson 1996] John T.E. Richardson, 1996. Evolving Concepts of Working Memory,
chapter 1, pages 3- 30. Counterpoints: Cognition, Memory, and Language. Oxford
University Press, New York, NY. 17

[Rosch et al 1976] E. Rosch, C.B. Mervis, W.D. Gray, D.M. Johnson, and P. Boyes-Braem.
1976. Basic objects in natural categories. Cognitive Psychology, 8:382- 439. 9

[Russel and Norvig 1995] Stuart J. Russel and Peter Norvig. 1995. Artificial Intelligence:
A Modem Approach. Prentice Hall Series In Artificial Intelligence. Prentice Hall, Upper
Saddle River, NJ. 24, 61

[Schank 1986] Roger C. Schank, 1986. R eadings In Natural Language Processing, chapter
Language And Memory, pages 171- 191. Morgan Kaufmann Publishers Inc., Los Altos,
CA. 22

[Sidner 1979] Candice L. Sidner.

1979. Toward A Computational Theory of Definite

Anaphora Comprehension in English. PhD Thesis 537, MIT Artificial Intelligence Laboratory, Cambridge, MA. 34, 36, 59

69

[Sidner 1983] Candice L. Sidner, 1983. Focusing in the Comprehension of Definite
Anaphora, chapter 5, pages 267- 330. M.I.T. Press, Cambridge, MA. 3, 34

[Sowa 1976] John F. Sowa. 1976. Conceptual Graphs for a Data Base Interface. IBM
Journal of R esearch and Development, 20:336- 357. 7

[Sowa 1979] John F. Sowa. 1979. Semantics of conceptual graphs. In Proceedings of
the 11th annual meeting on Association for Computational Linguistics, pages 39- 44,

Morristown, NJ, USA. Association for Computational Linguistics. 7

[Sowa 1984] John F. Sowa. 1984. Conceptual Structures: Information Processing in Mind
and Machine. Addison-Wesley, Reading, MA. 6, 7

[Sowa 2000] John F. Sowa. 2000. Knowledge Representation: Logical, Philosophical, and
Computational Foundations. Brooks/ Cole, P acific Grove. 8, 12

[Sterling and Shapiro 1999] Leon Sterling and Ehud Shapiro. 1999. The Art of Prolog.
MIT Press, 3rd edition. 44

[Thompson and Hagen 2001] Clifford James Thompson and Neil Hagen. 2001. CPSC
496 Relative Clauses, Quantifiers and Determiners. University of Northern British
Columbia, Prince George, BC.

[Thompson 2001a] Clifford J ames Thompson. 2001a. CPSC 674 Bottom-Up P arsing:
Chart Parsing. University of Northern British Columbia, Prince George, BC. 61

[Thompson 2001b] Clifford James Thompson. 2001b. CPSC 674 Grammars & Syntactic
Analysis. University of Northern British Columbia, Prince George, BC.

70

A PP EN DIX A

Corpora
The Three B rot hers 3
There was a man who had three sons. He had nothing in the world. Each son wanted the
house after his death. The Father loved them. He did not know what he should do. He
did not wish to sell the house. It had belonged to his forefathers. He conceived a plan
and told his sons that they must learn a trade. The son that builds the best masterpiece
will inherit his house.
The sons were content with this. The first son was determined to be a blacksmith.
The second son wanted to be a barber. The third son desired to be a fencing master.
They set a time when they should come home.
The brothers found skillful masters who taught them their trades. The blacksmith had
to shoe horses that belonged to the king. He believed that he would inherit the house.
The barber shaved only distinguished people. He believed that his father would give the
house to him. The fencing master suffered many blows to his body but he grit his teeth.
He thought that he would win the house.
The brothers returned home to t heir father. They did not know when they would
demonstrate their skills to their father. The brothers sat and contemplated what they
could do. A hare ran across the field. The barber took his basin and soap. He lathered
until the hare drew near. He soaped and saved the hare's whiskers while he was running
at his top speed. He did not cut his skin or a hair on his body. The father was delighted.
A nobleman can in his coach and at full speed. The blacksmith ran towards the coach.
He took four horseshoes off the horse while it was galloping and put new shoes on him.
The father thought that this was wonderful.
3

Adapted from http: I /www . cs. emu. edu;-spok/grimmtmp/094 . txt

71

The third son asked to demonstrate his skills. It began to rain an the son drew sword.
The sword flourished backwards and forwards above his head. No raindrops fell upon him .
The rain fell harder and harder. He flourished his sword and remained dry. His father
was amazed at this and gave his house to the third son.
His brothers were satisfied with this. They decided to line together since they loved
eachother. The brothers continued their trades and earned a good living. They lived
happily until they grew old. One brother became sick and died. The brothers grieved
intensely and they became ill and died. They were laid in the same grave because they
loved eachother.

72

Rum p elstiltskin 4
There was a miller who was poor but had a beautiful daughter. The miller visited the
king and told him that his daughter could spin straw into gold. The king said that this
was an art and pleased him. He requested that the miller bring his daughter to the palace.
The girl was brought to the king. He took her into a room that was filled with straw.
She was given a spinning wheel and a reel. The king demanded that she complete the
work by tomorrow or die. He locked the room and left the daughter. The poor daughter
sat there and wept. She knew that she could not spin straw into gold.
The door opened and a little man entered the room. He asked the girl why she was
crying. She told him that she must spin straw into gold. The little man told her that
he could spin the straw for a price. He asked the girl what she could give to him. The
daughter offered a necklace to him. The man took the necklace and sat at the spinning
wheel. He spun the straw into gold.
The king returned in the morning and saw the gold. He was astonished and delighted.
His heart filled with greed. The daughter was taken to a larger room that was filled with
straw. He demanded that she complete the work by tomorrow.
The girl sat in the large room and cried. The little man returned and told her that
he could spin the straw into gold. He asked her what she could give for the task. The
daughter gave a ring to the small man. He grabbed the ring and spun the straw into gold.
The king returned and was amazed by the feat. He demanded that she spin more gold.
The daughter was placed in a larger room. The king asked that she complete the task by
the morning. He thought that she would be his wife when the task was completed. The
manikin returned when the girl was alone. He asked what she would give for the task.
She answered that she had nothing. The girl promised to give her first child when she
becomes queen. She did not think that this would happen. The little man spun the straw
into gold.
4

Adapted from http: I /www. cs. emu. edu;- spok/ grimmtmp/044. txt

73

The king returned in the morning. He found what he wished. The king took the girl
in marriage and she became queen. She brought a beautiful child into the world . The
queen gave no thought to the manikin. He entered her room and asked for her child. She
was surprised and offered riches to him. The manikin refused the offer. The queen began
to cry and the little man felt pity. He said that she could keep her child but she must
guess his name in three days. She sent a messenger across the country. He searched for
any name that might exist.
The manikin returned the next day. The queen guessed Casper and Melchior and
Balthazar and other names that she knew. He said that she was incorrect. She sent
a messenger on the second day. The queen asked for uncommon names. She guessed
Shortribs and Sheepshanks and Laceleg but was incorrect.
The messenger found the manikin's house and overheard his name. The queen was
delighted. The manikin returned on the final day and ask for a name. She guessed Conrad
and Harry. He said that she was incorrect. She guessed Rumpelstiltskin. The manikin
became angry and was pulled into the earth.

74

APPENDIX B

Tagged Corpora
The Three Brothers
There was a Iman h who had three Isons b· ~

had nothing in the world. Each son

wanted the house after Ihis h death. The Ifather h loved Ithem b.

IHe h did not know
what Ihe h should do. IHe l1 did not wish to sell the Ihouse 6. [li1 had belonged to Ihis h

I hconceived a plan and told Ihis hIsons l2 that Ithey ~ must learn a trade.

forefathers. He

The I son babe that builds the best masterpiece will inherit Ihis l1 house.

The Isons bwere content with this. The first Ison ba was determined to be a I blacksmith ba·

I

The second Ison l2b wanted to be a barber

b. The third Ison be desired to be a Ifencing master ~e

IThey~ set a time when Ithey pshould come home.
The Ibrothers b found skillful I

te ~ who taught Ithem b Itheir k trades. The

I blacksmith ba had to shoe horses that belonged to the king. ~

believed that Ihe l2a

would inherit the house. The Ibarber bb shaved only distinguished people. IHe bb believed
that Ihis bb father would give the house to Ihim bb· The I fencing master l2e suffered many

blows to Ihis bebody but Ihe begrit Ihis beteeth. IHe bethought that Ihe bewould win the
house.

I

breturned home to Itheir bfather. IThey~ did not know when I they~
would demonstrate Itheir bskills to Itheir bIfather h. The Ibrothers bsat and contemThe brothers

plated what Ithey~ could do. A Ihare b ran across the field. The Ibarber bb took Ihis bb

basin and soap . IHe bb lathered until the Ihare b drew near. IHe

b soaped and saved the

Ihare's b whiskers while Ihe bb was running at Ihis bb top speed. IHe bb did not cut Ihis b

I bbody. The Ifather hwas delighted.

skin or a hair on his

A Inobleman b can in Ihis b coach and at full speed. The Iblacksmith ba ran towards
the coach. ~

took four horseshoes off the I horse 17 while lli]7 was galloping and put

new shoes on Ihim 17. The Ifather hthought that this was wonderful.
75

The Ithird son l2e asked to demonstrate I his be skills. It began to rain and the Ison l2e
drew I his be sword. The sword flourished backwards and forwards above Ihis be head . No
raindrops fell upon I him be· The rain fell harder and harder. ~e flourished ~ e

I

I

I hIhouse bto

sword and remained dry. His be father l1 was amazed at this and gave his
the I third son be.

I His 6e I brothers l2 were satisfied with this. IThey~ decided to line together since

I they ~ loved eachother.
living. ~

The I brothers 6 continued Itheir b trades and earned a good

lived happily until! they~ grew old. One brother became sick and died.

The I brothers 6 grieved intensely and I they ~ became ill and died. IThey !2 were laid in
the same grave because I they ~ loved eachother.

76

Rumpelstiltskin
There was a Imiller hwho was poor but had a beautiful/

hte ~ The Imiller l1 visited

~ and told Ihim 6that Ihis h/daughter /2 could spin straw into gold. The

the

I

said that this was an art and pleased him
/daughter

pto the palace.

The / girl/2 was brought to the / king ~

6. IHe 6requested that the Imiller hbring Ihis h
~ took Iher b into a room that was filled

with straw. I She b was given a spinning wheel and a reel. The/
complete the work by tomorrow or die. IHe 6locked the I

I

The poor /daughter /2 sat there

~

~ demanded th t~
~ and left the/ daughter 12·

band wept. IShe b knew that Ishe b could not spin straw

into gold.

I

The door opened and a little Iman k.entered the room. ~ asked the girl/2 why

Ishe b was crying. IShe b told Ihim k that Ishe b must spin straw into gold. The little

Iman k told I herb that ~ could spin the straw for a price. ~ asked the Igirl/2 what
Ishe bcould give to Ihim k. The /daughter /2 offered a necklace to Ihim k. The Iman k took
the necklace and sat at the spinning wheel. ~ spun the straw into gold.
The / king ~ returned in the morning and saw the gold.
delighted. IHis 6heart filled with greed . The/

IHe 6 was astonished and

hte ~ was taken to a larger room that

I bcomplete the work by tomorrow.

was filled with straw. ~ demanded that she
The I girl

r

sat in the large room and cried. The little Iman k returned and told Iherb

I k asked Iher l2 what Ishe b could give for the

that ~ could spin the straw into gold. He

task. The/ daughter /2 gave a ring to the small I man k· ~ grabbed the ring and spun
the straw into gold.
The /

~ returned and was amazed by the feat. ~ demanded that Ishe b spin

I

I

more gold. The daughter l2 was placed in the largest room. The / king ~ asked that she b
complete the task by the morning. ~ thought that Ishe bwould be Ihis 61 wife bwhen

I

the task was completed. The manikin

kreturned when the Igirll2 was alone. ~ asked

I bwould give for the task. IShe banswered that ~

what she

promised to give Iherb first child when Ishe l2 becomes Iqueen
77

had nothing. The Igirll2

p· IShe bdid not think that

this would happen. The
The I king

tt ~

~ spun the straw into gold.

rreturned in the morning. ~ found what ~ wished. The Iking rtook

the Igirll2 in marriage and I she b became I queen 12· ~ brought a beautiful child into
the world. The Iqueen

pgave no thought to the Imanikin k. ~e te e Iherb room and

asked for Iherb child. ~ was surprised and offered riches to Ihim

refused the offer. The Iqueen

k. The Imanikin k

pbegan to cry and the little Iman kfelt pity. ~ said that

could keep ~ child but Ishe b must guess Ihis k name in three days. IShe b sent

~

bsearched for any name that might exist.
The Imanikin kreturned the next day. The Iqueen l2 guessed Casper and Melchior and

a 1messenger 15 across the country. IHe

Balthazar and other names that Ishe b knew. ~ said that Ishe b was incorrect. IShe b
sent a Imessenger

r on the second day. The Iqueen pasked for uncommon names. IShe b

guessed Shortribs and Sheepshanks and Laceleg but was incorrect.
The Imessenger

rfound the Imanikin's khouse and overheard Ihis kname. The Iqueen 12

I

was delighted. The manikin

kreturned on the final day and ask for a name. IShe bguessed

Conrad and Harry. ~ said that Ishe b was incorrect. IShe b guessed Rumpelstiltskin.

I

The manikin

k became angry and was pulled into the earth.

78