A DIVERSE USER MODEL
IN THE CONTEXT OF AN INTELLIGENT TUTORING SYSTEM
by
Nathan Keim
B.Sc., University of Northern British Columbia, 2000

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
in
MATHEMATICAL, COMPUTER, AND PHYSICAL SCIENCES
(COMPUTER SCIENCE)

© Nathan Keim, 2003
THE UNTVERSITY OF NORTHERN BRITISH COLUMBIA
March 2003

All rights reserved. This work may not be
reproduced in whole or in part, by photocopy
or other means, without permission of the author.

1*1

National Library
of Canada

Bibliothèque nationale
du Canada

Acquisitions and
Bibliographic Services

Acquisitions et
services bibliographiques

395 Wellington street
Ottawa ON K1A0N4
Canada

395, rue Wellington
Ottawa ON K1A0N4
Canada
Yw rfüe Votre
Our&e Noire rétërence

The author has granted a non­
exclusive licence allowing the
National Library of Canada to
reproduce, loan, distribute or sell
copies o f this thesis in microform,
paper or electronic formats.

L’auteur a accordé une licence non
exclusive permettant à la
Bibliothèque nationale du Canada de
reproduire, prêter, distribuer ou
vendre des copies de cette thèse sous
la forme de microfiche/film, de
reproduction sur p«q)ier ou sur format
électronique.

The author retains ownership of the
copyright in this thesis. Neither the
thesis nor substantial extracts from it
may be printed or otherwise
reproduced without the author’s
permission.

L’auteur conserve la propriété du
droit d’auteur qui protège cette thèse.
N i la thèse ni des extraits substantiels
de celle-ci ne doivent être imprimés
ou autrement reproduits sans son
autorisation.

0-612-80654-5

CanadS

APPROVAL
Name;

Nathan Keim

Degree:

Master of Science

Thesis Title:

A DIVERSE USER MODEL IN THE CONTEXT OF AN
INTELLIGENT TUTORING SYSTEM

Examining Committee:

Chair: Dr. Robert W. Tait
Dean of Graduate Studies
UNBC

Supervisor: Dr. Cljdrles Brown
Associate Professor, Mathematics & Computer Science Program
UNBC

Committee Member: Dr. Han Li
Associate Professor, Psychology Program
UNBC

Committee Mepab®K~-Dr. Liang Chen
Associate^Pfofessor, Mathematics & Computer Science Program
UNBC

Ext
Associa

Date Approved:

xaminer: Dr. Judith Lapadat
Professor, Education Program

11

Abstract

The purpose of this thesis is to implement a variety of tutoring strategies based on a complex user
model and test the resulting program in a pilot study. The tutoring strategies and user model are created
using several ideas in the current literature, including the use of mal-rules. The creation of a tutoring
program that will use the tutoring strategies and user model to tutor subjects who are currently learning
English as their second language is used to test the program. The program is tested using a pilot study
with the control group having the tutoring strategies and user model disabled. Both quantitative and
qualitative data is used to determine the effectiveness of the program. Overall the results of the
program are inconclusive but raises many questions for future research. Therefore, this study shows
that it is possible to implement several different tutoring strategies and a complex user model to create
a tutoring system and provides a starting point for similar research in the area of Intelligent Tutoring
Systems.

Ill

TABLE OF CONTENTS

Abstract

ii

Table of Contents

iii

List of Tables

V

List of Figures

vi

Acknowledgment

vii

Chapter One

Overview
Introduction
Goals/Achievements
Assumptions/Challenges

1
1
4
7

Chapter Two

Background and Previous Work

9

Chapter Three

Program Structure
The User Model
Tutoring Strategies
Feedback
Question Selection
Comparing Against Expected
Lesson Structure
User Interface
Word Check
Parsing
M al-rules
Implementation

25
25
28
30
32
32
35
36
38
39
41
42

Chapter Four

Field Testing
Design
Subjects
Procedure
Instrumentation
Ethics

46
46
47
47
49
49

Chapter Five

Results of Field Testing
Qualitative Data
Quantitative Data

50
50
53

IV

Chapter Six

Conclusions and Future Outlook
Discussion
Conclusion
Future Outlook

60
60
63
65

References

68

Appendix I
Appendix II
Appendix III
Appendix IV
Appendix V
Appendix VI
Appendix VI
Appendix VIII
Appendix IX

71
72
74
76
82
84
86
87
88

List of Tables

Table 1. User Response Scoring System
Table 2. Subject Background Information
Table 3. Pre-test and Post-test Data
Table 4. Time/Lesson Data and Ratio
Table 5. Motivation Score

26
54
56
57
58

VI

List of Figures
Figure 1. A example of the layout of the user model.
Figure 2. An example of the tutoring strategies.
Figure 3. Feedback example.
Figure 4. Example of two correct answers.
Figure 5. Example of two correct answers.
Figure 6. Example of an insignificant mistake.
Figure 7. Analysis of the Correctness of an Answer
Figure 8. Example of a question containing "already".
Figure 9. Screen shot of the program
Figure 10. Example picture from the program
Figure 11. Example of DCG rules and lexicon entries.
Figure 12. Example of Multiple M al-rule Generation
Figure 13. M al-rule creation
Figure 14. A m al-rule on incorrect tense usage.

28
30
31
33
33
34
35
35
37
38
40
42
44
45

vu
Acknowledgments

I would like to acknowledge:
My supervisor: Charles Brown
My committee members: Han Li
Liang Chan
External Reviewer

All of the people who helped me along the way:
Corrine Omand
Keely Hunter
Lennise Mann
Marta Tejero
Patryk Simon
Rozalynd Curry
Yukari Yamamoto
A big thank you to all of the people who agreed to be subjects.

Finally I would like to acknowledge UNBC for giving me this opportunity.

Chapter 1
Overview

Introduction
In the area of Natural Language Processing (NLP) and E n g lish -as-a-S eco n d language (ESL) tutoring systems, there has been a significant amount of work done in the
past 20 years. Many people have focused on improving parsing in both the syntactic (Labrie
& Singh, 1991) and semantic (Lemaire, 1999) areas of language. Others have looked at
different tutoring strategies and user models (Bull, & Pain, 1995). Although the work in each
of these areas is far from complete, it is time to start to look at some of the interactions of the
different user models and tutoring strategies when they are combined.
One of the key components of many tutoring systems is the user model (or
learner model). The user model is the stored information that the program has learned about
the user. This could be as simple as the user’s name or as complex as an analysis of the
abilities of the user. Several tutoring programs have incorporated user models to increase the
ability of the tutoring program (Bull and Pain, 1995). There are many different methods for
using user models to drive the tutoring process of these programs (Bull, & Pain, 1995;
Maciejewski & Leung, 1992). However there is a lack of testing on the effects of combining
these methods.
1 created an Intelligent Tutoring System (ITS) for teaching ESL that adapts to the
user’s responses to increase its ability to effectively tutor. In particular, 1 centered on the user
model and tutoring strategies that allows the program to tailor the tutoring to the individual
user. The purpose of the program is not to create a fully functioning English tutoring
program, but to create a working prototype that could be used to show the potential of the

user model. The goal is to show the potential of using several tutoring strategies (which is
described in detail in Chapter 3) based on an extensive user model, and bow their combined
effect significantly increases the effectiveness of the tutoring program. This increase is based
on comparing the program to a textbook-like setup (that is, computerized sequenced
exercises that lack the "intelligent" component of the program). It is not the goal of this
program to try to replace a person teaching English to second language students.
I am going to try and show that a detailed user model, combined with several
tutoring strategies, allows for a significant increase in the efficiency of bow well the user
leams from the tutoring system. Efficiency, in this case, includes how well and how fast they
learn the material when compared to how well and bow fast they learn the material using a
textbook-like program (for example, my tutoring program with the user model and tutoring
strategies disabled). Also, such a program may be used to increase the motivation of the user,
which in turn increases the effectiveness of the tutoring system (Kalayar et al., 2001; Kinsbuk
et al., 1998). It bas already been shown that these strategies separately increase the success of
the tutoring program (Bailin & Philip, 1988 ; Bull, 1994; Moribiro, 1992), but little work has
been done on the effect of combining some of these strategies, and the effect that these
strategies have on motivation, as well as testing if the results are statistically significant.
It is important to note that the tutoring program created for this thesis is not a
complete tutoring program. It does not teach people the meaning of words and it does not
describe lessons in detail. This tutoring program focuses on the user-system interaction that
occurs during the independent practice stage of learning, following a lesson taught by a
second language teacher. The program is designed to try to be interactive with the learner
(the experimental subject) and provide dynamic feedback as a teacher would. Dynamic

feedback provides a more useful lesson than someone just completing exercises in a textbook.
Therefore, this program would be most appropriately used as a tutoring aid, and must be
analyzed in this context.
The tutoring program tutors a small subsection of English for people learning
ESL. The area of focus is grammatical errors involving verb endings. To restrict the size of
the grammar the program would have to include, the context of university is used (the words
commonly encountered outside of the classroom at university). This context is appropriate
because a large portion of the potential test population, which are people learning English as
a second language, is familiar with this context. Using a subset of English and the context of
university, the tutoring program aids the ESL subjects as they leam English.
In this thesis, there are several definitions that are important to the understanding
of what is being discussed. Some of these terms are described when first used but there are
several key definitions that are important throughout this paper: A user model is a
representation of the knowledge the program contains about the user. This can include
everything from the user’s name to the answers the user has entered in response to the tutor’s
questions. Tutoring, as I use it here, includes the selection of the questions to present to the
user, the analysis of the user’s response, and the feedback the user receives about their
answer. Tutoring strategies are the various forms of analysis of the user model along with the
resulting recommendation to the tutoring program on what to do next. An intelligent tutoring
system (ITS) is a program that tutors based partly upon an analysis of the data known about
the user, as well as by using tutoring strategies to determine how the user should progress. An
E nglish-as-a-Second-L anguage subject (ESL subject) is someone whose first language, the
one that they learned first as a child, is not English and who is now studying English.

English, though it is called a second language, can be learned after several other languages
have been learned. Thus, the term "second language" is misleading. A m a l-ru le is an
incorrect grammar rule. Language coverage is a subset of all of the words and grammar rules
that are from a given language. These definitions can be found in Appendix IX. Appendix
IX also contains definitions of other common terms in the ITS/ESL field.

Goals/Achievements
The main goal of this thesis is to look at the effect of using a complex user model
and several tutoring strategies to increase the effectiveness of an ITS. The user model and
tutoring strategies were developed from a combination of work that has already been done in
this field, as well as some new findings in the ITS field. These user model and tutoring
strategies are applied within a tutoring program to test their effectiveness. The assumption is
that the tutoring program significantly increases the subjects’ ability to leam the material
compared to when the program is tutoring without using the user model and the tutoring
strategies.
To accomplish the above, this research has several goals. It is important to look at
each of these goals and explore how successfully each one is met. The success of all of the
goals leads to the success of the overlying goal: to create a better tutoring system.
One of the main goals of the tutoring program is to use a user model to aid the
tutoring strategies of the program. The ITS gathers information about the user before the
tutoring session (called the user profile) and during the tutoring session (called the user
model). This user model is quite detailed and records a great deal of information about the
user. Several different types of information are recorded to observe their effects on the

tutoring strategies. This information increases the effectiveness of the tutoring system.
Another goal is to create tutoring strategies that take advantage of a complex user
model. There needs to be research that incorporates several different tutoring strategies and
examines what the effects are of combining these strategies. Most of these strategies have
been evaluated separately prior to this program (as discussed in Chapter 2), but are not often
tested working together. The goal is to show that combining these tutoring strategies is
possible and productive.
The program also incorporates m al-rules into the user model and the tutoring
strategies. Some of the m al-rules are predetermined and some are formed while the subjects
use the program. During the tutoring session, the m al-rules that are used by the subject are
saved in the user model. As m al-rules have not been widely used except in a few studies
(Brown, 2002; Sentence & Pain, 1995), their effects on the success of the program is
relatively unknown. Even though their individual effect cannot be determined through this
research, it is interesting to see the effect of combining the m al-rules with the other parts of
the user model and tutoring strategies is.
The user model combined with the tutoring strategies should also increase the
subjects’ motivation level while using the program compared to when the user model and the
tutoring strategies are inactive. Increasing the subjects’ motivational level is useful because it
is likely to contribute to the subject’s persistence while using the program, as well as
enhancing the learning outcomes. The key to motivational learning is having interactive
lessons which engage the user and make them want to complete the task.
The program focuses on tutoring about a small subset of English. To ensure that
the subjects are familiar with most of the words used during the tutoring sessions, the

program concentrates on the context of university (words encountered at university out of a
classroom setting). This makes the program easier to build and gives the subjects a greater
chance of understanding the questions.
The program is built based on the design in Chapter 3. The main goal of this
structure is to provide a framework for the tutoring strategies and user model, so the program
is able to be tested without the framework itself affecting the final results. This framework
also shows whether the user model and the tutoring strategies can be applied to a tutoring
program. As they use both established and new concepts, this is important to achieve.
As discussed in Chapter 2, one of the main areas lacking research in the area of
ITS is the area of testing the effectiveness of tutoring programs through the use of statistics.
Therefore, one of the main goals of this thesis is examine whether there is a statistically
significant difference between using the user model and tutoring strategies in combination,
and using the program without the user model and tutoring strategies. This is tested by
performing a quantitative analysis to determine the success of the program. There is also a
qualitative component to this research that examines the effectiveness of the program. The
success of the program is determined by how much the user improves their English ability
while using the program, as well as on how fast they are able to learn the information.
To successfully test the tutoring program, the subjects’ increase in grammatical
performance needs to be measured. This is done through a comparison of scores between a
pre-test (before program use) and a post-test (after program u s e ) . The subjects’
performance is also gauged by keeping track of the amount of time that is taken to complete a
section in the program. This is important as the benefit of the program may not be more
knowledge retained, but that the subject is able to obtain the same level of knowledge faster.

7
T -tests will be used to compare the experimental group against the control group to examine
the difference between the pre-test, post-test levels, motivation scores, and how far the
subjects progress in the program within a given amount of time.

Assumptions/Challenges
Some assumptions were made in order to constrain the size of the tutoring
program. These assumptions and a discussion of their possible impact follows.
The first major assumption made is that the subjects are at a competence level at
which they can understand general instructions in English, but have a lack of proficiency in
the area being tutored. However, this lack of proficiency may result in certain of the
instructions not being understood. This was dealt with as well as possible both while creating
the program and during the course of the testing. If necessary, the instructions were explained
in several different ways so that the subject could understand them. Conversely, the subjects
must not be too skilled at English to effectively test the program. Subjects being too skilled,
however, is not seen as a problem as the program should recognize that the subjects are
skillful and tutor them appropriately (in the case, rushing them through the program). This
may cause the time of program use to be a very important factor when looking at the
effectiveness of the program.
The next assumption is that the program is able to ignore/work around small
errors in the subjects’ response that are not related to the area being tutored. This is
important, as this is a key strategy when tutoring in second languages (Holland et ah, 1993).
Initially, this does not seem like a large problem, until you consider the number of possible
correct/incorrect answers in natural language that the user may respond with. Due to the very

8
large size of such a set (since there can be an infinite amount of errors created within a
language), it would be nearly impossible for a program of this size to be able to handle all of
the potential "ignorable" mistakes that may arise. However, the program does have the
capability of dealing with most of these "ignorable" mistakes and the assumption is made that
the ones that are missed do not affect the final outcome of the success of the program.
This thesis assumes that the user model increases, and does not decrease the
tutoring ability of the program. Assuming this is true is important when performing the
statistics on the data to see if the program performed significantly better when the user
model was in place. Therefore, the assumption is that the increase in tutoring ability should
be directly related to whether the user model is present or not. This is a standard assumption
in statistical analysis of studies like this one.
There is a chance that there are words used in the tutoring program that the
subjects do not know. This is addressed in a few ways. First, the addition of pictures that
relate to the questions allow the subjects to make an educated guess as to what a word means.
Second, the subjects are allowed to use aids and dictionaries to find the meanings of words.
This is allowed, as a full tutoring program would have the capability to teach words to the
subjects. Since this tutoring system does not have this functionality, another method for
obtaining meanings for the words involved is used.
As can be seen by the above, this was quite a complicated project to put together.
However, even with all of the assumptions and challenges above, the results of this
experiment show some of the great potential that exists in the different user models and
tutoring strategies that are available today. This research provides many new questions for
future study.

Chapter 2
Background and Previous W ork
ITS for Tutoring Second Languages

Introduction
To effectively create a tutoring program, one must first examine the field of
Intelligent Tutoring Systems (ITS) for tutoring second languages. This field is quite new and
has only been researched for the past 15 years. ITS uses a combination of linguistics,
programming, and tutoring to create a good tutor that helps facilitate learning in the subject.
There is agreement in the literature that ITS for second language learning yields better
learning outcomes than the average textbook, but it is unclear what methods should be used
to build such a system. This combination creates a paradigm that draws researchers to solve
the problem of making an effective ITS for second language learning.
This review summarizes the research and development literature on: the target
audience of ITS for second language learning, language coverage of existing tutoring
systems, parsing and robust parsing, tutoring strategies, error handling, user models, and
future outlooks of the field. Some of these areas contain extensive research, where as others
are still quite new. Several methods are not covered, that look at the background of ITS, but
most of the mainstream methods are covered below.
The goal of an ITS for second language learning is to teach the user a second
language. This is a very broad goal and most systems today restrict the scope of tutoring.
This could involve a system that focuses on job-related conversations, or on being able to get
directions. The goal of the system must be kept in mind when looking at all other aspects of

10
the system. Some general goals of language tutoring systems are: understanding text,
mastering the grammar rules of the language, producing texts, and conversing within the
language (Schwind, 1990).

Target Audience
The target audience of Intelligent Tutoring Systems (ITS) consists of people
interested or involved in learning a second language. This audience can potentially be
learning a variety of different languages. Some example of languages included in intelligent
tutors are English (Sentence & Pain, 1995), French (Hagen, 1994), Chinese (Wang &
Roberto, 1993), European Portuguese (Bull, 1994), German (Schwind, 1995), and Japanese
(Nagata, 1995). This is not a complete list, but gives a good sample of how large an audience
these intelligent tutors cover. Since people benefit most from using a tutoring program during
the first few years while learning a language, most of these tutoring programs target this
group of people.
Another factor that influences the target audience is the final goal of the system.
Is it the goal of the system to make a tutoring program for engineers trying to read technical
literature in a different language (Maciejewski & Leung, 1992), or for learning Japanese
Interpersonal Expressions (Kai & Nakamuri, 1995). These end goals directly influence not
only the target audience, but the whole makeup of the system as well.

Language Coverage
The language coverage of language tutoring systems is as different as the systems
themselves. Depending on the goal of the system, the language coverage can be small (as in

11
Labrie and Singh (1991) where a few pages can contain the entire coverage), to very large (as
in Farghali (1989) which includes the entire Webster’s seventh Collegiate Dictionary). Most
often the coverage is based on one aspect in the language. Examples of this are articles in
English (Sentence and Pain, 1995) and clitic pronoun placement (Bull, 1994).
The coverage of languages such as Englsh and Chinese seem to be more
extensive than languages like European-Fortuguese. I think this is due to the number of
people that speak the respective language and the resources that are available for making
tutoring systems in those languages.
The coverage of the languages that have been looked at more extensively cover
most of the aspects of the language. English, for example, has been covered quite well
syntactically by the literature. Semantic coverage is relatively small for all languages covered
in the literature. A more detailed list of language coverage can be found in Appendix II.

Parsing/Robust Parsing
It is not necessary to have a natural language processor (NLP) for creating a
language tutoring system. The most common language processors are ones that use pre­
stored answers to compare to the users’ input. An example of a processor using pre-stored
answers is CATACROC (Civil & Estella, 1992). These systems can be successful but often
lack flexibility.
One of the main advantages of tutoring systems with N LP’s is their flexibility
(Farghaly, 1989). Having NLP in a language tutoring system has many advantages. These
advantages include but are not limited to: being able to create a more interactive tutor, being
easier to simulate real life situations, the program becoming more like a native speaker and

12

can relax requirements (e.g., Spelling), exercises that are more communicative and creative,
being easier to provide more/better feedback, reinforceing good solutions and not trivial
fixes, only needing to design lexicon and grammar once (Farghaly, 1989), and supporting
open-ended writing activities (Hagan, 1994). Natural language processors do have some
disadvantages. They are usually very expensive, they occasionally do not work, and they are
usually not very successful dealing with semantics (Holland, 1993).
Parsing is the process of separating language into parts for easier understanding,
and its use in tutoring systems can be very diverse. There are many different parsers that can
be used successfully. Both bottom -up and top-dow n parsers are common. The most common
type of parser is the definite clause grammar (DCG), due to the fact it is one of the easist to
implement. Is sac and Fouquere (1995) made a system called AlexiA that uses a bottom -up
Tree Adjoining Grammar (TAG) parser. The choice of parser depends on the language the
parser is written in and the purpose of the tutor. The most common computer language used
is PROLOG; however, tutors can be written in other languages such as C (Issac & Fouquere,
TAG parser, 1995). Other tutors, having the main purpose of catching all of the user’s errors,
may be better off using chart parser, versus a parser focusing on tutoring strategies that would
not need more than a DCG.
Following is a list of parsing mechanisms commonly used in language tutoring
systems with some advantages and disadvantages for each. A Definite Clause Grammar ’s
(usually includes empty categories and top-dow n) advantages are that it is one of the easiest
parsers to code, the features can easily be added to, and it is easy to convert into a visual
parse tree. Its disadvantages are that it is hard to catch failed parses, it is hard to always
obtain the best parse, especially as the first parse, and can be fairly slow. A Chart parser

13
advantages are that it can catch failed parses, and is useful in catching user errors. Its
disadvantages are that it can use a lot of memory, and it can be quite slow. A TAG parser’s
(Issac & Fouquere, 1995) advantages are that it can use both morphological and syntax rules,
it is possible to use features, and it uses an associative network. Its disadvantages are that its
worst case complexity of 0(n®), may not be easy to implement, and a large language
coverage would be difficult.
Following is a list of parsing approaches commonly used in language tutoring
systems with some advantages and disadvantages for each. Morphological parser’s
advantages are that it can parse the meaning of a sentence. Its disadvantages are thatit is not
easy to cover all of the cases, and it can be hard to code. Syntactic parser’s advantages are
that it is relatively easy to make, and can cover a large portion or all of a language. Its
disadvantages are that it will get syntactically correct sentences that do not make sense. For
example "Colorless green ideas sleep furiously" (Chomsky, 1954). Link parser’s (Brehony &
Ryan, 1994) advantage is that it catches both syntactic and semantic ideas. Its disadvantages
are that it is sensitive to punctuation enors, only the first linkage is used, and sentences that
contain stylistic errors still parse. Unification-based parser’s (e.g. Lexical Functional
Grammar) advantages are that it handles features well, and unique unification method makes
it easier to catch and keep errors that the user makes. Its disadvantage is that it has the same
problems as DCGs.
Robust parsing is a parsing technique that is able to handle errors in the subjects’
input. These errors could include anything from a spelling mistake to a missing article. It is
important to be able to parse errors that a subject makes. This makes it easier to use the
previous errors that the subject has made to help tutor them to avoid similar mistakes in the

14
future. One of the most common ways to allow errors to parse is through the use of m al-rules
(Brown, 2002). M al-rules are either predetermined or inserted when the parser finds a new
error (by reading user input). This concept is still not very apparent in the literature that is
currently present in the field. There are a few programs that do use m al-rules. (For example,
Sentence & Pain; 1995), but there is a disparity between the importance of m al-rules and
how often they appear to be used in language tutoring systems.
Even with advances today, creating a parser for a language that has a precision of
90% or better is still a challenge (Chen et ah, 2001). The naturally drawn conclusion is that a
parser that includes such things as m al-rules do not perform significantly better then 90% as
it is dependent on the success of the parser. One way to deal with this problem is to restrict
the domain of the parsing (Chen et ah, 2001). These increases the precision of the parser as it
can be tailored more easily. Therefore, it is appropriate for a tutoring system to use restricted
domains to help ensure and increase precision when parsing the subjects’ responses.
Intelligent language tutoring systems that allow free-form input are still rare
(Tokuda, 2001). Researchers are currently working on several different methods including
using m al-rules (Brown, 2002) and types of tem plate-based matching (Chen, 2001). One of
the benefits of using m al-rules is that the parser can deal with both grammatical and
ungrammatical input (Heift, 1998). Future research in this area may prove to be quite useful
in allowing free-form input.
How to deal with user errors is currently being studied from many different
angles. Research shows an increasing number of ways to deal with incorrect user responses.
Some examples of dealing with user errors include introducing a template structure into the
tutoring system (Chen et al, 2001), and updating m al-rules based on the u ser’s input (Brown,

15

2002).

Tutoring Strategies
Tutoring strategies are a key component of any tutoring system. Appendix II
shows a list of tutoring strategies currently in use in the field. The tutoring strategies used
depend on the goal of the ITS. For example, the goal of Issac and Fouquere's (1995) Alexia
project was to teach the user the lexical information. With that goal in mind, the user reads
through a scenario, asks questions ahout it, and writes a summary of the scenario. This
process may be an effective method for learning lexical information, but may not be as useful
in developing correct syntax.
Most tutoring strategies tend to revolve around asking a question, getting an
answer, analyzing the answer, and giving feedback. In Kai & Nakamura’s (1995) system the
researchers use this type of tutoring strategy. They give the user a question and then provide
feedback (which is discussed in further sections).
One of the most common tutoring strategies that applies to tutoring second
languages is negative transfer (or first language interference). Negative transfer happens
when the existing knowledge about a subject’s first language interferes with learning a new
language. Negative transfer is a major cause of error when tutoring a second language (Wang
& Garigliano, 1993). Knowing this allows tutoring system developers to tailor their programs
to be receptive to errors caused by negative transfer. For example, Wang & Garigliano (1993)
developed a program that contains 100 Chinese grammar rules and corresponding English
rules, allowing the system to easily catch negative transfer and tutor the user properly in
his/her error.

16
Another important tutoring strategy is the ability to ignore insignificant errors and
tutor only on the important ones. It is very frustrating for a user to receive ten errors when
only one is important to the initial learning of a language. The most common insignificant
errors are spelling mistakes. A spelling mistake can generate many errors, from "incorrect
word" to "missing noun." It is important to be able to pick out the important errors and look
past the trivial ones.
The amount of time the subject is allowed to use the program does not seem to
have been addressed in the literature. Some research does point out they have no time limit
on their exercises, like in Bailin and Thomson’s (1988) VERB CON and PARSER.
Tutoring strategies include the tasks that are given to the user. These tasks are
what motivate the user to interact with the system and stimulate learning. Some of these
possible tasks are sentence construction, translation, pronominalization, transformation of
sentences (e.g., past to present form), composition of sentences, and text understanding and
conversation (Schwind, 1990).
A key component of any tutoring strategy is the feedback that is given to the user.
Typical feedback includes identifying whether the user’s answer is correct or incorrect,
providing hints if the user’s answer is incorrect, pointing out errors, making suggestions, and
explaning answers. It is important for feedback to be clear, informative and complete. Telling
a subject simply that he is wrong is not very helpful. Instead, telling the subject why he/she is
wrong and how to correct the response is more useful. W hen to give hints, answers, and more
opportunities to get the right response vary from system to system. No set method exists for
determining how many hints, answers, or extra chances work best for students learning, other
than trial and error. The important thing is to have good feedback that promotes learning.

17

Error detecting/handling
Error detection and handling are a large part of a language tutor; however, error
detection causes some very difficult decisions and problems. Common types of errors include
spelling errors, syntactic errors (e.g. Missing NP), semantic errors (e.g., "The apple is over
their"), contextual errors (user does not answer the question asked), and constraint violations
(conflicting features). Some of these types of errors are easier to identify and correct than
others. Spelling errors can be found by comparing the input to the lexicon. Syntactic errors
are found by identifying ill-form ed sentences or errors in the unification of some of the
features. An ill-form ed sentence is one that cannot be formed by using the production rules
in the grammar (Schwind, 1995). A more extensive list of what syntactic structures have or
have not been covered is difficult to formulate since most of the literature is not very specific.
The literature that is specific about its syntax rules, for example Labrie and Singh’s M iniprof
(1991) is very limited. Semantic errors are hard to find and are usually handled either in very
limited contexts or through the use of features. Contextual errors are usually only found by
comparing the user’s input against pre-stored answers.
Most of the programs in the literature (with the exception of those that stay
within a very restricted domain) do not catch all of the errors that a user makes. They are
designed to focus on one type of error (which usually corresponds to the goal of the system).
Some exampes are programs that cover tenses, articles, or interpersonal expressions. M ost of
these systems easily identify these common errors. A gap in the literature exists in programs
that try to catch a wide range of errors, which seems to be possible only if you are able to
predict all of the errors that the user is going to make in advance. In the area of language,

18
predetermining user errors is a daunting task and one that requires extensive future research.
So what is the difference between robust parsing and error detection/handling?
With robust parsing, the parser takes the user’s input and parses it even if it is incorrect. The
flaw in robust parsing is that it may not be obtainable what the error was. This means that
robust parsing can interpret the input despite existing errors, but it may not be known what
those errors are. Error detection/handling is the process of "catching" whatever error the user
has made so that it can be used to aid the tutoring strategies (which may have nothing to do
with parsing). It is possible that the error detection/handling process could be said to include
robust parsing, especially if, for example, the error is detected by finding out what m al-rules
were used. It is important to remember this difference, as most programs have error
detection, but not robust parsing.
When presented with a user’s response, there is the possibility that there are
several errors that the program will misidentify as to why the user made the mistakes he/she
did. One way to deal with this problem is through the use of confidence factors (Brown,
2002). If each possible reason the user made the error can be assigned a confidence factor
based on information gathered from the user and other internal sources, the tutoring system
can use the confidence factor to determine the likely sources of the errors and the proper
responses to those errors.
Low level syntactic errors involve a missing or extra word, most commonly an
article or a preposition. High level syntactic errors involve incorrect groups of words, usually
resulting from the use of an incorrect grammar rule (m al-rule) (Schwind, 1995). Learners
make both types of errors. An effective way to deal with high level syntactic errors is through
the use of m al-rules. These m al-rules must be anticipated in order to be able to provide

19
useful feedback to the user (Schwind, 1995). A concern of using anticipated m al-rules
created from the input of a user is that the first time they use a new, unexpected m al-rule,
very little feedback is available. Error detection and handling continues to be one of the focal
point in researching language tutoring systems.
A problem that presents itself when trying to detect a user’s errors is that most
ways of detecting errors must predict ahead of time what errors will be made. This becomes a
problem since people who have different language backgrounds make different errors and
static predictions may not suffice (Heift, 1998). Therefore, a tutoring program that may work
for people whose first language is English may not work for people who have another first
language. To fix these language interference problems, some researchers have started to look
at using user models.

User Model
A user model is stored information about the characteristics of a user of a
program. User models also include the ability to use the stored knowledge about the user to
improve the performance of a program. The simplest user model is a program that asks for
the user’s name and then uses it to make the program appear more personable. More
complicated user models can include such things as the user’s linguistic background,
linguistic skill levels, and problem solving methods (Brown, 2002; Bull, 1994). User models
are becoming a key component to any tutoring system and have many potential benefits.
Not all user models are alike. They are different enough to have different
advantages and disadvantages. Some of the benefits are listed below (though these benefits
do not necessarily apply to all user models). Benefits of a user model include the ability to

20
examine the user model, allows for self assessment, promotes reflection, interactive
diagnoses, students assessment by the teacher, and teacher training (Pavia, 1995).
There are two types of user models: static and dynamic. Static user models are
preset at the beginning of running the program (like asking the user’s name). Dynamic user
models change as the user uses the program (Sentence & Pain, 1995), and tend to store m alrules about the user to aid the tutoring strategies.
Most user models referred to in the literature have one main purpose, to judge the
level of understanding of the user. By judging skill level, the tutor is able to give questions to
the user that are appropriate to his/her skill level. This level of understanding is stored in
several different ways. Bull (1994) used a marker on a continuum based in the acquisition
order of clitic pronouns, with the range going from novice to expert. Another example is Bull
& Pain’s (1995) user model, in which both the user and the computer have confidence scores.
The computer scores the user based on his/her performance on their last five attempts. At the
same time, the user picks his/her own confidence level. The user model compares these two
values and initiates a dialogue between the computer and the user if they are too far apart.
Then the computer uses these scores to determine the order in which the exercises are
presented. Whichever method is chosen, the result is the same. If you know the level of
understanding of the user, you are better able to tutor him/her.
Some user models have the added feature of allowing the user to directly look at
what is stored within it, and may even permit the user to change what is in the user model or
challenge what the program has put there. But do users actually challenge user models? Bull
and Pain (1995) found that users do challenge the user model. This is very important as it
shows that an interactive user model is a possibility. An interactive user model promotes

21
reflection in the user which is likely to promote learning.
More user models that change as the learner uses them are being developed. A
new technique of using m al-rules in a user model is to have them update dynamically as the
learner uses the program (Brown, 2002). This allows the program to not have to predict
ahead of time what the users errors might be, but instead allows the program to form m alrules that are appropriate to the user. This allows for more personalized feedback, and
partially fixes the problem of different language backgrounds as discussed in Heift (1998).
Significant progress in this area has been seen in the last few years. However, the
potential of user modeling may not yet have been reached.

User Interface
When developing a tutoring system for teaching a second language it is important
to consider the interface of such a program. The main problem is that the program may not
be in a language that is completely understood by the user. Lonfils & Vanparys (2001) have
developed some good rules to follow when setting up such an interface (these are ways to
design the icons, but these rules can be expanded to include anything the user interacts with);
keeping it simple, discriminating (do not have two things that look the same), giving
preference to native objects (as they would be more familiar to the user), not being too subtle
(keep associations obvious), keeping actions unique, being consistent, being compatible with
the user’s knowledge about the real world, and assigning clear meanings. This list may seem
like common sense, but it is important to formally follow such a list or the usability of the
program is decreased (Lonfils & Vanparys, 2001).

22
O ther C onsiderations
There are some other considerations that have to be taken into account when
building a second language tutoring system. Since you are writing the program for users who
know a different language, it is very possible that their computer is also different and
compatibility may be an issue (Levison & Lessard, 1992).
Negative transfer is an important issue in the area of ITS. Wang & Garigliano
(1993) built a system sensitive to this concept and found that a significant number of errors
that users make are because of negative transfer. Therefore, any system that is tutoring a
second language should consider taking into account negative transfer.
Research is currently being done on the effect of allowing users access to their
own user models. Preliminary results suggest that this may be a promising area to look at in
future years (Morales et al, 2001).
Predicting user errors using m al-rules also currently is being studied. Fogarty et
al. (2001) tried to predict reading mistakes that children make. Through the use of a database
of over 70,000 oral reading mistakes, they were able to significantly increase the ability of
the tutoring program to detect errors.

Future Outlook
There have been many significant advances in ITS dealing with tutoring second
languages in the past 15 years. However, from a scientific standpoint this is not a very long
time. There is more work that can be done in this field. It may be possible to map the learner
cognitive model (Bonvalot, 1999) and, with this, be able to learn why a user is making a
mistake so that more appropriate tutoring strategies and feedback (Ghemri, 1991) may be

23
used. More studies on the effects of instructional variables on second language learning are
needed (Holland, 1993). Latent Semantic Analysis (LSA) is a corpus-based statistical
mechanism used in some new tutors (Lemaire, 1999) and can improve the interaction
between students and computer tutors (W iemer-Hastings & W iemer-Hastings, 1999). The
lack of knowledge about linguistics, teaching, and learning in the field has held back the
potential of some successful Intelligent Computer Assisted Language Learning (ICAL) ; for
example, the LICE system created by Bowerman (1992) relied more on introspection due to
this lack of knowledge. There needs to be more implementations of ITS dealing with tutoring
second languages. A lot of the current literature deals with what "should" or "could" be good
ways of making such systems. There are actually very few complete working systems in the
field, and those that do work tend to have used either very restricted domains or static
methods (like pre-stored answers) to allow their system to be practical and useable.
As can be seen by the list above, not only is there a significant amount of possible
future research, but the future research can occur in many different areas. This means that it
is important for researchers to both focus on trying to improve each area as well as working
on better methods of integrating the different areas of ITS together.

Conclusion
In conclusion there has been much progress in the area of ITS dealing with
tutoring second languages. Many approaches have been tried to create a good tutoring
program. Although some good tutoring programs have been made, there is no dominant
program or method in the field. This is mainly due to the fact that since the field is so new,
there are still many things that have not been tried. There is a particular lack in the area of

24
statistical research done on the many different methods discussed above, and there is much to
be done in exploring the many possible benefits of user models in more depth. Combining
linguistics, programming, and tutoring has been a slow process, and finding better ways of
combining them is a large part of research that should be done. In every area covered by this
review of the literature, there is room for more research. From parsing and tutoring strategies
to error handling and user models, all of the literature suggests that not only do these area
need to be looked at more, but more implementations of these ideas need to take place. Yet,
there has been an amazing amount of progress made in the last fifteen years and it is exciting
to see what comes up in the next fifteen.

25
Chapter 3
The structure of the program

Although the focus of this project is the user model and the tutoring strategies,
the structure of the entire program is vitally important to its success. This is because it is not
only the knowledge stored in the user model, but also the interaction of the tutoring strategies
with the user model that makes the program run effectively. The tutoring strategies would be
completely useless without a program to use them in. For this reason, the program
incorporates many aspects of the current state of research in the area of language tutoring
systems. Appendix IV contains a summary of the different parts of the tutoring program.
The program requires full sentence answers from the user. This will strengthen the users
grammatical skills as well as their ability with verb endings.

The User Model
One of the most important parts of the program is the user model. The structure
of the user model was determined by a thorough review of the literature and several different
types of data were chosen to focus on. These different types of data allow for a wide range of
tutoring strategies. The user model contains the information that the program has gathered
about the user. It including which questions were attempted and how they were answered by
the user, the login name and password for the user, the amount of time the user used the
program, the words that the user used that were not in the lexicon, and all of the mal-rules
that the user used.
Included in the user model is the user profile. The user profile is the p re-

26
information gathered about the user. In this program, the user profile includes the user’s
name, password and primary language. This information is useful because it aids the program
in "remembering" the subject over several lessons.
The rest of the user model contains the bulk of the user’s information. This
information includes which questions the user attempted, answered incorrectly/correctly, or
skipped. This information also includes what m al-rules the subject used and which words
were unknown to the tutoring program that the subject entered. All of the subjects’ responses
are also saved within the user model, as well as the corresponding m al-rules that are used on
the subjects’ answers. Finally, the amount of time the subject used the program is recorded.
An example of a user model can be found in Appendix V. Each section of the user model is
now discussed in greater detail.
The user model contains which questions the user answered correctly, incorrectly
or skipped, as well as the order in which they were answered. This is recorded through a
simplistic scoring system. A skipped question is different from an unattempted question in
that the user has not seen the question for an unattemped question but will have seen the
question for a skipped question. The scoring system is presented on Table 1.
Table 1. User Response Scoring System
The user’s response

Corresponding score

Incorrect response then question skipped

-3

Incorrect response

-2

Skipped question

-1

Unattempted question

0

Partially correct response

1

Correct response

2

The scoring system is a way to keep track of the user’s responses so that it is easy

27
to use the information. This scoring system is used by the tutoring strategies to help
determine the next course of action for the program. As the responses are saved in a
particular order in the user model, it is also easy to determine the ordering of the results of
the question. For example, to see if a user answered three questions correct in a row, all the
program has to do is look for three questions in a row that have a score above zero.
The user model also contains the words that the user entered that could not be
fixed/replaced with the w ord-check. This allows for the elimination of questions that are
skipped/answered incorrectly due to a lack of knowledge of the words in the lexicon. The
w ord-check also functions as a crude spell checker for words that do exist in the lexicon. If
the subject enters an incorrectly spelled word, he/she is presented with a list of words to
choose from to replace the misspelled word.
The m al-rules that the subject has used in his/her answers appear in his/her user
model. This includes both predetermined and created m al-rules. These are stored in such a
way that which rules used, how many times they are used, and what questions they are used
for all appears in the user model. This information is used by the program in applying the
tutoring strategies (see Appendix V).
The time that the subject starts and finishes using the program are recorded in the
user model. From this, it is possible to calculate for how long the subject used the program.
The method for creating and updating the user model is quite simple. When a
subject logs in, a file is created using the login ID of the subject. This login ID was randomly
assigned to the subjects. All of the user data is stored in the program in lists. If the subject
logs out or if the program ends, the data in these lists are written to the file. In this way, the
user model is preserved after the program ends and can even be reloaded in later sessions.

28
This is quite an extensive user model and allows not only the application of
tutoring strategies, but also has the potential to be used for many more sessions. An example
of the layout of the user model can be found in Figure 1, and a more extensive example can
be found in Appendix V.
Figure 1. A example of the layout of the user model.
John
password

<— subjects’ name
<— subjects’ password

1
1

<— current question
<— current lesson
<— question number
<— question result

1
-0

m al_vbarll
1
Sat Jan 25 12:26:04 GMT-08:00 2003
Sat Jan 25 13:20:37 GMT-08:00 2003
word

1
1
m al_vbarll
UNBC every Christmas.

<— start time of last lesson
<— end time of last lesson
a word used not in the lexicon

<— question number
<— lesson number
<— m al-rules used
<— subjects’ response to question

Tutoring strategies
On its own, the user model would not be very useful. Its value is in its usefulness
in guiding the tutoring strategies. The tutoring strategies used in the program are not meant to
be extensive or exhaustive. Instead, the purpose of the tutoring strategies is to use all of the
information contained in the user model to create an effective tutoring system. Since the
information in the user model is quite diverse, several very different tutoring strategies have
been combined within the program. One of the goals of the program is to look at the

29
interaction of these tutoring strategies to see what effect they have when combined.
The tutoring strategies are derived from reviewing the current literature and by
suggestions from experts in the field of tutoring ESL. In particular, tutoring strategies that
use information contained in a user model were chosen. Instead of focusing on one tutoring
strategy, several are used to determine the effect of combining the different tutoring
strategies.
As the tutoring strategies may not always have a recommendation on which
question to ask next, it is important to have a preset structure in which to ask the questions.
The questions in the program have a preset or default ordering so that there is always a
question available to ask the user. This ordering is discussed in the lesson structure section.
The preset structure is the sole tutoring strategy used for the control group. Only the preset
structure is used for the control group to closer simulate the kind of noninteractive lesson that
a subject would receive using a textbook. However, the program gives a simple
correct/incorrect response to the subjects’ responses. The preset structure is also useful
initially for the experimental group, as when they are using the program at the beginning, the
user model does not contain much information for the tutoring strategies to use.
The other tutoring strategies are based on the user model. These are the tutoring
strategies that use the information contained in the user model to affect the question that the
subject is presented with. The process of selecting the next question to present to the user is
partially based on recommendations provided by the tutoring strategies. It is possible that
more than one question may be valid, and the question selector chooses which one to present
based on a ranking of the suggestions. Also based on data provided by the tutoring strategies,
the question selector may move onto the next lesson. A example of the tutoring strategies can

30
be seen in Figure 2. See Appendix III for a complete list of strategies. This summary includes
the ideas discussed above, as well as the feedback the tutoring strategies provide.

Figure 2. An example of the tutoring strategies.
If one wrong — > Provide feedback based on detected mistake.
If two wrong in a row — > Provide feedback based on detected mistake.
If three wrong in a row — > Provide hints for the subject at the same
time that the questions are asked
If five wrong in a row
> Ask the subject if he/she wants to start the
lesson over.
If one right — > Give positive feedback
If three right in a row — > Move to next section.
If the answer is wrong and the subject made a similar mistake before — >
Tell the subject that he/she has
made a similar mistake before and
show the previous mistake to
him/her.
Missing word detected — > Tell the subject that there may be a missing
word in their response.

Feedback
Feedback is actually part of the tutoring strategies, but as it is such a large
component, it is discussed separately from the rest of the tutoring strategies. The feedback
that the control group sees is based solely on whether the subject’s answer is correct or not. A
correct response produces the feedback "Good Job." An incorrect response produces the
feedback " Incorrect. Try again, or skip the question." This is how much feedback a textbook
could provide with a key available. However, since the program can accept several versions
of correct answers, the ability of telling the subject whether he/she is correct or not actually is
more sophisticated than a textbook.

31
The feedback for the main experimental group is much more extensive. It
includes information about whether the subjects’ answer is correct or incorrect. If the answer
is incorrect, it tells the user whether the tense that he/she used is correct or not, provides
information about if the user has made a similar mistake before, tells if the answer may have
a grammar problem (which includes having a word missing), tells if there is a word missing
in the verb phrase, and if the subject should use hints or start the lesson over. For an example,
see Figure 3.
Figure 3. Feedback example.
Computer: Starting new lesson
This is a lesson o n

Simple Fresent(l)

In general, the simple present expresses events or situations that exist always,
usually, habitually; they exist now, have existed in the past, and probably will
exist in the future.
Computer: 1 W hat does UNBC do every Christmas? (close)
User: UNBC closed
Computer: Incorrect tense, try again
User: UNBC
Computer: Incorrect, grammar may be wrong. Try again
User: UNBC closes
Computer: Correct tense
Computer: Good Job!

There are too many different possible combination of dialogue that can occur between the

32
subject and the program to provide examples for all combinations in this thesis. Appendix VI
provides an extensive sample dialog that illustrates some of the potential interactions between
the subject and the program.

Question Selection
Question selection is the process by which the program determines what question
to ask the user next. The tutoring strategies provide data to the question selector. The
question selector takes these data and use them to select the next question to present to the
user. The question selection is deterministic and uses a set ordering of the tutoring strategies
to determine the next step.
The potential next step can include doing a question over, doing a lesson over,
starting the lesson from the beginning, skipping the lesson, or selecting a question from the
current lesson. These potential next steps are expressed in Appendix III. After choosing
which question to ask the subject, the question selector prints the question to the screen and
allows the user to respond. If no tutoring strategies are activated, then the questions are
selected in their predetermined order.

Compare Against Expected
One of the challenging tasks when putting together a tutoring program is the
ability to analyze whether an answer is correct or not. In Second Language Learning tutoring
programs there are many ways to decide whether an answer is correct or not (Farghaly, 1989;
Holland et al., 1993). The simplest way is to have a predetermined correct answer that the
user’s response is compared against. The problem with this is that in Language Learning it is

33
possible to answer correctly in a way that is different from the expected result. For an
example of two correct answers, refer to Figure 4.
Figure 4. Example of two correct answers.
Question: What is Nathan going to do tonight? (sleep)
Expected Answer: Nathan is going to sleep tonight.
U ser’s Answer: Nathan is going to sleep.
As you can see, the user’s answer is just as correct as the expected answer. If you
checked if the user’s answer exactly matched the correct answer, you would find that the
user’s answer is wrong. In Figure 5, there is another example of two correct responses, this
time due to the use of a pronoun instead of a proper noun.
Figure 5. Example of two correct answers.
Question: What is Nathan going to do tonight? (sleep)
Expected Answer: Nathan is going to sleep tonight.
U ser’s Answer: He is going to sleep tonight.
In this case the user’s answer is also correct. It becomes a very difficult task to
create a language tutoring program that accepts most of the possible correct answers to a
question.
One key point to remember when checking to see if the user’s response is correct,
is that the program is tutoring a subset of English and should be able to overlook unrelated
mistakes (Holland, 1993). An example of such a mistake can be seen in Figure 6.

34
Figure 6. Example of an insignificant mistake.
Question; What is Nathan going to do tonight? (sleep)
Expected Answer: Nathan is going to sleep tonight.
U ser’s Answer: She is going to sleep tonight.
In this example Nathan is actually a male, and referring to him as a "she" is a
mistake. However, if you are trying to teach the user about verbs and verb endings, it may be
appropriate to let the he/she mistake slide. This is a common method used in many Second
Language Learning classrooms. So now the problem is to not only accepting the many
variations of possible correct answers, but also to accept answers that may have unimportant
errors in them as well.
To solve the problem of deciding what to accept as a correct answer, a m ulti-step
process is used. First, the user’s answer is run through a spell checker. The spell checker does
not only make sure that the words are spelled correctly, but also makes sure that all the words
used by the subject exist in the current lexicon. If they do not exist in the lexicon, the user is
presented with a selection of words to choose from. The second step is to run the user’s
response through a grammar checker. The grammar check results are saved and used to help
determine if the user’s response is an acceptable answer. Third, the user’s response is
checked against one possible correct answer (referred to as the expected result). This process
includes checking to see if they have similar verb phrases. It is possible to take all this
information and decide if the user’s response is acceptable. Correctness of a response based
on a combination of results are shown in Figure 7.

35
Figure 7. Analysis of the Correctness of an Answer
If the user’s response is the same as expected result
& grammar is correct — > answer is correct
If the verb phrase in the user’s response is the same as expected result
& grammar is correct — > answer is correct
If the grammar of the response is incorrect — > answer is incorrect

Extra feedback is provided to the user about the mistakes in the answer, so that he/she is
aware of his/her mistakes, even if the answer is accepted as correct.
There are a number of choices that were made about phrases that are acceptable
in every day speech that the tutoring program does not allow. The most important one of
these is that in the perfect tenses, it is required for the subject to use "already" in his/her
responses. There are several different ways of answering the questions so that "already"
would not be needed, but in the case of this program, the "already" way was required.
Figure 8. Example of a question containing "already".
Question: What has she already done? (sleep)
Answer: She has already slept.
Determining whether a subjects’ answer is correct or not without predetermining
the correct answer is very difficult. However, allowing full sentence answers is one of the
key components to a tutoring program that is trying to teach the user full-sentence grammar
structures.

Lesson Structure
It is necessary for the tutoring program to have a pre-m ade lesson plan. This

36
includes the different lessons, the questions within each lesson, and the words that are used in
all of the questions. The structure that is used for the tutoring program consists of twelve
lessons on verb endings. The lessons on verb tenses are simple present, simple past, simple
future, present progressive, past progressive, future progressive, present perfect, past perfect,
future perfect, present perfect progressive, past perfect progressive, and future perfect
progressive.
Each lesson contains ten questions. The lessons and questions all center around
the context of school. The format of the questions was obtained from Betty Schrampfer’s
book "Understanding and Using English Grammar 3rd Ed" (1999) a format which is common
in language learning textbooks today.

User Interface
The user interface is the part of the program that the user interacts with. The user
interface appears as in Figure 9. It is a very simple interface to allow the experimenter to
easily teach the subjects how to use the program. This was important as the instructions are in
English and the subjects need to understand them well enough to use all of the options
available to them.
The user interface also includes a picture that corresponds with each question,
allowing the subjects to better understand the meaning of the question and what the expected
response is. These pictures are important as they are language independent and take away
some of the ambiguity of the questions.
The user interface closely follows the guidelines that were laid out by Lonfils &
Vanparys (2001). The interface is simple to avoid confusion and misunderstanding. 1 believe

37
that using the program is so easy that even subjects with very limited English skills can
understand how to use it. The program is set up consistently and the meanings of all of the
actions and buttons are quite clear.
Figure 9. Screen shot of the program

SLT
Login Start

Admin Features
VWbrd Check

Lesson Help

Display Question

IDialog text area

Hint

jCornputer; Starting n e w le s s o n
This is a l e s s o n on — Sim ple P resent(1 )
In g e n e r a l, t h e sim ple p r e s e n t e x p r e s s e s e v e n t s or situations th a t exist
alw a y s, usually, habitually; t h e y exist no'w, h a v e e x is te d in th e p a st, and
probably will e x ist in th e future.

Computer: 1 W hat d o e s unbc do e v e r y Christmas? ( c lo s e )

Skip

IPlease enter your responses below in full sentences
' ’lTn b c c l o s e s e v e r y Christm as.

Enter

38
Figure 10. Example picture from the program

Word Check
The program contains a simple word check and its purpose is twofold. First, it
catches spelling mistakes that the user has made and provides several words from the lexicon
that are closely spelled to the incorrectly spelled word. The second purpose of the word check
is to catch the words that may be spelled correctly but are not in the lexicon. As this is a
small tutoring program and the lexicon is quite small, there is a chance that the subject may
use a word that the program does not understand. The program gives the subject the option of
using a different word. If no such word is found, the subject has the option of skipping the

39
question and the word that caused the problem is saved in the user model for later analysis.

Parsing the subjects’ response
One of the features of the tutoring program is its ability to take answers in many
forms. A key component for doing this is the lexicon/parser. The lexicon consists of a few
hundred words that were picked based on the probable answer to the questions that exist in
the tutor. The parser is a simple DCG parser that is sufficient to parse most of the expected
answers that the user may give to the presented questions. The DCG parser also includes an
extra section for dealing with mal-rules. These m al-rules are used just like the other DCG
rules except that they are labeled as m al-rules. The reason that a simplistic parser (such as a
DCG) was chosen was the constraints of time and the fact that, for the size of the program,
the DCG parser is sufficient. However, the likelihood that more complicated ones would
work is quite high.
One of the key components of the tutoring program is the ability to parse the
user’s responses. This is important as it allows the program to accept more than one correct
answer, even if the answer was not predicted ahead of time. The parsing of the subjects’
response is done using a m ulti-step process. This includes checking the words of the
response, running the response through the DCG parser, checking the response for mal-rules,
and obtaining the verb phrase and tense of the response.
To validate the words of the response, all of the words are checked to see if they
exist in the lexicon. If a word does not exist in the lexicon, then the word is either out of the
context of the program or the word is spelled wrong. The subject is presented with a list of
words that he/she can choose from to replace a word if it does not exist in the lexicon. Once

40
the word is replaced, the subjects’ response is changed and the response is passed on to the
next process.
The response is run through the DCG parser. The parser should recognize most
grammar formats associated with the context. The DCG parser returns a response concerning
whether the parse was successful or not, the feature list, and the parse tree (if the response
parses successfully).
Figure 11. Example of a few of the DCG rules and lexicon entries.
DCG Rules
sent(sent(NP,VP)) — > np(NP),vp(VP).
np(np(N_PROP)) — > n_prop(N_PROP).
vp(verb(V_BAR)) — > verb(VERB).
Lexicon entries
n_prop(pn(Nathan)) — > [Nathan].
verb(v(swims)) — > [swims].
Result
sent(np(nbar(pn(Nathan))),vp(vbar(v(swims))))
A DCG parser is a top-dow n way of parsing a sentence. It takes a sentence and
breaks it down into two subsections. For example, Figure 11 illustrates the process of
obtaining the resulting analysis. This is a subset of the more complex grammar that the
program uses. The rules and lexical entries can parse the sentence "Nathan swims". The
sentence is parsed into a noun phrase and a verb phrase. The noun phrase is then parsed into a
proper noun and the verb phrase is parsed into a verb. This creates a tree structure that
contains the parsing of the sentence "Nathan swims"

41
If the response is not parsed successfully when run through the main grammar
rules it is run through a second set of m al-rules. These are the predetermined m al-rules that
are added to the program ahead of time. This parsing also returns the parse tree if the parse
was successful.
If the response still has not heen successfully parsed, a m al-rule is created that
represents the grammar structure of the response. This new m al-rule is added to the other
m al-rules in the program. The m al-rule step of this process is quite new to the field of
tutoring systems and it is covered in more detail in the next section.
Next, the verb phrase is extracted from the subjects’ response and compared to
the expected answer. This later provides information such as if there are words missing in the
subjects’ response and whether to accept the response even if other parts of the sentence are
incorrect. The tense of the verb phrase is also obtained and compared to the expected result.
At the end of this process, all of the information that has been produced is passed
on to the rest of the tutoring program. Some of the information is saved in the user model,
and the rest is passed on to be compared against the expected results.

M al-rules
M al-rules can he a very effective tool when used within a user model. This thesis
uses m al-rules to keep track of incorrect uses of grammar rules. M al-rules can he
predetermined or created as the program runs (Brown, 2002). In the tutoring program, two
types of m al-rules exist; those that are predetermined and those that are created as the subject
uses the program (called dynamic mal-rules). There were several challenges when making a
system that uses m al-rules.

42
The first challenge was to choose which m al-rules to predetermine. As the
purpose of the tutoring program is to tutor about verb endings, most of the predetermined
m al-rules have to do with incorrect verb endings. Therefore, most of the predetermined m alrules are at the feature level of the grammar. A few featureless m al-rules are added to make
sure they also work in this context. See the implementation section of this chapter for a more
in-depth look at the predetermined m al-rules
The next challenge is how to create and use m al-rules that are created as the
program runs. There are several serious problems that can arise when creating m al-rules. It is
possible that one mistake can create more then one m al-rule. An example of multiple m alrule generation can be seen in Figure 12.
Figure 12. Example of Multiple M al-rule Generation
Question; Was the orange cup full or empty?
Answer: The orange cup was.
M al-rule 1: Sent — > Determiner, noun, noun, verb
M al-rule 2: Sent — > Determiner, adjective, noun, verb
This is just a small sample of potential m al-rules that could be created. Multiple m al-rule
generation is a very serious problem but does not directly affect this thesis. The domain of
the program and the size of the lexicon (>300 words) means that in almost all cases only one
m al-rule are created. However, if this program was expanded, multiple m al-rules generation
would have to be addressed.

Implementation
Understanding the structure of the tutoring program is important, but it is also

43
equally important to know how the program is implemented. This is not only essential in
testing of the program but also allows other researchers to duplicate the program results.
To implement this program, the programming languages Java and Prolog are
used. Java is used to create the interface, and Prolog is used to do the parsing, tutoring
strategies, and user model. The justification of using these two language is that Java is widely
accepted as a useful language for the creation of interfaces, Prolog is a useful programming
language when it comes to natural language processing, and a component called Jasper
allows the two languages to be easily integrated into one another. The results of the tutoring
program could have been achieved by using other programming languages.
The program is run on a SunBlade 100 using a unix operating system. This setup
allowed the program to run at quite a fast speed and almost no loading delays were
experienced. If the program was run on a significantly slower computer, the delays that may
occur would affect the performance of the program.
The user model saves the rales/m al-rules that the user has used, as well as
information about the success of the user’s session (as discussed in the m al-rule section
found above). The login name for the subject was a randomly assigned number to protect the
confidentiality of the subject. The password used was the word "real" with the purpose of
identifying the tests that were used in this study.
The user model is stored in the program as several lists of data. Each of these lists
correspond to a part of the user model. When the user leaves the program, these lists are
saved to a file in a pre-set order. The format of this file can be seen in Appendix V.
To implement the tutoring strategies, each had to be programmed into the system
individually. For an example, I will examine how the tutoring strategy on if the tense of the

44
response is correct, was added. All of the questions have a potential correct answer. The tense
of this answer can be obtained by parsing it. This tense is then compared to the tense obtained
from parsing the subjects response. If the tenses do not match, then the program knows that
the subject has used an incorrect tense in their response.
To create a m al-rule, first the subjects’ input has to unsuccessfully parse with the
normal grammar rules and the predetermined grammar mles. Then the grammar types of all
of the words in the subjects’ input are obtained (noun, verb, etc...). These values are then
combined to form a new grammar rule and is assigned a unique name. This new m al-rule is
then added to the list of dynamic m al-rules contained in the user model. This is a simplistic
way of creating m al-rules at the sentence level, but since the grammar involved in the
presented questions is very similar, this approach is sufficient. An example of m al-rule
creation can be seen in Figure 13.
Figure 13. M al-rule creation
Flawed Answer: Lennise opened door.
Syntactic Analysis: proper noun, verb, noun [determiner: definite].
Error: No determiner to go with the noun.
M al-rule created: m al_l(sent— > proper noun, verb, noun [determiner: indefinite]).

The predetermined m al-rules that were used all focused on verb phrases. The
main set of predetermined m al-rules were m al-rules on incorrect tense usage. An example of
one of these m al-rules can be seen in Figure 14. The predetermined rules that exist in the
program are the different combinations of incorrect tenses and m al-rules on mistakes with
transitive and intransitive verbs. All of these m al-rules were predetermined because they

45
were presumed to be the most likely errors that the subjects would make.
Figure 14. A m al-rule on incorrect tense usage.
v_bar(mal_vbar7(V_AUX,V_BAR),[tense:error,vform;X,trans:Z]) — >
v_aux(V_AUX,[tense:past,vform:X,trans:Z]),
v_bar(V_BAR,[tense:present,vform:Y,trans:Z]).

All of the parts of the program are added as discussed in the description of the
structure of the program. Just creating the program described above provides useful
information about creating a successful tutoring program. However, once the program is
created, the next step is to test the program on human subjects to determine the effectiveness
of the program.

46
Chapter 4
Field Testing
Design
When a program is developed using a new approach it is important to test its
effectiveness. There are many methods to test a program, but only a few are used to test a
given program. To test this program several methods were used that gathered both
quantitative and qualitative data. The goal of this pilot study is to promote future research
and does not focus on providing conclusive results.
The several methods that were used to gather data from the subjects were the
results of the pre-test and post-test, motivation survey, amount of time the program was
used per lesson, the user models, the experimenter field notes, and the comments made by the
subjects. A t-test was performed on the pre-test and post-test scores, motivation survey
scores, and the amount of time the program was used per lesson scores to look for a
significant difference between the experimental group and the control group. Using t-tests
will result in the loss of the interaction between the different scores, but will provide some
useful information for the pilot study. ANOVA tests were not used to keep the results as
simple as possible as a more complex analysis is not the goal of the pilot study. The goal of
the pilot study is to provide direction for future research. It is not possible to derive
conclusive results from a small pilot study. The user models, the experimenter field notes,
and the comments made by the subjects were separated into data from the control group and
data from the experimental group and analyzed for patterns within the data.

47
Subjects
The subjects in the experiment come from several different backgrounds in
relation to their primary language. This means that first language interference may be
different for people depending on their language background. A potential solution to this is to
use only subjects with the same primary language background. However, this would cause
the results to only apply to people with the same background. Instead, a random mix of
backgrounds was used and the assumption was made that this does not affect the results due
to the random nature that the subjects are assigned to the control group and the experimental
group. Thus, the effects of the confounding variable, first-language interference, are
minimized.
Subjects had to be found who were currently learning English as a second
language but still were making errors in the area of verb endings. One place in which to find
such people was students attending CNC (the College of New Caledonia) and UNBC
(University of Northern British Columbia). The subjects were recruited by presenting
requests to classes at CNC and UNBC that have a high percentage of students that would be
appropriate for the experiment. Word of mouth was also used to attract potential subjects.
The testing of the program was done using people, above age 18, who have
English as a second language. The subjects was required to be above age 18 to avoid dealing
with the extra ethics approval and safeguards of testing minors.
For the pilot study, twenty subjects were used. The subjects were required to be
available for two hours. Two hours was picked due to time and resource constraints.

Procedure

48
The human aids that used to help administer the program to the subjects were
assembled to initially test the program. This not only taught the aids how to use the program,
but helped to identify some of the bugs that existed. The aids did not have a critical role in
the actual testing, they were just used to help the experimenter when two groups of subjects
had conflicting test times. The experimenter was present for all of the experiments.
The twenty subjects were randomly divided into two ten-subject groups: one was
the control that used the program without the benefit of the user model and the other group
(the experimental group) used the program with the user model enabled. To randomly assign
the subjects, an equal number of odd and even numbers were randomly generated and then
assigned to each subject. Those with odd numbers were used as the control group, and those
with even numbers as the experimental group.
A booklet was created that contained the informed consent form, instructions,
motivation questionnaire, ability pre-test, and ability post-test. This booklet was handed to
each subject and each section was filled out at the appropriate time. The instructions were
read first followed by reading and signing the Informed Consent Form. These were stored
separately from the rest of each subjects’ data to ensure the subjects confidentiality. A copy
was made so that each subject could obtain a copy of their own consent forms. Then the pre­
test was written before they used the program and was used to determine the subjects’
English skill level before using the program. A post-test was written after using the program
to acquire their English skill level after using the program.
The subjects used the program for a period of half an hour. The subjects were
allowed to use a dictionary during the use of the program. After the thirty minutes they were
given the option of continuing to use the program or stopping. This was allowed so those

49
who were not benefiting from the program did not have to use it for an extended amount of
time. All subjects were required to stop after a two hour period.
During the time the subjects used the tutoring program, field notes were taken by
the experimenter. These subjective observations were gathered and analyzed for patterns.
Finally the subjects filled out a questionnaire to judge the effect the program had
on increasing their motivation to learn English. This was a small questionnaire (containing 10
question) using a Likert scale. It can be found in Appendix VII. The validity of this survey
has not been established and therefore can only be used to obtain an subjective measure of
the subjects’ motivational levels.
After the above was completed, the booklets were collected and their user model
was saved. The subjects were debriefed and given the opportunity to ask any questions that
they still had about the experiment.

Instrumentation
The subjects used the tutoring program as described in Chapter 3. The control
group using the program with the user model and tutoring strategies disabled. The program
was run on a SubBlade 100 in room 5-164 at UNBC.

Ethics
To test the program, I needed to obtain approval for my test from the Ethics
Board at UNBC (as it uses human subjects). This involved submitting a Informed Consent
Form package to the Ethics Board. The experiment was approved. The testing took part over
a tw o-w eek period.

50
Chapter 5
Results of Field Testing
Qualitative Data
In this thesis there are several different sources of qualitative data: the
observations made by the experimenter, the comments made by the subjects, and the analysis
of the data collected in the user models. These data were examined for patterns and relevant
information. Many of these observations are anecdotal in nature due to a paucity of
supporting evidence.
When examining the effectiveness of a program, often qualitative data is used in
complementing quantitative data. In the experiment, the experimenter observed and recorded
observations about the subjects as they used the tutoring program. This was done through the
use of field notes. These notes were recorded and then separated into notes from observing
the control group and notes from observing the experimental groups. These notes were then
examined for patterns and compared against each other. These observations are a great source
of information and must be considered when looking at the effectiveness of the program.
The subjects in both the experimental group and the control group seemed to
want to continue using the program. When given the option to stop using the program after
thirty minutes, almost all of the subjects chose to continue the lesson.
The experimental group seemed to have an advantage when it came to finding the
correct answer after entering the incorrect one. They appeared to use the feedback provided
and figured out what the correct answer was. The control group seemed to take more
attempts to find the answers to the questions that they did not know. This pattern was found
when examining the field notes from the experimenter. In the field notes it was reported that

51
the control group would often take several attempts to get an incorrect question correct where
the experimental group often got the correct answer in the second attempt. This is supported
by the data in the user models that recorded the attempts of the subjects.
The tutoring based on the m al-rules appeared to work well. There were several
events where these tutoring strategies came into effect and aided the user in finding the
correct answer. This is inferred from the field notes of the experimenter.
In the course of testing the subjects, several bugs were discovered. The
experimenter and aids tried to reduce the impact of these bugs by directing the subjects
around them when they occurred. The bugs have been taken into account when analyzing the
results. The bugs can be found in Appendix VIII.
Another main source of anecdotal data comes from comments made by the
subjects. These were broken down into the conaments provided by the experimental group
and the comments provided by the control group. These notes were recorded and then
separated into notes from the control group and notes from the experimental groups. These
notes were then examined for patterns and compared against each other.
The comments made by the experimental group were mostly centered on
expanding the program. There were comments about improving the pictures, adding more
English words in the program, and having a wider range of questions and lessons. Several
subjects commented that the program is a useful program for tutoring ESL.
Comments from the control group were very similar to the experimental group.
They also wanted improvements and expansions of the program. However, the control group
also complained that the program was boring, that it lacked examples, and that they had a
hard time understanding why they were wrong. However, the control group reported that

52
they liked the program.
The comments made by the control group compared to those made by the
experimental group shows that there was commonalties in suggestions on how to improve the
program and that the program was generally liked. However, only the control group
commented that the program was boring, lacked examples, and did not explain why the
subject was wrong. This shows that there was a difference between the control group and the
experimental group’s opinion of the program, with the control group having additional
problems with the program.
The last source of data was from the recorded user models themselves. Since the
data generated by these user models is extensive (over 150 pages) they are not reproduced in
the thesis. There are several aspects of the user models that need to be looked at, including
the results of the subjects’ answers, the results of the m al-rules, and the words that the
subjects used that were not replaced with words from the lexicon. This data was separated
into two groups, data from the experimental group and data from the control group. These
two groups of data were then analyzed for patterns and were compared against each other.
Looking at the m al-rules, 44 dynamic and 27 predetermined rules were
created/used by the twenty subjects. The m al-rules appear to have been successfully
implemented after an examination of the user models. Both predetermined m al-rules and
m al-rules created during program use were used. Having said this, there was not a large use
of m al-rules. The most used m al-rules were the predetermined ones concerning verb
agreement. This was expected since the program was testing verb endings. The most often
generated m al-rule involved the lack of a determiner in the subjects’ response (their response
did not include a needed determiner). This could be an indication of first-language

53
interference as a missing determiner is a common mistake of Asian people learning English
(55% of the subjects were of Asian decent). With these findings, the possibility of doing
more studies that look only at m al-rules may prove to he an effective tool when it comes to
creating effective tutoring systems and for researching first-language interference.
Lastly, the lack of coverage of certain English words did not seem to he a
problem. Less than one word per subject was recorded in the user model. However, there
were a few comments made indicating that the English word coverage was too restrictive and
this could he a potential future improvement to the program.

Quantitative Data
There is a lack of quantitative data used to evaluate systems in the field of ITS.
Therefore, a pilot study was done on this program. This pilot study was part of the field
testing.
Several tests were performed to analyze the quantitative data obtained from the
experiment. A one-tailed t-test was used to compare between the experimental group and the
control group in the following t-tests. Three variables were the focus of the t-tests: the
motivation of the subjects, a comparison between the pre-test and post-test scores, and
examining the ratio of time of program use compared to how far the subjects progressed
through the program. To keep the results as simple as possible, ANOVA tests were not used
as a more complex analysis is not the goal of the pilot study. The pilot studies goal is to
promote future research, not to make conclusive results. All of the tests used an Alpha level
of 0.05. To adjust for the multiple t-tests, a more stringent alpha could have been used. This
is a possible change for future research.

54
From the subjects, some personal information was gathered regarding each of
them. This data allows for a good picture of the population that the subjects came from. This
information is listed in Table 2. There were 20 subjects, 10 male and 10 female; 6 speakers
of Chinese/Mandarin; 3 Japanese speakers; 2 Spanish speakers; 3 Korean speakers; and 6
other various language speakers.
Table 2. Subject Background Information
Subject
number

First
Language

Age

Sex

100

Chinese

29

Female

101

Japanese

26

Female

102

Chinese

33

Male

103

Swahili

19

Male

104

Japanese

21

Female

105

Chinese

40

Female

106

Farsi

30

Male

107

Czech

20

Male

108

Amharic

32

Male

109

Korean

22

Female

110

Chinese

29

Male

111

Spanish

35

Female

112

Spanish

37

Male

113

Scouvak

31

Male

114

Chinese

26

Female

115

Mandarin

21

Male

116

Korean

49

Female

117

Korean

30

Female

118

Japanese

27

Female

119

Polish

23

Male

55
After the personal information was gathered, there was a small, written pre-test
to gauge each subjects’ level of English ability. This questionnaire resulted in scores out of
10. These scores allowed the experimenter to get an idea of the subjects’ initial ability, and it
was compared to a similar questionnaire taken after using the tutoring program. The results
of this questionnaire can be found on Table 3.
It was possible for the subject to be "quite skilled" at English grammar. Being
"quite skilled" in this context refers to a subject that made no mistakes on the pre-test. These
subjects were not removed from the experiment as the pre-test was found to be too easy in
relation to the program. If someone was so skilled that no mistakes were made in using the
tutoring program, the data would not be included, as it would be impossible to determine if
the program would help someone who made no mistakes. All of the subjects in the
experiment made mistakes while using the program. The program does eventually ask
challenging questions that were more difficult than any of the questions on the pre-test.
After the subjects used the tutoring program, they were given a post-test similar
to the pre-test. The pre-test and post-test are identical in syntactic structure but used
different verbs. Different verbs were used to prevent subjects from obtaining the answers to
the post-test by memorizing cases within the program that the subjects recognized from the
pre-test. The post-test, which is also scored out of 10, gave the experimenter an idea of the
subjects’ English skill level after using the tutoring program. The results of this test can also
be found on Table 3. The subjects with a even subject number were in the experimental
group and the ones with an odd subject number were in the control group. With the pre-test
and post-test tests, a score can be obtained based on the results of the above two tests. This
score was the number correct on the post-test minus the number correct on the pre-test. The

56
gain score estimates how much each subject improved their English skills when dealing with
verb endings. These estimates can also be found on Table 3.
Table 3. P re-test and P ost-test Data
Subject
number

P re-test

Post-test

100

10

10

0

101

8

10

2

102

4

9

5

103

9

8

-1

104

10

10

0

105

10

10

0

106

10

10

0

107

10

10

0

108

6

6

0

109

10

10

0

110

7

10

3

111

9

7

-2

112

8

10

2

113

3

4

1

114

4

3

-1

115

10

10

0

116

1

1

0

117

3

3

0

118

8

10

2

119

9

9

0

Différé

A test was performed to look at the differences between the control group and the
experimental group concerning the difference between their scores on the pre-test and post­
test. The pre-test experimental group mean was 6.8 (SD = 3.05), and the pre-test control
group mean was 8.1 (SD = 2.77). The post-test experimental group mean was 7.9 (SD =

57
3.38), and the pst-test control group mean was 8.1 (SD = 2.64). The experimental group
mean was 1.1 (SD = 1.85), and the control group mean was 0.0 (SD = 1.05). There was no
significant difference between the mean score for the experimental group and the control
group (r(18) = 1.63, P>0.05). Therefore, the program did not appear to significantly increase
the English ability of the experimental group more than the control group.
The time that the subject used the program was recorded. The amount of time
used was compared to the lesson that the subject reached. This allows for a significance test
to be done comparing the experimental group and the control group. In Table 4, a summary
of the data can be seen.
Table 4. Time/Lesson Data and Ratio
Subject
number

Length of time
used (minutes)

Lesson Reached

Time/Lesson
Ratio

100

57

12

4.75

101

63

12

5.25

102

46

9

5.11

103

47

3

15.67

104

44

12

3.67

105

58

3

19.33

106

63

12

5.25

107

71

10

7.1

108

40

8

5

109

35

4

8.75

110

57

12

4.75

111

47

3

15.67

112

45

7

5

113

36

1

36

114

31

3

11.33

115

32

5

6.4

58
Subject
number

Length of time
used (minutes)

Lesson Reached

Time/Lesson
Ratio

116

36

3

12

117

53

1

53

118

54

12

4.5

119

31

6

5.17

A test was performed to look at the amount of time each of the two groups spent
using the program compared to how far they proceeded in the program . The experimental
group mean was 6.14 (SD = 2.95), the control group mean was 17.2 (SD = 15.7). There was
a significant difference between the mean score for the experimental group and the control
group (f(18) = 2.2, P<0.05). Therefore, the program significantly increased the time/distance
ratio (average time per lesson) of the experimental group more than the control group.
After using the tutoring program, there was a questionnaire to ascertain the
amount of increase in motivation the program had on the subjects. This was done using the
questionnaire found in Appendix VII. This questionnaire has not been evaluated for validity
and therefore the conclusions that can be drawn from it are very limited. The values of this
questionnaire were based on a Likert scale. A score out of 50 was obtained, with a higher
score indicating a higher level of motivation. These scores are summarized on Table 5, and
are later used in determining the difference in motivational levels of the two groups.
Table 5. The Level of Motivation Score
Subject

Score

100

38

101

30

102

36

103

44

104

40

59
Subject

Score

105

44

106

41

107

37

108

41

109

37

110

30

111

28

112

28

113

37

114

23

115

33

116

39

117

34

118

36

119

42

A test was performed to look at the questionnaire data used to determine
motivation levels of the two groups. The experimental group mean was 35.2 (SD = 6.16),
and the control group mean was 36.6 (SD = 5.54). There was no significant difference
between the mean score for the experimental group and the control group (t(18) = .53,
P>0.05). Therefore, the program did not appear to significantly motivate the experimental
group more than the control group.

60
Chapter 6
Discussion
The effectiveness of the combination of the different tutoring strategies and the
complex user model proved to be inconclusive. However, several noteworthy results were
obtained.
The subjects were tested for a period of up to two hours. The time that the
subjects used the program for may not be long enough to produce statistically significant
results. To deal with this possibility, trends in the user models of the subjects were taken into
consideration when determining the success of the program. Even non-significant effects
may hint at the possibility of significant results when using a longer testing period.
Since the subjects are both introduced to the tutoring program and are required to
use it within a two hour period, there is some frustration caused by using a program for the
first time. This may decrease the motivation score of all participants but should not
significantly affect the difference in motivation scores between the experimental group and
the control group.
There is the possibility that the wide range in skill levels of the subjects might
affect how well the program runs with each individual. Some may require a large amount of
tutoring, while others are only be tutored on the occasional question. Therefore, the effect of
the tutoring on these different subjects may vary. This is not seen as a large problem when it
comes to the statistical results because the groups are randomly distributed. However, it is
something that must be kept in mind when considering what population the results are
relevant for.
Examining the comments made by the experimenter and the subjects shows that

61
the feedback provided to the experimental group was useful. The control group commented
that there was a lack of feedback, while the experimental group did not. This information is
anecdotal and future research to help confirm these results is necessary.
One of the disadvantages of testing the combined effects of the different applied
tutoring strategies is that it is not possible to determine the individual effects of each of the
tutoring strategies. This is not seen as a large problem, as each of the tutoring strategies have
been shown to be effective in other studies (Bailin & Philip, 1988; Nagata, 1995). The only
tutoring strategy that has not been extensively studied is m al-rules. Therefore, it is important
for follow-up studies to look at the effectiveness of m al-rules without the interference of the
other tutoring strategies.
Though the individual benefits of the m al-rules cannot be confirmed with this
study, the evidence points at the potential benefits of using m al-rules in tutoring systems.
The program successfully used m al-rules by recognizing when a subject used one, creating
one if no m al-rules existed previously, and using the existence of both types of m al-rules to
aid in the directing of the tutoring strategies.
The dynamic m al-rules that were used were based on the syntactic meaning of
the words (such as a noun). To create dynamic m al-rules that used grammatical phrases
(such as a determiner and a noun) a bottom -up parser could be used. This is an area of
potential future research and more analysis of the m al-rules may provide useful future
research as well.
The bugs in the program most likely did not affect the results of the experiment.
The bugs that the subjects pointed out were insignificant, such as one could not press "enter"
on the keyboard instead of clicking the "enter" button on the program. Overall the program

62
was generally liked by the subjects and, therefore, it can be concluded that the bugs probably
did not interfere with program use.
Comparing the results to the background literature in this field, you can see that
many of the ideas expressed in the literature were shown to be correct in this thesis. Ideas
such as m al-rules (Brown, 2002), which have not been heavily tested were successfully
implemented.
The time lesson ratio test found a significant difference between the experimental
group and the control group. Subjects in the experimental group were able to progress
through the program much faster than the control group. Therefore, this is an important
finding in validating the success of the tutoring program.
The pre-test and post-test analysis did not find a significant difference in English
ability between the two groups. After looking at the data, a few reasons seem to have played
a important role in this final score. First and foremost, the pre-test and post-test were too
easy for the average skill level of the subjects. This caused the pre-test scores to be very high
in many cases, and therefore, the difference between the two was very small. The second
reason for the problems with the test was that one session is probably not an adequate amount
of time to see significant difference in the English skill level of the subjects. This was
unavoidable with the time and budget constraints of this experiment, but is an area where
future research could provide a more significant result.
There are several, more general reasons why the t-te st did not provide conclusive
result. First, the sample size was small (only 20 subjects). This was not a major problem due
to the fact that the pilot study was used to encourage and promote statistical studies and
future research. Second, there was a mix of Asian and European language backgrounds in the

63
study. This allows the results to be more generalizable, but adds more confounding variables
to the experiment. Third, the fact that the subjects had a wide range in English skill level may
have affected the results. Fourth, the effect of gender was not accounted for in the
experiment. Fifth, the pre-test and post-test were too easy in relation to the program. Sixth,
the validity of the motivation survey needed to be established. Due to the existence of these
problems, the results of this pilot study are inconclusive but open up many questions for
future research.

Conclusion
The overall results of the experiment are inconclusive. So, were the goals
reached? What does the success of this research mean? What contributions does this research
make to the field of ITS?
The goals that were stated at the beginning of this paper were successfully
implemented but resulted in inconclusive results. The tutoring strategies, combined with the
user model, were implemented successfully. M al-rules were successfully implemented and
used. Finally, a statistical difference was found between the experimental group and the
control group when looking at the difference of speed progression through the program. Why
this difference existed is uncertain, but may be established in future studies.
What do these results mean? These results show us that it may be possible to
successfully combine several tutoring strategies and use a complex user model to create an
effective tutoring system. When looking at the meaning of an experiment such as this one, it
is important to examine the meaning of not only what worked, but also what did not work.
The biggest revelation that can be obtained is that it would be very difficult to tutor

64
successfully with such a diverse population. The only way such a test could be conducted is if
the population was greatly restricted, so that the question could be made at an appropriate
level for the subjects. The problem with this is that the results would only be applicable to the
restricted population. Since almost any tutoring program would be exposed to a diverse
population of users, the appropriateness of such a restricted test is questionable.
It is important to remember that the success of the program currently only applies
to tutoring verb endings on a population with similar attributes to the one tested. Fortunately,
there was quite a wide range of subjects that were tested. The ages of the subjects were also
quite diverse. However, to conclusively say that the ideas and techniques used in this
program would work anywhere is premature. Instead, the success of this program instills in
us the belief that more research with these ideas and techniques is warranted and could lead
to more conclusive evidence.
The apparent success of implementing the m al-rules opens new avenues of
potential research, as the m al-rules were successfully implemented in a tutoring system. It
will be exciting to see more research in this area of Natural Language Processing.
It is important to discuss what contributions this research makes to the field of
ITS. The program was successfully implemented combining several tutoring strategies and a
complex user model to make a tutoring system. M al-rules were successfully used in this
study. This helps open a whole new field in ITS and encourages more studies using m alrules. This research includes a pilot study using t-tests to help determine the effectiveness of
the program. These t-test show that it is difficult to extract meaningful results from an
experiment such as this one. Most importantly, many questions have been raised by the
results of this research and these questions can be used as platforms to which more studies

65
can derive. These contributions will aid future research in the area of ITS.
Future Work
Some of the ideas used in this paper have been tested extensively, while others
have just been touched on. Therefore, the future work that is expressed here is presented in
three categories: improvements to the program, expanding the pilot study, and potential
research resulting from questions raised from this research.
The program worked in the end, hut was by no means a finished product. There
are many aspects of the program that could he improved and/or expanded on. Some of these
potential areas of improvement are discussed below.
This program focused on a small part of English grammar. Seeing the program
work with other parts of English grammar would increase the validity of the results obtained
from this program. It would also allow the results to be applicable to a larger range of
English grammar, instead of just verb endings.
It would be possible to test a large number of different types of tutoring strategies
while using this program. The large amount of data collected in the user model could he used
in many different ways, especially the m al-rules, which could he used to examine everything
from first-language interference (Wang & Garigliano, 1993) to catching common mistakes
made by all second language learners.
If the population of the subjects was known ahead of time, it would he possible
for the instructions/ethics forms to he in the native tongue of the subject. This would allow
subject with less English skill to participate in the experiment.
It would he beneficial to see the effect of subjects using the tutoring program
over a longer period of time. The session length, however, probably should not be longer.

66
keeping it around the same length of time as a normal tutoring session. Instead, the effects of
multiple sessions may provide interesting results. For this to be possible, the size of the
program would have to increase, and include more questions as well as a larger context.
The number of the subjects for the experiment was relatively small (twenty
subjects). It would be beneficial in validating the effectiveness of the program by using sixty
subjects to test the program.
It may provide interesting results if the program were used on a population that
had the same first language. This would reduce the applicability of the program over a
general population, but may provide greater insight into the effectiveness of the program due
to the removal of several confounding variables.
Using only one gender for the subjects may eliminate the confound that results by
having both males and females in the study. Ideally two experiments would be performed,
one with each gender type.
The dynamic m al-rules were based on the syntactic meaning of the words. It
would be possible to create dynamic m al-rules that were made up of grammatical phrases by
using a bottom -up parser. This would be an important step in the evolution of the mal-rules.
It may be possible to use the m al-rules formed by a user to detect first language
interference. Due to the fact that the program did successfully record the m al-rules that the
user made, and the fact that certain m al-rules can be linked to first language interference
(Wang & Garigliano, 1993), it should be possible to combine the two. This would allow a
tutoring program to tutor in such a way to recognize first language interference and use that
information to better tutor the subject.
One of the changes that I would like to implement if there were time to do this

67
experiment over is to compare the program to a version that does not tell the subject if their
answer is correct. W hen the control group was told whether their answers were correct, the
version they were using may have been superior to that of a textbook. Therefore, checking to
see if this did have a significant effect would be productive future research.
It would be interesting to see the effect of using a program like this on children.
As children learn differently than adults, the program might work differently on them.
Overall, this thesis has provided several answers, but has created many more
questions. These questions provide a good starting point for future research in the area of
Intelligent Tutoring Systems and User Models. This field is still very new in the area of
Computer Science and more research need to be done.

68
References
Aiello, Luigia and Alessandro Micarelli. (1993). Computer Assisted Language Learning: A
Grammar Detector and Corrector. Proceedings of the Seventh International PEG
conference.
Bailin, Alan & Thomson, Philip. (1988). The use o f Natural Language Processing in
Com puter-Assisted Language Instruction. Computers and the Humanities. V22.
Bonvalot, Catherine. (1999). Student Modeling through Dialogue in Second Language
Learning Systems. A I-E D 99. 7-8.
Bowerman, Chris. (1992) Writing and the Computer: The Nature o f the Problem and an
Intelligent Tutoring Systems S o lu tio n .V lS il-S ).
Brehony, Tom & Ryan, Kevin. (1994). Francophone Stylistic Grammar Checking (FSGC)
Using Linked Grammars. Computer Assisted Language Learning. V7(3).
Brown, Charles. (2002). Inferring and Maintaining the Learner Model. CALL Special
Edition A I-E D 2001, V15, No. 4, October 2002.
Bull, Susan & Pain, Helen. (1995). "Did I say what I think I said and do you agree with
me?": Inspecting and Questioning a Student M odel. Artificial Intelligence in
Education.
Bull, Susan. (1994). Student Modeling fo r second language acquisition. Computers and
Education. V23.
Bull, Susan et al. (1993). Collaboration and Reflection in the Construction o f a Student
Model fo r Intelligent Computer Assisted Language Learning. Proceedings of the
Seventh International PEG conference.
Chanier, Thierry et al. (1992). Modeling Lexical Phrases Acquisition in L2*. Second
Language Acquisition Research.
Chanier, Thierry et al. (1995 ). Alexia: a computer based environment fo r French foreign
language lexical learning. Second Language Acquisition Research.
Chen, Liang., Tokuda, Naoyuki., & Xiao, Dahai. (2001). POST P arser-B ased
Learners M odel fo r Tem plate-Based ILTS fo r Japanese-English Composition.
A I-E D 2001. pp 24-31.
Chomsky, N. (1957) Syntactic structures. The Hague, Mouton & co.
Civil, Anna et al. (1992). CATACROC: Com puter-Assisted Learning o f Catalan. The
CALICO Journal V10(2).

69

Farghaly, Ali. (1989). A Model fo r Intelligent Computer Assisted Language Instruction.
Computers and the Humanities. V23.
Fogarty, James., Dabbish, Laura., Steck, David., & Mostow, Jack. (2001). Mining a
Database o f Reading Mistakes: For what should an Automated Reading Tutor
Listen. Artificial Intelligence in Education. lOS Press, pp 422-433.
Ghemri, Lila. (1991). A Framework fo r Diagnosis and Remedial Feedback. SICS research
report R 91:18. Swedish Institute of Computer Science.
Hagan, Kirk. (1994). Unification-Based Parsing Application fo r Intelligent Foreign
Language Tutoring Systems. The CALICO Journal V12(2,3).
Heift, T. (1998). "Designed Intelligence: A Language Teac/zer M ode/''.Unpublished Ph.D.
Dissertation. Simon Fraser University.
Holland, Melissa., Maisano, Richard., Alderks, Cathie., & Martin, Jeffery. (1993). Parsers in
Tutors: What Are They Good For. The CALICO Journal. V l l , No 1. pp 28-46.
Issac, Fabrice & Fouquere, Christophe. (1995) A Bottom -U p Tag Parser: Application To
Foreign Language Lexical Learning. L.I.P.N.- Institut Galilee.
Kai, Kyoko & Nakamura, Jun-ichi. (1995). An Intelligent Tutoring System fo r Japanese
Interpersonal Expressions. 7th World Conference on Artificial Intelligence in
Education (AI-ED 95), pp. 194-201.
Kalayar, Myat., Ikematsu, Hidenori., Hirashima, Tsukasa., & Takeeuchi, Akira. (2001).
Intelligent Tutoring System fo r Search Algorithms. Kyushu Institute of
Technology, Department of Artificial Intelligence.
Kinshuk, Reinhard Oppermann, Rossen Rashev and Helmut Simm. (1998). Interactive
Simulation Based Tutoring System with Intelligent Assistance fo r Medical
Education. Proceedings of ED -M ED IA / ED -TELEC O M 98 (Eds. T. Ottmann
& I. Tomek), AACE, VA, pp715-720
Labrie, Gilles & Singh, L.P.S. (1991). Parsing, Error Diagnostics and Instruction in a
Erench Tutor. Calico Journal.
Lemaire, Benoit. (1999). Tutoring Systems Based on Latent Semantic A nalysis. Artificial
Intelligence in Education. lOS Press. 527-534.
Levison, Michael & Lessard, Gregory. (1992). A System fo r Natural Language Sentence
Generation. Computers and the Humanities V26.
Lonfils, Colin., & Vanparys, Johan. (2001), How to Design U ser-Friendly CALL

70
Interfaces. Computer Assisted Language Learning. Vol. 14, No. 5, pp 405-417.
Maciejewski, Anthony & Leung, Nelson. (1992). The Nihongo Tutorial System An
Intelligent Tutoring System fo r Technical Japanese Language Instruction. Calico
Journal.
Morales, Rafael., Pain, Helen., & Conlon, Tom. (2001). Effects o f Inspecting Learner Models
on Learners Abilities. Artificial Intelligence in Education. lOS Press,
pp 434-445.
Morihiro, Koichiro et al. (1992). Toward a Model o f Tutor’s Decision Making.
A I-T R -9 2 -1 3 .
Morihiro, Koichiro et al. (1992). Towards A Model o f Tutor’s Decision Making. Artificial
Intelligence Research Group.
Nagata, Noriko. (1995). An Effective Application o f Natural Language Processing in Second
Language Instruction. CALICO. V13(I).
Naoyuki Tokuda., & Liang Chen. (2001). An Online Tutoring System fo r Language
Translation. IEEE July-Septem ber 2001. pp 46-55.
Norman, David & Spohrer, James. (1996). Lerner-C entered Education. Payman. Tuesday,
April 16.
Pavia, A et al. (1995) Externalizing Learner Models. Artificial Intelligence in Education.
Schrampfer, Betty. (1999). Understanding and Using English Grammar 3rd Ed. Prentice Hall
Regents. Upper Saddle River, New Jersey.
Schwind, Camilla. (1990). An Intelligent Language Tutoring System. International Journal on
M an-M achine Studies, V33, 557-579.
Schwind, Camilla. (1995). Error Analysis and Explanation in Knowledge Based Language
Tutoring. Computer Assisted Language Learning, V8, No. 4, pp 295-324.
Sentence, Sue & Pain, Helen. (1995). A generative learner model in the domain o f second
language learning. Artificial Intelligence in Education.
Wang, Yang & Garigliano, Roberto. (1993). Negative Transfer and Intelligent Tutoring.
Proceedings of the Seventh International PEG conference.
W iemer-Hastings, Peter & W iemer-Hastings, Katja. (1999). Improving an intelligent
tutor’scomprehension o f student with Latent Semantic Analysis. IQS Press,
pp 535-542.

71
Appendix I
Tutoring Strategies
Tutoring Strategies
• Presentation of deep knowledge
• Explanation of a correct answer
• Presentation of incorrectness of answer
• Presentation of verification process
• Suggestion of trace of solution process
• Presentation of portion where bug exists
• Presentation of an example conflicting with the student’s
• Presentation of some examples of common factors of interest
• Presentation of some examples of different factors of interest
• Presentation of a similar process
• Explanation at deep level
• Suggestion of verification operation
• Presentation of correct answer
• Explanation of comparison results of incorrect answer
• Suggestion of bug existence
• Explanation of incorrectness of knowledge
• Presentation of attributes of examples
• Presentation of intermediate solution
• Suggestion of intermediate goals
(Morihiro 1992)
MetaCognitive Strategies
• organizational planning of strategies, self monitoring, self evaluation
Cognitive Strategies
• resourcing, note taking, grouping, summarization, deduction/induction, substitution,
translation, transfer, inferencing
Social Strategies
• cooperation, question for clarification
(BuE,1994)

72
Appendix II

This is a example of the language coverage in the literature. This is not a complete list of
what had been covered in the field, but gives an idea of the kinds of topics that are covered
Japanese
Japanese interpersonal expressions (Kai & Nakamuri, 1995)
Japanese passive sentences (Nagata, 1995)
Japanese technical writing (Maciejewski & Leung, 1992)
Chinese
basic Chinese (100 Chinese grammar rules) (Wang & Garigliano, 1993)
French
noun phrase
verb phrase
verb endings (basic)
person, number of verb doesn’t match subject
"e" missing between "g" and "o"
"g" is followed by "eons"
"ne" is missing
"pas" is missing
"ne" is followed by a vowel
"n” ' is not followed by a vowel
(Labrie & Singh, 1991)
conjunctions
reflexive binding
displaced, missing and superfluous constituents
(Hagen, 1994)
25000 pre stored errors centered on negative transfer (Brehony & Ryan, 1994)
English
subject-verb discrepancies
a-an articles with plural nouns
exchange between the articles a and an
use of articles with positive adjectives, or of the adjectives m uch-m any with
singular or plural substantives
use of indefinite pronouns in affirmative, negative, and interrogative sentences
defective verbs + infinitive
use of verbs want, wish, use, and like with to + infinitive verb
(Aiello, Sanctis & Micarelli, 1993)
English restricted into the areas of Work Employment and Unemployment (Issac
& Fouquere, 1995?)
articles in English (Sentence & Pain, 1995)
Catalan
basic Catalan in restricted domains (Civil et al, 1992)
German

73
syntactic errors when user answers questions about spatial location (Holland,
1993)
grammar using 25 syntactic and 60 semantic features (Schwind, 1995)
European Portuguese
pronoun placement (12 rules)

74
Appendix III Tutoring Strategies/Feedback
These are the tutoring strategies that the program applies in the decision process
on what question to present next, as well as what feedback to provide to the subject.
Tutoring Strategies based on questions incorrect include:
If one wrong — > Provide feedback based on detected mistake.
If two wrong in a row — > Provide feedback based on detected mistake.
If three wrong in a row — > Provide hints for the subject at the same time that the
question is asked.
If five wrong in a row
> Ask the subject if he/she want to start the lesson over.
Detected mistakes include:
Incorrect parse — > Inform the user that the structure of his/her response may be incorrect.
(may include the creation of m al-rules)
Incorrect tense — > Inform the user that the tense of his/her response is incorrect.
Missing word detected — > Tells the subject that he/she may have a missing word in their
response.
Incorrect answer — > Inform the user that he/she have not answered the question provided.
(occurs when the user does not use the verb provided)
If the answer is wrong and the subject made a similar mistake before — >
Tell the subject that he/she has
made a similar mistake before and
show the previous mistake to
them.
Tutoring Strategies based on questions correct include:
If one right — > Give positive feedback
If three right in a row — > Move to next section.
If the value of the responses in the lesson is six or above — > Move to the next session
If the verb phrase is correct but the sentence is wrong — > Accept the response as correct but
inform the user that only the verb
phrase is correct.
Other strategies include:
If the response is to skip the question — > Skip the question.
If the response requires "already" but does not contain it — > Tell the subject that "already" is
needed in their response.

The hints that the tutoring can provide include:
The first time the hint button is pressed for a given question — >

75
Give the subject tense of the
verb expected.
The second time the hint button is pressed for a given question — >
Show a similar example.

76
Appendix IV
The next few pages are a summary of the structure of the tutoring system.

E A SL Tutoring Program Structure
Start
Prcçram

Question

Selectian
Profile

Tutoring
Strategies
User
Feedlsack
User
M odel

¥ --------

Update
User M odel

User
Response

1
Parse U ser
Response

Compare
Against Expected

The Tutor

Word
Check

77

Profile
Login user
—Load prafile if ans exists fbi partienlar nssr
If no profile pre sent
—Get Profile data from user
-ex. Name of user

Question
Selection
Selecting next question baæd on suggestions by tutoring
strategies.
The questions will be setup based on the context selected
(only 1 context available in this program)
The questions will be broken down into sub categories to
allow the tutor to focus on particular areas one at a time,
-exançle: past, present, and future tense

78

User Model
Profile data
Mai Rules
—the mal rules uæd by the user
-the mal rules created from the uæx's re spouses
Used Rules
-the rules that the user has used
Questhm Results (right vs wrung) (orderiug o f correct & incorrect re span ses)
Incorrect ’’Murds
-contains all the words tiiat the spell check flagged as not in the
lexicon and that were not replaced with ones that are.
The answer s provided by the user.

Tutoring Strategies
Learning Structure
-b ased on the rules uæd, mal rules used, the profile, and the question
results, a recommendatian for next question to ask is produced

Motivation
-worries about the mobvahon of the user
-uses the results of which questions conechAncoirect
to influence what questions should be asked next

79

Word Check
Checks lie words in the users input by checking if each word is in
the lexicon
—if it is notin the lexicon preænt the user 'witii Ihe list of
words starting with the same letter from the lexicon.
—if correct word is still not found save word in urer model
and count word as a missing w-ord
—based on the context the user is working in and not
all of the En^ish la n g u ^ (the lexicon contains only a
few hundred words)

Parser
Lexicon
■a lexicon containing the words in h e current contex t .
Parser
-D C G parser (may change to chart parser if nece ssaty )
- Parse rules
—Mai rules
-there will be pre existing mal—rule s
-some mal-rules mey need to be created for the individual
user and saved in the user model

80

Update
User Model
Update lie user ncdeliu these area
- what type of rules were used to parse fte uær's rehouse
- whether the user got the question right, wrong, or dripped
- add^reate mal rules for the incorrect user response
- change context (subcantext) if requested by the tutoring strategies
- add any unknown words that the user used
- add hie user's re span ses to uæ for hints later

Compare answer
Against Expected
Result
Compare the user's response against expected results
-can h ave fuU^artial matches
Provides info to the tutoring strategies based on the matches
Provides whether the user is concidered to have gotten the question
correct,6ncorrect

81

User Feedback
Provide the uær feedback oa his lespoase based on
- if he/she got the question light/wiong.
- modify lesponæ stobe ncne personable based on user profile
If tutoring strategies requests
—prvide feeback on correct;4ncanect structure used
—provide a hint for the question and a request to try it
%ain.
—provide correct answer
—ask if the uær wants to start the current lesson over

User Response
Get the user's réponse to the question.

82

Appendix V
This is an example of a user model created by the tutoring program, (edited for size)
John
password

<— subjects’ name
<— subjects’ password

4
1
0

<— current question
<— current lesson

0
1
2
1
3
1
4

<— question number
<— question result

8
9

10
mal_vbarlO

1
mal_ 1(. (n_pro,.(verb,.(verb,[]))))
1
mal_2(.(n_pro,.(verb,.(adv,.(verb,[])))))

1

<— m al-rule used
<— number of times m al-rule used

mal_3(.(n_pro,.(adv,.(verb,.(verb,[])))))
1
m al_vbarl 1

1
Sat Jan 25 12:26:04 GMT-08:00 2003
Sat Jan 25 13:20:37 GMT-08:00 2003

<— start time of last lesson
<— end time of last lesson

something

<— a word used not in the lexicon

83

1
1
m al_vbarll
UNBC every Christmas.
2
1
correct_ruIe
Lennise studies at UNBC.
3

1
correct_rule
Lennise eats every lunch hour.

<— question number
<— lesson number
<— m a-rules used
<— subjects’ response to question

84

Appendix VI
This is an example of some of the feedback that the tutoring program provides for the user.
Computer: 8 W hat does Lennise do every morning? (drink)
User: Lennise drinks.
Computer: Correct tense.
Computer: You are missing a word (or two) in your response.
Computer: Incorrect answer, try again.
User: Lennise drank a cup of coffee.
Computer: Incorrect tense, try again.
Computer: Giving hint.
Computer: The tense of the current lesson i s -----Simple Present(I)
Computer: Giving second hint.
Computer: An example question.
Computer: Question: W hat does it do in Alaska? (snow)
Computer: Answer: It snows.
Computer: Skipping question.
Computer: A possible answer was — > Lennise drinks a cup of coffee.
...(skipping ahead for sake of demonstration)
Computer: 6 what will Lennise do tomorrow? (drink)
User: Lennise will drink cup of coffee.
Computer: Correct tense.
Computer: You are missing a word (or two) in your response.
Computer: Incorrect answer, try again.
Computer: Skipping question.
Computer: A possible answer was — > Lennise will drink a cup of coffee.
...(skipping ahead for sake of demonstration)
Computer: Starting new lesson.
This is a lesson o n
Present Progressive(4)
It is now 11:00. Tom went to sleep at 10:00 tonight, and he is still asleep. His sleep began in
the past, is in progress at the present time, and probably will continue.
Computer: NEW QUESTION
Computer: I Wat is Lennise doing right now? (study)
Computer: lesson 4 -6
Computer: Form: be + ing (present participle)

85
Computer: Meaning: the progressive tenses give the idea that an action is in progress
Computer: during a particular time the tenses say that an action begins before, is in
Computer: progress during, and continues after another time or action.

86
Appendix VII
Questionnaire

Strongly Agree
Agree

Neutral Disagree Strongly
Disagree

1) I enjoyed using the program.

5

4

3

2

I

2) I found the program boring.

5

4

3

2

I

3)1 feel that using a program like this would be 5
more useful than a textbook.
4) I found the program easy to use.

5

5) I think the questions were too hard for me.

5

6) Using a textbook would help me
learn English better than the program
7) I would use a program like this to help
me learn English.
8) I felt the questions were too easy for me.

9) I felt frustrated by the questions in
the program.

5

4

3

2

1

10) I feel that the program motivated
me to learn.

5

4

3

2

1

Feel free to add any comments you have below:
The following does not appear on the subjects’ copy
Reversed questions are 2,5,6,8,9 . Meaning their Likert values are reversed when used to
obtain the final motivation score.
l-> 5 , 2-> 4, 3->3, 4-> 2, 5 -> l

87
Appendix VIII
These are the bugs that were found in the tutoring program after subject testing had begun.
•
•
•
•
•

Question 2.8 was removed.
Question 3.3 was removed.
Question 9.2 was removed.
An intransitive verb followed by a time sometimes caused an error in the feedback.
If the tutoring strategies recommend to move to the next lesson on lesson 12 (the last
lesson) the program does not end, instead it just fails to ask a new question
• Picture 36’s label should read "the plant", not "plant".
• The keyboard button "enter" could not be used by the subject after they entered a response
to tell the computer to check their answer; the "enter" button on the program had to be
used instead

88

Appendix IX (definitions & abbreviations)
CALL = Computer Assisted Language Learning
CNC = College of New Calidonia (located in Prince Geroge, British Columbia)
DCG = definite clause grammar
ESL = English as a Second Language
ICALL = Intelligent Computer Assisted Language Learning
ITS = Intelligent Tutoring System
NP = noun phrase
TAG = tree adjoining grammar
UNBC = University of Northern British Columbia (located in Prince Geroge, British
Columbia)
Proficiency post-test; This is a short test used for determining the approximate English skill
level when dealing with verb ending. This test is taken after using the tutoring
program.
Proficiency pre-test: This is a short test used for determining the approximate English skill
level when dealing with verb ending. This test is taken before using the tutoring
program.
Bug (program bug): An error in a computer program that affects the running of the
program. Usually results from an unpredicted error in the program.
Intelligent tutoring system: An intelligent tutoring system is a program that tutors based
partly upon an analysis of the data known about the user, as well as using tutoring
strategies to determine how the user should progress.
M al-rule: An incorrect grammar rule that the user uses (such as using a noun phrase without
a verb phrase e.g. "the dog" )
Natural Language Processing: Computer programs that model the human ability to analyze
a sentence and determine grammaticality on the basis of linguistic rules
(Farghaly, 1989).
Negative Transfer (First Language Interference): Negative transfer is when knowledge
about a subjects’ first language negatively influence his/her answers when dealing
with a second language.
Parser: A tool whose purposes include: testing the adequacy of grammars, translating source
text in a machine translation system, and analyzing input strings (Farghaly,
1989).