A DIVERSE USER MODEL IN THE CONTEXT OF AN INTELLIGENT TUTORING SYSTEM by Nathan Keim B.Sc., University of Northern British Columbia, 2000 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in MATHEMATICAL, COMPUTER, AND PHYSICAL SCIENCES (COMPUTER SCIENCE) © Nathan Keim, 2003 THE UNTVERSITY OF NORTHERN BRITISH COLUMBIA March 2003 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without permission of the author. 1*1 National Library of Canada Bibliothèque nationale du Canada Acquisitions and Bibliographic Services Acquisitions et services bibliographiques 395 Wellington street Ottawa ON K1A0N4 Canada 395, rue Wellington Ottawa ON K1A0N4 Canada Yw rfüe Votre Our&e Noire rétërence The author has granted a non­ exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies o f this thesis in microform, paper or electronic formats. L’auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur p«q)ier ou sur format électronique. The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author’s permission. L’auteur conserve la propriété du droit d’auteur qui protège cette thèse. N i la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation. 0-612-80654-5 CanadS APPROVAL Name; Nathan Keim Degree: Master of Science Thesis Title: A DIVERSE USER MODEL IN THE CONTEXT OF AN INTELLIGENT TUTORING SYSTEM Examining Committee: Chair: Dr. Robert W. Tait Dean of Graduate Studies UNBC Supervisor: Dr. Cljdrles Brown Associate Professor, Mathematics & Computer Science Program UNBC Committee Member: Dr. Han Li Associate Professor, Psychology Program UNBC Committee Mepab®K~-Dr. Liang Chen Associate^Pfofessor, Mathematics & Computer Science Program UNBC Ext Associa Date Approved: xaminer: Dr. Judith Lapadat Professor, Education Program 11 Abstract The purpose of this thesis is to implement a variety of tutoring strategies based on a complex user model and test the resulting program in a pilot study. The tutoring strategies and user model are created using several ideas in the current literature, including the use of mal-rules. The creation of a tutoring program that will use the tutoring strategies and user model to tutor subjects who are currently learning English as their second language is used to test the program. The program is tested using a pilot study with the control group having the tutoring strategies and user model disabled. Both quantitative and qualitative data is used to determine the effectiveness of the program. Overall the results of the program are inconclusive but raises many questions for future research. Therefore, this study shows that it is possible to implement several different tutoring strategies and a complex user model to create a tutoring system and provides a starting point for similar research in the area of Intelligent Tutoring Systems. Ill TABLE OF CONTENTS Abstract ii Table of Contents iii List of Tables V List of Figures vi Acknowledgment vii Chapter One Overview Introduction Goals/Achievements Assumptions/Challenges 1 1 4 7 Chapter Two Background and Previous Work 9 Chapter Three Program Structure The User Model Tutoring Strategies Feedback Question Selection Comparing Against Expected Lesson Structure User Interface Word Check Parsing M al-rules Implementation 25 25 28 30 32 32 35 36 38 39 41 42 Chapter Four Field Testing Design Subjects Procedure Instrumentation Ethics 46 46 47 47 49 49 Chapter Five Results of Field Testing Qualitative Data Quantitative Data 50 50 53 IV Chapter Six Conclusions and Future Outlook Discussion Conclusion Future Outlook 60 60 63 65 References 68 Appendix I Appendix II Appendix III Appendix IV Appendix V Appendix VI Appendix VI Appendix VIII Appendix IX 71 72 74 76 82 84 86 87 88 List of Tables Table 1. User Response Scoring System Table 2. Subject Background Information Table 3. Pre-test and Post-test Data Table 4. Time/Lesson Data and Ratio Table 5. Motivation Score 26 54 56 57 58 VI List of Figures Figure 1. A example of the layout of the user model. Figure 2. An example of the tutoring strategies. Figure 3. Feedback example. Figure 4. Example of two correct answers. Figure 5. Example of two correct answers. Figure 6. Example of an insignificant mistake. Figure 7. Analysis of the Correctness of an Answer Figure 8. Example of a question containing "already". Figure 9. Screen shot of the program Figure 10. Example picture from the program Figure 11. Example of DCG rules and lexicon entries. Figure 12. Example of Multiple M al-rule Generation Figure 13. M al-rule creation Figure 14. A m al-rule on incorrect tense usage. 28 30 31 33 33 34 35 35 37 38 40 42 44 45 vu Acknowledgments I would like to acknowledge: My supervisor: Charles Brown My committee members: Han Li Liang Chan External Reviewer All of the people who helped me along the way: Corrine Omand Keely Hunter Lennise Mann Marta Tejero Patryk Simon Rozalynd Curry Yukari Yamamoto A big thank you to all of the people who agreed to be subjects. Finally I would like to acknowledge UNBC for giving me this opportunity. Chapter 1 Overview Introduction In the area of Natural Language Processing (NLP) and E n g lish -as-a-S eco n d language (ESL) tutoring systems, there has been a significant amount of work done in the past 20 years. Many people have focused on improving parsing in both the syntactic (Labrie & Singh, 1991) and semantic (Lemaire, 1999) areas of language. Others have looked at different tutoring strategies and user models (Bull, & Pain, 1995). Although the work in each of these areas is far from complete, it is time to start to look at some of the interactions of the different user models and tutoring strategies when they are combined. One of the key components of many tutoring systems is the user model (or learner model). The user model is the stored information that the program has learned about the user. This could be as simple as the user’s name or as complex as an analysis of the abilities of the user. Several tutoring programs have incorporated user models to increase the ability of the tutoring program (Bull and Pain, 1995). There are many different methods for using user models to drive the tutoring process of these programs (Bull, & Pain, 1995; Maciejewski & Leung, 1992). However there is a lack of testing on the effects of combining these methods. 1 created an Intelligent Tutoring System (ITS) for teaching ESL that adapts to the user’s responses to increase its ability to effectively tutor. In particular, 1 centered on the user model and tutoring strategies that allows the program to tailor the tutoring to the individual user. The purpose of the program is not to create a fully functioning English tutoring program, but to create a working prototype that could be used to show the potential of the user model. The goal is to show the potential of using several tutoring strategies (which is described in detail in Chapter 3) based on an extensive user model, and bow their combined effect significantly increases the effectiveness of the tutoring program. This increase is based on comparing the program to a textbook-like setup (that is, computerized sequenced exercises that lack the "intelligent" component of the program). It is not the goal of this program to try to replace a person teaching English to second language students. I am going to try and show that a detailed user model, combined with several tutoring strategies, allows for a significant increase in the efficiency of bow well the user leams from the tutoring system. Efficiency, in this case, includes how well and how fast they learn the material when compared to how well and bow fast they learn the material using a textbook-like program (for example, my tutoring program with the user model and tutoring strategies disabled). Also, such a program may be used to increase the motivation of the user, which in turn increases the effectiveness of the tutoring system (Kalayar et al., 2001; Kinsbuk et al., 1998). It bas already been shown that these strategies separately increase the success of the tutoring program (Bailin & Philip, 1988 ; Bull, 1994; Moribiro, 1992), but little work has been done on the effect of combining some of these strategies, and the effect that these strategies have on motivation, as well as testing if the results are statistically significant. It is important to note that the tutoring program created for this thesis is not a complete tutoring program. It does not teach people the meaning of words and it does not describe lessons in detail. This tutoring program focuses on the user-system interaction that occurs during the independent practice stage of learning, following a lesson taught by a second language teacher. The program is designed to try to be interactive with the learner (the experimental subject) and provide dynamic feedback as a teacher would. Dynamic feedback provides a more useful lesson than someone just completing exercises in a textbook. Therefore, this program would be most appropriately used as a tutoring aid, and must be analyzed in this context. The tutoring program tutors a small subsection of English for people learning ESL. The area of focus is grammatical errors involving verb endings. To restrict the size of the grammar the program would have to include, the context of university is used (the words commonly encountered outside of the classroom at university). This context is appropriate because a large portion of the potential test population, which are people learning English as a second language, is familiar with this context. Using a subset of English and the context of university, the tutoring program aids the ESL subjects as they leam English. In this thesis, there are several definitions that are important to the understanding of what is being discussed. Some of these terms are described when first used but there are several key definitions that are important throughout this paper: A user model is a representation of the knowledge the program contains about the user. This can include everything from the user’s name to the answers the user has entered in response to the tutor’s questions. Tutoring, as I use it here, includes the selection of the questions to present to the user, the analysis of the user’s response, and the feedback the user receives about their answer. Tutoring strategies are the various forms of analysis of the user model along with the resulting recommendation to the tutoring program on what to do next. An intelligent tutoring system (ITS) is a program that tutors based partly upon an analysis of the data known about the user, as well as by using tutoring strategies to determine how the user should progress. An E nglish-as-a-Second-L anguage subject (ESL subject) is someone whose first language, the one that they learned first as a child, is not English and who is now studying English. English, though it is called a second language, can be learned after several other languages have been learned. Thus, the term "second language" is misleading. A m a l-ru le is an incorrect grammar rule. Language coverage is a subset of all of the words and grammar rules that are from a given language. These definitions can be found in Appendix IX. Appendix IX also contains definitions of other common terms in the ITS/ESL field. Goals/Achievements The main goal of this thesis is to look at the effect of using a complex user model and several tutoring strategies to increase the effectiveness of an ITS. The user model and tutoring strategies were developed from a combination of work that has already been done in this field, as well as some new findings in the ITS field. These user model and tutoring strategies are applied within a tutoring program to test their effectiveness. The assumption is that the tutoring program significantly increases the subjects’ ability to leam the material compared to when the program is tutoring without using the user model and the tutoring strategies. To accomplish the above, this research has several goals. It is important to look at each of these goals and explore how successfully each one is met. The success of all of the goals leads to the success of the overlying goal: to create a better tutoring system. One of the main goals of the tutoring program is to use a user model to aid the tutoring strategies of the program. The ITS gathers information about the user before the tutoring session (called the user profile) and during the tutoring session (called the user model). This user model is quite detailed and records a great deal of information about the user. Several different types of information are recorded to observe their effects on the tutoring strategies. This information increases the effectiveness of the tutoring system. Another goal is to create tutoring strategies that take advantage of a complex user model. There needs to be research that incorporates several different tutoring strategies and examines what the effects are of combining these strategies. Most of these strategies have been evaluated separately prior to this program (as discussed in Chapter 2), but are not often tested working together. The goal is to show that combining these tutoring strategies is possible and productive. The program also incorporates m al-rules into the user model and the tutoring strategies. Some of the m al-rules are predetermined and some are formed while the subjects use the program. During the tutoring session, the m al-rules that are used by the subject are saved in the user model. As m al-rules have not been widely used except in a few studies (Brown, 2002; Sentence & Pain, 1995), their effects on the success of the program is relatively unknown. Even though their individual effect cannot be determined through this research, it is interesting to see the effect of combining the m al-rules with the other parts of the user model and tutoring strategies is. The user model combined with the tutoring strategies should also increase the subjects’ motivation level while using the program compared to when the user model and the tutoring strategies are inactive. Increasing the subjects’ motivational level is useful because it is likely to contribute to the subject’s persistence while using the program, as well as enhancing the learning outcomes. The key to motivational learning is having interactive lessons which engage the user and make them want to complete the task. The program focuses on tutoring about a small subset of English. To ensure that the subjects are familiar with most of the words used during the tutoring sessions, the program concentrates on the context of university (words encountered at university out of a classroom setting). This makes the program easier to build and gives the subjects a greater chance of understanding the questions. The program is built based on the design in Chapter 3. The main goal of this structure is to provide a framework for the tutoring strategies and user model, so the program is able to be tested without the framework itself affecting the final results. This framework also shows whether the user model and the tutoring strategies can be applied to a tutoring program. As they use both established and new concepts, this is important to achieve. As discussed in Chapter 2, one of the main areas lacking research in the area of ITS is the area of testing the effectiveness of tutoring programs through the use of statistics. Therefore, one of the main goals of this thesis is examine whether there is a statistically significant difference between using the user model and tutoring strategies in combination, and using the program without the user model and tutoring strategies. This is tested by performing a quantitative analysis to determine the success of the program. There is also a qualitative component to this research that examines the effectiveness of the program. The success of the program is determined by how much the user improves their English ability while using the program, as well as on how fast they are able to learn the information. To successfully test the tutoring program, the subjects’ increase in grammatical performance needs to be measured. This is done through a comparison of scores between a pre-test (before program use) and a post-test (after program u s e ) . The subjects’ performance is also gauged by keeping track of the amount of time that is taken to complete a section in the program. This is important as the benefit of the program may not be more knowledge retained, but that the subject is able to obtain the same level of knowledge faster. 7 T -tests will be used to compare the experimental group against the control group to examine the difference between the pre-test, post-test levels, motivation scores, and how far the subjects progress in the program within a given amount of time. Assumptions/Challenges Some assumptions were made in order to constrain the size of the tutoring program. These assumptions and a discussion of their possible impact follows. The first major assumption made is that the subjects are at a competence level at which they can understand general instructions in English, but have a lack of proficiency in the area being tutored. However, this lack of proficiency may result in certain of the instructions not being understood. This was dealt with as well as possible both while creating the program and during the course of the testing. If necessary, the instructions were explained in several different ways so that the subject could understand them. Conversely, the subjects must not be too skilled at English to effectively test the program. Subjects being too skilled, however, is not seen as a problem as the program should recognize that the subjects are skillful and tutor them appropriately (in the case, rushing them through the program). This may cause the time of program use to be a very important factor when looking at the effectiveness of the program. The next assumption is that the program is able to ignore/work around small errors in the subjects’ response that are not related to the area being tutored. This is important, as this is a key strategy when tutoring in second languages (Holland et ah, 1993). Initially, this does not seem like a large problem, until you consider the number of possible correct/incorrect answers in natural language that the user may respond with. Due to the very 8 large size of such a set (since there can be an infinite amount of errors created within a language), it would be nearly impossible for a program of this size to be able to handle all of the potential "ignorable" mistakes that may arise. However, the program does have the capability of dealing with most of these "ignorable" mistakes and the assumption is made that the ones that are missed do not affect the final outcome of the success of the program. This thesis assumes that the user model increases, and does not decrease the tutoring ability of the program. Assuming this is true is important when performing the statistics on the data to see if the program performed significantly better when the user model was in place. Therefore, the assumption is that the increase in tutoring ability should be directly related to whether the user model is present or not. This is a standard assumption in statistical analysis of studies like this one. There is a chance that there are words used in the tutoring program that the subjects do not know. This is addressed in a few ways. First, the addition of pictures that relate to the questions allow the subjects to make an educated guess as to what a word means. Second, the subjects are allowed to use aids and dictionaries to find the meanings of words. This is allowed, as a full tutoring program would have the capability to teach words to the subjects. Since this tutoring system does not have this functionality, another method for obtaining meanings for the words involved is used. As can be seen by the above, this was quite a complicated project to put together. However, even with all of the assumptions and challenges above, the results of this experiment show some of the great potential that exists in the different user models and tutoring strategies that are available today. This research provides many new questions for future study. Chapter 2 Background and Previous W ork ITS for Tutoring Second Languages Introduction To effectively create a tutoring program, one must first examine the field of Intelligent Tutoring Systems (ITS) for tutoring second languages. This field is quite new and has only been researched for the past 15 years. ITS uses a combination of linguistics, programming, and tutoring to create a good tutor that helps facilitate learning in the subject. There is agreement in the literature that ITS for second language learning yields better learning outcomes than the average textbook, but it is unclear what methods should be used to build such a system. This combination creates a paradigm that draws researchers to solve the problem of making an effective ITS for second language learning. This review summarizes the research and development literature on: the target audience of ITS for second language learning, language coverage of existing tutoring systems, parsing and robust parsing, tutoring strategies, error handling, user models, and future outlooks of the field. Some of these areas contain extensive research, where as others are still quite new. Several methods are not covered, that look at the background of ITS, but most of the mainstream methods are covered below. The goal of an ITS for second language learning is to teach the user a second language. This is a very broad goal and most systems today restrict the scope of tutoring. This could involve a system that focuses on job-related conversations, or on being able to get directions. The goal of the system must be kept in mind when looking at all other aspects of 10 the system. Some general goals of language tutoring systems are: understanding text, mastering the grammar rules of the language, producing texts, and conversing within the language (Schwind, 1990). Target Audience The target audience of Intelligent Tutoring Systems (ITS) consists of people interested or involved in learning a second language. This audience can potentially be learning a variety of different languages. Some example of languages included in intelligent tutors are English (Sentence & Pain, 1995), French (Hagen, 1994), Chinese (Wang & Roberto, 1993), European Portuguese (Bull, 1994), German (Schwind, 1995), and Japanese (Nagata, 1995). This is not a complete list, but gives a good sample of how large an audience these intelligent tutors cover. Since people benefit most from using a tutoring program during the first few years while learning a language, most of these tutoring programs target this group of people. Another factor that influences the target audience is the final goal of the system. Is it the goal of the system to make a tutoring program for engineers trying to read technical literature in a different language (Maciejewski & Leung, 1992), or for learning Japanese Interpersonal Expressions (Kai & Nakamuri, 1995). These end goals directly influence not only the target audience, but the whole makeup of the system as well. Language Coverage The language coverage of language tutoring systems is as different as the systems themselves. Depending on the goal of the system, the language coverage can be small (as in 11 Labrie and Singh (1991) where a few pages can contain the entire coverage), to very large (as in Farghali (1989) which includes the entire Webster’s seventh Collegiate Dictionary). Most often the coverage is based on one aspect in the language. Examples of this are articles in English (Sentence and Pain, 1995) and clitic pronoun placement (Bull, 1994). The coverage of languages such as Englsh and Chinese seem to be more extensive than languages like European-Fortuguese. I think this is due to the number of people that speak the respective language and the resources that are available for making tutoring systems in those languages. The coverage of the languages that have been looked at more extensively cover most of the aspects of the language. English, for example, has been covered quite well syntactically by the literature. Semantic coverage is relatively small for all languages covered in the literature. A more detailed list of language coverage can be found in Appendix II. Parsing/Robust Parsing It is not necessary to have a natural language processor (NLP) for creating a language tutoring system. The most common language processors are ones that use pre­ stored answers to compare to the users’ input. An example of a processor using pre-stored answers is CATACROC (Civil & Estella, 1992). These systems can be successful but often lack flexibility. One of the main advantages of tutoring systems with N LP’s is their flexibility (Farghaly, 1989). Having NLP in a language tutoring system has many advantages. These advantages include but are not limited to: being able to create a more interactive tutor, being easier to simulate real life situations, the program becoming more like a native speaker and 12 can relax requirements (e.g., Spelling), exercises that are more communicative and creative, being easier to provide more/better feedback, reinforceing good solutions and not trivial fixes, only needing to design lexicon and grammar once (Farghaly, 1989), and supporting open-ended writing activities (Hagan, 1994). Natural language processors do have some disadvantages. They are usually very expensive, they occasionally do not work, and they are usually not very successful dealing with semantics (Holland, 1993). Parsing is the process of separating language into parts for easier understanding, and its use in tutoring systems can be very diverse. There are many different parsers that can be used successfully. Both bottom -up and top-dow n parsers are common. The most common type of parser is the definite clause grammar (DCG), due to the fact it is one of the easist to implement. Is sac and Fouquere (1995) made a system called AlexiA that uses a bottom -up Tree Adjoining Grammar (TAG) parser. The choice of parser depends on the language the parser is written in and the purpose of the tutor. The most common computer language used is PROLOG; however, tutors can be written in other languages such as C (Issac & Fouquere, TAG parser, 1995). Other tutors, having the main purpose of catching all of the user’s errors, may be better off using chart parser, versus a parser focusing on tutoring strategies that would not need more than a DCG. Following is a list of parsing mechanisms commonly used in language tutoring systems with some advantages and disadvantages for each. A Definite Clause Grammar ’s (usually includes empty categories and top-dow n) advantages are that it is one of the easiest parsers to code, the features can easily be added to, and it is easy to convert into a visual parse tree. Its disadvantages are that it is hard to catch failed parses, it is hard to always obtain the best parse, especially as the first parse, and can be fairly slow. A Chart parser 13 advantages are that it can catch failed parses, and is useful in catching user errors. Its disadvantages are that it can use a lot of memory, and it can be quite slow. A TAG parser’s (Issac & Fouquere, 1995) advantages are that it can use both morphological and syntax rules, it is possible to use features, and it uses an associative network. Its disadvantages are that its worst case complexity of 0(n®), may not be easy to implement, and a large language coverage would be difficult. Following is a list of parsing approaches commonly used in language tutoring systems with some advantages and disadvantages for each. Morphological parser’s advantages are that it can parse the meaning of a sentence. Its disadvantages are thatit is not easy to cover all of the cases, and it can be hard to code. Syntactic parser’s advantages are that it is relatively easy to make, and can cover a large portion or all of a language. Its disadvantages are that it will get syntactically correct sentences that do not make sense. For example "Colorless green ideas sleep furiously" (Chomsky, 1954). Link parser’s (Brehony & Ryan, 1994) advantage is that it catches both syntactic and semantic ideas. Its disadvantages are that it is sensitive to punctuation enors, only the first linkage is used, and sentences that contain stylistic errors still parse. Unification-based parser’s (e.g. Lexical Functional Grammar) advantages are that it handles features well, and unique unification method makes it easier to catch and keep errors that the user makes. Its disadvantage is that it has the same problems as DCGs. Robust parsing is a parsing technique that is able to handle errors in the subjects’ input. These errors could include anything from a spelling mistake to a missing article. It is important to be able to parse errors that a subject makes. This makes it easier to use the previous errors that the subject has made to help tutor them to avoid similar mistakes in the 14 future. One of the most common ways to allow errors to parse is through the use of m al-rules (Brown, 2002). M al-rules are either predetermined or inserted when the parser finds a new error (by reading user input). This concept is still not very apparent in the literature that is currently present in the field. There are a few programs that do use m al-rules. (For example, Sentence & Pain; 1995), but there is a disparity between the importance of m al-rules and how often they appear to be used in language tutoring systems. Even with advances today, creating a parser for a language that has a precision of 90% or better is still a challenge (Chen et ah, 2001). The naturally drawn conclusion is that a parser that includes such things as m al-rules do not perform significantly better then 90% as it is dependent on the success of the parser. One way to deal with this problem is to restrict the domain of the parsing (Chen et ah, 2001). These increases the precision of the parser as it can be tailored more easily. Therefore, it is appropriate for a tutoring system to use restricted domains to help ensure and increase precision when parsing the subjects’ responses. Intelligent language tutoring systems that allow free-form input are still rare (Tokuda, 2001). Researchers are currently working on several different methods including using m al-rules (Brown, 2002) and types of tem plate-based matching (Chen, 2001). One of the benefits of using m al-rules is that the parser can deal with both grammatical and ungrammatical input (Heift, 1998). Future research in this area may prove to be quite useful in allowing free-form input. How to deal with user errors is currently being studied from many different angles. Research shows an increasing number of ways to deal with incorrect user responses. Some examples of dealing with user errors include introducing a template structure into the tutoring system (Chen et al, 2001), and updating m al-rules based on the u ser’s input (Brown, 15 2002). Tutoring Strategies Tutoring strategies are a key component of any tutoring system. Appendix II shows a list of tutoring strategies currently in use in the field. The tutoring strategies used depend on the goal of the ITS. For example, the goal of Issac and Fouquere's (1995) Alexia project was to teach the user the lexical information. With that goal in mind, the user reads through a scenario, asks questions ahout it, and writes a summary of the scenario. This process may be an effective method for learning lexical information, but may not be as useful in developing correct syntax. Most tutoring strategies tend to revolve around asking a question, getting an answer, analyzing the answer, and giving feedback. In Kai & Nakamura’s (1995) system the researchers use this type of tutoring strategy. They give the user a question and then provide feedback (which is discussed in further sections). One of the most common tutoring strategies that applies to tutoring second languages is negative transfer (or first language interference). Negative transfer happens when the existing knowledge about a subject’s first language interferes with learning a new language. Negative transfer is a major cause of error when tutoring a second language (Wang & Garigliano, 1993). Knowing this allows tutoring system developers to tailor their programs to be receptive to errors caused by negative transfer. For example, Wang & Garigliano (1993) developed a program that contains 100 Chinese grammar rules and corresponding English rules, allowing the system to easily catch negative transfer and tutor the user properly in his/her error. 16 Another important tutoring strategy is the ability to ignore insignificant errors and tutor only on the important ones. It is very frustrating for a user to receive ten errors when only one is important to the initial learning of a language. The most common insignificant errors are spelling mistakes. A spelling mistake can generate many errors, from "incorrect word" to "missing noun." It is important to be able to pick out the important errors and look past the trivial ones. The amount of time the subject is allowed to use the program does not seem to have been addressed in the literature. Some research does point out they have no time limit on their exercises, like in Bailin and Thomson’s (1988) VERB CON and PARSER. Tutoring strategies include the tasks that are given to the user. These tasks are what motivate the user to interact with the system and stimulate learning. Some of these possible tasks are sentence construction, translation, pronominalization, transformation of sentences (e.g., past to present form), composition of sentences, and text understanding and conversation (Schwind, 1990). A key component of any tutoring strategy is the feedback that is given to the user. Typical feedback includes identifying whether the user’s answer is correct or incorrect, providing hints if the user’s answer is incorrect, pointing out errors, making suggestions, and explaning answers. It is important for feedback to be clear, informative and complete. Telling a subject simply that he is wrong is not very helpful. Instead, telling the subject why he/she is wrong and how to correct the response is more useful. W hen to give hints, answers, and more opportunities to get the right response vary from system to system. No set method exists for determining how many hints, answers, or extra chances work best for students learning, other than trial and error. The important thing is to have good feedback that promotes learning. 17 Error detecting/handling Error detection and handling are a large part of a language tutor; however, error detection causes some very difficult decisions and problems. Common types of errors include spelling errors, syntactic errors (e.g. Missing NP), semantic errors (e.g., "The apple is over their"), contextual errors (user does not answer the question asked), and constraint violations (conflicting features). Some of these types of errors are easier to identify and correct than others. Spelling errors can be found by comparing the input to the lexicon. Syntactic errors are found by identifying ill-form ed sentences or errors in the unification of some of the features. An ill-form ed sentence is one that cannot be formed by using the production rules in the grammar (Schwind, 1995). A more extensive list of what syntactic structures have or have not been covered is difficult to formulate since most of the literature is not very specific. The literature that is specific about its syntax rules, for example Labrie and Singh’s M iniprof (1991) is very limited. Semantic errors are hard to find and are usually handled either in very limited contexts or through the use of features. Contextual errors are usually only found by comparing the user’s input against pre-stored answers. Most of the programs in the literature (with the exception of those that stay within a very restricted domain) do not catch all of the errors that a user makes. They are designed to focus on one type of error (which usually corresponds to the goal of the system). Some exampes are programs that cover tenses, articles, or interpersonal expressions. M ost of these systems easily identify these common errors. A gap in the literature exists in programs that try to catch a wide range of errors, which seems to be possible only if you are able to predict all of the errors that the user is going to make in advance. In the area of language, 18 predetermining user errors is a daunting task and one that requires extensive future research. So what is the difference between robust parsing and error detection/handling? With robust parsing, the parser takes the user’s input and parses it even if it is incorrect. The flaw in robust parsing is that it may not be obtainable what the error was. This means that robust parsing can interpret the input despite existing errors, but it may not be known what those errors are. Error detection/handling is the process of "catching" whatever error the user has made so that it can be used to aid the tutoring strategies (which may have nothing to do with parsing). It is possible that the error detection/handling process could be said to include robust parsing, especially if, for example, the error is detected by finding out what m al-rules were used. It is important to remember this difference, as most programs have error detection, but not robust parsing. When presented with a user’s response, there is the possibility that there are several errors that the program will misidentify as to why the user made the mistakes he/she did. One way to deal with this problem is through the use of confidence factors (Brown, 2002). If each possible reason the user made the error can be assigned a confidence factor based on information gathered from the user and other internal sources, the tutoring system can use the confidence factor to determine the likely sources of the errors and the proper responses to those errors. Low level syntactic errors involve a missing or extra word, most commonly an article or a preposition. High level syntactic errors involve incorrect groups of words, usually resulting from the use of an incorrect grammar rule (m al-rule) (Schwind, 1995). Learners make both types of errors. An effective way to deal with high level syntactic errors is through the use of m al-rules. These m al-rules must be anticipated in order to be able to provide 19 useful feedback to the user (Schwind, 1995). A concern of using anticipated m al-rules created from the input of a user is that the first time they use a new, unexpected m al-rule, very little feedback is available. Error detection and handling continues to be one of the focal point in researching language tutoring systems. A problem that presents itself when trying to detect a user’s errors is that most ways of detecting errors must predict ahead of time what errors will be made. This becomes a problem since people who have different language backgrounds make different errors and static predictions may not suffice (Heift, 1998). Therefore, a tutoring program that may work for people whose first language is English may not work for people who have another first language. To fix these language interference problems, some researchers have started to look at using user models. User Model A user model is stored information about the characteristics of a user of a program. User models also include the ability to use the stored knowledge about the user to improve the performance of a program. The simplest user model is a program that asks for the user’s name and then uses it to make the program appear more personable. More complicated user models can include such things as the user’s linguistic background, linguistic skill levels, and problem solving methods (Brown, 2002; Bull, 1994). User models are becoming a key component to any tutoring system and have many potential benefits. Not all user models are alike. They are different enough to have different advantages and disadvantages. Some of the benefits are listed below (though these benefits do not necessarily apply to all user models). Benefits of a user model include the ability to 20 examine the user model, allows for self assessment, promotes reflection, interactive diagnoses, students assessment by the teacher, and teacher training (Pavia, 1995). There are two types of user models: static and dynamic. Static user models are preset at the beginning of running the program (like asking the user’s name). Dynamic user models change as the user uses the program (Sentence & Pain, 1995), and tend to store m alrules about the user to aid the tutoring strategies. Most user models referred to in the literature have one main purpose, to judge the level of understanding of the user. By judging skill level, the tutor is able to give questions to the user that are appropriate to his/her skill level. This level of understanding is stored in several different ways. Bull (1994) used a marker on a continuum based in the acquisition order of clitic pronouns, with the range going from novice to expert. Another example is Bull & Pain’s (1995) user model, in which both the user and the computer have confidence scores. The computer scores the user based on his/her performance on their last five attempts. At the same time, the user picks his/her own confidence level. The user model compares these two values and initiates a dialogue between the computer and the user if they are too far apart. Then the computer uses these scores to determine the order in which the exercises are presented. Whichever method is chosen, the result is the same. If you know the level of understanding of the user, you are better able to tutor him/her. Some user models have the added feature of allowing the user to directly look at what is stored within it, and may even permit the user to change what is in the user model or challenge what the program has put there. But do users actually challenge user models? Bull and Pain (1995) found that users do challenge the user model. This is very important as it shows that an interactive user model is a possibility. An interactive user model promotes 21 reflection in the user which is likely to promote learning. More user models that change as the learner uses them are being developed. A new technique of using m al-rules in a user model is to have them update dynamically as the learner uses the program (Brown, 2002). This allows the program to not have to predict ahead of time what the users errors might be, but instead allows the program to form m alrules that are appropriate to the user. This allows for more personalized feedback, and partially fixes the problem of different language backgrounds as discussed in Heift (1998). Significant progress in this area has been seen in the last few years. However, the potential of user modeling may not yet have been reached. User Interface When developing a tutoring system for teaching a second language it is important to consider the interface of such a program. The main problem is that the program may not be in a language that is completely understood by the user. Lonfils & Vanparys (2001) have developed some good rules to follow when setting up such an interface (these are ways to design the icons, but these rules can be expanded to include anything the user interacts with); keeping it simple, discriminating (do not have two things that look the same), giving preference to native objects (as they would be more familiar to the user), not being too subtle (keep associations obvious), keeping actions unique, being consistent, being compatible with the user’s knowledge about the real world, and assigning clear meanings. This list may seem like common sense, but it is important to formally follow such a list or the usability of the program is decreased (Lonfils & Vanparys, 2001). 22 O ther C onsiderations There are some other considerations that have to be taken into account when building a second language tutoring system. Since you are writing the program for users who know a different language, it is very possible that their computer is also different and compatibility may be an issue (Levison & Lessard, 1992). Negative transfer is an important issue in the area of ITS. Wang & Garigliano (1993) built a system sensitive to this concept and found that a significant number of errors that users make are because of negative transfer. Therefore, any system that is tutoring a second language should consider taking into account negative transfer. Research is currently being done on the effect of allowing users access to their own user models. Preliminary results suggest that this may be a promising area to look at in future years (Morales et al, 2001). Predicting user errors using m al-rules also currently is being studied. Fogarty et al. (2001) tried to predict reading mistakes that children make. Through the use of a database of over 70,000 oral reading mistakes, they were able to significantly increase the ability of the tutoring program to detect errors. Future Outlook There have been many significant advances in ITS dealing with tutoring second languages in the past 15 years. However, from a scientific standpoint this is not a very long time. There is more work that can be done in this field. It may be possible to map the learner cognitive model (Bonvalot, 1999) and, with this, be able to learn why a user is making a mistake so that more appropriate tutoring strategies and feedback (Ghemri, 1991) may be 23 used. More studies on the effects of instructional variables on second language learning are needed (Holland, 1993). Latent Semantic Analysis (LSA) is a corpus-based statistical mechanism used in some new tutors (Lemaire, 1999) and can improve the interaction between students and computer tutors (W iemer-Hastings & W iemer-Hastings, 1999). The lack of knowledge about linguistics, teaching, and learning in the field has held back the potential of some successful Intelligent Computer Assisted Language Learning (ICAL) ; for example, the LICE system created by Bowerman (1992) relied more on introspection due to this lack of knowledge. There needs to be more implementations of ITS dealing with tutoring second languages. A lot of the current literature deals with what "should" or "could" be good ways of making such systems. There are actually very few complete working systems in the field, and those that do work tend to have used either very restricted domains or static methods (like pre-stored answers) to allow their system to be practical and useable. As can be seen by the list above, not only is there a significant amount of possible future research, but the future research can occur in many different areas. This means that it is important for researchers to both focus on trying to improve each area as well as working on better methods of integrating the different areas of ITS together. Conclusion In conclusion there has been much progress in the area of ITS dealing with tutoring second languages. Many approaches have been tried to create a good tutoring program. Although some good tutoring programs have been made, there is no dominant program or method in the field. This is mainly due to the fact that since the field is so new, there are still many things that have not been tried. There is a particular lack in the area of 24 statistical research done on the many different methods discussed above, and there is much to be done in exploring the many possible benefits of user models in more depth. Combining linguistics, programming, and tutoring has been a slow process, and finding better ways of combining them is a large part of research that should be done. In every area covered by this review of the literature, there is room for more research. From parsing and tutoring strategies to error handling and user models, all of the literature suggests that not only do these area need to be looked at more, but more implementations of these ideas need to take place. Yet, there has been an amazing amount of progress made in the last fifteen years and it is exciting to see what comes up in the next fifteen. 25 Chapter 3 The structure of the program Although the focus of this project is the user model and the tutoring strategies, the structure of the entire program is vitally important to its success. This is because it is not only the knowledge stored in the user model, but also the interaction of the tutoring strategies with the user model that makes the program run effectively. The tutoring strategies would be completely useless without a program to use them in. For this reason, the program incorporates many aspects of the current state of research in the area of language tutoring systems. Appendix IV contains a summary of the different parts of the tutoring program. The program requires full sentence answers from the user. This will strengthen the users grammatical skills as well as their ability with verb endings. The User Model One of the most important parts of the program is the user model. The structure of the user model was determined by a thorough review of the literature and several different types of data were chosen to focus on. These different types of data allow for a wide range of tutoring strategies. The user model contains the information that the program has gathered about the user. It including which questions were attempted and how they were answered by the user, the login name and password for the user, the amount of time the user used the program, the words that the user used that were not in the lexicon, and all of the mal-rules that the user used. Included in the user model is the user profile. The user profile is the p re- 26 information gathered about the user. In this program, the user profile includes the user’s name, password and primary language. This information is useful because it aids the program in "remembering" the subject over several lessons. The rest of the user model contains the bulk of the user’s information. This information includes which questions the user attempted, answered incorrectly/correctly, or skipped. This information also includes what m al-rules the subject used and which words were unknown to the tutoring program that the subject entered. All of the subjects’ responses are also saved within the user model, as well as the corresponding m al-rules that are used on the subjects’ answers. Finally, the amount of time the subject used the program is recorded. An example of a user model can be found in Appendix V. Each section of the user model is now discussed in greater detail. The user model contains which questions the user answered correctly, incorrectly or skipped, as well as the order in which they were answered. This is recorded through a simplistic scoring system. A skipped question is different from an unattempted question in that the user has not seen the question for an unattemped question but will have seen the question for a skipped question. The scoring system is presented on Table 1. Table 1. User Response Scoring System The user’s response Corresponding score Incorrect response then question skipped -3 Incorrect response -2 Skipped question -1 Unattempted question 0 Partially correct response 1 Correct response 2 The scoring system is a way to keep track of the user’s responses so that it is easy 27 to use the information. This scoring system is used by the tutoring strategies to help determine the next course of action for the program. As the responses are saved in a particular order in the user model, it is also easy to determine the ordering of the results of the question. For example, to see if a user answered three questions correct in a row, all the program has to do is look for three questions in a row that have a score above zero. The user model also contains the words that the user entered that could not be fixed/replaced with the w ord-check. This allows for the elimination of questions that are skipped/answered incorrectly due to a lack of knowledge of the words in the lexicon. The w ord-check also functions as a crude spell checker for words that do exist in the lexicon. If the subject enters an incorrectly spelled word, he/she is presented with a list of words to choose from to replace the misspelled word. The m al-rules that the subject has used in his/her answers appear in his/her user model. This includes both predetermined and created m al-rules. These are stored in such a way that which rules used, how many times they are used, and what questions they are used for all appears in the user model. This information is used by the program in applying the tutoring strategies (see Appendix V). The time that the subject starts and finishes using the program are recorded in the user model. From this, it is possible to calculate for how long the subject used the program. The method for creating and updating the user model is quite simple. When a subject logs in, a file is created using the login ID of the subject. This login ID was randomly assigned to the subjects. All of the user data is stored in the program in lists. If the subject logs out or if the program ends, the data in these lists are written to the file. In this way, the user model is preserved after the program ends and can even be reloaded in later sessions. 28 This is quite an extensive user model and allows not only the application of tutoring strategies, but also has the potential to be used for many more sessions. An example of the layout of the user model can be found in Figure 1, and a more extensive example can be found in Appendix V. Figure 1. A example of the layout of the user model. John password <— subjects’ name <— subjects’ password 1 1 <— current question <— current lesson <— question number <— question result 1 -0 m al_vbarll 1 Sat Jan 25 12:26:04 GMT-08:00 2003 Sat Jan 25 13:20:37 GMT-08:00 2003 word 1 1 m al_vbarll UNBC every Christmas. <— start time of last lesson <— end time of last lesson a word used not in the lexicon <— question number <— lesson number <— m al-rules used <— subjects’ response to question Tutoring strategies On its own, the user model would not be very useful. Its value is in its usefulness in guiding the tutoring strategies. The tutoring strategies used in the program are not meant to be extensive or exhaustive. Instead, the purpose of the tutoring strategies is to use all of the information contained in the user model to create an effective tutoring system. Since the information in the user model is quite diverse, several very different tutoring strategies have been combined within the program. One of the goals of the program is to look at the 29 interaction of these tutoring strategies to see what effect they have when combined. The tutoring strategies are derived from reviewing the current literature and by suggestions from experts in the field of tutoring ESL. In particular, tutoring strategies that use information contained in a user model were chosen. Instead of focusing on one tutoring strategy, several are used to determine the effect of combining the different tutoring strategies. As the tutoring strategies may not always have a recommendation on which question to ask next, it is important to have a preset structure in which to ask the questions. The questions in the program have a preset or default ordering so that there is always a question available to ask the user. This ordering is discussed in the lesson structure section. The preset structure is the sole tutoring strategy used for the control group. Only the preset structure is used for the control group to closer simulate the kind of noninteractive lesson that a subject would receive using a textbook. However, the program gives a simple correct/incorrect response to the subjects’ responses. The preset structure is also useful initially for the experimental group, as when they are using the program at the beginning, the user model does not contain much information for the tutoring strategies to use. The other tutoring strategies are based on the user model. These are the tutoring strategies that use the information contained in the user model to affect the question that the subject is presented with. The process of selecting the next question to present to the user is partially based on recommendations provided by the tutoring strategies. It is possible that more than one question may be valid, and the question selector chooses which one to present based on a ranking of the suggestions. Also based on data provided by the tutoring strategies, the question selector may move onto the next lesson. A example of the tutoring strategies can 30 be seen in Figure 2. See Appendix III for a complete list of strategies. This summary includes the ideas discussed above, as well as the feedback the tutoring strategies provide. Figure 2. An example of the tutoring strategies. If one wrong — > Provide feedback based on detected mistake. If two wrong in a row — > Provide feedback based on detected mistake. If three wrong in a row — > Provide hints for the subject at the same time that the questions are asked If five wrong in a row > Ask the subject if he/she wants to start the lesson over. If one right — > Give positive feedback If three right in a row — > Move to next section. If the answer is wrong and the subject made a similar mistake before — > Tell the subject that he/she has made a similar mistake before and show the previous mistake to him/her. Missing word detected — > Tell the subject that there may be a missing word in their response. Feedback Feedback is actually part of the tutoring strategies, but as it is such a large component, it is discussed separately from the rest of the tutoring strategies. The feedback that the control group sees is based solely on whether the subject’s answer is correct or not. A correct response produces the feedback "Good Job." An incorrect response produces the feedback " Incorrect. Try again, or skip the question." This is how much feedback a textbook could provide with a key available. However, since the program can accept several versions of correct answers, the ability of telling the subject whether he/she is correct or not actually is more sophisticated than a textbook. 31 The feedback for the main experimental group is much more extensive. It includes information about whether the subjects’ answer is correct or incorrect. If the answer is incorrect, it tells the user whether the tense that he/she used is correct or not, provides information about if the user has made a similar mistake before, tells if the answer may have a grammar problem (which includes having a word missing), tells if there is a word missing in the verb phrase, and if the subject should use hints or start the lesson over. For an example, see Figure 3. Figure 3. Feedback example. Computer: Starting new lesson This is a lesson o n Simple Fresent(l) In general, the simple present expresses events or situations that exist always, usually, habitually; they exist now, have existed in the past, and probably will exist in the future. Computer: 1 W hat does UNBC do every Christmas? (close) User: UNBC closed Computer: Incorrect tense, try again User: UNBC Computer: Incorrect, grammar may be wrong. Try again User: UNBC closes Computer: Correct tense Computer: Good Job! There are too many different possible combination of dialogue that can occur between the 32 subject and the program to provide examples for all combinations in this thesis. Appendix VI provides an extensive sample dialog that illustrates some of the potential interactions between the subject and the program. Question Selection Question selection is the process by which the program determines what question to ask the user next. The tutoring strategies provide data to the question selector. The question selector takes these data and use them to select the next question to present to the user. The question selection is deterministic and uses a set ordering of the tutoring strategies to determine the next step. The potential next step can include doing a question over, doing a lesson over, starting the lesson from the beginning, skipping the lesson, or selecting a question from the current lesson. These potential next steps are expressed in Appendix III. After choosing which question to ask the subject, the question selector prints the question to the screen and allows the user to respond. If no tutoring strategies are activated, then the questions are selected in their predetermined order. Compare Against Expected One of the challenging tasks when putting together a tutoring program is the ability to analyze whether an answer is correct or not. In Second Language Learning tutoring programs there are many ways to decide whether an answer is correct or not (Farghaly, 1989; Holland et al., 1993). The simplest way is to have a predetermined correct answer that the user’s response is compared against. The problem with this is that in Language Learning it is 33 possible to answer correctly in a way that is different from the expected result. For an example of two correct answers, refer to Figure 4. Figure 4. Example of two correct answers. Question: What is Nathan going to do tonight? (sleep) Expected Answer: Nathan is going to sleep tonight. U ser’s Answer: Nathan is going to sleep. As you can see, the user’s answer is just as correct as the expected answer. If you checked if the user’s answer exactly matched the correct answer, you would find that the user’s answer is wrong. In Figure 5, there is another example of two correct responses, this time due to the use of a pronoun instead of a proper noun. Figure 5. Example of two correct answers. Question: What is Nathan going to do tonight? (sleep) Expected Answer: Nathan is going to sleep tonight. U ser’s Answer: He is going to sleep tonight. In this case the user’s answer is also correct. It becomes a very difficult task to create a language tutoring program that accepts most of the possible correct answers to a question. One key point to remember when checking to see if the user’s response is correct, is that the program is tutoring a subset of English and should be able to overlook unrelated mistakes (Holland, 1993). An example of such a mistake can be seen in Figure 6. 34 Figure 6. Example of an insignificant mistake. Question; What is Nathan going to do tonight? (sleep) Expected Answer: Nathan is going to sleep tonight. U ser’s Answer: She is going to sleep tonight. In this example Nathan is actually a male, and referring to him as a "she" is a mistake. However, if you are trying to teach the user about verbs and verb endings, it may be appropriate to let the he/she mistake slide. This is a common method used in many Second Language Learning classrooms. So now the problem is to not only accepting the many variations of possible correct answers, but also to accept answers that may have unimportant errors in them as well. To solve the problem of deciding what to accept as a correct answer, a m ulti-step process is used. First, the user’s answer is run through a spell checker. The spell checker does not only make sure that the words are spelled correctly, but also makes sure that all the words used by the subject exist in the current lexicon. If they do not exist in the lexicon, the user is presented with a selection of words to choose from. The second step is to run the user’s response through a grammar checker. The grammar check results are saved and used to help determine if the user’s response is an acceptable answer. Third, the user’s response is checked against one possible correct answer (referred to as the expected result). This process includes checking to see if they have similar verb phrases. It is possible to take all this information and decide if the user’s response is acceptable. Correctness of a response based on a combination of results are shown in Figure 7. 35 Figure 7. Analysis of the Correctness of an Answer If the user’s response is the same as expected result & grammar is correct — > answer is correct If the verb phrase in the user’s response is the same as expected result & grammar is correct — > answer is correct If the grammar of the response is incorrect — > answer is incorrect Extra feedback is provided to the user about the mistakes in the answer, so that he/she is aware of his/her mistakes, even if the answer is accepted as correct. There are a number of choices that were made about phrases that are acceptable in every day speech that the tutoring program does not allow. The most important one of these is that in the perfect tenses, it is required for the subject to use "already" in his/her responses. There are several different ways of answering the questions so that "already" would not be needed, but in the case of this program, the "already" way was required. Figure 8. Example of a question containing "already". Question: What has she already done? (sleep) Answer: She has already slept. Determining whether a subjects’ answer is correct or not without predetermining the correct answer is very difficult. However, allowing full sentence answers is one of the key components to a tutoring program that is trying to teach the user full-sentence grammar structures. Lesson Structure It is necessary for the tutoring program to have a pre-m ade lesson plan. This 36 includes the different lessons, the questions within each lesson, and the words that are used in all of the questions. The structure that is used for the tutoring program consists of twelve lessons on verb endings. The lessons on verb tenses are simple present, simple past, simple future, present progressive, past progressive, future progressive, present perfect, past perfect, future perfect, present perfect progressive, past perfect progressive, and future perfect progressive. Each lesson contains ten questions. The lessons and questions all center around the context of school. The format of the questions was obtained from Betty Schrampfer’s book "Understanding and Using English Grammar 3rd Ed" (1999) a format which is common in language learning textbooks today. User Interface The user interface is the part of the program that the user interacts with. The user interface appears as in Figure 9. It is a very simple interface to allow the experimenter to easily teach the subjects how to use the program. This was important as the instructions are in English and the subjects need to understand them well enough to use all of the options available to them. The user interface also includes a picture that corresponds with each question, allowing the subjects to better understand the meaning of the question and what the expected response is. These pictures are important as they are language independent and take away some of the ambiguity of the questions. The user interface closely follows the guidelines that were laid out by Lonfils & Vanparys (2001). The interface is simple to avoid confusion and misunderstanding. 1 believe 37 that using the program is so easy that even subjects with very limited English skills can understand how to use it. The program is set up consistently and the meanings of all of the actions and buttons are quite clear. Figure 9. Screen shot of the program SLT Login Start Admin Features VWbrd Check Lesson Help Display Question IDialog text area Hint jCornputer; Starting n e w le s s o n This is a l e s s o n on — Sim ple P resent(1 ) In g e n e r a l, t h e sim ple p r e s e n t e x p r e s s e s e v e n t s or situations th a t exist alw a y s, usually, habitually; t h e y exist no'w, h a v e e x is te d in th e p a st, and probably will e x ist in th e future. Computer: 1 W hat d o e s unbc do e v e r y Christmas? ( c lo s e ) Skip IPlease enter your responses below in full sentences ' ’lTn b c c l o s e s e v e r y Christm as. Enter 38 Figure 10. Example picture from the program Word Check The program contains a simple word check and its purpose is twofold. First, it catches spelling mistakes that the user has made and provides several words from the lexicon that are closely spelled to the incorrectly spelled word. The second purpose of the word check is to catch the words that may be spelled correctly but are not in the lexicon. As this is a small tutoring program and the lexicon is quite small, there is a chance that the subject may use a word that the program does not understand. The program gives the subject the option of using a different word. If no such word is found, the subject has the option of skipping the 39 question and the word that caused the problem is saved in the user model for later analysis. Parsing the subjects’ response One of the features of the tutoring program is its ability to take answers in many forms. A key component for doing this is the lexicon/parser. The lexicon consists of a few hundred words that were picked based on the probable answer to the questions that exist in the tutor. The parser is a simple DCG parser that is sufficient to parse most of the expected answers that the user may give to the presented questions. The DCG parser also includes an extra section for dealing with mal-rules. These m al-rules are used just like the other DCG rules except that they are labeled as m al-rules. The reason that a simplistic parser (such as a DCG) was chosen was the constraints of time and the fact that, for the size of the program, the DCG parser is sufficient. However, the likelihood that more complicated ones would work is quite high. One of the key components of the tutoring program is the ability to parse the user’s responses. This is important as it allows the program to accept more than one correct answer, even if the answer was not predicted ahead of time. The parsing of the subjects’ response is done using a m ulti-step process. This includes checking the words of the response, running the response through the DCG parser, checking the response for mal-rules, and obtaining the verb phrase and tense of the response. To validate the words of the response, all of the words are checked to see if they exist in the lexicon. If a word does not exist in the lexicon, then the word is either out of the context of the program or the word is spelled wrong. The subject is presented with a list of words that he/she can choose from to replace a word if it does not exist in the lexicon. Once 40 the word is replaced, the subjects’ response is changed and the response is passed on to the next process. The response is run through the DCG parser. The parser should recognize most grammar formats associated with the context. The DCG parser returns a response concerning whether the parse was successful or not, the feature list, and the parse tree (if the response parses successfully). Figure 11. Example of a few of the DCG rules and lexicon entries. DCG Rules sent(sent(NP,VP)) — > np(NP),vp(VP). np(np(N_PROP)) — > n_prop(N_PROP). vp(verb(V_BAR)) — > verb(VERB). Lexicon entries n_prop(pn(Nathan)) — > [Nathan]. verb(v(swims)) — > [swims]. Result sent(np(nbar(pn(Nathan))),vp(vbar(v(swims)))) A DCG parser is a top-dow n way of parsing a sentence. It takes a sentence and breaks it down into two subsections. For example, Figure 11 illustrates the process of obtaining the resulting analysis. This is a subset of the more complex grammar that the program uses. The rules and lexical entries can parse the sentence "Nathan swims". The sentence is parsed into a noun phrase and a verb phrase. The noun phrase is then parsed into a proper noun and the verb phrase is parsed into a verb. This creates a tree structure that contains the parsing of the sentence "Nathan swims" 41 If the response is not parsed successfully when run through the main grammar rules it is run through a second set of m al-rules. These are the predetermined m al-rules that are added to the program ahead of time. This parsing also returns the parse tree if the parse was successful. If the response still has not heen successfully parsed, a m al-rule is created that represents the grammar structure of the response. This new m al-rule is added to the other m al-rules in the program. The m al-rule step of this process is quite new to the field of tutoring systems and it is covered in more detail in the next section. Next, the verb phrase is extracted from the subjects’ response and compared to the expected answer. This later provides information such as if there are words missing in the subjects’ response and whether to accept the response even if other parts of the sentence are incorrect. The tense of the verb phrase is also obtained and compared to the expected result. At the end of this process, all of the information that has been produced is passed on to the rest of the tutoring program. Some of the information is saved in the user model, and the rest is passed on to be compared against the expected results. M al-rules M al-rules can he a very effective tool when used within a user model. This thesis uses m al-rules to keep track of incorrect uses of grammar rules. M al-rules can he predetermined or created as the program runs (Brown, 2002). In the tutoring program, two types of m al-rules exist; those that are predetermined and those that are created as the subject uses the program (called dynamic mal-rules). There were several challenges when making a system that uses m al-rules. 42 The first challenge was to choose which m al-rules to predetermine. As the purpose of the tutoring program is to tutor about verb endings, most of the predetermined m al-rules have to do with incorrect verb endings. Therefore, most of the predetermined m alrules are at the feature level of the grammar. A few featureless m al-rules are added to make sure they also work in this context. See the implementation section of this chapter for a more in-depth look at the predetermined m al-rules The next challenge is how to create and use m al-rules that are created as the program runs. There are several serious problems that can arise when creating m al-rules. It is possible that one mistake can create more then one m al-rule. An example of multiple m alrule generation can be seen in Figure 12. Figure 12. Example of Multiple M al-rule Generation Question; Was the orange cup full or empty? Answer: The orange cup was. M al-rule 1: Sent — > Determiner, noun, noun, verb M al-rule 2: Sent — > Determiner, adjective, noun, verb This is just a small sample of potential m al-rules that could be created. Multiple m al-rule generation is a very serious problem but does not directly affect this thesis. The domain of the program and the size of the lexicon (>300 words) means that in almost all cases only one m al-rule are created. However, if this program was expanded, multiple m al-rules generation would have to be addressed. Implementation Understanding the structure of the tutoring program is important, but it is also 43 equally important to know how the program is implemented. This is not only essential in testing of the program but also allows other researchers to duplicate the program results. To implement this program, the programming languages Java and Prolog are used. Java is used to create the interface, and Prolog is used to do the parsing, tutoring strategies, and user model. The justification of using these two language is that Java is widely accepted as a useful language for the creation of interfaces, Prolog is a useful programming language when it comes to natural language processing, and a component called Jasper allows the two languages to be easily integrated into one another. The results of the tutoring program could have been achieved by using other programming languages. The program is run on a SunBlade 100 using a unix operating system. This setup allowed the program to run at quite a fast speed and almost no loading delays were experienced. If the program was run on a significantly slower computer, the delays that may occur would affect the performance of the program. The user model saves the rales/m al-rules that the user has used, as well as information about the success of the user’s session (as discussed in the m al-rule section found above). The login name for the subject was a randomly assigned number to protect the confidentiality of the subject. The password used was the word "real" with the purpose of identifying the tests that were used in this study. The user model is stored in the program as several lists of data. Each of these lists correspond to a part of the user model. When the user leaves the program, these lists are saved to a file in a pre-set order. The format of this file can be seen in Appendix V. To implement the tutoring strategies, each had to be programmed into the system individually. For an example, I will examine how the tutoring strategy on if the tense of the 44 response is correct, was added. All of the questions have a potential correct answer. The tense of this answer can be obtained by parsing it. This tense is then compared to the tense obtained from parsing the subjects response. If the tenses do not match, then the program knows that the subject has used an incorrect tense in their response. To create a m al-rule, first the subjects’ input has to unsuccessfully parse with the normal grammar rules and the predetermined grammar mles. Then the grammar types of all of the words in the subjects’ input are obtained (noun, verb, etc...). These values are then combined to form a new grammar rule and is assigned a unique name. This new m al-rule is then added to the list of dynamic m al-rules contained in the user model. This is a simplistic way of creating m al-rules at the sentence level, but since the grammar involved in the presented questions is very similar, this approach is sufficient. An example of m al-rule creation can be seen in Figure 13. Figure 13. M al-rule creation Flawed Answer: Lennise opened door. Syntactic Analysis: proper noun, verb, noun [determiner: definite]. Error: No determiner to go with the noun. M al-rule created: m al_l(sent— > proper noun, verb, noun [determiner: indefinite]). The predetermined m al-rules that were used all focused on verb phrases. The main set of predetermined m al-rules were m al-rules on incorrect tense usage. An example of one of these m al-rules can be seen in Figure 14. The predetermined rules that exist in the program are the different combinations of incorrect tenses and m al-rules on mistakes with transitive and intransitive verbs. All of these m al-rules were predetermined because they 45 were presumed to be the most likely errors that the subjects would make. Figure 14. A m al-rule on incorrect tense usage. v_bar(mal_vbar7(V_AUX,V_BAR),[tense:error,vform;X,trans:Z]) — > v_aux(V_AUX,[tense:past,vform:X,trans:Z]), v_bar(V_BAR,[tense:present,vform:Y,trans:Z]). All of the parts of the program are added as discussed in the description of the structure of the program. Just creating the program described above provides useful information about creating a successful tutoring program. However, once the program is created, the next step is to test the program on human subjects to determine the effectiveness of the program. 46 Chapter 4 Field Testing Design When a program is developed using a new approach it is important to test its effectiveness. There are many methods to test a program, but only a few are used to test a given program. To test this program several methods were used that gathered both quantitative and qualitative data. The goal of this pilot study is to promote future research and does not focus on providing conclusive results. The several methods that were used to gather data from the subjects were the results of the pre-test and post-test, motivation survey, amount of time the program was used per lesson, the user models, the experimenter field notes, and the comments made by the subjects. A t-test was performed on the pre-test and post-test scores, motivation survey scores, and the amount of time the program was used per lesson scores to look for a significant difference between the experimental group and the control group. Using t-tests will result in the loss of the interaction between the different scores, but will provide some useful information for the pilot study. ANOVA tests were not used to keep the results as simple as possible as a more complex analysis is not the goal of the pilot study. The goal of the pilot study is to provide direction for future research. It is not possible to derive conclusive results from a small pilot study. The user models, the experimenter field notes, and the comments made by the subjects were separated into data from the control group and data from the experimental group and analyzed for patterns within the data. 47 Subjects The subjects in the experiment come from several different backgrounds in relation to their primary language. This means that first language interference may be different for people depending on their language background. A potential solution to this is to use only subjects with the same primary language background. However, this would cause the results to only apply to people with the same background. Instead, a random mix of backgrounds was used and the assumption was made that this does not affect the results due to the random nature that the subjects are assigned to the control group and the experimental group. Thus, the effects of the confounding variable, first-language interference, are minimized. Subjects had to be found who were currently learning English as a second language but still were making errors in the area of verb endings. One place in which to find such people was students attending CNC (the College of New Caledonia) and UNBC (University of Northern British Columbia). The subjects were recruited by presenting requests to classes at CNC and UNBC that have a high percentage of students that would be appropriate for the experiment. Word of mouth was also used to attract potential subjects. The testing of the program was done using people, above age 18, who have English as a second language. The subjects was required to be above age 18 to avoid dealing with the extra ethics approval and safeguards of testing minors. For the pilot study, twenty subjects were used. The subjects were required to be available for two hours. Two hours was picked due to time and resource constraints. Procedure 48 The human aids that used to help administer the program to the subjects were assembled to initially test the program. This not only taught the aids how to use the program, but helped to identify some of the bugs that existed. The aids did not have a critical role in the actual testing, they were just used to help the experimenter when two groups of subjects had conflicting test times. The experimenter was present for all of the experiments. The twenty subjects were randomly divided into two ten-subject groups: one was the control that used the program without the benefit of the user model and the other group (the experimental group) used the program with the user model enabled. To randomly assign the subjects, an equal number of odd and even numbers were randomly generated and then assigned to each subject. Those with odd numbers were used as the control group, and those with even numbers as the experimental group. A booklet was created that contained the informed consent form, instructions, motivation questionnaire, ability pre-test, and ability post-test. This booklet was handed to each subject and each section was filled out at the appropriate time. The instructions were read first followed by reading and signing the Informed Consent Form. These were stored separately from the rest of each subjects’ data to ensure the subjects confidentiality. A copy was made so that each subject could obtain a copy of their own consent forms. Then the pre­ test was written before they used the program and was used to determine the subjects’ English skill level before using the program. A post-test was written after using the program to acquire their English skill level after using the program. The subjects used the program for a period of half an hour. The subjects were allowed to use a dictionary during the use of the program. After the thirty minutes they were given the option of continuing to use the program or stopping. This was allowed so those 49 who were not benefiting from the program did not have to use it for an extended amount of time. All subjects were required to stop after a two hour period. During the time the subjects used the tutoring program, field notes were taken by the experimenter. These subjective observations were gathered and analyzed for patterns. Finally the subjects filled out a questionnaire to judge the effect the program had on increasing their motivation to learn English. This was a small questionnaire (containing 10 question) using a Likert scale. It can be found in Appendix VII. The validity of this survey has not been established and therefore can only be used to obtain an subjective measure of the subjects’ motivational levels. After the above was completed, the booklets were collected and their user model was saved. The subjects were debriefed and given the opportunity to ask any questions that they still had about the experiment. Instrumentation The subjects used the tutoring program as described in Chapter 3. The control group using the program with the user model and tutoring strategies disabled. The program was run on a SubBlade 100 in room 5-164 at UNBC. Ethics To test the program, I needed to obtain approval for my test from the Ethics Board at UNBC (as it uses human subjects). This involved submitting a Informed Consent Form package to the Ethics Board. The experiment was approved. The testing took part over a tw o-w eek period. 50 Chapter 5 Results of Field Testing Qualitative Data In this thesis there are several different sources of qualitative data: the observations made by the experimenter, the comments made by the subjects, and the analysis of the data collected in the user models. These data were examined for patterns and relevant information. Many of these observations are anecdotal in nature due to a paucity of supporting evidence. When examining the effectiveness of a program, often qualitative data is used in complementing quantitative data. In the experiment, the experimenter observed and recorded observations about the subjects as they used the tutoring program. This was done through the use of field notes. These notes were recorded and then separated into notes from observing the control group and notes from observing the experimental groups. These notes were then examined for patterns and compared against each other. These observations are a great source of information and must be considered when looking at the effectiveness of the program. The subjects in both the experimental group and the control group seemed to want to continue using the program. When given the option to stop using the program after thirty minutes, almost all of the subjects chose to continue the lesson. The experimental group seemed to have an advantage when it came to finding the correct answer after entering the incorrect one. They appeared to use the feedback provided and figured out what the correct answer was. The control group seemed to take more attempts to find the answers to the questions that they did not know. This pattern was found when examining the field notes from the experimenter. In the field notes it was reported that 51 the control group would often take several attempts to get an incorrect question correct where the experimental group often got the correct answer in the second attempt. This is supported by the data in the user models that recorded the attempts of the subjects. The tutoring based on the m al-rules appeared to work well. There were several events where these tutoring strategies came into effect and aided the user in finding the correct answer. This is inferred from the field notes of the experimenter. In the course of testing the subjects, several bugs were discovered. The experimenter and aids tried to reduce the impact of these bugs by directing the subjects around them when they occurred. The bugs have been taken into account when analyzing the results. The bugs can be found in Appendix VIII. Another main source of anecdotal data comes from comments made by the subjects. These were broken down into the conaments provided by the experimental group and the comments provided by the control group. These notes were recorded and then separated into notes from the control group and notes from the experimental groups. These notes were then examined for patterns and compared against each other. The comments made by the experimental group were mostly centered on expanding the program. There were comments about improving the pictures, adding more English words in the program, and having a wider range of questions and lessons. Several subjects commented that the program is a useful program for tutoring ESL. Comments from the control group were very similar to the experimental group. They also wanted improvements and expansions of the program. However, the control group also complained that the program was boring, that it lacked examples, and that they had a hard time understanding why they were wrong. However, the control group reported that 52 they liked the program. The comments made by the control group compared to those made by the experimental group shows that there was commonalties in suggestions on how to improve the program and that the program was generally liked. However, only the control group commented that the program was boring, lacked examples, and did not explain why the subject was wrong. This shows that there was a difference between the control group and the experimental group’s opinion of the program, with the control group having additional problems with the program. The last source of data was from the recorded user models themselves. Since the data generated by these user models is extensive (over 150 pages) they are not reproduced in the thesis. There are several aspects of the user models that need to be looked at, including the results of the subjects’ answers, the results of the m al-rules, and the words that the subjects used that were not replaced with words from the lexicon. This data was separated into two groups, data from the experimental group and data from the control group. These two groups of data were then analyzed for patterns and were compared against each other. Looking at the m al-rules, 44 dynamic and 27 predetermined rules were created/used by the twenty subjects. The m al-rules appear to have been successfully implemented after an examination of the user models. Both predetermined m al-rules and m al-rules created during program use were used. Having said this, there was not a large use of m al-rules. The most used m al-rules were the predetermined ones concerning verb agreement. This was expected since the program was testing verb endings. The most often generated m al-rule involved the lack of a determiner in the subjects’ response (their response did not include a needed determiner). This could be an indication of first-language 53 interference as a missing determiner is a common mistake of Asian people learning English (55% of the subjects were of Asian decent). With these findings, the possibility of doing more studies that look only at m al-rules may prove to he an effective tool when it comes to creating effective tutoring systems and for researching first-language interference. Lastly, the lack of coverage of certain English words did not seem to he a problem. Less than one word per subject was recorded in the user model. However, there were a few comments made indicating that the English word coverage was too restrictive and this could he a potential future improvement to the program. Quantitative Data There is a lack of quantitative data used to evaluate systems in the field of ITS. Therefore, a pilot study was done on this program. This pilot study was part of the field testing. Several tests were performed to analyze the quantitative data obtained from the experiment. A one-tailed t-test was used to compare between the experimental group and the control group in the following t-tests. Three variables were the focus of the t-tests: the motivation of the subjects, a comparison between the pre-test and post-test scores, and examining the ratio of time of program use compared to how far the subjects progressed through the program. To keep the results as simple as possible, ANOVA tests were not used as a more complex analysis is not the goal of the pilot study. The pilot studies goal is to promote future research, not to make conclusive results. All of the tests used an Alpha level of 0.05. To adjust for the multiple t-tests, a more stringent alpha could have been used. This is a possible change for future research. 54 From the subjects, some personal information was gathered regarding each of them. This data allows for a good picture of the population that the subjects came from. This information is listed in Table 2. There were 20 subjects, 10 male and 10 female; 6 speakers of Chinese/Mandarin; 3 Japanese speakers; 2 Spanish speakers; 3 Korean speakers; and 6 other various language speakers. Table 2. Subject Background Information Subject number First Language Age Sex 100 Chinese 29 Female 101 Japanese 26 Female 102 Chinese 33 Male 103 Swahili 19 Male 104 Japanese 21 Female 105 Chinese 40 Female 106 Farsi 30 Male 107 Czech 20 Male 108 Amharic 32 Male 109 Korean 22 Female 110 Chinese 29 Male 111 Spanish 35 Female 112 Spanish 37 Male 113 Scouvak 31 Male 114 Chinese 26 Female 115 Mandarin 21 Male 116 Korean 49 Female 117 Korean 30 Female 118 Japanese 27 Female 119 Polish 23 Male 55 After the personal information was gathered, there was a small, written pre-test to gauge each subjects’ level of English ability. This questionnaire resulted in scores out of 10. These scores allowed the experimenter to get an idea of the subjects’ initial ability, and it was compared to a similar questionnaire taken after using the tutoring program. The results of this questionnaire can be found on Table 3. It was possible for the subject to be "quite skilled" at English grammar. Being "quite skilled" in this context refers to a subject that made no mistakes on the pre-test. These subjects were not removed from the experiment as the pre-test was found to be too easy in relation to the program. If someone was so skilled that no mistakes were made in using the tutoring program, the data would not be included, as it would be impossible to determine if the program would help someone who made no mistakes. All of the subjects in the experiment made mistakes while using the program. The program does eventually ask challenging questions that were more difficult than any of the questions on the pre-test. After the subjects used the tutoring program, they were given a post-test similar to the pre-test. The pre-test and post-test are identical in syntactic structure but used different verbs. Different verbs were used to prevent subjects from obtaining the answers to the post-test by memorizing cases within the program that the subjects recognized from the pre-test. The post-test, which is also scored out of 10, gave the experimenter an idea of the subjects’ English skill level after using the tutoring program. The results of this test can also be found on Table 3. The subjects with a even subject number were in the experimental group and the ones with an odd subject number were in the control group. With the pre-test and post-test tests, a score can be obtained based on the results of the above two tests. This score was the number correct on the post-test minus the number correct on the pre-test. The 56 gain score estimates how much each subject improved their English skills when dealing with verb endings. These estimates can also be found on Table 3. Table 3. P re-test and P ost-test Data Subject number P re-test Post-test 100 10 10 0 101 8 10 2 102 4 9 5 103 9 8 -1 104 10 10 0 105 10 10 0 106 10 10 0 107 10 10 0 108 6 6 0 109 10 10 0 110 7 10 3 111 9 7 -2 112 8 10 2 113 3 4 1 114 4 3 -1 115 10 10 0 116 1 1 0 117 3 3 0 118 8 10 2 119 9 9 0 Différé A test was performed to look at the differences between the control group and the experimental group concerning the difference between their scores on the pre-test and post­ test. The pre-test experimental group mean was 6.8 (SD = 3.05), and the pre-test control group mean was 8.1 (SD = 2.77). The post-test experimental group mean was 7.9 (SD = 57 3.38), and the pst-test control group mean was 8.1 (SD = 2.64). The experimental group mean was 1.1 (SD = 1.85), and the control group mean was 0.0 (SD = 1.05). There was no significant difference between the mean score for the experimental group and the control group (r(18) = 1.63, P>0.05). Therefore, the program did not appear to significantly increase the English ability of the experimental group more than the control group. The time that the subject used the program was recorded. The amount of time used was compared to the lesson that the subject reached. This allows for a significance test to be done comparing the experimental group and the control group. In Table 4, a summary of the data can be seen. Table 4. Time/Lesson Data and Ratio Subject number Length of time used (minutes) Lesson Reached Time/Lesson Ratio 100 57 12 4.75 101 63 12 5.25 102 46 9 5.11 103 47 3 15.67 104 44 12 3.67 105 58 3 19.33 106 63 12 5.25 107 71 10 7.1 108 40 8 5 109 35 4 8.75 110 57 12 4.75 111 47 3 15.67 112 45 7 5 113 36 1 36 114 31 3 11.33 115 32 5 6.4 58 Subject number Length of time used (minutes) Lesson Reached Time/Lesson Ratio 116 36 3 12 117 53 1 53 118 54 12 4.5 119 31 6 5.17 A test was performed to look at the amount of time each of the two groups spent using the program compared to how far they proceeded in the program . The experimental group mean was 6.14 (SD = 2.95), the control group mean was 17.2 (SD = 15.7). There was a significant difference between the mean score for the experimental group and the control group (f(18) = 2.2, P<0.05). Therefore, the program significantly increased the time/distance ratio (average time per lesson) of the experimental group more than the control group. After using the tutoring program, there was a questionnaire to ascertain the amount of increase in motivation the program had on the subjects. This was done using the questionnaire found in Appendix VII. This questionnaire has not been evaluated for validity and therefore the conclusions that can be drawn from it are very limited. The values of this questionnaire were based on a Likert scale. A score out of 50 was obtained, with a higher score indicating a higher level of motivation. These scores are summarized on Table 5, and are later used in determining the difference in motivational levels of the two groups. Table 5. The Level of Motivation Score Subject Score 100 38 101 30 102 36 103 44 104 40 59 Subject Score 105 44 106 41 107 37 108 41 109 37 110 30 111 28 112 28 113 37 114 23 115 33 116 39 117 34 118 36 119 42 A test was performed to look at the questionnaire data used to determine motivation levels of the two groups. The experimental group mean was 35.2 (SD = 6.16), and the control group mean was 36.6 (SD = 5.54). There was no significant difference between the mean score for the experimental group and the control group (t(18) = .53, P>0.05). Therefore, the program did not appear to significantly motivate the experimental group more than the control group. 60 Chapter 6 Discussion The effectiveness of the combination of the different tutoring strategies and the complex user model proved to be inconclusive. However, several noteworthy results were obtained. The subjects were tested for a period of up to two hours. The time that the subjects used the program for may not be long enough to produce statistically significant results. To deal with this possibility, trends in the user models of the subjects were taken into consideration when determining the success of the program. Even non-significant effects may hint at the possibility of significant results when using a longer testing period. Since the subjects are both introduced to the tutoring program and are required to use it within a two hour period, there is some frustration caused by using a program for the first time. This may decrease the motivation score of all participants but should not significantly affect the difference in motivation scores between the experimental group and the control group. There is the possibility that the wide range in skill levels of the subjects might affect how well the program runs with each individual. Some may require a large amount of tutoring, while others are only be tutored on the occasional question. Therefore, the effect of the tutoring on these different subjects may vary. This is not seen as a large problem when it comes to the statistical results because the groups are randomly distributed. However, it is something that must be kept in mind when considering what population the results are relevant for. Examining the comments made by the experimenter and the subjects shows that 61 the feedback provided to the experimental group was useful. The control group commented that there was a lack of feedback, while the experimental group did not. This information is anecdotal and future research to help confirm these results is necessary. One of the disadvantages of testing the combined effects of the different applied tutoring strategies is that it is not possible to determine the individual effects of each of the tutoring strategies. This is not seen as a large problem, as each of the tutoring strategies have been shown to be effective in other studies (Bailin & Philip, 1988; Nagata, 1995). The only tutoring strategy that has not been extensively studied is m al-rules. Therefore, it is important for follow-up studies to look at the effectiveness of m al-rules without the interference of the other tutoring strategies. Though the individual benefits of the m al-rules cannot be confirmed with this study, the evidence points at the potential benefits of using m al-rules in tutoring systems. The program successfully used m al-rules by recognizing when a subject used one, creating one if no m al-rules existed previously, and using the existence of both types of m al-rules to aid in the directing of the tutoring strategies. The dynamic m al-rules that were used were based on the syntactic meaning of the words (such as a noun). To create dynamic m al-rules that used grammatical phrases (such as a determiner and a noun) a bottom -up parser could be used. This is an area of potential future research and more analysis of the m al-rules may provide useful future research as well. The bugs in the program most likely did not affect the results of the experiment. The bugs that the subjects pointed out were insignificant, such as one could not press "enter" on the keyboard instead of clicking the "enter" button on the program. Overall the program 62 was generally liked by the subjects and, therefore, it can be concluded that the bugs probably did not interfere with program use. Comparing the results to the background literature in this field, you can see that many of the ideas expressed in the literature were shown to be correct in this thesis. Ideas such as m al-rules (Brown, 2002), which have not been heavily tested were successfully implemented. The time lesson ratio test found a significant difference between the experimental group and the control group. Subjects in the experimental group were able to progress through the program much faster than the control group. Therefore, this is an important finding in validating the success of the tutoring program. The pre-test and post-test analysis did not find a significant difference in English ability between the two groups. After looking at the data, a few reasons seem to have played a important role in this final score. First and foremost, the pre-test and post-test were too easy for the average skill level of the subjects. This caused the pre-test scores to be very high in many cases, and therefore, the difference between the two was very small. The second reason for the problems with the test was that one session is probably not an adequate amount of time to see significant difference in the English skill level of the subjects. This was unavoidable with the time and budget constraints of this experiment, but is an area where future research could provide a more significant result. There are several, more general reasons why the t-te st did not provide conclusive result. First, the sample size was small (only 20 subjects). This was not a major problem due to the fact that the pilot study was used to encourage and promote statistical studies and future research. Second, there was a mix of Asian and European language backgrounds in the 63 study. This allows the results to be more generalizable, but adds more confounding variables to the experiment. Third, the fact that the subjects had a wide range in English skill level may have affected the results. Fourth, the effect of gender was not accounted for in the experiment. Fifth, the pre-test and post-test were too easy in relation to the program. Sixth, the validity of the motivation survey needed to be established. Due to the existence of these problems, the results of this pilot study are inconclusive but open up many questions for future research. Conclusion The overall results of the experiment are inconclusive. So, were the goals reached? What does the success of this research mean? What contributions does this research make to the field of ITS? The goals that were stated at the beginning of this paper were successfully implemented but resulted in inconclusive results. The tutoring strategies, combined with the user model, were implemented successfully. M al-rules were successfully implemented and used. Finally, a statistical difference was found between the experimental group and the control group when looking at the difference of speed progression through the program. Why this difference existed is uncertain, but may be established in future studies. What do these results mean? These results show us that it may be possible to successfully combine several tutoring strategies and use a complex user model to create an effective tutoring system. When looking at the meaning of an experiment such as this one, it is important to examine the meaning of not only what worked, but also what did not work. The biggest revelation that can be obtained is that it would be very difficult to tutor 64 successfully with such a diverse population. The only way such a test could be conducted is if the population was greatly restricted, so that the question could be made at an appropriate level for the subjects. The problem with this is that the results would only be applicable to the restricted population. Since almost any tutoring program would be exposed to a diverse population of users, the appropriateness of such a restricted test is questionable. It is important to remember that the success of the program currently only applies to tutoring verb endings on a population with similar attributes to the one tested. Fortunately, there was quite a wide range of subjects that were tested. The ages of the subjects were also quite diverse. However, to conclusively say that the ideas and techniques used in this program would work anywhere is premature. Instead, the success of this program instills in us the belief that more research with these ideas and techniques is warranted and could lead to more conclusive evidence. The apparent success of implementing the m al-rules opens new avenues of potential research, as the m al-rules were successfully implemented in a tutoring system. It will be exciting to see more research in this area of Natural Language Processing. It is important to discuss what contributions this research makes to the field of ITS. The program was successfully implemented combining several tutoring strategies and a complex user model to make a tutoring system. M al-rules were successfully used in this study. This helps open a whole new field in ITS and encourages more studies using m alrules. This research includes a pilot study using t-tests to help determine the effectiveness of the program. These t-test show that it is difficult to extract meaningful results from an experiment such as this one. Most importantly, many questions have been raised by the results of this research and these questions can be used as platforms to which more studies 65 can derive. These contributions will aid future research in the area of ITS. Future Work Some of the ideas used in this paper have been tested extensively, while others have just been touched on. Therefore, the future work that is expressed here is presented in three categories: improvements to the program, expanding the pilot study, and potential research resulting from questions raised from this research. The program worked in the end, hut was by no means a finished product. There are many aspects of the program that could he improved and/or expanded on. Some of these potential areas of improvement are discussed below. This program focused on a small part of English grammar. Seeing the program work with other parts of English grammar would increase the validity of the results obtained from this program. It would also allow the results to be applicable to a larger range of English grammar, instead of just verb endings. It would be possible to test a large number of different types of tutoring strategies while using this program. The large amount of data collected in the user model could he used in many different ways, especially the m al-rules, which could he used to examine everything from first-language interference (Wang & Garigliano, 1993) to catching common mistakes made by all second language learners. If the population of the subjects was known ahead of time, it would he possible for the instructions/ethics forms to he in the native tongue of the subject. This would allow subject with less English skill to participate in the experiment. It would he beneficial to see the effect of subjects using the tutoring program over a longer period of time. The session length, however, probably should not be longer. 66 keeping it around the same length of time as a normal tutoring session. Instead, the effects of multiple sessions may provide interesting results. For this to be possible, the size of the program would have to increase, and include more questions as well as a larger context. The number of the subjects for the experiment was relatively small (twenty subjects). It would be beneficial in validating the effectiveness of the program by using sixty subjects to test the program. It may provide interesting results if the program were used on a population that had the same first language. This would reduce the applicability of the program over a general population, but may provide greater insight into the effectiveness of the program due to the removal of several confounding variables. Using only one gender for the subjects may eliminate the confound that results by having both males and females in the study. Ideally two experiments would be performed, one with each gender type. The dynamic m al-rules were based on the syntactic meaning of the words. It would be possible to create dynamic m al-rules that were made up of grammatical phrases by using a bottom -up parser. This would be an important step in the evolution of the mal-rules. It may be possible to use the m al-rules formed by a user to detect first language interference. Due to the fact that the program did successfully record the m al-rules that the user made, and the fact that certain m al-rules can be linked to first language interference (Wang & Garigliano, 1993), it should be possible to combine the two. This would allow a tutoring program to tutor in such a way to recognize first language interference and use that information to better tutor the subject. One of the changes that I would like to implement if there were time to do this 67 experiment over is to compare the program to a version that does not tell the subject if their answer is correct. W hen the control group was told whether their answers were correct, the version they were using may have been superior to that of a textbook. Therefore, checking to see if this did have a significant effect would be productive future research. It would be interesting to see the effect of using a program like this on children. As children learn differently than adults, the program might work differently on them. Overall, this thesis has provided several answers, but has created many more questions. These questions provide a good starting point for future research in the area of Intelligent Tutoring Systems and User Models. This field is still very new in the area of Computer Science and more research need to be done. 68 References Aiello, Luigia and Alessandro Micarelli. (1993). Computer Assisted Language Learning: A Grammar Detector and Corrector. Proceedings of the Seventh International PEG conference. Bailin, Alan & Thomson, Philip. (1988). The use o f Natural Language Processing in Com puter-Assisted Language Instruction. Computers and the Humanities. V22. Bonvalot, Catherine. (1999). Student Modeling through Dialogue in Second Language Learning Systems. A I-E D 99. 7-8. Bowerman, Chris. (1992) Writing and the Computer: The Nature o f the Problem and an Intelligent Tutoring Systems S o lu tio n .V lS il-S ). Brehony, Tom & Ryan, Kevin. (1994). Francophone Stylistic Grammar Checking (FSGC) Using Linked Grammars. Computer Assisted Language Learning. V7(3). Brown, Charles. (2002). Inferring and Maintaining the Learner Model. CALL Special Edition A I-E D 2001, V15, No. 4, October 2002. Bull, Susan & Pain, Helen. (1995). "Did I say what I think I said and do you agree with me?": Inspecting and Questioning a Student M odel. Artificial Intelligence in Education. Bull, Susan. (1994). Student Modeling fo r second language acquisition. Computers and Education. V23. Bull, Susan et al. (1993). Collaboration and Reflection in the Construction o f a Student Model fo r Intelligent Computer Assisted Language Learning. Proceedings of the Seventh International PEG conference. Chanier, Thierry et al. (1992). Modeling Lexical Phrases Acquisition in L2*. Second Language Acquisition Research. Chanier, Thierry et al. (1995 ). Alexia: a computer based environment fo r French foreign language lexical learning. Second Language Acquisition Research. Chen, Liang., Tokuda, Naoyuki., & Xiao, Dahai. (2001). POST P arser-B ased Learners M odel fo r Tem plate-Based ILTS fo r Japanese-English Composition. A I-E D 2001. pp 24-31. Chomsky, N. (1957) Syntactic structures. The Hague, Mouton & co. Civil, Anna et al. (1992). CATACROC: Com puter-Assisted Learning o f Catalan. The CALICO Journal V10(2). 69 Farghaly, Ali. (1989). A Model fo r Intelligent Computer Assisted Language Instruction. Computers and the Humanities. V23. Fogarty, James., Dabbish, Laura., Steck, David., & Mostow, Jack. (2001). Mining a Database o f Reading Mistakes: For what should an Automated Reading Tutor Listen. Artificial Intelligence in Education. lOS Press, pp 422-433. Ghemri, Lila. (1991). A Framework fo r Diagnosis and Remedial Feedback. SICS research report R 91:18. Swedish Institute of Computer Science. Hagan, Kirk. (1994). Unification-Based Parsing Application fo r Intelligent Foreign Language Tutoring Systems. The CALICO Journal V12(2,3). Heift, T. (1998). "Designed Intelligence: A Language Teac/zer M ode/''.Unpublished Ph.D. Dissertation. Simon Fraser University. Holland, Melissa., Maisano, Richard., Alderks, Cathie., & Martin, Jeffery. (1993). Parsers in Tutors: What Are They Good For. The CALICO Journal. V l l , No 1. pp 28-46. Issac, Fabrice & Fouquere, Christophe. (1995) A Bottom -U p Tag Parser: Application To Foreign Language Lexical Learning. L.I.P.N.- Institut Galilee. Kai, Kyoko & Nakamura, Jun-ichi. (1995). An Intelligent Tutoring System fo r Japanese Interpersonal Expressions. 7th World Conference on Artificial Intelligence in Education (AI-ED 95), pp. 194-201. Kalayar, Myat., Ikematsu, Hidenori., Hirashima, Tsukasa., & Takeeuchi, Akira. (2001). Intelligent Tutoring System fo r Search Algorithms. Kyushu Institute of Technology, Department of Artificial Intelligence. Kinshuk, Reinhard Oppermann, Rossen Rashev and Helmut Simm. (1998). Interactive Simulation Based Tutoring System with Intelligent Assistance fo r Medical Education. Proceedings of ED -M ED IA / ED -TELEC O M 98 (Eds. T. Ottmann & I. Tomek), AACE, VA, pp715-720 Labrie, Gilles & Singh, L.P.S. (1991). Parsing, Error Diagnostics and Instruction in a Erench Tutor. Calico Journal. Lemaire, Benoit. (1999). Tutoring Systems Based on Latent Semantic A nalysis. Artificial Intelligence in Education. lOS Press. 527-534. Levison, Michael & Lessard, Gregory. (1992). A System fo r Natural Language Sentence Generation. Computers and the Humanities V26. Lonfils, Colin., & Vanparys, Johan. (2001), How to Design U ser-Friendly CALL 70 Interfaces. Computer Assisted Language Learning. Vol. 14, No. 5, pp 405-417. Maciejewski, Anthony & Leung, Nelson. (1992). The Nihongo Tutorial System An Intelligent Tutoring System fo r Technical Japanese Language Instruction. Calico Journal. Morales, Rafael., Pain, Helen., & Conlon, Tom. (2001). Effects o f Inspecting Learner Models on Learners Abilities. Artificial Intelligence in Education. lOS Press, pp 434-445. Morihiro, Koichiro et al. (1992). Toward a Model o f Tutor’s Decision Making. A I-T R -9 2 -1 3 . Morihiro, Koichiro et al. (1992). Towards A Model o f Tutor’s Decision Making. Artificial Intelligence Research Group. Nagata, Noriko. (1995). An Effective Application o f Natural Language Processing in Second Language Instruction. CALICO. V13(I). Naoyuki Tokuda., & Liang Chen. (2001). An Online Tutoring System fo r Language Translation. IEEE July-Septem ber 2001. pp 46-55. Norman, David & Spohrer, James. (1996). Lerner-C entered Education. Payman. Tuesday, April 16. Pavia, A et al. (1995) Externalizing Learner Models. Artificial Intelligence in Education. Schrampfer, Betty. (1999). Understanding and Using English Grammar 3rd Ed. Prentice Hall Regents. Upper Saddle River, New Jersey. Schwind, Camilla. (1990). An Intelligent Language Tutoring System. International Journal on M an-M achine Studies, V33, 557-579. Schwind, Camilla. (1995). Error Analysis and Explanation in Knowledge Based Language Tutoring. Computer Assisted Language Learning, V8, No. 4, pp 295-324. Sentence, Sue & Pain, Helen. (1995). A generative learner model in the domain o f second language learning. Artificial Intelligence in Education. Wang, Yang & Garigliano, Roberto. (1993). Negative Transfer and Intelligent Tutoring. Proceedings of the Seventh International PEG conference. W iemer-Hastings, Peter & W iemer-Hastings, Katja. (1999). Improving an intelligent tutor’scomprehension o f student with Latent Semantic Analysis. IQS Press, pp 535-542. 71 Appendix I Tutoring Strategies Tutoring Strategies • Presentation of deep knowledge • Explanation of a correct answer • Presentation of incorrectness of answer • Presentation of verification process • Suggestion of trace of solution process • Presentation of portion where bug exists • Presentation of an example conflicting with the student’s • Presentation of some examples of common factors of interest • Presentation of some examples of different factors of interest • Presentation of a similar process • Explanation at deep level • Suggestion of verification operation • Presentation of correct answer • Explanation of comparison results of incorrect answer • Suggestion of bug existence • Explanation of incorrectness of knowledge • Presentation of attributes of examples • Presentation of intermediate solution • Suggestion of intermediate goals (Morihiro 1992) MetaCognitive Strategies • organizational planning of strategies, self monitoring, self evaluation Cognitive Strategies • resourcing, note taking, grouping, summarization, deduction/induction, substitution, translation, transfer, inferencing Social Strategies • cooperation, question for clarification (BuE,1994) 72 Appendix II This is a example of the language coverage in the literature. This is not a complete list of what had been covered in the field, but gives an idea of the kinds of topics that are covered Japanese Japanese interpersonal expressions (Kai & Nakamuri, 1995) Japanese passive sentences (Nagata, 1995) Japanese technical writing (Maciejewski & Leung, 1992) Chinese basic Chinese (100 Chinese grammar rules) (Wang & Garigliano, 1993) French noun phrase verb phrase verb endings (basic) person, number of verb doesn’t match subject "e" missing between "g" and "o" "g" is followed by "eons" "ne" is missing "pas" is missing "ne" is followed by a vowel "n” ' is not followed by a vowel (Labrie & Singh, 1991) conjunctions reflexive binding displaced, missing and superfluous constituents (Hagen, 1994) 25000 pre stored errors centered on negative transfer (Brehony & Ryan, 1994) English subject-verb discrepancies a-an articles with plural nouns exchange between the articles a and an use of articles with positive adjectives, or of the adjectives m uch-m any with singular or plural substantives use of indefinite pronouns in affirmative, negative, and interrogative sentences defective verbs + infinitive use of verbs want, wish, use, and like with to + infinitive verb (Aiello, Sanctis & Micarelli, 1993) English restricted into the areas of Work Employment and Unemployment (Issac & Fouquere, 1995?) articles in English (Sentence & Pain, 1995) Catalan basic Catalan in restricted domains (Civil et al, 1992) German 73 syntactic errors when user answers questions about spatial location (Holland, 1993) grammar using 25 syntactic and 60 semantic features (Schwind, 1995) European Portuguese pronoun placement (12 rules) 74 Appendix III Tutoring Strategies/Feedback These are the tutoring strategies that the program applies in the decision process on what question to present next, as well as what feedback to provide to the subject. Tutoring Strategies based on questions incorrect include: If one wrong — > Provide feedback based on detected mistake. If two wrong in a row — > Provide feedback based on detected mistake. If three wrong in a row — > Provide hints for the subject at the same time that the question is asked. If five wrong in a row > Ask the subject if he/she want to start the lesson over. Detected mistakes include: Incorrect parse — > Inform the user that the structure of his/her response may be incorrect. (may include the creation of m al-rules) Incorrect tense — > Inform the user that the tense of his/her response is incorrect. Missing word detected — > Tells the subject that he/she may have a missing word in their response. Incorrect answer — > Inform the user that he/she have not answered the question provided. (occurs when the user does not use the verb provided) If the answer is wrong and the subject made a similar mistake before — > Tell the subject that he/she has made a similar mistake before and show the previous mistake to them. Tutoring Strategies based on questions correct include: If one right — > Give positive feedback If three right in a row — > Move to next section. If the value of the responses in the lesson is six or above — > Move to the next session If the verb phrase is correct but the sentence is wrong — > Accept the response as correct but inform the user that only the verb phrase is correct. Other strategies include: If the response is to skip the question — > Skip the question. If the response requires "already" but does not contain it — > Tell the subject that "already" is needed in their response. The hints that the tutoring can provide include: The first time the hint button is pressed for a given question — > 75 Give the subject tense of the verb expected. The second time the hint button is pressed for a given question — > Show a similar example. 76 Appendix IV The next few pages are a summary of the structure of the tutoring system. E A SL Tutoring Program Structure Start Prcçram Question Selectian Profile Tutoring Strategies User Feedlsack User M odel ¥ -------- Update User M odel User Response 1 Parse U ser Response Compare Against Expected The Tutor Word Check 77 Profile Login user —Load prafile if ans exists fbi partienlar nssr If no profile pre sent —Get Profile data from user -ex. Name of user Question Selection Selecting next question baæd on suggestions by tutoring strategies. The questions will be setup based on the context selected (only 1 context available in this program) The questions will be broken down into sub categories to allow the tutor to focus on particular areas one at a time, -exançle: past, present, and future tense 78 User Model Profile data Mai Rules —the mal rules uæd by the user -the mal rules created from the uæx's re spouses Used Rules -the rules that the user has used Questhm Results (right vs wrung) (orderiug o f correct & incorrect re span ses) Incorrect ’’Murds -contains all the words tiiat the spell check flagged as not in the lexicon and that were not replaced with ones that are. The answer s provided by the user. Tutoring Strategies Learning Structure -b ased on the rules uæd, mal rules used, the profile, and the question results, a recommendatian for next question to ask is produced Motivation -worries about the mobvahon of the user -uses the results of which questions conechAncoirect to influence what questions should be asked next 79 Word Check Checks lie words in the users input by checking if each word is in the lexicon —if it is notin the lexicon preænt the user 'witii Ihe list of words starting with the same letter from the lexicon. —if correct word is still not found save word in urer model and count word as a missing w-ord —based on the context the user is working in and not all of the En^ish la n g u ^ (the lexicon contains only a few hundred words) Parser Lexicon ■a lexicon containing the words in h e current contex t . Parser -D C G parser (may change to chart parser if nece ssaty ) - Parse rules —Mai rules -there will be pre existing mal—rule s -some mal-rules mey need to be created for the individual user and saved in the user model 80 Update User Model Update lie user ncdeliu these area - what type of rules were used to parse fte uær's rehouse - whether the user got the question right, wrong, or dripped - add^reate mal rules for the incorrect user response - change context (subcantext) if requested by the tutoring strategies - add any unknown words that the user used - add hie user's re span ses to uæ for hints later Compare answer Against Expected Result Compare the user's response against expected results -can h ave fuU^artial matches Provides info to the tutoring strategies based on the matches Provides whether the user is concidered to have gotten the question correct,6ncorrect 81 User Feedback Provide the uær feedback oa his lespoase based on - if he/she got the question light/wiong. - modify lesponæ stobe ncne personable based on user profile If tutoring strategies requests —prvide feeback on correct;4ncanect structure used —provide a hint for the question and a request to try it %ain. —provide correct answer —ask if the uær wants to start the current lesson over User Response Get the user's réponse to the question. 82 Appendix V This is an example of a user model created by the tutoring program, (edited for size) John password <— subjects’ name <— subjects’ password 4 1 0 <— current question <— current lesson 0 1 2 1 3 1 4 <— question number <— question result 8 9 10 mal_vbarlO 1 mal_ 1(. (n_pro,.(verb,.(verb,[])))) 1 mal_2(.(n_pro,.(verb,.(adv,.(verb,[]))))) 1 <— m al-rule used <— number of times m al-rule used mal_3(.(n_pro,.(adv,.(verb,.(verb,[]))))) 1 m al_vbarl 1 1 Sat Jan 25 12:26:04 GMT-08:00 2003 Sat Jan 25 13:20:37 GMT-08:00 2003 <— start time of last lesson <— end time of last lesson something <— a word used not in the lexicon 83 1 1 m al_vbarll UNBC every Christmas. 2 1 correct_ruIe Lennise studies at UNBC. 3 1 correct_rule Lennise eats every lunch hour. <— question number <— lesson number <— m a-rules used <— subjects’ response to question 84 Appendix VI This is an example of some of the feedback that the tutoring program provides for the user. Computer: 8 W hat does Lennise do every morning? (drink) User: Lennise drinks. Computer: Correct tense. Computer: You are missing a word (or two) in your response. Computer: Incorrect answer, try again. User: Lennise drank a cup of coffee. Computer: Incorrect tense, try again. Computer: Giving hint. Computer: The tense of the current lesson i s -----Simple Present(I) Computer: Giving second hint. Computer: An example question. Computer: Question: W hat does it do in Alaska? (snow) Computer: Answer: It snows. Computer: Skipping question. Computer: A possible answer was — > Lennise drinks a cup of coffee. ...(skipping ahead for sake of demonstration) Computer: 6 what will Lennise do tomorrow? (drink) User: Lennise will drink cup of coffee. Computer: Correct tense. Computer: You are missing a word (or two) in your response. Computer: Incorrect answer, try again. Computer: Skipping question. Computer: A possible answer was — > Lennise will drink a cup of coffee. ...(skipping ahead for sake of demonstration) Computer: Starting new lesson. This is a lesson o n Present Progressive(4) It is now 11:00. Tom went to sleep at 10:00 tonight, and he is still asleep. His sleep began in the past, is in progress at the present time, and probably will continue. Computer: NEW QUESTION Computer: I Wat is Lennise doing right now? (study) Computer: lesson 4 -6 Computer: Form: be + ing (present participle) 85 Computer: Meaning: the progressive tenses give the idea that an action is in progress Computer: during a particular time the tenses say that an action begins before, is in Computer: progress during, and continues after another time or action. 86 Appendix VII Questionnaire Strongly Agree Agree Neutral Disagree Strongly Disagree 1) I enjoyed using the program. 5 4 3 2 I 2) I found the program boring. 5 4 3 2 I 3)1 feel that using a program like this would be 5 more useful than a textbook. 4) I found the program easy to use. 5 5) I think the questions were too hard for me. 5 6) Using a textbook would help me learn English better than the program 7) I would use a program like this to help me learn English. 8) I felt the questions were too easy for me. 9) I felt frustrated by the questions in the program. 5 4 3 2 1 10) I feel that the program motivated me to learn. 5 4 3 2 1 Feel free to add any comments you have below: The following does not appear on the subjects’ copy Reversed questions are 2,5,6,8,9 . Meaning their Likert values are reversed when used to obtain the final motivation score. l-> 5 , 2-> 4, 3->3, 4-> 2, 5 -> l 87 Appendix VIII These are the bugs that were found in the tutoring program after subject testing had begun. • • • • • Question 2.8 was removed. Question 3.3 was removed. Question 9.2 was removed. An intransitive verb followed by a time sometimes caused an error in the feedback. If the tutoring strategies recommend to move to the next lesson on lesson 12 (the last lesson) the program does not end, instead it just fails to ask a new question • Picture 36’s label should read "the plant", not "plant". • The keyboard button "enter" could not be used by the subject after they entered a response to tell the computer to check their answer; the "enter" button on the program had to be used instead 88 Appendix IX (definitions & abbreviations) CALL = Computer Assisted Language Learning CNC = College of New Calidonia (located in Prince Geroge, British Columbia) DCG = definite clause grammar ESL = English as a Second Language ICALL = Intelligent Computer Assisted Language Learning ITS = Intelligent Tutoring System NP = noun phrase TAG = tree adjoining grammar UNBC = University of Northern British Columbia (located in Prince Geroge, British Columbia) Proficiency post-test; This is a short test used for determining the approximate English skill level when dealing with verb ending. This test is taken after using the tutoring program. Proficiency pre-test: This is a short test used for determining the approximate English skill level when dealing with verb ending. This test is taken before using the tutoring program. Bug (program bug): An error in a computer program that affects the running of the program. Usually results from an unpredicted error in the program. Intelligent tutoring system: An intelligent tutoring system is a program that tutors based partly upon an analysis of the data known about the user, as well as using tutoring strategies to determine how the user should progress. M al-rule: An incorrect grammar rule that the user uses (such as using a noun phrase without a verb phrase e.g. "the dog" ) Natural Language Processing: Computer programs that model the human ability to analyze a sentence and determine grammaticality on the basis of linguistic rules (Farghaly, 1989). Negative Transfer (First Language Interference): Negative transfer is when knowledge about a subjects’ first language negatively influence his/her answers when dealing with a second language. Parser: A tool whose purposes include: testing the adequacy of grammars, translating source text in a machine translation system, and analyzing input strings (Farghaly, 1989).