r)ii\rEüLx:MPA/DE]\rr ()Fv\ (:HE(:K:LiSTr FOR EVALUATING COHESION IN WRITING By Lynda Struthers B Sc. Speech-Language Pathology and Audiology, University of Alberta, 1989 TTHEKSIS; SIJISA/CTTTED IN PvUR/TIvM. ]FIJT ]T[LJLJVCE%Sri'<:WF T:EIEi]%]i(]%JI]lIiAd]31NnrSI:()Il TTHE DI3(jIlEE(]MF A%L4JSTrEüR()i;i3I)IJ(:VlTriC)hI in CURRICULUM AND INSTRUCTION ©Lynda Struthers, 2001 University of Northern British Columbia June, 2001 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without permission of the author. 1^1 National Library of Canada Bibliothèque nationale du Canada Acquisitions and Bibliographic Services Acquisitions et services bibliographiques 395 Wellington Street Ottawa ON K1A0N4 Canack 395. rue Wellington Ottawa ON K1A0N4 Canada Your tile Voire OurSie Nooe réléfence The author has granted a non­ exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats. L’auteur a accordé une licence non exclusive permettant à la Bibhothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique. The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author’s permission. L’auteur conserve la propriété du droit d’auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation. 0- 612- 80708-8 CanadS APPROVAL Name: Lynda Struthers Degree: Master of Education Thesis Title: DEVELOPMENT OF A CHECKLIST FOR EVALUATING COHESION IN WRITING Examining Committee: Chair: Dr. Gordon Martel Assistant (Graduate Studies) to the Vice-President (Academic) Professor and Chak^History Program UNBC Iper’^sor^^^Jr^Kidith Lapadat tssociatp/Rofessor, Education Program Committee Member: Dr. Peter MacMillan Assistant Professor, Education Program UNBC C o m m it^ Member: Dr. Jim Bell Learning Skills Centre_Co-ordinator UNBC ^ External Examiner: Wendy Duke, MSc Director, Columbfaapeech & Language Services, Vancouver Assistant Professor, School of Audiology and Speech Sciences University of British Columbia Date Approved: ABSTRACT This study describes the development and evaluation of a checklist intended for use in the assessment of cohesion in the writing of elementary school children. Assessment of this skill is important as cohesion impacts the readability and quality of written work. Currently available writing tests do not address this area or do so only in a limited fashion. The procedures that I used in evaluating the checklist included classical item analyses, as well as validity and reliability checks. Validity checks provided evidence for construct and discriminant validity. As well, the checklist was able to predict grade membership. Although internal consistency values were low, the level of interrater agreement was satisfactory. Discussion of the findings includes the limitations of this study, suggestions for modifications to the checklist, and future research recommendations. m TABLE OF CONTENTS Abstract u List of Tables vu Acknowledgments vm CHAPTER ONE INTRODUCTION Problem Assessment Practices Why Assess Cohesion? 2 2 4 Summary 5 CHAPTER TWO: LITERATURE REVIEW 7 Cohesion The Concept of Cohesion Cohesion and writing ability Cohesive Devices Used by Children in Writing Methods of Evaluating Cohesion in Research Summary 7 7 9 11 13 15 Writing Assessment Historical Perspective Commercially Available Tests Methods of Assessment Using Curriculum-Generated Writing Curriculum-B ased Assessment Measuring writing fluency Measuring syntactic complexity Summary Categorical Assessment Tools Rating systems Checklists Considerations for Checklist Development Reliability of Checklists Establishing Validity of a Checklist 15 16 17 20 25 26 26 27 28 29 30 30 Research Purpose Scope of the Proposed Research Contributions of this Research 32 32 32 21 22 IV CHAPTER THREE: METHOD 34 Research Design 34 Data Source Ethical Considerations 34 36 Procedures 37 Preliminary Development of the Instrument Initial Compilation Panel Reviews Preliminary Item Analysis Pilot Study o f Interrater Agreement Training session Scoring session Revisions to Checklist Content and Process Modification of items Procedural modifications 37 37 37 38 38 39 39 40 40 41 Evaluation Using a Large Scale Sample Item Analysis Interrater Study Validity Measures Writing fluency measures Syntactic complexity measures 41 42 43 44 45 45 Summary CHAPTER FOUR: RESULTS 46 47 Preliminary Development of the Instrument Initial Compilation of Items Panel Reviews Preliminary Item Analysis Pilot Study for Interrater Agreement Revisions to Checklist Content and Process 47 47 47 48 51 52 Evaluation Using a Large Scale Sample Item Analysis Checklist Reliability Interrater study Validity Measures Factor analysis Discriminant analysis 53 53 57 57 59 59 61 Concurrent criterion-related validity Summary CHAPTER FIVE: DISCUSSION 63 65 67 Summary of the Checklist Item Development 67 Findings for Reliability and Validity Reliability Validity Concurrent criterion-related validity Discriminant validity Discrimination ability of checklist scores Sub scores versus total scores 71 71 73 74 75 76 78 Contributions o f this Research 81 Limitations 82 Implications for Future Research Proposed Changes to Checklist Content and Format Content Format Proposed Procedures for Checklist Evaluation Reliability Validity 83 83 83 84 85 86 86 Implications for Practice 87 REFERENCES 89 APPENDICES Appendix A: Letter of Consent 98 Appendix B: Procedures for Administering CBM Writing Probes 101 Appendix C: Examples of Writing Samples 104 Appendix D: Checklist 1.0 108 Appendix E: Checklist 1.1 110 Appendix F: Checklist 1.2 112 VI Appendix G: Checklist 2.0 With Instruction Manual 114 Appendix H: Checklist 2.1 With Modified Scoring Criteria 136 vu LIST OF TABLES Table 1: Examples of Cohesive Markers 8 Table 2: Item Statistics from the Preliminary Item Analysis 49 Table 3: Mean Percentage Agreement Among Raters Across Items 52 Table 4: Item Statistics from the Analysis of Checklist 2.0 54 Table 5 : Item Statistics from the Analysis of Checklist 2.1 55 Table 6: Item Statistics from the Item Analysis Run Using Subtests 56 Table 7: Correlations Between Rater Pairs Across Items 58 Table 8; Analysis of Variance for Between Rater Differences 58 Table 9: Proportion of Agreement Across Writing Samples for Each Item 59 Table 10: Variable Loadings for a Three Component Model 60 Table 11 : Matrix Displaying Item-by-Item Correlations 61 Table 12: Statistics Describing Checklist Total Scores by Grade 62 Table 13: Statistics Describing Checklist Subscores by Grade 62 Table 14: Test of Equality o f Group Means for Checklist Scores 63 Table 15: Means and Standard Deviations of TWW, WSC, MLTU, and SI by Grade 64 Table 16: Test of Equality o f Group Means for Writing Measures and Grade 64 Table 17: Correlations Between Checklist Scores and Concurrent Measures 65 V lll ACKNOWLEDGMENTS I wish to thank Judith and Peter who advised me and encouraged me in completing this research. I also wish to thank John, Saima and Willy, my unofficial team of advisors, whose support and knowledge was invaluable. I would also like to acknowledge the speechlanguage pathologists in School District 57, who participated in many aspects of this study. Finally, but most importantly. I’d like to dedicate this work to my son, Nolan, who shared the first five years of his life with my Masters Degree. CHAPTER ONE INTRODUCTION Assessment may be used for a variety of educational purposes. When preparing for assessment of students, one needs to consider guidelines for best practices. Guidelines outlined in a paper entitled Principles for Fair Student Assessment Practices for Education in Canada (1993) indicate the necessity for test users to select approaches and instruments that are suitable to the purposes of assessment and the students being assessed. Black and Wiliam (1998) echo this sentiment in their discussion of assessment practices in education. Assessment may be used to support a number of educational decisions including selection, placement, and classification of students for acceptance and placement into programs which best suit a student’s needs; diagnosis and remediation of particular areas of difficulty experienced by a student; feedback to students; motivation and guidance for learning; and program improvement (Sax, 1997). As a speech-language pathologist working in an educational setting, I am particularly interested in using assessment for diagnostic purposes. The focus of this type of assessment is on the planning and monitoring of an intervention or instructional program suitable for skill development or remediation. In my professional role I am often concerned with evaluating the writing skills of children with language learning disabilities and difficulties. Children with these types of difficulties often demonstrate problems in communicating their ideas effectively in both spoken and written modes (Wiig & Semel, 1984; Singer, 1995). In reviewing writing samples of children with language learning problems, it is apparent that many of these children struggle with aspects of writing beyond spelling and grammar. As compared to normally developing peers, these difficulties include; a lower amount of written text produced (Graham, Harris, MacArthur & Schwartz, 1998; Silliman, Emerson & Wilkinson, 2000), limited diversity in vocabulary, decreased syntactic complexity. 2 less coherence and semantic cohesiveness (Silliman et al., 2000), and poor planning strategies (Graham et al.; Silliman, et al, 2000). I too have noted difficulties with expressing ideas in complex sentence forms and with cohesion in the writing of these children. Halliday and Hasan (1976) define cohesion as occurring “where the interpretation of some element in the discourse is dependent on that of another” (p. 4). They state that cohesion is a semantic concept, referring to the relationships in meaning that exist within the text. Cohesion is what defines a piece of writing as a unified text and not just a string o f words or sentences. The focus o f my research is the evaluation of cohesion in writing for diagnostic purposes. Problem Assessment Practices Dagenais and Beadle (1984) reviewed several instruments that assessed writing. Their study included the examination of six achievement tests. These were the Comprehensive Test of Basic Skills, the California Achievement Tests, the Stanford Achievement Tests, the Iowa Test of Basic Skills. The SRA Achievement Series, and the Metropolitan Achievement Tests. These achievement tests focused on the evaluation of word usage, grammar and mechanical aspects of writing. Dagenais and Beadle also examined seven other tests of written language. These included the Test Of Written Language, the Test Of Adolescent Language. Sequential Test of Educational Progress. A Diagnostic System for Teaching Composition for Grades 1014 (DI-COMPl. the Diagnostic Evaluation of Writing Skills, the Test ofEvervdav Writing Skills, and the Woodcock-Johnson Psvchoeducational Test. Part 2 - Achievement for the Written Language Cluster. They found that many of these tests used multiple-choice formats or involved tasks that tested reading more than writing. Dagenais and Beadle indicated that all 3 of these measures involved contrived writing situations rather than naturalistic, authentic writing samples. A current search of writing assessment tools that I conducted revealed that many tests continue to focus on spelling, punctuation, capitalization, and grammar. Some of these instruments utilized compositional writing samples but still concentrated on scoring mechanical and grammatical aspects of writing. I was interested in finding tests that assessed discourse level structures like cohesion. Discourse level structures are those that reflect the structure and meaning of a text beyond the level of the sentence (Schiflfrin, 1994). Those tests that did examine discourse level structures did so in a limited way. These tests provided only one or two ratings for aspects of writing such as organization, sequence, or coherence. A more detailed summary of this search is provided in the next chapter. Dagenais and Beadle (1984) suggested that due to the limited scope of writing skills addressed in commercially available tests, some poor writers may perform adequately on these instruments without actually being able to write more than simple sentence structures or to effectively communicate their ideas in writing. Commercially available, standardized, norm-referenced achievement tests have their greatest utility in making comparisons between individuals for the purposes of determining ehgibility for programs and for predicting success (Sax, 1997). Silliman, Wilkinson, and Hoffman (1993) indicated that traditional approaches to assessment “failed to relate assessment procedures with instructional goals and procedures” (p. 59). They also indicated that these approaches were time consuming yet yielded limited amounts of information for programming purposes. Dagenais and Beadle (1984) indicated that achievement tests are useful in determining who is an “acceptable” writer and who is not. They indicated that none of the tools they examined were intended for in-depth diagnostic work. 4 In my review of commercially available tests, I noted that many state their purpose as identifying strengths and weaknesses in a student’s writing. However, Dagenais and Beadle (1984) state that the practice of using tests for the purpose of simply identifying deficits is both inefficient and unnecessary. Instead, they feel that testing should focus on identifying areas from which to develop teaching programs and compensatory strategies. King-Sears (1994) also criticizes traditional testing indicating that it does not provide information for instructional programming. She states that “norm-referenced, standardized tests provide a snapshot of a student’s performance within broad curricular areas, but are not sufficient for developing specific instructional plans when educators must write lEPs [Individual Educational Plans]” (p. 3). Black and Wiliam (1998) also advocate for assessments that provide information for differential treatment of difficulties. King-Sears (1994) calls for use of assessment materials that analyze errors and provide specific information about where and how to proceed with instruction. Similarly, Rousseau (1990) advocates for the use o f error analysis in diagnostic assessments. As children with learning disabilities demonstrate difficulty connecting their ideas in writing (Singer, 1995), that is with cohesion, this may be a useful area in which to focus an in-depth error analysis and diagnostic assessment. Whv Assess Cohesion? There is a need to evaluate cohesion in writing, as, according to Hedberg and Fink (1996), “errors in cohesion interfere with the reader’s efforts to understand the intent of the author” (p. 75). Several researchers have reported that cohesion is related to the overall quality or readability of written work (Lindeberg, 1984; Zamowski, 1981). Others have indicated a link between writing proficiency and the use of cohesive ties (Englert & Raphael, 5 1988; Greenberg, 1987; Hedberg & Fink, 1996; Singer; 1995), suggesting that poor writers or those with learning disabilities have difficulty using cohesive ties. The English Language Arts Integrated Resource Package (IRP) developed by the British Columbia Ministry of Education (1996a) indicates in its prescribed learning outcomes areas that are directly related to cohesion in writing. For instance, the IRPs state that by Grade 4, children will use consistent verb tenses and correct pronoun references in writing and will organize their ideas into logical sequences. These aspects of writing create unity and therefore relate to cohesion. According to the IRP document, by Grade 5 students are expected to be able to revise and edit their own work for clarity. Again, clarity in writing is related to how well ideas, sentences and words are connected for the reader of a written piece and therefore to cohesion. Nelson (1994) suggests that educators analyze the expectations of the curriculum and abilities of students to develop interventions that narrow the gap between the two. If the goal o f writing instruction is to develop writers who can effectively communicate to their readers, cohesion is an important skill and therefore worthy of evaluation. Hedberg and Fink (1996) state that before intervention programs instructing the appropriate use of cohesive devices can be designed, information is required that describes the development of cohesion. Assessment of cohesion in children’s writing therefore would not only serve to tell about how a student writes, but also about how cohesion develops. Summary For the purposes of my discussion here, current assessment practices are seen to be limited in two distinct ways. First, commercially available standardized assessment measures do not provide a good basis for the development and monitoring of intervention programs. Second, most writing assessments do little to examine discourse level structures that 6 contribute to organization and unity in a written piece. In fact none could be found that addressed specific aspects of cohesion at all. Aspects of cohesion, on the other hand, are implicated in the curriculum as a learning outcome. It has also been argued that cohesion affects the readability of a written piece. Also, it has been argued that children with language learning problems demonstrate difficulty with cohesion in their writing. Given these limitations of currently used methods of assessment, and the relationship of cohesion to writing quality, I see a need for an assessment tool that can be used to evaluate cohesion in writing. This instrument should be useful for the development and monitoring of intervention plans and it should be usable with actual writing samples generated by children in the classroom. Development of such a tool would help professionals detect and describe difficulties “problem writers” have in structuring written text (Lindeberg, 1984). The remainder of this document is devoted to describing the first stages of the development of such an instrument. This chapter provided a brief introduction of the problem to be explored. This problem is elaborated in the next chapter. Chapter Two also provides a literature review into studies of cohesion, assessment of writing, and considerations resulting in the choice of a checklist format for a tool to evaluate cohesion in writing. Chapter Three describes the steps used in developing and evaluating the checklist. The results of these development and evaluation procedures are reported in Chapter Four. This chapter also includes reports of the instrument’s reliability and validity. Finally, the interpretations of these outcomes are discussed in Chapter Five. This discussion includes future research directions to continue the development of this cohesion checklist. CHAPTER TWO: LITERATURE REVIEW Three key areas are addressed in this review of the literature. The first consists of review of the elements of cohesion and studies of how cohesive devices are used, including in writing done by children. Another area of review focuses on writing evaluation in general. This portion of the review describes how cohesion and writing in general is typically assessed, taking into account both historical and current perspectives. The last area of review focuses on considerations for using a checklist in educational evaluation. Issues of reliability and validity o f assessment tools are also examined. Cohesion The Concept of Cohesion The explanation of cohesion provided by Halliday and Hasan (1976) is the most frequently cited in research studies examining the markers of cohesion in children’s written and spoken discourse (Crowhurst, 1981, 1987; Pellegrini, Galda & Rubin, 1984; Liles, 1985; Rutter & Raban, 1982; Smith, 1999; Zamowski, 1981). Halliday and Hasan describe five devices that are used to accomplish cohesion. Examples of each device are presented in Table 1. One device is called reference. This includes the use of pronouns, articles and demonstratives to refer to information within the text (anaphora). Substitution, another device, involves the utilization of a generic term in place of a redundant element. Another tool, ellipsis, involves the elimination of redundant information. A fourth tool, conjunction, is used to connect clauses and sentences and to organize text. Conjunctions may be additive, temporal, causal, adversative, or continuative. A final tool, called lexical cohesion includes lexical reiteration and lexical collocation. Reiteration of a term may be accomplished by using the same word, a superordinate, a synonym or near-synonym. Collocation involves use of words that commonly occur together such as antonyms, complementary terms and converses. Table I Examples of Cohesive Markers Type of Cohesion Example Reference -pronouns The bov was cold. He was tired. -articles I saw a doe. The doe started to chase me. -demonstratives A lion stood still. That beast was wild. Substitution He always wanted a red bike. Finally he got one. Ellipsis I was going to go but (I) didn’t (go). Conjunction -additive and, also, in addition, or -temporal then, when, first, next, finally -causal because, therefore, consequently, so -adversative but, although -continuative now Lexical Reiteration -superordinate dog - animal -synonym dog - canine -near-synonym dog - beast Lexical Collocation -antonyms up - down -complementaries beach - sand -converses ask - answer 9 Complementary terms are words that commonly occur together. Converses are words that suggest a response of one to the other. The degree of cohesion accomplished through lexical reiteration and collocation is a reflection of the semantic and physical proximity of the terms used in the text. The degree o f cohesion is stronger where the distance is less. The term coherence has also been used by some researchers to discuss aspects of writing related to cohesion. For our purposes here, coherence will refer to the overall semantic unity achieved in a piece of writing whereas cohesion will refer to the linguistic devices used to obtain that unity. For instance, McCutchen and Perfetti (1982) discuss both topic coherence and local connectedness in their discussions of cohesion. According to these authors, topic coherence reflects the overall semantic unity and integrity of a piece and local connectedness refers to the implicit and explicit connections between adjacent sentences. As these authors explain, topic coherence is necessary but does not in and of itself create coherence in writing. They state that it is difficult to describe overall global coherence without describing the devices used to establish connections between sentences. These local connections between sentences reflect the same cohesive devices described by Halliday and Hasan (1976). Cohesion and writing ability. Cohesion has been shown to be related to the overall readability and quality of written language. Zamowski (1981) cited the relationship between inter-sentence cohesion and the readability of a written text in her argument about the importance of analyzing cohesion in children’s writing. Rutter and Raban (1982) stated, “Failure to realize the implication of a cohesive tie, to recover its referent, implies loss of meaning and a break down in coherence for the recipient of the communication” (p. 65). This relationship between readability and cohesion is further supported by an exploratory study conducted by Lindeberg (1984) where graded college level expository essays were analyzed 10 for the use o f cohesive ties. She found that the proportion of cohesive ties was greater in essays graded 8 or more out of 10 than in those that received grades lower than 6. She concluded that her findings supported the hypothesis that cohesive tightness could be considered a sign of quality in writing. Not only is cohesion related to the overall readability of writing but it has also shown a link to writing proficiency. Singer (1995) reported that children in Grades 3, 5 and 7, without a history of language-learning diflficulties, were “remarkably adept” at writing cohesively. Englert and Raphael (1988) indicated that children with language-learning impairments have difficulty detecting inconsistencies in their writing and recognizing how these inconsistencies confuse the reader. They concluded that such difficulties would be expressed as problems in coherent organization of ideas in written prose. In an investigation of cohesion in the writing conducted by Hedberg and Fink (1996), normally developing children were compared with children with language-learning disabilities. Children with language-leaming disabilities scored significantly lower than their normal peers on many of the variables examined in their writing. Included in these lower scores were demonstrations of less cohesive harmony and density than their peers. In a study of narrative and expository writing samples from children in Grades 2, 4, 6, and 8, McCutchen and Perfetti (1982) found increasingly higher percentages of inter­ sentence cohesive ties used with increased grade. The relationship between the readability of a written composition and cohesion that is cited in the literature highlights the importance of understanding and evaluating cohesion in writing. This argument is furthered by evidence that writing proficiency is related to the ability to write cohesively. 11 Cohesive Devices Used by Children in Writing Although the research literature is not extensive in this area, some researchers have undertaken investigations of the types of cohesive devices used in the writing of children (Crowhurst, 1981, 1987; Hidi & Hildyard, 1983; McCutchen & Perfetti, 1982; Pellegrini et al, 1984; Rutter & Raban, 1982; Smith, 1999). These studies indicate that certain cohesive devices appear more frequently in the writing of children, whereas others appear infrequently or not at all. For instance, Crowhurst, Liles (1985) and Smith found that substitution and ellipsis were rare or absent in the writing of the elementary school-aged children studied. Crowhurst and Smith rarely encountered continuative conjunctions and Liles noted a lack of comparative reference. In a small study conducted previously, I found that pronoun referencing; lexical cohesion; and causal, temporal and additive conjunctions were the most frequently used cohesive devices in the writing of children in Grades 3, 5 and 7. Crowhurst’s studies of writing by students in Grades 6, 10 and 12 showed that the most common cohesive ties used were lexical cohesion, pronoun referencing, demonstratives, and use of the definite article. Liles investigated spoken rather than written narratives of children aged 7 years 6 months to 10 years 6 months and found a greater percentage of reference and conjunction than other cohesive devices. Some interesting findings related to the uses of reference and lexical cohesion. For instance, while pronoun referencing appeared a predominant cohesive device used by children in several studies, in my study I noted errors in referencing pronouns in the writing samples of elementary school-aged children (Smith, 1999). I also noted that pronoun referencing and lexical cohesion were among the most sensitive to developmental variation. That is, these devices were used differently at different grade levels. Rutter and Raban (1982) found differences in the way children of different ages used demonstratives. In their study of 12 narrative writing, 10 year olds used a higher proportion and greater variety of demonstratives than did 6 year olds. They also found that the lexical device of collocation was used more frequently than superordinates for both age groups. Studies of cohesion have similar findings with respect to the conjunctions that are commonly seen in the writing of children. In a study by Crowhurst (1981) the additive conjunctions and, also, and the adversative conjunction but were the most commonly used. So was the most commonly used causal conjunction (Crowhurst, 1981, 1987). Scott (1991a) reported that inter-sentence connections in the narratives of students are usually accomplished with so, and then. Crowhurst (1987) also noted use of temporal markers such as then, soon, later and next day. Similarly, I found the additive conjunction and to be used in the writing samples of all the children in my study, whereas so was the most common causal conjunction. I also found frequent uses of the additive conjunction also, the causal conjunctions if and because, and temporal connectives then, when and before/after in descending order of frequency of appearance. Overall, causal, temporal and additive conjunctions were the most commonly used (Smith, 1999). I also noted that the kinds of additive, causal and temporal conjunctions used change across grades, with older children using a greater variety. Aspects of cohesion which provide unity across the text have not been as well studied. Perrara (1984) indicated that use of consistent tense across the text provides important discourse connections. She indicated that younger writers quite often have difficulty with this aspect of writing. I found that topic coherence was primarily realized through lexical devices used across the text, and through the organization of the text which was achieved through paragraph structure and sequencing of information (Smith, 1999). The predominance of certain cohesive devices used in the writing of children suggests that these may be important areas to focus on in an analysis of cohesion. These include 13 pronoun referencing, demonstratives, and use of the definite article the. Conjunctions in the additive, temporal, causal, and adversative categories also may be important to examine. Another area worthy of examination includes aspects of lexical cohesion with a focus on the use o f collocation and superordinates. A final area to consider would be aspects o f global cohesion such as paragraph structures and other organizational features. Methods of Evaluating Cohesion in Used Research Liles (1985) criticizes the simple classification of cohesive devices as done in the research cited thus far as insufficient for analyzing cohesion. She suggests that studies of cohesion should also address the issue of cohesive adequacy. In her study of cohesion in the oral narratives of primary school children, she utilized a process in which cohesive ties were identified, categorized and then judged as complete, incomplete, or erroneous. The raters in her study identified cohesive markers by reading each sentence of the transcript in isolation. An element was considered cohesive if the reader had to search outside the sentence for its interpretation. The classification of ties was then made according to the definitions of Halliday and Hasan (1976). A tie was considered complete if the referred information could be determined unambiguously. An incomplete tie involved interpretation that appeared to be based on information that was not provided in the text. An erroneous tie resulted when the listener or reader was guided to ambiguous or erroneous information. In the evaluation of conjunction use, conjunctions were judged to be either complete or erroneous as it was considered too difficult to judge the completeness of a conjunction. This procedure for identifying cohesive markers creates difficulty in measuring lexical cohesion. This type of procedure would not be sensitive to the use of complementary terms and converses. These types of markers contribute to cohesion by creating semantic relationships across a piece of writing. However, the interpretation of each element is not 14 dependent on the other. This method would only capture examples of lexical reiteration. It also fails to examine global aspects of cohesion as it does not examine elements like consistent use of tenses or overall organization of ideas. Another method of measuring cohesive adequacy was employed by Liles, Duffy, Merrit, and Purcell (1995). These researchers measured adequacy by dividing the number of complete ties by the number of ties in each linguistic category. Identification, categorization, and judgment of tie adequacy followed the procedure laid out by Liles (1985). Again this method omits analysis of aspects of lexical and global cohesion. In a study by Klecan-Aker and Lopez (1985), only reference and conjunction were measured. The authors failed to indicate why other areas of cohesion were not examined in their study. In their analysis, they described reference as either appropriate or inappropriate. Appropriate ties were those that were unambiguous. These authors indicated that one factor which reduces ambiguity is the close proximity of a reference to its referent. Conjunctions were measured by first being categorized as coordinating or subordinating. The number of each type was then counted for each writing sample. In my view, this approach does not seem to be measuring conjunctive cohesion as much as syntax. The classification of conjunctions as coordinating or subordinating reflects the syntactic complexity of the written piece but does not provide information as to the kinds of relationships between ideas, that is whether they are causal or temporal, for example. A method for analyzing cohesion in writing based on these and other pieces of research was described by Hughes, McGillivray, and Schmidek (1997). As in the methods described above, cohesive ties are identified when they refer to information somewhere else in the text. The ties are then judged and counted as suggested by Liles (1985) and Liles et al. (1995). Hughes et al.’s adaptation to these procedures lies in the method of preparing a 15 sample of writing for analysis. Their procedure involves dividing the writing sample into main clauses with subordinating clauses attached (T-units) rather than dividing it by sentence boundaries. As the remainder of the method does not vary significantly from those already described, the same criticisms as for the methods used by Liles and Liles et al. apply here. The methods o f analyzing cohesion described here were developed primarily for research purposes. Three considerations for evaluating cohesion have been highlighted in these methods. First, in evaluating cohesion, it may be important to consider whether ties are ambiguous or clear. Second, the proximity of ties should be considered in their evaluation. Third, establishing T-unit rather than sentence boundaries may be helpful in evaluating inter­ sentence cohesion. Summary This review of studies of cohesion highlights the need for a clinical instrument that could be used to evaluate cohesion in writing. Furthermore, these studies suggest content areas that might be included in such an instrument. Studies relating cohesion to the readability and quality of writing highlight the importance of cohesion as a subject for assessment. Research depicting how cohesion is used in the writing of children provides information about what types of devices to assess and provides considerations of methods that may be used in developing an assessment tool for cohesion. Writing Assessment Writing assessment procedures vary across instruments and across time. Evaluation of writing has followed trends in assessment practices that reflect changes in the social climate, trends in research, and shifts in educational practices. A few of these influences on writing assessment are discussed here, along with an exploration of current methods of writing assessment. 16 Historical Perspective Earlier in the twentieth century, assessment practices centered on standardized objective measures of learning. Several factors contributed to this approach. One was the development of the multiple-choice technology during the World Wars, to allow for inexpensive and efficient selection of soldiers (Calfee & Freedman, 1996). This coincided with an emphasis on accountability and behavioral approaches. Standardized, multiple choice testing meshed well with this purpose (Calfee & Freedman, 1996). In the mid 1950s there was also a move to make assessment in education more objective through the use of indirect measures that emphasized right versus wrong (Isaacson, 1991). The focus of writing evaluation at this point related to objectives-based education with an emphasis on spelling and grammar (Calfee & Freedman, 1996). This trend was facilitated by the development of machine scoring (Isaacson, 1991). Evaluation of writing using this approach, however, proved to be problematic. First, writing was not easily assessed by these methods. In addition, reliability o f standardized writing measures was difficult to achieve (Calfee & Freedman, 1996). Eventually, perspectives about writing and writing evaluation began to change. In the early 1970s, two projects, namely the Bay Area Writing Project and the National Writing Project, placed emphasis on the concept of “writing as a process” (Calfee & Freedman, 1996). In addition to this new perspective, the difficulty in assessing compositional writing skills through indirect measures sparked an interest in holistic scoring (Isaacson, 1991). Thus writing came to be seen as a complex process that required direct, holistic examination. Despite these changes in the viewpoints around writing and writing assessment, research and practice in areas of language intervention still operated from a behavioral viewpoint throughout the 1970s (Warren & Yoder, 1994). This finally began to change in the 17 1980s for several reasons. One of these reasons was a movement from the stricter behavioral point of view towards naturalistic contexts for learning (Warren & Yoder, 1994). That is, researchers began to see that skills trained in a strictly behavioral fashion did not generalize well. Another reason for the shift in assessment and intervention practices related to the whole language movement in the 1980s. This movement emphasized a more authentic curriculum in both reading and writing (Calfee & Freedman, 1996). The emphasis in writing with this movement included aspects o f writing such as purpose, voice, audience, and coherence (Calfee & Freedman, 1996). These changes to writing instruction were connected to a shift in how writing was assessed. The whole language theory meant that labels were not as important as descriptions, and learning was related to context (Gillam & McFadden, 1994). With the focus on context and the view of writing as a process, authentic assessment, performancebased testing, and the use of portfolios for evaluation began to emerge (Calfee & Freedman, 1996). Coinciding with these changes was a movement to revise holistic scoring procedures to include countable features, thus balancing the need for objectivity with the need to evaluate authentic writing tasks (Isaacson, 1991). The goal to balance objective measures of writing with the need to observe writing directly for evaluation purposes continues to challenge test developers. This can be seen in examining current methods and tools for assessing writing. Commercially Available Tests A search of currently available assessment tools was undertaken to analyze which aspects of writing are presently addressed in tests of writing. As the availability of assessment tools for direct review was limited, test indices were used to provide a fairly exhaustive review o f tools that may be available for the assessment of writing skills in elementary aged students. Review of tests was conducted using four main indices. These were Tests in Print V (Murphy, 18 Impara, & Plake, 1999), Psychological Assessment in Schools (Impara & Murphy, 1994), The Thirteenth Mental Measurements Yearbook (Impara & Plake, 1998) and the ETS Test Collection Catalogue Volume I: Achievement Tests and Measurement Devices. 2nd Edition (Educational Testing Service, 1993). A web search was also conducted using a test locator through ERIC. Whenever possible, a review of the actual test was conducted. The focus o f this search was on finding tests that evaluate written language in actual narrative writing samples of elementary school children. These limitations were placed on the search as they reflected the parameters set out for this study. As per the findings ofDagenais and Beadle (1984), many of the tests found that addressed writing at all focused on indirect writing measures (e.g., sentence completion, word and sentence writing, cloze activities) or on mechanical aspects of writing such as punctuation, spelling, and capitalization. Tests that did examine direct writing samples usually utilized holistic or analytic scoring procedures. Holistic scoring involves rating a whole written text with a single score. Analytic scoring involves rating several aspects of a written piece individually. These terms are defined in more detail in the upcoming section on rating systems. Sax (1997) criticizes holistic measures as being too subjective. Murray-Ward (1998) indicated that holistic scoring is useful in a general writing assessment but does not allow for diagnostic information. In her opinion, analytic scoring was more useful in examining aspects o f writing in a more isolated fashion and could be useful to assist students in improving particular aspects of their writing. Even when analytic scoring was used, it was often used to examine mechanical aspects of writing or provided only one or two ratings for discourse level structures such as organization, sequence or coherence. For example, the CTB Writing Assessment System, as described by Engelhard (1998), uses analytic scoring for content, organization, sentence construction, vocabulary/grammar and spelling/capitalization. Other tests that use scoring in 19 this way include the Test of Written Expression (TOWE) and the Test of Written Language (TOWL-3). The TOWE analytically scores attributes such as organization and structure, detail, spelling, punctuation, capitalization, usage (Murray-Ward, 1998). The TOWL-3 analytically scores Contextual Conventions, Contextual Language, and Story Construction (Hansen, 1998). Hansen (1998) criticizes the TOWL-3 for using judgments that are too subjective like “poor”, “average” and “good” to rate writing samples. Only three instruments were found that assessed coherence. These were the Writing Process Test (WPT), the Wechsler Individual Achievement Test (WIAT), and the Oral and Written Language Scales (OWLS). As described by Kimmel (1998), the WPT utilizes a two-phase procedure. In the first phase, the writer is required to plan and draft a composition. The second phase involves editing and revising the first draft. The draft is then scored with an extensive five-point rating system on various features of writing competence. These include aspects of writing such as purpose, audience, vocabulary, style, and mechanical aspects of writing such as punctuation and spelling. A rating is also provided for Organization/ Coherence. With this rating, coherence and organization refer to how well the writer adhered “to a discernible plan throughout the composition”(p. 1160). This does not appear to be related to the definitions of coherence and cohesion examined earlier in this chapter, however. The WIAT (Psychological Corporation, 1992) is administered by having a student write about a topic for 15 minutes. The composition is then scored using both holistic and analytic rating scales. One of the analytic ratings scores Organization, Unity, and Coherence on a four-point scale. Criteria for a rating of 4 is described as “Completely organized, with smooth flow from one idea to the next through the use of transitions and sequencing. Unity is strongly evident with no wandering from the primary theme or plan” (Psychological 20 Corporation, 1992, p. 74). A rating of 1 is given to samples described as “Lack of plan. May be incoherent” (Psychological Corporation, 1992, p. 74). This rating serves as one score of an analytic scoring system that examines six elements of writing. The WIAT also uses a six point holistic measure on the same sample of writing with the top score including criteria for unity and organization of the piece. The OWLS (Carrow-Woolfolk, 1996) consists of a variety of writing tasks including writing sentences and paragraphs. This test also examines coherence in writing. However, the test contains only two items that test coherence for elementary-aged children. Each of these items receives a score of either 1 or 0 for the presence or absence of coherence in a short writing sample. This test gives credit for coherence when each sentence is tied to the previous one, tenses are consistent, transitions like then, next, and so forth are used, and sentences are not “choppy.” None of these instruments provided an in-depth analysis or definition of each of the linguistic tools used to achieve cohesion in writing. Each test analyzed the more global concept o f coherence. Furthermore, in each case, only one or two item scores reflected this aspect of writing competence. While analytic scoring has been viewed as diagnostic in nature, it is my contention that it would he difficult to develop interventions aimed at improving skills in cohesion or coherence hased on a single rating of the overall skill. Methods of Assessment Using Curriculum-Generated Writing Some methods of assessment involve using real writing samples generated through regular classroom assignments and applying a scoring system to them. When evaluating writing in this way a variety of measurement procedures are possible. Silliman and Wilkinson (1994) discuss several options for the evaluation of language skills by observing their use in regular classroom tasks. For our purposes, this would translate to examination of curriculum­ 21 generated writing. Several methods used for this kind of assessment will be examined here including curriculum-based assessment and categorical tools. Curriculum-B ased Assessment Many methods of evaluating samples of written language are referred to as forms of curriculum-based assessment (CBA) or measurement (CBM). Poteet (1992a) defines CBA as “the process of determining students’ instructional needs within a curriculum by directly assessing specific curriculum skills” (p. 11). CBM is a specific set of procedures for repeated measures of student progress on standardized tasks of writing expression (School District #57, 1996). The term CBA implies an overall approach to assessment, while CBM refers to a particular set of measures. According to Choate and Miller (1992) CBA determines the expectations of the curriculum, the match between students and those expectations, and how to plan to adjust the curriculum to meet the needs of students. Nelson (1994) also describes a similar procedure which she refers to as curriculum-based language assessment. She distinguishes this form of assessment from CBA in that CBA addresses whether or not the child has learned the curriculum whereas curriculum-based language assessment determines whether or not a child has the language skills and strategies necessary for processing the language of the curriculum. As the models of CBA are varied (Poteet, 1992a; King-Sears, 1994), so are the CBA methods of assessing writing. Despite these variances, some general guidelines are supported by several researchers. For instance. Nelson (1994) indicates the need to use the real context and content of the curriculum in assessment. King-Sears (1994) recommends that quantitative and qualitative measures used in CBA reflect the teaching objectives. Similarly Choate and Miller (1992) describe the process of CBA as beginning with an extensive examination of the curriculum in question. 22 Another common guideline relates to the selection of writing samples for evaluation. Silliman and Wilkinson (1994) highlight the importance of using representative samples of a student’s work. One recommendation is the use of portfolios which contain examples of a student’s best writing as the basis for CBA (King-Sears, 1994). Howell, Fox, and Morehead (1993) suggest the use o f already existing writing samples as the basis of CBA, as prior knowledge and interest in a writing topic are critical for good writing to occur. Several aspects of writing may be evaluated using CBA approaches. Isaacson (1991) describes procedures for measuring writing fluency, syntactic maturity, vocabulary, content, and writing conventions. Poteet (1992b) describes procedures for evaluating handwriting, spelling, mechanics, usage, and ideation. King-Sears (1994) also describes several CBA procedures for evaluating letter formation, spelling, and sentence and paragraph writing. One frequently cited measure used in CBM is writing fluency (Howell, et. al., 1993; Isaacson, 1991; King-Sears, 1994; Marston, 1989; School District #57, 1996). Measuring writing fluency. Currently, writing fluency is one of several forms of curriculum-based measurement (CBM) being used in School District #57 Prince George. Locally normed CBM is used to evaluate writing skills (School District #57, 1996) by having a student write for three minutes from a story starter. Scores derived from this instrument include the total words written (TWW) and the number of words spelled correctly (WSC). Caution should be taken when evaluating written language in this way. While Howell, et. al. (1993) suggest that story starters are useful for generating writing samples when classroom generated samples are unavailable, they caution that when the purpose of writing is not generated or perceived by the writer, the product may be compromised. The result may be a writing sample that is not representative of a writer’s usual work. 23 Another caution for the use of the CBM writing probes described here relates to how they are used. These types o f measures were not designed to be used as a substitution for other types of assessment (Canter & Marston, 1998; School District #57, 1996). It is stated by School District #57 that “Curriculum Based Measurement provides only one of several pieces of required information. By itself, it is insufficient information” (p. 2). This is particularly true as this method of writing evaluation focuses on speed and spelling while ignoring other areas of writing such as those suggested by Isaacson (1991), King-Sears (1994), and Poteet (1992b). Several studies report the reliability of these CBM writing measures (Marston & Deno, 1981; School District #57, 1996; Tindal, Marston & Deno, 1983). These studies found inter-scorer reliability ranging from .90 to .98. Marston and Deno found split-half reliability ranging from .96 to .99. In a measure of internal consistency comparing each minute o f the writing to the other minutes using Chronbach’s alpha, reliability ranged from .70 to .87. These were interpreted as satisfactory values for internal consistency. Measures of stability and equivalence conducted in the norming project by School District #57 at three month intervals revealed median coefficients of .62 and .67. Two studies of comparability of forms found reliability coefficients for of .73 and .95 for TWW (Marston & Deno, 1981; Tindal et al., 1983). In summary, Tindal et al. state that the findings of this research are that the procedures utilized in the CBM described here are generally reliable. CBM measures have also been shown to demonstrate criterion-related validity. In a compilation of research findings on the validity of CBM (Marston, 1989), correlations ranging from .45 to .92 were found when comparing TWW and WSC scores on CBM to scores obtained on other measures of writing including standardized testing. In a study by Deno, Marston, and Mirkin (1982), CBM measures of WSC and TWW were compared to 24 scores on the Test of Written Language (TOWL), the Word Usage subtest o f the Stanford Achievement Test: Intermediate I, and the Developmental Sentence Scoring (DSS) System. Correlation coefiBcients ranged from .67 to .76 for WSC and .62 to . 84 for TWW. The highest correlations were for DSS and the Written Language Quotient on the TOWL. Using a multitrait-multimethod analysis, Tindal and Nolet (1990) found high discriminant validity between CBM and the Stanford Achievement Test measures of writing. They reported a correlation o f .88 between scores on two CBM writing probes and the writing score on the Stanford Achievement Test. Fewster (2000) found that CBM writing scores obtained in Grades 6 and 7 were correlated to teacher-assigned letter grades in Social Studies and English in Grades 8, 9 and 10. Correlations demonstrated significant (p < .005) small to medium effect sizes. There is also some evidence favoring the discriminative validity of CBM writing measures. Tindal and Parker (1991) found that CBM measures of TWW and WSC were successful in detecting differences between children receiving specialized services in education in Grades 3 through 5, as compared to children not requiring this kind of support. This provides evidence that CBM scores can discriminate among students at different skill levels. Another recent study investigated the discrimination ability of CBM writing scores as well as their predictive validity. A discriminant analysis found significant differences between the CBM writing scores of children in each of the following groups: students in special education placements or receiving remedial support, students in regular education not receiving support, and honors students (Fewster, 2000). Although these measures have been shown to demonstrate concurrent and predictive criterion-related validity and to discriminate among writers of varying abilities, the content validity of these measures is questionable. Content validity of a test is shown when the content 25 is “drawn from the relevant environmental demands, that is, what the student is expected to do within the general education context” (King-Sears, 1994, p. 11). As writing from a story starter for three minutes is not a typical educational activity, it may be difficult to make generalizations about an individual’s writing ability from performance on CBM writing samples to writing in general. Measuring syntactic complexity. Another area for evaluation frequently suggested in the literature is that of syntactic complexity (Hughes et. al., 1997; Isaacson, 1991; Rousseau, 1990). All three authors suggest the T-unit as a useful measure for scoring syntactic complexity in writing samples. One T-unit, or terminable unit as named by Hunt (1965), consists of a main clause and any attached subordinate clauses. Hunt’s study is frequently cited in the literature, and the T-unit continues as a common basis for measuring syntactic complexity in writing (Scott, 1988). Hunt studied the writing of children in Grades 4, 8, and 12. The writing samples she used consisted of 1000 words. All students in her study were of average intelligence. Her results indicated that T-unit length (mean length of T-unit or MLTU) and the ratio of clauses to T-units (subordination index or SI) were found to increase across the three grade levels. An analysis of variance using a factorial analysis showed that MLTU and SI were statistically significant (p < .01) for grade. In an extensive longitudinal study concerning the spoken and written language skills of 211 children in Kindergarten through Grade 12, Loban (1976) also found a general increase in MLTU across grades. His findings for Grades 4 , 8 and 12 were nearly identical to those of Hunt (1965). The growth in MLTU increased steadily in Grades 4 through 6 with some plateaus occurring in Grades 6 and 7. Loban (1976) also studied the degree of subordination which he expressed as the number of subordinate clauses per sentence. These data were subsequently converted to the 26 number of subordinate plus main clauses per sentence by Scott (1988) to allow for direct comparison to Hunt’s work in this area. Like MLTU, the growth in SI increased between Grades 3 and 12, and again, the data for Grades 4, 8, and 12 were similar to the findings of Hunt (1965). Loban’s study also made comparisons between low and high achieving students. His results showed higher MLTU scores for the higher group across all grades. More recent studies also have investigated the usefulness of T-unit analysis for examining syntactic complexity. For example, Klecan-Aker and Hendrick (1985) found statistically significant differences between the T-unit lengths in the oral language of students in Grades 6 and 9 (p < .05). There was not a statistically significant increase in the number of clauses per T-unit (SI) between the two grades. It should be noted that this study was conducted using oral language samples and while findings may not be generalizable to written language, it does provide more evidence that MLTU increases with growth in language development. Summary. While these measures of writing fluency and syntactic complexity provide countable information on samples of writing, and have been well studied to establish their reliability and validity, there are limitations to their use. Again, like many commercial writing assessment tools, they focus on word and sentence level aspects of writing without taking into account larger discourse related aspects of writing such as cohesion and organization. The next section explores a class of tools that may be used to examine a variety of aspects of written language. Categorical Assessment Tools Silliman and Wilkinson (1994) suggest that categorical tools are useful for coding behaviors and skills quantitatively in language assessment. Two of the systems they suggest 27 for accomplishing this include rating systems and checklist systems. Categorical tools will be examined from the perspective of their usefulness in evaluating cohesion in writing. Rating systems. Rating systems may be used in a variety of ways to evaluate writing. Three kinds of rating systems are typically employed in writing evaluation. These are holistic ratings, analytic ratings, and primary trait ratings. Analytic and holistic ratings of writing are used in some of the standardized tests already mentioned. These procedures may also be applied to scoring writing samples produced in the classroom. Dagenais and Beadle (1984) described these procedures of evaluation as being helpful for use in classrooms and for planning instructional programs. Holistic evaluations involve rating a writing sample on the basis of the overall presentation rather than its specific features (Miller, 1999). As seen previously, this type of rating does not allow for an in-depth analysis of specific areas of writing difficulty or success. Holistic evaluations require that the evaluator be trained in the use of the rating scale and are aimed at gaining an overall impression of a sample of writing. This impression can be gauged against a pre-established criteria (Dagenais & Beadle, 1984) or a group of writing samples may be rated according to their relative standing in relation to other writing samples (Miller, 1999). Another type of rating consists of primary trait scoring. It involves examining particular aspects of a piece of writing and rating them individually (Sax, 1997). This type of scoring requires the development of criteria that are specific to the writing purpose. For example, a primary trait rating scale used to score an argumentative piece of writing could rate the persuasiveness of the argument. Students are rated against this criterion rather than against one another (Miller, 1999). According to Miller, this type of rating is more difficult to 28 develop than other types due to the specific nature o f the scoring criteria. However, due to the level of specificity, it provides more diagnostic information than holistic scoring. A third form of rating consists of analytic scoring. This type of scoring involves rating different aspects of the same writing sample individually. For example, this type of scoring procedure may involve rating story development, grammar, and spelling each on a 5 point rating scale. An individual’s score would be the total of all three ratings. According to Miller (1999), while having the advantage of analyzing different areas of writing strengths and weaknesses, this type o f method is time consuming and may be impractical for large scale assessments. Furthermore, as found in the review of standardized writing assessments, this type of rating provides only a single measure for each skill area examined. An example of an analytic rating scale is the writing reference set developed by the British Columbia Ministry of Education (1996b). This scale includes description or rubrics to assist teachers in scoring writing samples using a seven point scale on the features of Meaning, Style, Form, and Surface Features. The rating systems described here generally are used to evaluate constructs globally and offer only a wide perspective analysis (Silliman & Wilkinson, 1994) of writing skills. Furthermore, these types of evaluations overall tend to be subjective in their scoring and can be time consuming to complete (Sax, 1997). Consequently, it is my opinion that such tools would not be the best means of providing a diagnostic measure of the various cohesive devices that children use in their writing. Checklists. A checklist is another categorical system that may be used in the evaluation o f writing. When evaluating writing, checklists can help focus the evaluator’s attention on relevant details for scoring (Sax, 1997) rather than on making overall judgments about a construct, as occurs when using holistic or analytic rating procedures. While Silliman and 29 Wilkinson (1994) caution that checklists provide only a broad evaluation focus and may not be sensitive to small changes in communication behaviors, Sax indicates that checklists are useful in measuring complex behaviors that can be broken down into specific segments. Rousseau (1990) suggests that simple checklists are useful to pinpoint errors for error analysis and to allow for repeated direct measures of progress in writing development. Although evaluation with this kind of tool does not provide a qualitative look at a given behavior, it does allow for easy and inexpensive administration and comparison of a wide range of behaviors across a large number of students (Silliman & Wilkinson, 1994). It is the opinion of this author that the usefulness of a checklist for diagnostic purposes is probably related to the degree of specificity in the items. That is, broad items would allow for only broad assessment, whereas many specific items which reflected multiple aspects of a writing area could result in a fairly in-depth analysis of writing skills. Although a checklist will lose some information due to the absolute nature of rating only the presence or absence of aspects of cohesion, for example, this same trait increases the ease and objectivity of administration. Given the arguments for the usefulness in using checklists for error analysis in diagnostic assessments, this method of evaluation seems the most viable for an analysis of the different markers of cohesion used in the writing of children. The other benefit to this form of assessment is its usefulness with a variety of curriculum-generated writing samples. The remainder of this discussion will focus on considerations for the development and evaluation o f effective checklists. Considerations for Checklist Development One consideration in checklist construction involves a comprehensive analysis o f the important aspects of a given behavior (Sax, 1997). According to Sax, construction of a checklist involves an in-depth knowledge of the skill to be evaluated. It is from this analysis 30 and knowledge that the detailed content of the checklist is developed. Silliman and Wilkinson (1994) remind us that items on a checklist should be specific. Regardless of the purpose of an assessment or the type of evaluation tool to be examined, Tindal and Parker (1991) suggest several further considerations for test development and evaluation. According to these authors, tests should have a method of standardized administration and demonstrate reliable scoring. They should discriminate between students with varying skill levels and show at least low-moderate correlations with other acceptable methods of assessment. They also should be sensitive to improvements in student abilities. King-Sears (1994) advocates for standardized procedures and content for measures to ensure integrity and to avoid compromising reliability and validity. Reliability of Checklists If measurements are to be reliable, scorer reliability is a must as scorer reliability places an upper limit on the reliability o f the overall measure (Sax, 1997). Tindal and Parker (1991) state that “Clear and standardized administration and inter-scorer reliability are necessary for others to unambiguously interpret the results” (p. 211). One factor that affects the reliability of ratings on checklists is the ambiguity in the definitions of the trait to be measured. Other factors include the differences among raters that relate to training in the use of the instrument or the tendency of individual raters to score leniently or too severely on a consistent basis (Sax, 1997). It follows then that in developing a checklist, item clarity, rater training, and interrater agreement should be areas of focus. Establishing Validity of a Checklist There are several ways of viewing the validity of an evaluation tool. Content validity is established when the skills outlined in the instrument’s items correspond to the skills one is claiming to assess. As mentioned earlier, the development of a checklist involves a 31 comprehensive analysis o f skills to be developed. One approach to establish content validity is by reviewing studies o f how cohesion is used in writing and developing checklist items directly from the findings of research. Another form of validity, called criterion-related validity, comes from concurrent writing assessments. That is, concurrent criterion related validity can be established by correlating the scores on an instrument to other related measures (Sax, 1997). As no other tests examining cohesion have been found, concurrent validity would have to be established by comparing scores on a cohesion checklist to scores on other assessments of writing skill. If cohesion is related to writing proficiency as indicated in the literature, then it should demonstrate positive correlations to other measures of writing proficiency. Another form of validity, called construct validity, indicates the extent to which an assessment tool measures the theoretical construct it claims to evaluate. This form of validity includes many lines of evidence (Moss, 1995). Sax (1997) outlines several avenues that support an argument for construct validity. One line of evidence is the justification that the construct in question has educational relevance and importance. Another is that the construct can be measured. Convergent validity provides another line of evidence. This form shows multiple sources of evidence o f the construct established through criterion-related and content-related validity arguments. Another argument for construct validity results from evidence of discriminant validity, that is evidence showing to what the construct is not related. Thus, establishing the validity of an instrument involves multiple lines of evidence that can be established by looking at relationships between different measures and through examining the literature for explanations of the construct in question. 32 Research Purpose As discussed in the foregoing chapters, there is a need for a tool to evaluate cohesion in the writing of school-aged children. Such a tool would be useful for tracking the development of cohesion in students’ writing and for devising and monitoring written language intervention plans for students with writing difficulties. The purposes of the current research are two-fold; 1. To develop a checklist that can be used to evaluate cohesion in the writing of elementary school children. 2. To evaluate the reliability and validity of the checklist. Scope o f the Proposed Research For the purposes o f this study, cohesion will only be examined in one writing genre, narrative writing. The primary goal is the development of the items on the checklist with attention focused on creating a reliable and valid instrument. The development of scoring norms, considerations for developmental and cultural differences, and genre differences are topics for future research. Although the generahzability of this tool will likely be limited by the nature o f the writing samples used in the development, it is felt that this research will provide a good starting place for the development of a tool that can be later extended to a wider variety of writers and types of writing. Contributions of this Research Given the limitations of currently used methods of assessment, and the impact of cohesion on writing quality, the research done here will contribute in three main ways. The first contribution is to the body of literature in the area of writing assessment. Hedberg and Fink (1996) indicate that the research on the writing of children with disabilities and story writing in general is sparse. This study will provide information that informs this area. 33 Another area to be informed is the body of research regarding cohesion. As found to be the case by this author, the information on cohesion use in writing in the research literature is sparse. Findings from this study will provide further information about cohesion in the narrative writing o f elementary school-aged children. The final area of contribution is to practitioners conducting writing assessments. Development of such a tool would help professionals detect and describe difficulties problem writers have in structuring written text (Lindeberg, 1984). Such a tool could then be used to plan and monitor interventions aimed at improving the use of cohesive devices in writing. 34 CHAPTER THREE: METHOD Research Design The current study was designed to develop and evaluate a checklist for measuring cohesion in writing. The development process involved several steps that were adapted from procedures outlined by Crocker and Algina (1986). The first steps that they suggested in constructing a test included identifying the primary purpose of the test, identifying the behaviors that represent the construct to be examined, preparing a set of test specifications, constructing an initial item pool, having items reviewed by knowledgeable panels followed by revisions as necessary, and preliminary item testing followed by revisions as necessary. These steps constituted the preliminary development of the instrument. The remainder of the steps they suggested included testing of items on a large sample that represents the population for whom the instrument is intended; determining the statistical properties of the item scores with elimination of items that do not perform as expected; conducting reliability and validity studies on the final form of the test; and developing guidelines for the administration, scoring, and interpretation of the instrument. These final steps formed the second part of this study, the large scale evaluation of the instrument. This chapter describes these steps in detail along with the data source used to conduct the item analyses, reliability and validity checks. This chapter describes both the statistical and qualitative procedures used. Data Source The data source used in this study consisted of 342 archival CBM writing samples fi-om children in School District #57 in Grades 4, 5, 6, and 7. The samples were gathered from three elementary schools, and represented each school’s entire Grade 4, 5, 6, and 7 population. A fourth school also provided samples but the set from this school was incomplete. As it was not certain that the samples represented the entire population of 35 students in all four grades at this school, this set of writing samples was not included with the 342 used in the large scale study. However, I chose 20 writing samples from this fourth school to use in the preliminary item analysis. As indicated in the previous chapter, the CBM writing samples used here are samples that are obtained by having children write for three minutes from a story starter. The samples are then scored for the total number of words written (TWW) and the number o f words spelled correctly (WSC). The samples are gathered on a routine basis by many schools in School District #57 as one way of monitoring student progress. The three story starters are presented in Appendix B. Several examples of writing that reflect the range present in these samples are displayed in Appendix C. I chose this kind of writing sample for use in this study for three reasons. One reason relates to the use of CBM in School District #57 where I work. Part of my aim in this study was to develop an instrument that would have practical use for myself and my colleagues. Because CBM use is prevalent in this district, developing an instrument that would be able to evaluate cohesion in writing using CBM writing samples would enable practitioners to capitalize on a resource already being used in the district. Furthermore, and more importantly, development of an instrument that worked with these writing samples would allow practitioners to extend the purpose and value of CBM beyond measures of fluency and spelling. Another reason for choosing these samples relates again to the practical utility of the instrument being developed. If the cohesion checklist was able to measure differences in short writing samples, it would likely have utility for longer samples as well. The reverse, however, might not be true. That is, a checklist that could measure differences in the use of cohesive devices on longer samples may not be as sensitive to differences in shorter samples. 36 The final reason for choosing these samples is their similarity. In order to evaluate my checklist, I wanted to be sure that the variability in checklist scores reflected the performance o f the checklist items. Therefore, as much as possible, I eliminated variability caused from sources such as differences in genre, audience, amount and type of instruction given for the writing task, or the amounts of supported editing and re-writing. CBM samples are generated with a standardized procedure and are available in large quantities across elementary grades. The procedures for administering CBM writing probes are presented in Appendix B. The schools fi’om which the writing samples were gathered were chosen on the basis of the availability of complete school sets of writing samples. To ensure variability in the data source, I used writing fi'om each school’s entire Grade 4, 5, 6, and 7 population, including writing by children with special needs, English as a Second Language/Dialect (ESL/D), and learning disabilities (LD). This constituted a convenience sample. As this research focused on developing checklist items rather than making generalizations about a population of students, random sampling was not necessary. The writing samples collected had all identifying information such as the student’s and school’s name removed. I gave each writing sample an identification number and coded each one for grade, gender, and special learning designation. Special learning designations included ESL/D, LD, Special Learning Resource (SLR) which refers to children with IQ s of lower than 75, and Other which included children with behavior difficulties and hearing impairments. I also included a code to indicate which of three story starters was used. I also recorded scores for TWW and WSC. Ethical Considerations As the data used in this study was archival and contained no identifying marks, it was not necessary to obtain the consent of individuals. As this research focused on the evaluation 37 of an assessment tool rather than students, there was no perceived harm to individuals. Norm Monroe, Director of School Services, provided written consent to conduct this study using writing samples from Prince George School District #57 (see Appendix A). Procedures Preliminary Development of the Instrument Initial Compilation I carried out the first steps of identifying the primary purpose of the test and behaviors that represent the construct to be examined, in this case cohesion, through review of the literature. These purposes and behaviors are described in Chapter Two. I developed each item on the instrument as well as the table of specifications from the compilation of research findings on cohesion as outlined in the literature review, and based largely on the definitions of cohesion developed by Halliday and Hasan (1976). The first steps I used in evaluating this instrument consisted of panel reviews, a preliminary item analysis and a pilot interrater study. Panel Reviews The first step in revising the checklist involved two panel reviews. The first panel consisted of myself and three teachers with experience in testing who had taken graduate courses in measurement and evaluation in education. This evaluation focused mainly on the structural aspects of the checklist. This included determining if the items were free from technical flaws such as errors in spelling and grammar; determining the accuracy, appropriateness, and relevance to test specifications; judging the level of readability; and examining item bias and ambiguity of items. A second panel consisted of myself and five other school speech-language pathologists. With this panel, discussion focused on how well the items on the instrument reflected the concept of cohesion. This group also supplied feedback on the clarity of items and examples, as well as the layout of the instruction manual and ease 38 of scoring. Following feedback from these two panels, I made a number of minor revisions to the wording of items and the format of the instrument. These changes will be discussed in more detail in the Chapter Four. Preliminary Item Analysis In the next phase o f development, I scored 20 writing samples using the cohesion checklist and performed a preliminary classical item analysis on the results. The 20 samples I selected for this portion of the study were not used in the large scale testing. This group of 20 samples include five from each of Grades 4 through 7 .1 selected each sample by “eyeballing” the overall length and legibility. I chose the first five I found that represented a mid-range length for the grade. I then used the results from this scoring for an item analysis using ITEMAN (1994). Item analysis is a procedure for examining the statistical performance of items on a assessment instrument. The statistical results help to determine which items are too easy or too difficult, and how well items discriminate between high and low scorers. ITEMAN is a classical item analysis software program which calculates standard item statistics and summary statistics. I used the results from this analysis to “red flag” items that could show up as problems during the next phase of the preliminary development. These flagged items may be ones that show up as ambiguous in the interrater study. At this point I made some changes to the checklist items and a second version of the checklist was created. These changes are described in Chapter Four. Pilot Study of Interrater Agreement In the pilot study of interrater agreement, I and 12 volunteers scored ten writing samples for comparison o f agreement among raters. Each rater scored the same ten writing samples. I selected these samples, by using a random numbers chart, from the 342 samples 39 collected for the study . The volunteers consisted of three school speech and language pathologists, seven learning assistance or remedial teachers, one classroom teacher, and one school psychologist. This phase of the study was conducted over a day-long session. Training session. The first half of the day consisted of a training session. Training consisted of an overview of the notion of cohesion, followed by a thorough review of the checklist and scoring practice. During the checklist review, I led the group of volunteers through an item-by item examination of the checklist, using examples to illustrate appropriate scoring o f each item. The next step in training focused on practice with scoring writing samples. All participants scored the first two practice samples in small groups to allow raters the chance to discuss and clarify their scoring choices. They then scored the next two examples individually. In all cases of scoring practice, the group reconvened to compare scores and discuss differences. The raters completed two additional practice samples, as they expressed uncertainty in their scoring and there was still some disagreement on some of the items. By the end of the scoring practice, at least 11 out of 13 raters agreed on the scoring of each item on a given example. After this final practice, scoring of writing samples for the pilot interrater portion of the project commenced. Scoring session. For the scoring portion of the session, I provided each rater with a bundle of eight writing samples. When scoring was completed on these, I supplied another bundle o f eight. Each bundle included five copies of probes that would be scored by the whole group. The other 3 samples were different for all. I included the extra writing samples and used a random arrangement of interrater writing samples within the bundles to reduce the opportunity for raters to compare scores as it was unlikely that any two raters would be scoring the same writing sample at the same time. Also, this made the raters blind to which samples would be used for interrater comparison. Each rater scored 10 interrater probes in all. 40 Data from one rater was eliminated from the study. This rater was only able to complete five interrater probes and noted when handing them in that she had miscopied the probe identification numbers. Therefore it could not be determined which sample went with which completed checklist. One other rater returned only eight of the ten samples and another rater returned only nine. Using the completed checklists from the 12 raters, I calculated the proportion of agreement for each writing sample on an item-by-item basis. I then calculated the mean proportion of agreement across all 10 interrater writing samples for each item. I also compared total scores on the checklists. Revisions to Checklist Content and Process Modification of items. The preliminary item analysis and pilot interrater studies were instrumental in eliminating major problems inherent in the checklist, both in content and scoring procedure, prior to carrying out the large scale study. It is noteworthy that discussions with and among the group during the training portion of the interrater pilot study led to many valuable suggestions about improvements that could be made to reduce ambiguity in the checklist items and scoring instructions. These suggestions included specific examples that would clarify when and when not to give credit on individual items. Each volunteer wrote notes and comments on their instruction manuals to assist them in scoring. I then collected these notes at the end of the session as additional input to consider when evaluating and editing items on the checklist. I used the results of the preliminary item analysis in combination with the data on the proportion of agreement between raters on each item to delete and modify checklist items prior to the large scale study. While quantitative results are emphasized in this report, the process of deciding how and where to change items also involved qualitative processes. 41 Specifically, I used a reiterative approach in examining the literature on cohesion studies, examining items that performed poorly in the item analysis, examining notes taken during the rater training session and examining items which performed poorly in terms of proportion of interrater agreement. This process resulted in another revision of the checklist. Procedural modifications. Not only did the pilot portion of this study lead to changes in checklist items before the final analysis, but it also raised questions about whether or not all writing samples could be adequately scored using the checklist. When looking at interrater agreement, it was clear that agreement overall was much lower on some writing samples than others. I examined these samples for qualities that might suggest criteria for inclusion in the study. This examination resulted in the following criteria for a sample’s inclusion. First, the sample had to be readable. That is, the handwriting and spelling patterns had to be such that words could be deciphered. This did not mean that spelling errors could not be present, but the words had to be decipherable. I chose a limit of 2 unreadable words as the cut-off criterion for inclusion. Second, the sample had to contain at least two sentences. The rationale for this criterion was based on the fact that the checklist was meant to analyze inter­ sentence connections. If sentence boundaries were not present, cohesion could not be scored. I removed writing samples not meeting these criteria from subsequent phases of the study. Evaluation Using a Large Scale Sample The next part of the study involved scoring 312 of the 342 collected writing samples using the third revision of the checklist which contained 25 items. Thirty of the writing samples had been eliminated from this part of the study as they did not meet the criteria for inclusion. Seventy Grade 4 students, sixty-seven Grade 5 students, eighty-four Grade 6 students and eighty-nine Grade 7 students generated the samples used. Two writing samples missing the code for grade were retained in the study but were eliminated from analyses 42 conducted by grade. Of these remaining writing samples, 30 were written by students designated ESL/D, 7 by students designated SLR, 21 by students designated LD and 9 by students designated as ‘other’. I scored all samples in this portion of the study for cohesion, the number of T-units, mean T-unit length (MLTU), and the degree of subordination (SI). I then used this scoring of the writing samples to perform several analyses on the checklist items. After additional revisions of the checklist items, I conducted another interrater study. I also used the scores obtained in this portion of the study to establish concurrent criterionrelated validity. Item Analvsis An item analysis was run using ITEMAN (1994). I chose a classical item analysis to conduct this study as it provides statistics that reflect the performance of items including information about their discrimination. A Rasch analysis would also provide information appropriate for analyzing item performance but does not provide information about the discrimination ability of items. This method assumes that all items discriminate equally (Sax, 1997). Rasch scales also assume that the latent trait being measured is unidimensional (Sax, 1997). As this instrument was a new creation, I did not know if these assumptions could be met. As well as providing statistics that reflected item performance, this classical analysis provided a measure of internal consistency for the overall checklist. I deleted or combined items that performed poorly on this analysis and ran the analysis a second time to determine the discrimination of items on this last version of the checklist which now consisted of 13 items. I also ran a third item analysis on these 13 items to determine how items performed as parts of subtests rather than as parts of the total checklist. 43 Interrater Study Another step in the large scale study was to establish the reliability of the instrument through an examination o f interrater agreement and reliability. Tinsley and Weiss (1975) indicate that it is important to gather indices of both interrater agreement and reliability. They indicate that agreement is established by examining the extent to which different raters give the same scores on an instrument. This can be reported in terms of proportion of agreement. Interrater reliability, on the other hand, indicates the degrees to which scores from different raters are proportional to one another. This is usually reported in terms of indices of analysis of variance and correlational values (Tinsley and Weiss, 1975). I completed this portion of the study with three other raters. My scores were excerpted from the large scale study. In this interrater study all four raters were school speech-language pathologists. This professional group shares a common background with respect to clinical measurement of children’s language, and represents the practitioners that would most likely use such an instrument. For this interrater study, I selected probes consisting of only one story starter. The samples were also marked for sentence boundaries to eliminate the need for judgment in this regard as not all raters were familiar with the procedures to do this. By taking these steps, it was easier to interpret the variability among raters’ scores as reflecting problems with the checklist rather than some other variable. To ensure adequate variability in the scores of writing samples used in this interrater study, I chose samples that represented the full range of scores of 1 through 12 on the checklist. Each rater received copies of the checklist and manual, as well as two practice items in advance o f the training session. The training session was scheduled for a half day. During this time, I focused on training the group to recognize examples and non-examples of checklist items in relation to the rules for scoring outlined in the manual. Several practice samples were 44 then scored to establish understanding of the scoring rules with the raters. At the conclusion o f the training session, I presented the raters with a package of 12 probes to score independently. Once the completed checklists were collected, I examined the results for the proportion of agreement among raters on an item-by-item basis. I then calculated a one-way analysis of variance (ANOVA) and Levene’s F statistic to determine if any significant differences existed between raters. I also calculated correlations for agreement between rater pairs. Validity Measures In this portion of the study I compiled several sources of evidence for the validity of the checklist. The content validity of this instrument was supported by the review of the literature presented in Chapter Two. Chapter Two also contains arguments for the educational relevance and importance of measuring the construct of cohesion. This argument supports the construct validity of this instrument. To provide further evidence of construct validity, I conducted a factor analysis on the items. I also conducted a discriminant analysis to determine the predictive value of checklist scores for grade membership. The final area of validity that I investigated in this study was concurrent criterionrelated validity. As no other measures of cohesion were available, I turned to other measures o f writing that related to writing proficiency. I used measures of writing fluency and measures of syntactic complexity calculated on the same writing samples from which I obtained my cohesion scores. Fluency measures consisted of total words written (TWW) and words spelled correctly (WSC). Syntactic measures consisted of the mean length of T-unit (MLTU) and the subordination index (SI). These terms were defined in Chapter Two. 45 Writing fluency measures. I chose these scores for concurrent criteria as the reported findings (see Chapter Two) indicated that CBM measures are reliable and relate to other overall measures of writing. Performance on these instruments has been shown to relate to other measures of writing development, and to discriminate between students with learning problems and those without. It was therefore expected that CBM scores would provide a reasonable criterion of writing proficiency. As cohesion has been shown to relate to overall writing proficiency, scores on a checklist o f cohesion should also relate to a variety of measures that also relate to writing proficiency. If scores of TWW and WSC are related to overall writing ability as implied in the literature reviewed in Chapter Two, then they should relate to measures of cohesion. Svntactic complexitv measures. Syntactic complexity is a relevant measure of writing proficiency because research has long shown that syntactic complexity and elaboration are clear measures of writing development ( Hunt, 1965; Loban, 1976). Furthermore, studies of “problem writers” show that their products contain sentences which are less complex (Ratner & Harris, 1994) especially in regards to the degree of subordination (Scott, 1991b). Many other researchers have also found differences between problem writers and “good writers” in both length of sentences and syntactic complexity (Anderson, 1982; Poplin, Gray, Larsen, Banikowski & Mehring, 1980; Scott, 1991b; Singer, 1995). Therefore, it can be argued that poor writers are likely to score poorly on measures of syntactic complexity while good writers should score better. Consequently, it is expected that measures of MLTU and SI will show positive correlations to other measures of writing ability. Assuming that scores on a measure of cohesion are related to writing ability, correlations between MLTU, SI and scores of cohesion are expected. 46 Prior to using measures of TWW, WSC, MLTU and SI for concurrent validity purposes, I examined their statistical performance with this data source to further determine their effectiveness as criterion measures. This included calculations of means and standard deviations o f each of these scores. I then ran a discriminant analysis to determine the ability of these measures to predict grade. Finally, I calculated correlations between checklist scores and scores o f TWW, WSC, MLTU and SI to determine the concurrent criterion-related validity of the checklist scores. Summary The development and evaluation of a checklist to assess cohesion in writing involved many steps utilizing 312 CBM writing samples as the data source. The preliminary development of the checklist involved the compilation of checklist items, panel reviews, a preliminary item analysis, and a pilot interrater agreement study. The evaluation of the checklist involved classical item analysis; reliability checks for internal consistency and interrater agreement; and validity checks including a factor analysis, a discriminant analysis, and a concurrent criterion-related validity study. 47 CHAPTER FOUR: RESULTS This chapter presents the results of the checklist development and evaluation procedures. The first section describes the outcome of the preliminary development o f the instrument including results of the panel reviews, the preliminary item analysis, and the pilot study for interrater agreement. The next section reports the results of the large scale sample evaluation of the checklist. This section includes the outcomes of three item analyses as well as results of validity and reliability checks. The results of a second interrater study are also included here. Preliminary Development of the Instrument Initial Compilation o f Items Appendix D contains the preliminary draft of the checklist that I compiled based on the research literature. I will call this version Checklist 1.0. This was the form of the checklist presented to two panels for review. Panel Reviews The majority of comments from the first panel review focused on ambiguity in the wording of the items, overlap of item content and the weighting of items relative to the table o f specifications. The second panel focused its attention on the ease of scoring and the clarity o f examples and instructions. I considered the feedback from both panel reviews in conjunction with the aim of my research and that of the instrument. As a result of these reviews, I made adjustments to some of the items. These included minor changes in the wording of some items, and a switch in the order of presentation of two items to clarify the scoring procedure. The panels also flagged some additional items as potentially problematic, but I left these unchanged pending the outcome of the item analysis. I also changed the format of the manual based on suggestions made by the panels. 48 These changes included the addition of a brief explanation of cohesion in the introduction to the manual and inclusion of a section defining key terms. I also developed a scoring companion that provided point-form scoring explanations and examples for quick reference. The panel reviews concluded with a second version of the checklist called Checklist 1.1. See Appendix E for a copy of this version of the checklist. Checklist 1.1 was used in the preliminary item analysis. Preliminary Item Analvsis Table 2 shows the results of the classical item analysis conducted on Checklist 1.1 using 20 scored writing samples. Item analysis is defined as the computation and examination o f the statistical properties of responses to individual test items to permit the selection of items on an instrument (Crocker & Algina, 1986). The ITEMAN (1994) analysis program used in this study generates three main item statistics. The first one, proportion correct, indicates how many writing samples received credit on each checklist item. This indicates the difficulty of the item. Values above .85 indicate easy items while values at or below .49 indicate difficult ones (Sax, 1997). Items that are either too easy or too difficult are not desirable to retain on an assessment instrument because they provide little information about the individuals being evaluated. That is, if everyone is scoring the same on an item, the item does not discriminate among writers. 49 TdWe2 Item Statistics from the Preliminary Item Analysis Item Prop. D Ipb Item Prop. D No. Corr. 48* 10 .00 .00 .71 54* 11 .30 .05 .14 46* 12 4 .65 .66 .44* 5 .55 .46 6 .25 7 No. Corr. 1 .65 .46 2 .35 3 îpb Item Prop. D îpb No. Corr. - 19 .25 .09 .35 .43 .26 20 .05 .00 -05 .15 .43 .38 21 .00 .00 - 13 .15 -06 .17 22 .25 .37 .35 .28 14 .00 .00 - 23 .20 -2 6 -16 .37 .35 15 .25 .43 .43 24 .50 .03 .15 .90 .40 51* 16 .80 .26 .11 25 .95 .20 .22 8 .50 .57 .41 17 .30 .23 .34 26 .00 .00 - 9 .00 .00 - 18 .55 .31 .35 27 .00 .00 - Note. Item No. = item number; Prop. Corr. = proportion correct; D = discrimination index; rpb = Point Biserial; - = data not calculated. * e < .0 5 . Another item statistic, the discrimination index, indicates “the extent to which items differentiate between those persons with highest and lowest scores on the Total test” (Sax, 1997, p. 240). Positive discrimination values show that more examinees who scored high on the instrument have received credit for that item than examinees who scored low. Values between 0.31 and 1.0 indicate good discrimination. Values between 0.10 and 0.30 indicate fair discrimination, while values of 0.09 and below indicate poor discrimination (Sax, 1997). Another statistic calculated by this program is the point biserial. This statistic shows how well the score on an item relates to overall performance on the instrument (Crocker & 50 Algina, 1986). The same approach to examining any correlation is applied to the examination of point biserial correlations. Both the size of the relationship and the statistical significance require evaluation. From the table it can be seen that 11 of the 27 items showed poor discrimination on this preliminary item analysis. Of these, 6 items discriminated poorly because none of the writing samples received credit for these items. Of the remaining 5 poorly discriminating items, 2 (items 13 and 20) also had low proportions correct. The remaining 3 items (items 19, 23 and 24) were considered problematic due to nonsignificant point biserial correlations (q* (18) < .38, p < .10) and low or negative discrimination values that could not be explained by low proportions correct. The majority of the items demonstrated point biserial values that were not statistically significant (g > .05). I attributed this as probably due to the small sample size (n = 20) used in this portion of the study. As the results were only used as a preliminary indicator of potentially problematic items, I did not give great weight to the level of statistical significance at this stage. Although these results were interpreted cautiously due to the small sample size, they prompted me to make some changes to the checklist. These changes reflected the results of the preliminary item analysis as well as notes from the panel reviews. In particular, I changed items 24 through 27 significantly. The changes to these items can be summarized as follows. Checklists 1.0 and 1.1 contained three items that evaluated the sophistication of organization achieved through the use o f paragraph or paragraph-like structures. After the preliminary item analysis, these items were collapsed to a single item with the content of the revised item reflecting the kinds of paragraph structures more likely to be found in a narrative writing sample. In addition, I added two more items to the checklist that evaluated lexical cohesion in a more specific manner. 51 These changes resulted in a new version of the instrument, Checklist 1.2. The reader is referred to Appendix F to review this version of the checklist. Pilot Study for Interrater Agreement In this interrater study, 12 raters scored 10 interrater writing samples using Checklist 1.2. The interrater samples were randomly chosen from the data source of 342 writing samples collected. Table 3 displays the mean proportion of agreement among raters for each item across writing samples. I arrived at these figures by calculating the proportion of ones or zeros scored for a given item across all 12 samples. The table shows a large degree of variability in the scoring o f items as indicated by varied means and large standard deviation values for these proportions. Agreement ranged from 69.02 percent to 100 percent. I interpreted this variability in scoring as reflecting ambiguities in some of the checklist’s items. I considered items with less than 85% agreement among the raters to be problematic. This proportion of agreement was chosen as a cut off as this level indicated that a mean of 11 out of 13 raters agreed on the scoring of that item across writing samples. As this would indicate a majority of raters agreeing, this was felt to be an indicator that the checklist item was not ambiguous. 52 TdWe3 Mean Percentage Agreement Among Raters Across Items % Agreement % Agreement % Agreement Item No, M (SD) Item No. M (SD) Item No. M (SD) 1 83.18 (12.43) 10 96.59 (5.90) 19 95.76 (4.48) 2 72.36 (13.20) 11 98 33 (3 51) 20 95.83 (10.58) 3 87.20 (17.85) 12 100.00 (0.00) 21 97.42 (4.15) 4 75.98 (18.58) 13 96 59 (8 11) 22 93.11 (11.41) 5 81.52 (17.91) 14 95 76 (10.61) 23 92.42 (12.08) 6 69.02 (11.31) 15 83 18 (17.83) 24 93.26 (6.60) 7 91.36(14.48) 16 94.77 (7.36) 25 83.03 (14.71) 8 93.11 (12.69) 17 9109 (10.61) 26 96.59 (5.90) 9 99.17 (2.63) 18 84.45 (13.28) 27 95.00 (10.54) Note. Item no. = Checklist item number; M = mean; SD = standard deviation. The results of this interrater study indicated that items 2, 4 and 6 had particularity low levels o f agreement. Items 1,5, 15, 18 and 25 were also considered problematic by the results indicated here. Revisions to Checklist Content and Process I made several changes to the checklist as a result of the preliminary item analysis and pilot interrater studies. I considered the results from these sections in conjunction with feedback from the panels and raters, as well as from the findings of research reviewed in the literature. Changes I made included combining, deleting, and expanding some items. This was done to reduce ambiguities, to eliminate redundancies, and to reduce the number of items that were either too difficult or too easy. Other changes included adding definitions and examples 53 to the instruction manual and scoring companion. I also made some changes to the format of the checklist such as dividing it into subtest sections for ease of scoring and improved appearance. In addition, I added a section to the manual describing the method of preparing a writing sample for scoring. This included criteria for writing samples with which the checklist may be used and instructions to use T-units rather than sentences as the basis for analysis. The result of these combined findings was another version of the checklist containing 25 items, called Checklist 2.0. See Appendix G for a copy of this version of the checklist. Evaluation Using a Large Scale Sample This section includes the results generated after I scored all 312 writing samples that met the criteria for inclusion in the study, using Checklist 2.0. Of the 30 samples that did not meet criteria for inclusion, thirteen were from Grade 4, nine from Grade 5, five from Grade 6 and three from Grade 7. Twenty three of the excluded samples were generated by children with special designations. Students designated ESL/D wrote 6 of the excluded samples, students designated as SLR wrote 4 of the excluded samples, students designated LD wrote 11 o f the excluded samples and students designated as “other” wrote 2 . 1 included several samples of writing used in this portion of the study in Appendix C. These include good and poor examples of writing as well as typical samples for each grade. The information reported in this section includes three classical item analyses using the ITEMAN (1994) software. In addition, I report the data that provide evidence for checklist validity and reliability. The results of the final interrater study are also included in this section. Item Analysis The mean total score on the 25 item checklist was 7.69 with a standard deviation of 2.20. The minimum score was 2 and the maximum was 16 with a median of 8. 54 Results of the item analysis are displayed in Table 4. All items had positive discrimination indices. Items with poor discrimination indices and low point biserial correlations were considered for revision. I eliminated two items (21 and 25) from the checklist due to poor discrimination. I removed another two items (22 and 23) as they were too easy. Item 24 was also deleted as it was the only other item remaining on that subtest. I combined remaining items with low proportions correct with other items of related content. Table 4 Item Statistics from the Analysis of Checklist 2.0 Item Prop D Item îjjb No. Prop D Ipb Prop D Ipb No. No. Corr. Item Corr. Corr: 1 .66 .36 .32*** 10 .12 .18 31*** 19 .08 .11 13* 2 .51 .39 .36*** 11 .03 .03 .06 20 .62 .31 32*** 3 .20 .21 12 .09 09 18** 21 .07 .09 15** 4 .55 .54 13 .27 .31 J3*** 22 .97 .07 15** 5 .25 .16 16** 14 .33 .44 .14*** 23 .91 .16 24*** 6 .50 .33 .30*** 15 .05 .06 .09 24 .65 .36 31*** 7 .01 .01 .10 16 .01 .03 .10 25 .02 .04 .13 8 .02 .01 .00 17 .29 .27 .30*** 9 39 ,15 18 .09 .13 .26*** Note. * P < 0 5 * ** p < .01. *** p < .001. Taking into consideration the deleted and combined checklist items, the scoring results were adjusted to produce another composition of items. This composition formed Checklist 2.1. I then ran another item analysis to evaluate Checklist 2.1. This form of the checklist 55 consisted of 13 items that were divided across three subtests called Reference, Conjunction and Lexical Cohesion. This version of the checklist is displayed in Appendix H. On this classical item analysis the mean total score on the checklist was 4.92 with a standard deviation of 1.92. Mean scores ('SD') for the individual subtests were 2.16 (1.24), 1.99 (1.18) and 0.77 (0.62) for Reference, Conjunction and Lexical Cohesion respectively. The minimum score for Checklist 2.1 was 0 while the maximum was 12 with a median of 5. The individual item statistics are displayed in Table 5. The new combination of items improved the proportion correct on almost all items on the checklist. Table 5 Item Statistics from the Item Analvsis of Checklist 2.1 Item Prop. No. Corr. 1 .66 .36 2 .51 3 D Item Prop. D £pb No. Corr. 29*** 8 .13 .14 .22*** .46 45*** 9 .57 .29 .32*** .20 .20 .26*** 10 .37 .34 .36*** 4 .55 .50 .47*** 11 .29 .21 .29*** 5 .25 .25 .28*** 12 .14 .21 .25*** 6 .52 .24 28*** 13 .62 .37 25*** 7 .12 .18 22*** Note. *** = g .001 Ipb The discrimination index on all but two items now fell between .20 and .50. Point biserial correlations for all items ranged from .22 to .45 (significant to p < ,001). A final item analysis was conducted on the 13 item form (Checklist Version 2.1) to compare individual item performance to other items in the same subtest rather than the Total 56 Test. These results are displayed in Table 6. On this run, the discrimination indices for all but two items now ranged from .21 to .93. Nine of the items now had a discrimination index over .40. Point biserial correlations showed similar improvement with values now ranging from .25 to .82 (significant to p < .001). The performance of almost all checklist items was improved by evaluating them as components of subtests rather than of the total checklist. Table 6 Item Statistics from the Item Analysis Run Using Subtests Subtest Item Sub Prop. No. Item Corr. 1 1-1 .66 .72 .64 2 1-2 .51 .88 3 1-3 .20 4 1-4 5 REF CON D Item Sub Prop. No. Item Corr. 8 2-3 .72 9 .21 .25 .55 .44 .44 1-5 .25 .54 .59 6 2-1 .52 .63 .54 7 2-2 .12 .17 .31 îpb Subtest CON LEX D îpb .13 .18 .27 2-4 .57 .56 .50 10 2-5 .37 .60 .54 11 2-6 .29 .51 .49 12 3-1 .14 .22 .62 13 3-2 .62 .93 .82 Note. REF = Reference Subtest; CON = Conjunction Subtest; LEX = Lexical Cohesion Subtest; Item No. = checklist item number; Prop. Corr. = proportion correct; D = discrimination index. The only exceptions were items 3 and 7 which showed little or no improvement with this analysis. Scale inter-correlations were also calculated. Correlations between the Reference subscores and the Conjunction and Lexical Cohesion subscores were r = .03 and ,10 57 respectively. The correlation between the Conjunction and Lexical Cohesion subscores was r = .06. These values indicated no relationship among the three subscales of the checklist. A Pearson’s r was calculated between the subtest scores and the Total Test scores. Correlations of subtest scores to the Total Test score were r = .70, .67, .42 for Reference, Conjunction, and Lexical Cohesion subscores respectively. All correlations were significant (g < . 001). Checklist Reliability The internal consistency of the checklist overall was a = 0.32 with a standard error of measurement (SEM) of 1.58. The internal consistency of subtests were a = .39, for Reference, a = .22 for Conjunction, and a = .10 for Lexical Cohesion. Interrater studv. This section reports the results of the interrater study conducted with the 13 item Checklist 2.1 using myself and three other raters. The proportion of agreement for total scores was not reported as there were only four raters. Proportions would therefore only reflect agreement levels o f 0, 25, 50, 75 and 100 percent. I felt that these increments were too large to be discerning. I did, however calculate the proportion of agreement on an item-byitem basis across writing samples to determine on which items raters most frequently disagreed. To establish interrater reliability, I calculated correlations between pairs of raters and ran a one-way analysis of variance (ANOVA). Means and standard deviations were calculated for the 12 interrater checklist scores generated by each rater. The results showed mean scores of 5.75 (SD =2.13), 5.83 (SD = 2.14), 4.75 (SD = 1.85), and 6.00 (SD = 2.67) for raters one, two, three and four respectively. I was rater number four. While mean scores and spread of scores are similar for raters one, two and four, rater three’s scores were slightly lower with less variability indicating more stringent marking by rater three. 58 Table 7 displays Pearson product moment correlations calculated between pairs of raters. The correlation coefficients between raters ranged from .70 to .91 and were significant to g < .05. The three lowest correlations all involved rater three. Coefficients between raters one, two and four ranged from .86 to .91. TdWe7 Correlations Between Rater Pairs Across Items Raters 1 2 3 4 1 - .86 (.000) .75 (.005) .88 (.000) 2 - - .70 (.011) .91 (.000) 3 - - - .71 (.009) Note, g value in parentheses Levene’s test of homogeneity for variance among raters’ scores generated an F statistic of .507 (3, 44) at a significance level of p = .68 revealing that the assumption of homogeneity was met. As can be seen in Table 8, a one-way ANOVA revealed no significant difference between raters. TdWeS Analysis of Variance for Between Rater Differences Sum of Squares df Mean Square F P Among Raters 11500 3 3 833 479 .699 Within Raters 352.167 44 8.004 Total 363.667 47 The examination of the proportion of agreement on individual items across samples showed which items may still be ambiguous and therefore causing lower agreement. As you 59 can see in Table 9, agreement among raters across checklist items ranged from 75.00 to 97.92 percent. I considered percentages below 80 on a single item to be problematic. I selected this criterion as it indicated that a mean of more than three out of four judges agreed on the score of that item across all 12 checklists. Two items (items 5 and 13) showed less than 80 % agreement. Table 9 Proportion of Agreement Across Writing Samples for Each Item Item % Agreement Item % Agreement Item % Agreement No. M (SD) No. M (SD) No. M (SD) 1 87.50 (16.85) 6 87.50 (19.94) 11 95 83 (9 73) 2 83.33 (12.31) 7 97.92 (7.22) 12 93.75 (15.54) 3 85 42 (22 51) 8 93.75 (15.54) 13 75 00 (21.32) 4 85.42 (16.71) 9 87.50 (19.94) 5 79.17 (20.87) 10 95.83 (9.73) Note. M = mean; SD = Standard deviation. Validity Measures The validity measures reported here include the results of a factor analysis and a discriminant analysis. They also include results of the concurrent criterion-related study. Factor analvsis. I attempted a principal component analysis using the 13 items from Checklist Version 2.1. This was done to determine whether the checklist items loaded into components reflecting the instrument’s subtests. A principal component extraction followed by Varimax rotation generated six factors with eigenvalues greater than 1. This model accounted for only 59.83% of the total variance. A three component solution with Oblimin rotation and Kaiser normalization was also attempted, but this solution only accounted for 60 35.30% of the total variance. The three factor model accounted for all variables with loadings ranging from -.301 to .808. The loadings for each component are displayed in Table 10. Table 10 Variable Loadings for a Three Component Model Component Component Item No. 1 1 663 8 2 808 9 .448 10 .721 11 559 2 3 3 310 4 576 Item No. 1 2 592 5 723 12 A24 6 -301 13 456 7 3 467 To determine why the factor analysis accounted for a limited amount of variance, I examined item-by-item correlations. These are displayed in Table 11. Tabachnick and Fidell (1996) indicate that in order for a factor analysis to work, the correlation matrix should include “sizable” correlations. They define sizable values as greater than .30. As can be seen by examining Table 8, very few inter-item correlations were present. Only two values exceeded .30. All reported r values were significant (p< .05). 61 TdWell Matrix Displaying Item-by-Item Correlations Item 1 2 3 4 5 6 7 8 9 10 11 12 No. 2 .45 3 - - 4 - - - 5 .19 .44 -1 2 - 6 - - - - -.19 7 - .12 - .13 .11 - 8 - - - .11 - - .10 9 - - - - - .18 - - 10 - - .13 .13 - - - - - 11 - - - - -1 2 - - - - .21 12 - - - .10 - - - - - - - 13 - - - .17 - - - - - - - - Note. - = r values both nonsignificant and less than .10. Discriminant analvsis. I also ran a discriminant analysis to determine if checklist scores could predict grade membership or special learning designation. I first calculated descriptive statistics for each grade’s scores on the checklist. As can be seen in Table 12, although there was a general growth trend across grades on the total checklist score, there was little difference in the scores for Grades 4, 5 and 6. Table 13 shows more scattered patterns of growth with the subtest scores. Only Lexical Cohesion shows a steady improvement of scores across grades. 62 Table 12 Statistics Describing Checklist Total Scores by Grade Grade n Range Median M SD SEM 4 71 0 -9 4 4.46 186 R22 5 67 0-10 5 4 76 L35 0 20 6 84 1 -9 5 4.67 1.10 0 21 7 89 1-12 6 5.64 0 58 0.23 Note. M = mean; SD = standard deviation; SEM = standard error of measurement TaWelS Statistics Describing Checklist Subscores bv Grade Grade Subtest M SD SEM Subtest M SD SEM 4 REF 2 16 L35 0.16 LEX 0 58 0 58 0 07 5 REF 197 1.18 0.14 LEX 0 69 0.61 &07 6 REF 2 18 128 0 14 LEX 0 70 0 62 &07 7 REF 2 30 1.18 0 13 LEX 110 0 69 0.07 4 CON 173 110 0 13 5 CON 2.00 1.10 0.14 6 CON 170 1.17 0 13 7 CON 2 19 130 0.14 Despite the small differences between grades on the total checklist score, results from the discriminant analysis indicated that checklist scores predicted grade. Similarly, the checklist subscores o f Lexical Cohesion and Conjunction also predicted grade. Reference, 63 however, did not (see Table 14). Scores on the checklist did not predict any special learning designation. This is likely due to the small numbers represented in these categories. Table 14 Tests of Equality of Group Means for Checklist Scores Subtest Wilks' F dfi dfz E Lambda REF 0 991 0 917 3 306 433 CON 0 959 4.411 3 306 005 LEX 0.911 9.992 3 306 000 Total Score 0 938 6 698 3 306 .000 Concurrent criterion-related validity. I used the measures of writing fluency (WSC and TWW) and syntactical complexity (MLTU and SI) as indicators of concurrent criterionrelated validity. These measures were calculated and recorded for each writing sample. I took two steps in determining the adequacy of these measures for detecting improvements in writing with grade within this data set. First, I calculated the means and standard deviations for each measure by grade. The results are presented in Table 15. The data show that measures of SI, MLTU, WSC, and TWW increased with grade. 64 TaWel5 Means and Standard Deviations of TWW. WSC. MLTU and SI by Grade Writing Grade 4 Grade 5 Grade 6 Measure M M M (SD) (§D) (SD) Grade 7 M (SD) WSC 34.24 (13.46) 41.16 (14.53) 45.80 (15.47) 57.62 (14.47) TWW 37.00 (13.33) 43.54 (14.49) 48.83 (15.22) 59.58 (13.76) MLTU 7 88 (2 52) 7.94 (2.38) 8 87 (3 03) 9 53 (3 06) SI 118 (0 26) 1.19 (0.27) L25 (0.38) 123 (0 21) Note. M = mean; SD = Standard deviation Next, I used the results of a discriminant analysis to determine if scores of WSC, TWW, MLTU, and SI were able to predict grade. A Wilks’ Lambda test of equality of group means was conducted with all four scores. The results are presented in Table 16. Table 16 Tests of Equality of Group Means for Writing Measures and Grade Writing Wilks’ Measure Lambda F dfi d& E TWW 733 37 195 3 306 000 WSC ^29 38.007 3 306 .000 MLTU .942 5 976 3 293 .001 SI 989 1CW8 3 293 J72 Note. WSC = words spelled correctly; TWW = total words written; MLTU = mean length of T-Unit; SI = subordination index. Results showed that MLTU, TWW and WSC were able to predict grade membership, while SI was not. 65 I then calculated correlations between the scores on the checklist and writing fluency measures o f TWW and WSC and syntactic complexity measures of MLTU and SI using Pearson product moment correlations. As can be seen by examining Table 17, the relationships between fluency measures and the total checklist scores. Conjunction subscores and Lexical Cohesion subscores were significant (p < .001) These relationships had medium effect sizes. The relationships between MLTU and Total Test scores and Reference subtest scores were also significant (p < .05). Though these relationships were small, according to Cohen (1992) correlations between . 1 and .3 denote a small but not trivial effect. SI showed no significant correlation to any of the subtest scores. TdWel? Correlations Between Checklist Scores and Concurrent Measures Score WSC TWW MLTU SI REF -.014 (.808) -.005 (.929) 203 (.000) .041 (.477) CON 390 (.000) .393 (000) -.065 (.259) .034^558) LEX 335 (.000) .336 (.000) .173 (.003) .040 (.487) Total 338 (.000) .346 (.000) .146 (.012) .061 C296) Note. Total = total checklist score. P value in parentheses. Summary This chapter described the results of the steps taken in developing and evaluating an instrument for assessing cohesion in the writing of children. In the preliminary developmental stage of this study, findings fi-om qualitative analyses of the checklist were combined with data demonstrating the statistical performance of items and a pilot interrater study. 1 used these 66 findings to make changes to the checklist before attempting a large sample field test of the instrument. In the second portion of this study, the large scale sample evaluation of the checklist, I performed three more item analyses to examine the statistical performance of Checklist 2.0 and 2.1 versions, as well as to examine the relationship of items to subtests. I also gathered data reflecting the reliability of the instrument. These included measures of internal consistency and interrater reliability and agreement. Additionally, I gathered data to contribute evidence for the validity o f the instrument. These included a factor analysis, a discriminant analysis, and a concurrent criterion-related validity study. Evaluation of and implications for these results are discussed in Chapter Five. 67 CHAPTER FIVE: DISCUSSION The process used in this study for the development of an instrument to measure cohesion followed that which was outlined by Crocker and Algina (1986) and reported in Chapter Three. The remainder of this paper focuses on discussion of the outcomes of this process. The first topic to be discussed is that of the changes that were made to the checklist through the course of this investigation. The next area to be addressed will be the interpretation of the results of the reliability and validity studies. The validity section will include a discussion and interpretation of the concurrent criterion-related findings, the interpretation of subtest scores as it pertains to construct validity, and the discrimination ability o f the checklist’s scores. This chapter will conclude with a discussion of the limitations of this study and implications for future research and practice. The section discussing the implications for future research contains recommendations for further modifications to the content and form o f Checklist 2.1 for prospective development of the instrument. This section also includes some suggestions for future reliability and validity studies. Summary of the Checklist Item Development The preliminary form of this checklist (Checklist 1.0) consisted of 27 items grouped under the four subheadings of Reference, Conjunction, Lexical Cohesion, and Global Cohesion. Through the process of the preliminary development and large scale evaluation, the checklist underwent several revisions with the final form consisting of only 13 items and 3 subtests (Checklist 2.1). The changes made consisted of eliminating the subsection on Global Cohesion, combining items in the Conjunction subsection, and deleting and combining Lexical Cohesion items. Global Cohesion was no longer part of Checklist 2.1 even though aspects of global cohesion play a role in developing a coherent and cohesive piece of writing (McCutchen & 68 Perfetti, 1982; Smith, 1999). Several factors contributed to the decision to eliminate this subsection of the test. The results of the item analysis indicated that almost all writing samples in this study received credit on the items for sequential organization and consistent use of tenses. Conversely, very few writing samples received credit for the use of paragraphs. Given the length of the writing samples and the timed nature in which they were administered, the lack of paragraph structures was not surprising. Many of the samples were not long enough to warrant more than one paragraph. For those that were, with only one sheet of paper on which to write and three minutes to complete the writing task, time and space constrictions may have played a role in reducing the tendency to use paragraphs. Similarly, the limited length o f the writing samples may have reduced the likelihood of errors with consistent tense use. Due to their high or low proportions of credit, the items regarding sequential organization, consistency of tenses and use of paragraphs were considered too easy or too difficult. Items that are too easy or too difficult are not effective in discriminating among students (Sax, 1997). As most writing samples scored the same on these items, the information was not felt to be helpfiil in determining differences between the abilities of individual students to use cohesive devices. Subsequently, they were removed from the checklist. The remaining item regarding implied causal relationships was eliminated despite its adequate performance on the item analysis as it did not seem logical to retain one item in a subtest. These Global Cohesion items may have been more important if different or longer writing samples had been used or if a different genre of writing was attempted. McCutchen and Perfetti (1982) indicate that even the most immature writers have knowledge of the narrative form and that knowledge of the text form contributes to the overall coherence achieved in writing. Ferrera (1984) concurs that when children begin to write coherent texts. 69 they have more success with chronologically ordered texts. As these overall organizational patterns have not been found to present problems in children’s narrative writing, it is not surprising that the items from this portion of the checklist did not reveal many differences between writers. It may be that these items reflecting global cohesion would be more useful in discriminating between writers if the text form was not as familiar. This aspect of cohesion may be more informative when examining expository, persuasive or descriptive pieces of writing. The Conjunction subsection was another area that was changed substantially through the course of this study. Interestingly, the results of the item analysis supported the findings of other researchers in regards to the common occurrence of the conjunctions ‘and,’ ‘then,’ and ‘so’ in children’s writing. These were clearly the most frequently occurring conjunctions as indicated by the proportion of writing samples receiving credit for these items. In fact, most other forms of temporal, additive, and causal conjunctions occurred rarely. Consequently, I made the decision to collapse items for individual conjunctions into larger category groupings. These combinations resulted in improvements in the discrimination of most items. Two items, however, remained problematic. These items reflect the use of subordinating conjunctions used to show temporal connections between sentences and clauses. The poor discrimination indices on these items, however, is likely due to the relatively low proportion of use of such conjunctions by the writers in this sample. These items were still considered valuable to the checklist as their deletion would result in a loss of information regarding the use of temporal conjunctions which have been noted to be common in the writing of children (Crowhurst, 1987; Perrera, 1984; Smith, 1999). Furthermore, removal of these items would result in a gap in the kinds of conjunctive cohesion measured by this instrument. These items may be more valuable in evaluating cohesion in more typical 70 curricular writing as well as in different writing genres. Perrera, for instance, found that children did not use subordinating conjunctions as much in story writing as they did in other kinds of writing tasks. The final area of modification occurred in the subsection of Lexical Cohesion. Converses and antonyms rarely appeared in the writing samples so the item reflecting these types of cohesion was deleted. Superordinates, synonyms and near-synonyms also appeared relatively infrequently. However, it was found that by combining the two items regarding this form of cohesion, the discrimination index was improved with limited loss of information. The item now reflected whether or not a student was using this form of cohesion, but did not reflect the ‘closeness’ of the ties. As the writing samples used here were quite short, the distance between ties was rarely great. As physical proximity impacts the degree of cohesive bond formed (Halliday & Hasan, 1976), in a longer sample, this combined item might be problematic. All other items on the checklist functioned in an acceptable manner as judged from the item analysis and subsequently were left unchanged in Checklist 2.1. It should be noted however that two problems became apparent during the scoring process. These problems primarily affected the scoring of the Reference subsection. One problem involved item 5 which reflected the use of cohesive devices on a sentence by sentence basis. In most cases, these items were not problematic. However the scoring of this item was impacted by shifts in the story. It was the observation of this researcher that when there was a shift in story events, an anaphoric reference between the last sentence of one segment, and the first sentence of the next one was not warranted. Therefore, writing samples that contained shifts in time, place, or speaker may have been inadvertently penalized on this item. 71 The other problem related to the use of stories told in first person voice. As this instrument focused on endophoric forms of reference (reference within the text) and the referent for ‘I’ is exophoric (external to the text) repetitions of the first person pronoun ‘I ’ were not given credit as forms of reference. Stories containing only first person pronouns would not receive credit on the first two checklist items and would consequently receive lower scores. It was rare, however, to find writing samples that contained no examples of third person pronouns. These problems do not affect the scoring of individual writing samples but may impact the use of the checklist for cross student comparisons. Caution should be taken when comparing texts that are written in first and third person voices or in comparing stories with dialogue or other shifts in events with stories that do not contain such elements. Findings for Reliability and Validity Interpretation of the findings of this study lead to many important considerations and conclusions beyond the statistical performance of checklist items. Evaluation of an assessment instrument does not end with the analysis of the items. Reliability and validity of an instrument also require examination. These areas are addressed in the next section. Reliability I assessed reliability by examining internal consistency and interrater reliability and agreement. The alpha levels provided an indication of the internal consistency of the instrument. Although an alpha of .32 is low, this may be a fimction of the checklist length. Sax (1997) indicates that reliability increases with the number of items on the test. Therefore a reliability measure of .32 given that there were only 13 items on the checklist may be considered reasonable. Similarly, the alpha levels for each of the subtests were not very high. Again, each subtest consisted of only a small number of items. The Conjunction subscore 72 showed particularly low consistency among its items as reflected by its low alpha (.22). and large SEM (1.05). As mentioned earlier, two items on the Conjunction subtest were still performing poorly on the items analysis and this may have affected the internal consistency of this subtest. Lexical cohesion also demonstrated a low alpha but this figure is difficult to interpret given that there were only two items on this last section of the instrument. The issue of checklist length may have been further compounded by the limited length of the writing samples. Several factors may have impacted the internal consistency of this instrument by reducing the variability of the scores. A reduction in the variability of the sample will result in a reduction in the reliability findings (Sax, 1997). One way variability in scoring is reduced is with increased item difficulty, as many writing samples will receive the same score on a difficult item (Sax, 1997). According to this item analysis results, 7 of the 13 checklist items are considered difficult. With so many items showing up as difficult, the overall variability of scores may have been compromised. Furthermore, an examination of the differences in overall test and subtest scores revealed that mean scores did not vary greatly across grades. Similarly, the small number of items on each subtest fiarther reduced the scoring variability. The internal consistency of the 13 item Checklist 2.1, as found in this study, was weak. Reliability could be improved by including more easy items and increasing the number of items overall. Increasing the number of items on which the checklist is scored could also be accomplished by scoring more than one writing sample and pooling the results rather than increasing the number of checklist items. Furthermore, as checklist scores reflected the samples that were used to test the instrument, internal consistency scores may be different with different writing samples. 73 Further evidence regarding the checklist’s reliability was provided through examination of interrater reliability and agreement. Review of the literature did not provide absolute guidelines as to how much interrater reliability or agreement is considered adequate. I considered the overall levels of interrater reliability attained for Checklist 2.1 adequate due to the lack of significant differences among raters as determined by the one-way ANOVA and the correlations between rater pairs. However, it is important to note that the process of training raters for this interrater study was paramount in establishing agreement. The amount of interrater reliability and agreement required depends on the uses of the instrument. Higher levels than were attained here may be desired if an examiner wished to compare checklist scores to those obtained by another examiner. In addition to providing information about the reliability of the measure, checks of interrater agreement also provided some insight as to which checklist items could still be considered ambiguous. Two items (item 5 and item 13) demonstrated noticeably lower levels o f agreement than others and may warrant some further editing for the purposes of clarification. Validitv Studies of an instrument’s validity provide evidence that the instrument is measuring the construct it claims to measure. This may be accomplished in several ways. Evidence for criterion-related validity demonstrates that the scores on an instrument correlate with some related external criteria. Evidence for discriminant validity demonstrates the difference between what the test measures and different constructs. A factor analysis can also support construct validity by providing evidence for the relationships between items that reflect a single construct. Demonstration that the construct is important and can be measured also support arguments for validity. 74 Concurrent criterion-related validity. Prior to conducting the validity study, I examined the performance of the measures of TWW, WSC, MLTU and SI to determine their adequacy as measures of writing proficiency in this data set. Evidence showing the growth and discrimination ability of TWW, WSC and MLTU scores suggested that these measures reflected growth in writing ability across grades. This provided further evidence beyond what was reported in the literature review as to the relationship of these scores to developmental growth in writing proficiency. I then correlated these scores with scores from the checklist in an attempt to provide one source of evidence for concurrent criterion-related validity. Cohen (1992) states that correlations above . 1 and below .3 demonstrate a small but non-trivial effect. Correlations between .3 and .5 demonstrate medium effect sizes. These descriptors apply to the practical significance of a correlation value. Given Cohen’s definition of practical significance of correlations, the two scores of TWW and WSC were found to show medium effect size correlations to the Total Test score on the checklist while MLTU showed a small size correlation to the Total Test score. No relationship was found with SI. The failure of SI to show relationships to any of the other writing measures resulted primarily from the small variation in these scores across grades. The difference between Grades 4 and 7 on this measure was only .05 clauses per T-unit. Similarly this measure did not predict grade membership. SI growth as measured in previous studies has been shown to increase across grades but with some fluctuations in the growth pattern (Scott, 1988) as was also found here. While Hunt (1965) found a more noticeable pattern of growth in SI, she was looking at 4 years grade difference between groupings of students (Grades 4, 8 and 12). Even Klecan-Aker and Lopez’s (1985) study of SI differences between students in grades with 3 years apart (Grades 6 and 9) found no statistical difference between the scores of the two groups. This lack of variability between grades would give this measure very little 75 discriminating power, and make it difficult or impossible to detect any relationship it may have to other growth measures. Similarly, limited variability in scores of MLTU may also have impacted the size of their relationship to cohesion scores. MLTU and SI have generally been calculated from much larger writing samples. Hunt (1965), for example, used writing samples 1000 words long to calculate MLTU and SI. Although the MLTU showed growth with increased grade and SI showed a small growth pattern as well, it is possible that these values would have been more precise had the writing samples from which they were calculated been longer. While the Total Test score showed relationships to other measures of writing, these values were not large and therefore are not strong indicators of concurrent criterion-related validity. However, relationships between MLTU, TWW and WSC, and the cohesion score may provide evidence for another kind of validity. Discriminant validitv. Discriminant validity is indicated by evidence showing how the construct in question differs from other constructs. The size of the correlations between the scores on the cohesion checklist and measures of writing fluency and syntactic complexity provided evidence of discriminant validity. It was argued earlier that measures of cohesion should show some relationship to measures o f writing fluency and syntactic complexity as they all reflect skills related to writing proficiency. But while all measures may reflect writing ability, they each reflect different aspects of that ability. Several factors could explain diverse performance on different measures of writing. These differences relate to the underlying skills involved in various aspects of writing. For instance, while researchers have called attention to the difficulties students with languagelearning disabilities have with cohesion, syntax and other general areas of writing, not all 76 children with writing difiBculties have an underlying language problem. Children with nonverbal learning disabilities and those with dysgraphia may have problems that are associated with the motor aspects of writing (Richards, 1999; Thompson, 1997). These children may demonstrate difficulties with writing speed and letter formation which may impair scores of writing fluency, but may not impact the ability to write coherently or in complex sentence forms. Additionally, it has been my professional observation that children having language impairments that primarily impact the pragmatic aspects of language may have significant difficulty with cohesion but not have difficulty with writing fluency or written syntax. In each o f these scenarios, cohesion scores, syntax scores and writing fluency scores would not be closely related. The modest size of the relationships found between TWW, WSC, MLTU and the Total checklist scores provided evidence of discriminant validity. That is, the cohesion checklist did not measure the same skills as writing fluency or syntax measures. If it did, we would expect higher correlations. The medium and small effect sizes found here suggest that cohesion scores are related to measures of writing fluency and syntactic complexity, as all three measures relate to writing proficiency, but are not measures of the same underlying construct. In fact, it was this opinion that prompted this study. If mechanical skills in writing were highly reflective of discourse level skills such as the use of cohesion devices, there would be no need to measure cohesion separately, as measurements of mechanical skills would be ready made indicators of cohesion. These results show that cohesion is a separate writing skill that can be measured. Discrimination ability of checklist scores. The ability of an assessment tool to detect differences between students o f differing abilities provides further evidence for validity. The total checklist score was able to predict grade membership. However, it was not able to 77 predict the special learning designations ofESL/D, SLR, LD, and Other, either individually or combined. One explanation for this lack of predictability could be the small numbers of writing samples generated by children with special learning designations in this data source. Of the 312 writing samples used, there were only 7 samples generated from children designated SLR, 30 samples from children designated ESL/D, 21 samples from children designated LD, and only 9 from children designated as Other. With such small numbers represented in these groups, it could not be established whether performance on this instrument detected differences between these students and their normally achieving peers. Sampling across four grades also made this kind of detection difficult. For example, there may be minimal differences between a high performing student in Grade 4 and a student in Grade 7 with a special learning designation. I did not perform discriminant analyses by grade as the number of writing samples generated by children with special learning designations in each grade was too small. Although the Total Test scores were able to predict grade membership, differences existed among subtests in their ability to do so. For instance, the Reference subscore could not be shown to predict grade membership at all. One explanation for this could be that the scores on this section of the checklist were not sensitive to incremental developmental growth as would be expected from grade to grade. Perrera (1984) has suggested that “pronominal reference is used early and extensively in children’s writing” (p. 241). If this is true, then items of reference may have difficulty predicting grade differences in the upper elementary years simply because children have already developed their use of this form of reference. However, Perrera also stated that children have ongoing difficulties with pronominal agreement and using pronouns in an unambiguous way but she did not indicate at what developmental stages these errors diminish. 78 Another explanation for this lack of predictability could be related to the problem encountered when scoring Reference items across shifts in story events. It is possible that some older students used Reference items with less error and ambiguity but lost credit due to dialogue use or shifts in story events. This could result in older and younger writers with similar Reference scores from credit on different items. This explanation may partially account for the lack of substantial differences among grades on the mean Reference scores. Another explanation for the inability of reference scores to predict grade may reflect the “all” or “none” scoring criteria applied in this section. Older students with only one error on an item would receive the same item score as younger students with multiple errors, yet it may be argued that these two children may have differing levels of ability in this area. While the Reference subscore showed poor discrimination among grades, the Conjunction and Lexical Cohesion subscores did not. Their ability to predict grade membership suggested that there is some relationship between performance on these two subtests and developmental writing ability. Subscores versus total scores. Evidence for construct validity also comes from findings that support the underlying theorized construct in question, in this case the components of cohesion reflected in the subtests of the instrument. The subsections of the checklist were based on the concepts outlined by Halliday and Hasan (1976). Results o f the item analyses showed a difference in the performance of the items as parts of the total checklist when compared to the item performance as parts of the subtests. This finding seems to support the relationship between the checklist subsections and the underlying constructs of referential, conjunctive, and lexical cohesion described by Halliday and Hasan. This relationship is evidenced not only by the improvement in items scores when analyzed as part of the subtests, but also by the lack of relationship between the three subsets of items. 79 Further evidence regarding the differences in subtest performance is indicated by variation in how the subtests related to other measures of writing. While the checklist Total Test score showed correlations to three other writing measures, individual subtests of the instrument varied in their relationship to these same three measures. For instance, the Reference sub score was shown to have a small effect size correlation to MLTU. Conversely, it showed no relationship to fluency writing measures while the Lexical Cohesion and Conjunction subscores did. This suggests to me that the length of a writing sample has no influence on whether or not a child successfully or unsuccessfully used devices of referential cohesion. On the other hand, the relationship between writing fluency measures and Conjunction and Lexical Cohesion may be a function of the length of the writing samples. It may be that the longer the writing samples were, the more variety there was in the vocabulary and the kinds of conjunctions used. Another explanation is that Conjunction and Lexical sub scores are, in fact, generally related to overall writing proficiency as are TWW and WSC. The Conjunction sub score was the only one that did not correlate to MLTU. This may have resulted from higher uses of coordinating conjunctions. The use of more varied coordinating conjunctions and less subordinating ones could result in high cohesion scores coupled with lower MLTU as the MLTU generally increases with increased subordination. The Lexical Cohesion subscore was the only one to show correlations to all the three measures of TWW, WSC and MLTU. These findings, though mixed, do provide evidence supporting the argument that facility with cohesion is related to proficiency with several different underlying skills related to three areas of cohesion described by Halliday and Hasan (1976). Because of the findings regarding the differential performance of checklist subtests, I expected that a factor analysis would provide a solution of three components underlying the 80 checklist items. However, a factor analysis was unable to account for a large portion of the variance in checklist scores and did not show substantial variable loadings on each component. Loadings greater than .30 in absolute value are generally considered significant (Academic Computing And Instructional Technology Services, 1995) but Stevens (1996) indicates that components require a minimum of four loadings greater than .60 or a minimum of three loadings greater than .80 to be reliable. There should be a minimum of three observed variables for each factor and, ideally, each variable should load significantly on a single factor (Academic Computing And Instructional Technology Services, 1995). One explanation for the failure of the factor analysis may relate to the small inter­ correlations between checklist items. The reason for the poor correlations found between checklist items may lie in the dichotomous nature of the variables used on this instrument. Gorsuch (1983) explains: When data are noncontinuous, it is possible for several individuals to receive exactly the same score on one variable. However, if these same individuals do not receive the same score on another variable, the two variables cannot correlate perfectly even if the underlying relationship is perfect. The reduction in correlation occurs most often with dichotomous variables because a great number of individuals receive the same score (pp. 291 - 292). Another explanation for the difficulty with interpreting the factor analysis relates to the quality o f the data. Error in the data can strongly influence the results of a factor analysis, therefore, the instrument used for a factor analysis needs to be reliable (Academic Computing And Instructional Technology Services, 1995; Tabachnick and Fidell, 1996). As the internal consistency results on this instrument were not strong, this may have impacted the outcome of this analysis. 81 Despite the difficulty with interpretation of the factor analysis, the findings of the item analysis seem to support the subtest divisions of the checklist. The ITEMAN analysis is suited to dichotomous variables. Additionally, point biserial correlations are based on correlations between a dichotomous variable (item score) and a continuous variable (subtest or total score). Given the improved point biserial correlations with the checklist subscores and the differential performance of subtests, interpretation of checklist scores may be better served by examining performance on each of the subsections individually. These findings may also suggest that cohesion is not a single construct as was first expected here, but may be made up of several unrelated or semi-related latent skills or abilities. Consequently, subtest scores may need to be considered separately. Contributions of this Research This study formed the initial stages of developing a checklist to use in evaluating cohesion in writing. Through this research, the items on the checklist have been revised to reduce ambiguities and improve their performance on a classical item analysis. The final interrater study showed adequate agreement among raters. It has been shown that the checklist’s total score is able to predict grade membership thus showing its sensitivity to differences in the writing of children of different grade levels. Additionally, checklist subscores of Conjunction and Lexical Cohesion were able to predict grade membership. As well, checklist scores demonstrated discriminant validity in their relationships to other measures of writing proficiency. There is also evidence to suggest that subtests be scored independently. While further development of the checklist is still warranted, this research has contributed to the field in three main ways. First, the study done here provided the ground work for further development of an instrument to measure cohesion in writing. Second, information on writing development and evaluation is sparse in the literature. This study will 82 add to that body of knowledge through its examination of evaluating cohesion in writing. Third, the number of studies on cohesion in the writing of school-aged children is also limited. This study o f cohesion contributes to that body of knowledge. Limitations As this study warranted comparison of the performance of checklist items across many writing samples, a single type of writing sample was used. I felt that the type of writing samples chosen should be as homogenous as possible to make comparisons between writing sample scores more clear. I chose CBM samples as they met this criteria for homogeneity and were available in large quantities across grades. However, CBM writing samples are short and administered under time constraints. No proof-reading or editing is allowed. The performance o f items on this testing o f the checklist was indeed limited by the constraints under which these writing samples were generated. It is expected that item analyses conducted with untimed edited narrative writing samples would have different results. Another necessary limitation of the sample chosen was the genre used. Many studies have shown that the types of cohesive devices used in writing are related to the genre of the written text (Crowhurst, 1981, 1987; Hidi & Hildyard, 1983; McCutchen & Perfetti, 1982; Pellegrini et al., 1984). A single genre was used as it would be difficult to interpret an item analysis based on comparing different kinds of writing samples. That is, because different devices are genre specific, variability in cohesion scores found with mixed writing samples may have reflected differences not related to proficiency with the use of cohesive devices. By using only one genre, analysis of checklist items was made easier, but items that may have been more important in detecting differences in other writing genres were lost. The checklist developed here, consequently, may only be useful in evaluating cohesion in narrative writing samples. 83 The sample of students who generated the writing also impacted the outcomes of this study. As the samples were generated from only three schools, there were only 67 samples generated from children designated with special learning needs or learning disabilities. With such small numbers represented in this group, it could not be established whether performance on this instrument detected differences between these students and their normally achieving peers. Sampling across four grades also made this kind of detection difficult. Consequently, there is not enough information to determine at this point whether such an instrument would be useful in detecting differences between different groups of students in the same grade. This ability would be crucial for its value as a diagnostic tool; that is, in its ability to show differences between a target student and same grade peers, and to detect growth in a single student over short periods of time. Implications for Future Research The findings here reflect the first stages in the development of a checklist for evaluating cohesion in writing. Further development of the instrument is warranted before its value as an educational tool can be determined. In addition to the suggestions made throughout this discussion, some further suggestions for future research are explored here. Proposed Changes to Checklist Content and Format Content. There are still some items on the checklist that may benefit from further modifications. These include items that showed poor interrater agreement and items on the Reference subsection which were presenting problems when scoring stories with shifts in events. For instance. Item 5 could be reworded to say “Except in topic sentences, each sentence is connected to the one proceeding it by at least one form of reference.” Improvements could also be made to the Reference section by setting criteria rather than “all” for credit on an item. For example, criteria for credit could include an allowable number or 84 proportion of violations per designated number of T-units. Establishing appropriate ratios would require testing samples of writing with this subtest and determining which proportions reflected the best discrimination between groups of learners. Item 13, which addressed the use of complementary lexical items, was also presenting difficulties with rater agreement. The scoring guide could include more explicit scoring instructions for this item such as by including a systematic way of detecting examples of collocation. This could include a procedure like underlining all the nouns and verbs and examining them to find word pairs that meet the definition of collocation. As the internal consistency of the checklist may be better with more items, items that had been combined in order improve this item analysis using the very short three- minute narrative writing samples may be separated into a greater number of discreet items to be tested with longer samples or writing of other genres. In particular, the Conjunction items may be separated into more discreet items and the subtest of Global Cohesion could be reintroduced. Format. As the length of the sample seemed to impact scores on certain subtests more than others, some guideline reflecting the length of the sample to be evaluated may be prudent. This guideline could form an minimum requirement for length. Additionally, where an examiner wished to use the checklist to evaluate longer samples o f writing, only a portion reflecting the length requirement need be scored. For example, the examiner could score the first 50 T-units. This would not only help to control for differences in scores caused by length, but also would make the task of scoring more manageable. Another way to control for size would be to score Conjunction and Lexical Cohesion subtests on the basis of a proportion rather than an absolute score. 85 Proposed Procedures for Checklist Evaluation As the genre, degree of editing, and audience all affect the types of cohesive devices used it is recommended that the checklist be evaluated with a variety of writing samples including more typical curricular narrative samples. Furthermore, by testing the checklist with other forms of writing samples, it could be established how much of the validity and reliability problems encountered in this study were related to the checklist and how much could be accounted for by the writing samples used in this study. I have included some suggestions for future analysis of the checklist’s performance. One suggestion is to use writing samples from the Foundations Skills Assessment administered provincially to Grades 4 and 7 . The advantage of this choice is that these samples are administered in a standardized way, and their scores could be used to establish concurrent criterion-related validity. Furthermore, by examining the checklist performance within large numbers in each grade, it may be possible to establish whether or not the checklist can detect differences between same grade peers of differing ability. One limitation of this choice is that it would not be possible to examine performance on the checklist across grades. Another suggestion for further evaluation of this instrument is the District Writing 5 Exam. This exam is administered annually to all Grade 5 students in School District # 57. Each sample is rated on a 4 point holistic rating scale. The advantage of this choice is that these samples are administered in a standardized way. As each sample is only rated on a four point scale, these scores are not diverse enough to serve as criteria for concurrent validity. However, a discriminant analysis could determine whether checklist scores would be predictors of the holistic rating. This would provide evidence for the checklist’s ability to detect differences between same-grade peers which is paramount for its use as a diagnostic device. One limitation of this choice is that results could not be generalized beyond Grade 5. 86 Another source of writing samples on which to test the instrument is writing portfolios. The challenge with this choice would be selecting samples that are comparable on the basis of genre, degree of editing, instruction, and audience. The advantage of this selection would be the authenticity of samples and the opportunity to determine how the checklist detects cohesion in “best” samples of writing. The instrument also could be used to explore cohesion in “draft” and “published” versions of the same writing included in the portfolios, thus providing pedagogical guidance. Use of portfolios would also allow for testing across grades, as well as providing an indicator of the writing development of individual students over the school year. The use of such portfolio-based writing samples would provide an important indicator of practical or clinical validity. Reliability. It would be valuable to determine if longer writing samples, or those produced without time constraints, resulted in larger values for internal consistency. Furthermore, evidence of the stability of cohesion scores across time through test-retest procedures and equivalence of cohesion scores across writing samples of similar genre and instructional approach would be valuable in the development of this tool. Validitv. Further assessment of the validity of this instrument is also warranted. In order for this tool to be used diagnostically, for instance, there needs to be evidence showing the checklist’s ability to predict special learning designations. This may be best accomplished by using a disproportional stratified random sample that represented large proportions, and therefore large samples of children with these designations. This type of sampling is useful when comparisons among groups is of interest (Palys, 1997). An important consideration would be to use widely accepted criteria for the identification of specific designations of special needs. 87 Another area to be addressed is concurrent criterion-related validity. The argument for the concurrent validity of this instrument may be strengthened by a comparison of scores from the cohesion checklist to holistic ratings of readability of the same samples, as this aspect of writing is expected to be more related to cohesion (Lindeberg, 1984; Rutter & Raban, 1982; Zamowski, 1981) than the measures used here. This may include procedures such as comparing the checklist scores to teachers’ ratings of quality or to the analytic scoring rubrics used in the Writing Reference Set (British Columbia Ministry of Education, 1996b). Once it could be demonstrated that the checklist was able to detect differences between writers of the same age with differing abilities, validity for use of this tool as a diagnostic instrument that can be used to establish and monitor progress of intervention goals still would need to be determined. This would involve pre- and post- treatment measures to determine if the checklist was sensitive to changes in the use of cohesion over time. Implications for Practice The initial impetus for this research was to create an instrument that could be used by speech-language pathologists to detect and define problems with cohesion in authentic writing samples. That is, the instrument, in its completed form, would assist in first detecting which children were having difficulty in using cohesive devices in writing when compared directly to their peers on the same writing task. Second, I wanted an instrument that could reveal which aspects of cohesion were lacking or problematic in a child’s writing. Third, I wanted the instrument to be able to detect differences in an individual’s use of cohesive devices with intervention. Although the checklist developed here is not yet ready for these uses, it still may have application as a reference guide for observing children’s writing. In this way it could assist an examiner in describing what types of cohesion the child is using. In addition to this use, the 88 results of this study can inform professionals who work with children on their writing skills in two main ways. First, the results of this study suggest that different kinds of cohesion may benefit from separate evaluation. Proficiency in the use of referential cohesion may develop quite differently and reflect a different type of skill than that seen with the use of lexical devices or conjunctions. Second, writing is a complex process requiring facility with a number of different skills. Assessment in any single area will not tell us much about a writer’s overall writing ability. Writing ability constitutes more than a single latent variable, therefore assessment across a variety of skills is necessary to get an adequate picture of a writer’s abilities, disabilities, strengths, and weaknesses. Referential cohesion, conjunction, and lexical cohesion are only small parts of a complex process or skill. 89 REFERENCES Academic Computing and Instmctional Technology Services. (1995). Factor analysis using SAS PROC FACTOR [On-line]. Available; http://www.utexas.edu/cc/docs/stat53.html Anderson, P. L. (1982). A preliminary study of syntax in the written expression of learning disabled children. Journal of Learning Disabilities, 15(6). 359-362. Black, P. & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education. 5(11 7-73. British Columbia Ministry of Education (1996a). English language arts K to 7 integrated resource package. Victoria, BC: Author. British Columbia Ministry of Education (1996b). Evaluating writing across the curriculum: Using the writing reference set to support learning. Victoria. BC: Author. Calfee, R. C & Freedman, S. W. (1996). Classroom writing portfolios. Old, new, borrowed, blue. In R. C. Calfee & P. Perfumo (Eds ). Writing portfolios in the classroom: Policy and practice, promise and peril, (pp. 3-26). Mahwah, NJ: Lawrence Erlhaum Associates. Canter, A. & Marston, D. (1998). Helping children at home and school: Handouts from vour school psychologist. Bethesda, MD: National Association of School Psychologists. Carrow-Woolfolk, E. (1996). Oral and Written Language Scales. Circle Pines, MN: American Guidance Service. Choate, J. S. & Miller, L. J. (1992). Curricular assessment and programming. In J. S. Choate, B. E. Enright, L. J. Miller, J. A. Poteet, & T. A. Rakes (Eds). Curriculum-based assessment and programming (2nd ed) (pp. 43-77). Needham Heights, MA: Allyn and Bacon. Cohen, J. (1992). A power primer. Psychological Bulletin. 112 (1). 155-159. 90 Crocker, L & Algina, J. (1986). Introduction to classical and modem test theory. New York: Holt, Rinehart & Winston. Crowhurst, M. (1981) Cohesion in argumentative prose written by sixth-, tenth-, and twelfth-graders. Paper presented at the Annual Meeting of the American Educational Research Association, Los Angeles. (ERIC Document Reproduction Service No. ED 202 023) Crowhurst, M. (1987). Cohesion in argument and narration at three grade levels. Research in the Teaching of English. 21 (2). 185-197. Dagenais, D. J. & Beadle, K. R. (1984). Written language: When and where to begin. Topics in Language Disorders. 4 (2). 59-85. Deno, S. L , Marston, D , & Mirkin P. (1982). Valid measurement procedures for continuous evaluation for written expression. Exceptional Children. 48 14). 368-371. Educational Testing Service (1993). The ETS collection catalog :Achievement tests and measurement devices (2nd Ed ). Phoenix, AZ: Oryx Press. Engelhard, G. (1998). Review of the CTB Writing Assessment System. In J. C. Impara & B S. Plake (Eds.) The thirteenth mental measurements yearbook (pp. 329-331). Lincoln, NE: Buros Institute of Mental Measurements of the University of Nebraska. Englert, C. S. & Raphael, T. E. (1988). Constructing well-formed prose: Process, structure and metacognitive knowledge. Exceptional Children. 54 (6), 513-520. Fewster, S. (2000). School based evidence for validitv of curriculum-based measurement norms. Unpublished master’s thesis. University of Northern British Columbia, Prince George, British Columbia, Canada. Gillam, R. & McFadden, T. U. (1994). Redefining assessment as a holistic discovery process. Journal of Childhood Communication Disorders. 16 (1), 36-40. 91 Gorsuch, R. L. (1983). Factor Analysis. (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Graham, S., Harris, K. R., Mac Arthur, C. Schwartz, S. (1998). Writing instruction. In B. Y. L. Wong (Ed.) Learning about learning disabilities (2nd ed., pp. 391-423). San Diego: Academic Press. Greenberg, K. L. (1987). Defining, teaching and testing basic writing competence. Topics in Language Disorders. 7 (4). 31-41. Halliday, M. A. K. & Hasan, R. (1976). Cohesion in English. London: Longman Group. Hansen, J. B. (1998). Review of the Test of Written Language - Third Edition. In J. C. Impara & B. S. Plake (Eds.) The thirteenth mental measurements yearbook (pp. 1070 - 1072). Lincoln, NE: Buros Institute of Mental Measurements of the University of Nebraska. Hedberg, N. L. & Fink, R. J. (1996). Cohesive harmony in the written stories of elementary children. Reading and Writing: An Interdisciplinary Journal. 8. 73-86. Hidi, S. E. & Hildyard, A. (1983). The comparison of oral and written productions in two discourse types. Discourse Processes. 6. 91-105. Howell, K. W., Fox, S. L. & Morehead, M. K. (1993). Curriculum-based evaluation: Teaching and decision making. (2nd ed ). Belmont, CA: Wadsworth. Hughes, D , McGillivray, L. & Schmidek, M. (1997). Guide to narrative language: Procedures for assessment. Eau Claire, WI: Thinking Publications. Hunt, K. W. (1965). Grammatical structures written at three grade levels. Champaign, IL: National Council of Teachers of English. Impara, J. C. & Murphy, L. L. (Eds ). (1994). Psychological assessment in schools. Lincoln, NE: University of Nebraska Press. 92 Impara, J. C. & Plake, B. S. (Eds.)- (1998). The thirteenth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental Measurements of the University of Nebraska. Isaacson, S. (1991). Assessing written language skills. In C. S. Simon (Ed.) Communication skills and classroom success: Assessment and therapy methodologies for language learning disabled students, (pp. 224-237). Eau Claire, WI: Thinking Publications. ITEMAN (Version 3.50) [Computer software]. (1994). St. Paul, MN: Assessment Systems Corporation. Kimmel, E. W. (1998). Review of the Writing Process Test. In J.C. Impara & B.S. Plake (Eds.) The thirteenth mental measurements yearbook (pp. 1160-1161). Lincoln, NE: Buros Institute of Mental Measurements of the University ofNebraska. King-Sears, M. E. (1994). Curriculum-based assessment in special education. San Diego: Singular Publishing Group. Klecan-Aker, J. S. & Hendrick, D. L. (1985). A study of the syntactic language skills of normal school-age children. Language. Speech, and Hearing Services in Schools. 16 (3). 187-198. Klecan-Aker, J. S. & Lopez, B. (1985). A comparison of T-units and cohesive ties used by first and third grade children. Language and Speech. 28 (3), 307-315. Liles, B. Z. (1985). Cohesion in the narratives of normal and language-disordered children. Journal of Speech and Hearing Research. 28. 123-133. Liles, B. Z , Duffy, R. J., Merritt, D. D. & Purcell, S. L. (1995). Measurement of narrative discourse ability in children with language disorders. Journal of Speech and Hearing Research. 38. 415-425. 93 Lindeberg, A. C. (1984). Cohesion and coherence in short expository essays. Proceedings from the Nordic Conference for English Studies. Loban, W. (1976). Language development: Kindergarten through grade twelve. Urbana, IL: National Council of Teachers of English. Marston, D. B. (1989). A curriculum-based measurement approach to assessing academic performance: What it is and why do it. In M. R. Shinn (Ed.) Curriculum based measurement: Assessing special children (pp. 18-78). New York: Guilford Press. Marston, D. B. & Deno, S. (1981). The reliability of simple direct measures of written expression (Research Report # 50). Miimeapolis: University of Minnesota Institute for Research of Learning Disabilities. Miller, L. E. (1999). Evaluating English writing at the highschool and college level. Unpublished paper. (Available from Lois E. Miller, P.O. Box 101, Nass Camp, BC, VOJ 3J0). McCutchen, D. & Perfetti, C A (1982). Coherence and connectedness in the development of discourse production. Text. 2. 113-139. Moss, P. A. (1995). Themes and variations in validity theory. Educational Measurement: Issues and Practice. 15 (3), 5-13. Murphy, L. L., Impara, J. C. & Plake, B. S. (Eds ). (1999). Tests in print: An index to tests, test reviews, and the literature on specific tests. Lincoln, NE: Buros Institute of Mental Measurements of the University ofNebraska. Murray-Ward, M. (1998). Review of the Test of Written Expression. In J. C. Impara & B S. Plake (Eds.) The thirteenth mental measurements yearbook (pp. 1067-1068). Lincoln, NE: Buros Institute of Mental Measurements of the University ofNebraska. 94 Nelson, N. W. (1994). Curriculum-based language assessment and intervention across the grades. In G. P. Wallach & K. G. Butler (Eds.) Language learning disabilities in schoolaged children and adolescents: Some principles and applications (pp. 104-131). Toronto; Maxwell Macmillan Canada. Palys, T. (1997). Research decisions; Quantitative and qualitative perspectives. Toronto: Harcourt Brace. Pellegrini, A. D , Galda, L. & Rubin, D. (1984). Context in text: The development of oral and written language in two genres. Child Development, 55. 1549-1555. Perrara, K. (1984). Children’s writing and reading. Oxford: Basil Blackwell. Poplin, M., Gray, R , Larsen, S., Banikowski, A. & Mehring, T. (1980). A comparison of components of written-expression abilities in learning disabled and non learning disabled students at three grade levels. Learning Disabilities Quarterly. 3 (4L 46-53. Poteet, J. A. (1992a). Educational assessment. In J. S. Choate, B. E. Enright, L. J. Miller, J. A. Poteet, & T. A. Rakes (Eds). Curriculum-based assessment and programming (2nd ed) (pp. 1-21). Needham Heights, MA: Allyn and Bacon. Poteet, J. A. (1992b). Written Expression. In J. S. Choate, B. E. Enright, L. J. Miller, J. A. Poteet, & T. A. Rakes (Eds). Curriculum-based assessment and programming (2nd ed) (pp. 231-271). Needham Heights, MA. Allyn and Bacon. Principles for fair student assessment practices for education in Canada. (1993). Edmonton, AB: Joint Advisory Committee. (Available from the Joint Advisory Committee, Centre for Research in Applied Measurement and Evaluation, 3-104 Education Building North, University of Alberta, Edmonton, AB, T5G 2G5). Psychological Corporation (1992). Wechsler Individual Achievement Test. San Antonio, TX: Harcourt Brace and Company. 95 Ratner, V. L. & Harris, L. R. (1994). Understanding language disorders: The impact on learning. Eau Claire, WI: Thinking Publications. Richards, R. G. (1999). The source for dyslexia and dysgraphia. East Moline, IL: LinguiSystems. Rousseau, M. K. (1990). Errors in written language. In R. A. Gable & J. M. Hendrickson (Eds.) Assessing students with special needs (pp. 89-101). London: Longman Group. Rutter, P. & Raban, B. (1982). The development of cohesion in children’s writing: A preliminary investigation. First Language. 3 (7). 63-75. Sax, G. (1997). Principles of educational and psychological measurement and evaluation (4th ed ). Belmont, CA: Wadsworth Publishing. Schiffrin, D. (1994). Approaches to discourse. Cambridge: Blackwell. School District #57 (1996). Guidebook for the use of curriculum based measurement in School District #57. Prince George, BC: School District #57. Scott, C. (1988). Spoken and written syntax. In M. A. Nippold (Ed.) Later language development: Ages 9 through 19 (pp. 49-95). Boston: College-Hill Press. Scott, C. (1991a). Learning to write: Context, form and process. In A. G. Kamhi & H. W. Catts (Eds.) Reading disabilities: A developmental language perspective (pp. 261-302). Boston: Allyn & Bacon. Scott, C. (1991b). Problem writers: Nature, assessment and intervention. In A. G. Kamhi & H. W. Catts (Eds.) Reading disabilities: A developmental language perspective (pp. 303-344). Boston: Allyn & Bacon. 96 Silliman, E. R., Jimerson, T. L. & Wilkinson, L C (2000). A dynamic systems approach to writing assessment in students with language learning problems. Topics in Language Disorders. 20 (41 45-64. Silliman, E. R. & Wilkinson, L.C. (1994). Observation is more than looking. In G. P. Wallach & K. G. Butler (Eds ), Language learning disabilities in school-aged children and adolescents: Some principles and applications (pp. 145-173). Toronto: Maxwell Macmillan Canada. Silliman, E. R., Wilkinson, L.C. & Hoffinan, L. P. (1993). Documenting authentic progress in language and literacy learning: Collaborative assessment in the classrooms. Topics in Language Disorders. 14 H I 58-71. Singer, B. D (1995). Written language development and disorders: Selected principles, patterns and intervention possibilities. Topics in Language Disorders. 160 ) . 83-96. Smith, L. (1999). An exploration of cohesion in narrative and expository writing in the mid-elementary years. Unpublished paper. (Available from Lynda Struthers [Smith], 267 Claxton Cres., Prince George, BC, V2M 5X6). Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Tabachnick, B. S. & Fidell, L. S. (1996). Using multivariate statistics. (3rd ed.). New York: Harper Collins College Press. Thompson, S. (1997). The source for nonverbal learning disorders. East Moline, IL: LinguiSystems. Tindal, G , Marston, D. & Deno, S. L. (1983). The reliability of direct and repeated measurement (Research Report No. 109). Minneapolis, MN: Institute for Research on Learning Disabilities. 97 Tindal, G. & Nolet, V. (1990), The construct validity of curriculum-based measurements o f achievement: A multitrait-multimethod analysis. Paper presented at the American Educational Research Association, Boston, April 16-20. (ERIC Document Reproduction Service No. ED 325 506) Tindal, G. & Parker, R. (1991). Identifying measures for evaluating written expression. Learning Disabilities Research and Practice. 6. 211-218. Tinsley, H. E. & Weiss, D. I. (1975). Interrater reliability and agreement of subjective judgements. loumal of Counseling Psvchologv. 22 (4), 358-376. Warren, S. F. & Yoder, P. I. (1994). Communication and language intervention: Why a constructivist approach is insufficient. The Journal of Special Education. 28 (3), 248-258. Wiig, E. & Semel, E. (1984). Language assessment and intervention for the learning disabled (2nd ed.). Columbus, OH: Charles E. Merrill. Zamowski, M. (1981). A child’s composition: How does it hold together? Language Arts. 58 I2L 316-19. 98 APPENDIX A Letter of Consent 99 (Principal’s Name) (School’s Name and Address) March 10, 2000 Dear , I am currently working as a speech-language pathologist on Area Support Team 4 . 1 am also currently working on completing graduate studies in Education at the University of Northern British Columbia (UNBC). This letter contains an outline of my thesis and requests your assistance in completion o f this research. Norm Monroe, Director of School Services, has given his approval for this thesis project. He will be kept apprised of the details of this project as it is carried out. The results will be o f interest to support teachers and school district specialists involved in student assessment. Research Problem to be Addressed Review of the Uterature and my own experience indicates that there are qualitative differences between the writing of children with disabilities and those without. These qualitative differences are not just a reflection of the misuse of writing conventions such as spelling, punctuation and grammar, but extend into the ability of writers to communicate their ideas effectively to the reader. The research indicates that problem writers have difficulty with cohesion (e.g. Hedberg & Fink, 1996). Cohesion involves the use of linguistic devices that serve to link ideas and sentences together creating a unified text. This area of writing has been shown to reflect the readability o f a text (e.g. Crowhurst, 1980; Hedberg et al., 1996), but very few assessment tools are available that evaluate writing in this way. My research, therefore, involves the development of an assessment instrument that can be used to assess cohesion in the writing samples of school-aged children for the purpose of planning and monitoring interventions. Method The method I will use involves a field test of the instrument using approximately 300 to 400 CBM writing samples from students in School District #57, in Grades 4 through 7. I am requesting that some of these samples be from your school. The samples from your school should consist of all students in each grade in a single testing period. The time of year in which the samples were collected is not significant, though I ask that all samples provided come from the same testing period. The samples need not be current and names of the students and schools who generated them will remain anonymous. The only identifying information that is requested is the grade and gender o f the writer, as well as any special designation that the student might have (SLR, ESL/ESD, LD). Photocopies or original samples are welcome. Originals will be returned at the completion of the project. Photocopies will be destroyed. The collected samples will each be rated using the cohesion assessment tool. The results of this rating will then be used to test for sensitivity of individual test items, and the instrument’s reliability and validity. 100 Ethical Considerations The samples used in this research will have been generated for educational rather than research purposes. As the purpose of examining the samples is to evaluate my assessment tool rather than students, there is no perceived harm to individuals. Furthermore, as the samples will have all identifying information other than grade, gender, and special designation removed, the confidentiality of the school and student will be protected. Summary This thesis has grown out of an interest in developing assessment tools that are functional, and useful for developing and monitoring goals in educational intervention plans. The completion of this project will depend on your assistance in supplying CBM writing probes from your school as specified above. Please feel free to contact Norm Monroe or myself if you see any difficulty with the project as it is presented here. Thank you in advance for your assistance and support. Lynda Smith Area Support Team # 4 562-3780 e-mail : Lynda_Smith@fc. schdistS 7.bc. ca Norm Monroe Director of School Services 561-6800 ext. 311 norm_monroe@fc. schdistS 7. be.ca References Crowhurst, M. (1980). The effect of syntactic complexity on writing quality: A review of the research. Unpublished paper. ERIC document ED 202 024. Hedberg, N. L. & Fink, R. J. (1996). Cohesive harmony in the written stories of elementary children. Reading and Writing: An Interdisciplinary Journal. 8. 73-86. 101 APPENDIX B Procedures for Administering CBM Writing Probes 102 The directions for administration of the CBM writing probes as outlined in the Guidebook for the Use o f Curriculum Based Measurement in School District # 57 (School District #57, 1996) are listed here. Materials 1. Story starter. 2. Stop watch. Directions 1. Select an appropriate story starter. 2. Provide the student with a pencil and a sheet of lined paper. 3. Say these specific directions to the students: “You are going to write a story. First I will read a sentence, and then you will write a story about what happens next. You will have one minute to think about what you write, and three minutes to write your story. Remember to do your best work. If you don’t know how to spell a word you should guess. Are there any questions? (Pause.) Put your pencils down and listen. For the next minute, think about... (insert story starter).” 4. After reading the story starter, begin your stopwatch and allow i minute for students to “think.” (Monitor students so that they do not begin writing.) After 30 seconds say: “You should be thinking about... (insert story starter).” 5. At the end of 1 minute say: “Now begin writing.” Restart your stop watch. 6. Monitor students’ attention to the task. Encourage students to work only if they are looking around and talking. 7. After 90 seconds say: “You should be writing about (insert story starter).” 103 8. At the end o f 3 minutes say: “Stop. Put your pencils down. The three story starters used to generate the writing samples used in this study included: 1. Yesterday, a monkey climbed through the school and... 2 .1 was walking along a path when all of a sudden... 3. The cat climbed the telephone pole and... 104 APPENDIX C Examples of Writing Samples 105 Included here are examples of the writing samples used in this study. These include examples of typical, best and worst writing samples produced by each grade. The typical examples shown here were chosen as their scores for fluency, syntactic complexity and cohesion reflected the mean for each score for a given grade. The worst and best examples were selected from those with the highest and lowest cohesion scores for each grade. Each sample is re-written here with the original spelling and punctuation used by the writer. Grade 4 Typical. Yesterday a monkey climbed through the window at school and... the monkey came and pulled all the teachers hair off so she was bald and the monkey tore every ones paper and broke the chairs and desks and distrode the hole classroom and he did that... Worst. Yesterday a monkey climbed through the window at school and... eat my banna I was so mad I got a 22 and shont him 100 they staid at me and... Best. I was walking along a path when all of a sudden... a dog jumped on me and pushed me down and then it liked me. I was scared it would bite me. But it didn’t, so I got up and took the dog home and showed it to my mom. She said... Grade 5 Typical. Yesterday a monkey climbed through the window at school and... ate all our math books. After school our teacher, Mrs. White, took the monkey to the zoo and of course we got an F on Math. Today we were working on SS and a lion jumped through the window and ate our SS books. Worst. The cat climbed the telephone pole and... stepped on to the thin wire. I have to get my baby she thought. Everyone around here were screaming, “here kitty kitty, come down for there, she’ll jump,”. She just... 106 Best. I was walking along a path when all of a sudden... a frog jumped out and I know how I hate frogs so I lept behind a bush until the frog left. When I left the bush I thought I was safe but just when I turned the comer there was over a million frogs. I had no way to get away and I think I squashed 11 frogs. When I finally made it over, I ran home and swore... Grade 6 Typical. I was walking along a path when all of a sudden... I saw a poor hurt baby panther that had a thorn threw its paw. I went close but not too close. Then its mother came and started to growl at me. I backed away and the baby limped closer towards me. I was frozen stiff. Worst. Yesterday a monkey climbed through the window at school and... grabbed a experiment and smashed it. It blew the room up. The monkey grabbed more things and smashed them There was these toxic chemicals that burnt the school down. All the kids were screaming. The monkey wasn’t smart enough to move and died. The fireman came a put the fire out. They told the kids they... Best. Yesterday a monkey climbed through the window at school and... it started throwing everything it could at people. First it through the encyclopedias then the dictionaries then the text books. When the zoo keepers came it through chairs and pencils, anything it could find at the zookeepers but finally they caught it but the zookeepers all had bumps and bruises. All this took... Grade 7 Typical. Yesterday a monkey climbed through the window at school and... started jumping around crazily. The kids thought that if was really cool but the teacher jumped up, started screaming and ran down the hall. The monkey started swinging from the roof and soon 107 dropped onto my desk, at first it frightened me but then I realized that he did not want to hurt me so I calmed down. Soon the principal... Worst. I was walking along a path when all of a sudden... the school bully came. “Oh no” I pannicked. “Help” He stopped me in my tracks. “Lunch mony ”, he demanded I quickly thought up a lie, no he knows I have money. “uh,um No” I whispered “No!” He boomed “No, did you say No.” Best. I was walking along a path when all of a sudden... I was in a jungle wear lions, tigers, and bears and many other wild animals live when a lion chased me. I started to freak out and screamed but I thought that wouldn’t do much so I ran and ran until I saw the lion was not behind me anymore and when I stopped I saw some strawberries so I ate them and when they were gone I got tired. So I fell asleep. When I woke up I was in tarzans little treehouse up really high. So then I saw... 108 APPENDIX D Checklist 1.0 109 Cohesive marker 1. All pronouns refer to some previously mentioned noun. 2. All pronouns have a referent in the previous sentence. 3. All demonstratives (eg. this, these, that, those) have a clear referent in the previous sentence. 4. All nouns appearing with the article 'the' have a previous referent in the text. 5. Referents for nouns used with 'the' that are not present in the text can be inferred from world knowledge. 6. Each sentence is connected to the one preceding it by at least one anaphoric reference. 7. The written passage is sequentially organized. 8. And' is used to connect sentences and/or clauses. 9. Also' is used to connect sentences and/or clauses. 10. Other coordinating conjunctions are used to connect sentence and/or clauses. 11. Then' is used to connect sentences and/or clauses. 12, ‘When’ is used to connect clauses. 13.‘Before' and 'after' are used to connect sentences and/or clauses. 14 .‘First, second,' etc. are used to connect sentences and/or clauses. 15. Other temporal conjunctions are used to connect sentences and/or clauses. 16. Consistent tenses are used throughout. 17. Shifts in time are marked with temporal terms (eg. the next day) other than conjunctions. 18. Causal relationships are implied. 19. So' is used to connect sentences and/or clauses. 20. Because' is used to connect clauses. 21. Other causal conjunctions are used to connect sentences and/or clauses (eg. consequently, therefore, etc.). 22. Adversative conjunctions (eg. but) are used to connect clauses. 23. Super-ordinates, synonyms or near-synonyms are used for the same referent in adjacent sentences. 24. Complementary terms, converses or antonyms appear in adjacent sentences. 25. The text is divided into paragraphs. 26. Paragraphs have topic sentences. 27. Explicit transitions between paragraphs are present. Total Cohesion Score YES 1 NO 0 110 APPENDIX E Checklist 1.1 Ill Cohesive marker 1. All pronouns refer to some previously mentioned noun. 2. All pronouns have a referent in the previous sentence. 3. All demonstratives (eg. this, these, that, those) have a clear referent in the previous sentence. 4 . Referents for nouns used with 'the' that are not present in the text can be inferred from world knowledge. 5. All nouns appearing with the article 'the' have a previous referent in the text. 6. Each sentence is connected to the one preceding it by at least one anaphoric reference. 7. The written passage is sequentially organized. 8. And' is used to connect sentences and/or clauses. 9. ‘Also’ is used to connect sentences and/or clauses. 10. Other coordinating conjunctions are used to connect sentence and/or clauses. 11. 'Then' is used to connect sentences and/or clauses. 12.‘When’ is used to connect clauses. 13. ‘Before' and 'after' are used to connect sentences and/or clauses. 14.‘First, second,' etc. are used to connect sentences and/or clauses. 15. Other temporal conjunctions are used to connect sentences and/or clauses. 16. Consistent tenses are used throughout. 17. Shifts in time are marked with temporal terms (eg. the next day) other than conjunctions. 18. Causal relationships are implied. 19. ‘So' is used to connect sentences and/or clauses. 20. Because' is used to connect clauses. 21. Other causal conjunctions are used to connect sentences and/or clauses (eg. consequently, therefore, etc.). 22. Adversative conjunctions (eg. but) are used to connect clauses. 23. Super-ordinates, synonyms or near-synonyms are used for the same referent in adjacent sentences. 24. Complementary terms, converses or antonyms appear in adjacent sentences. 25. The text is divided into paragraphs. 26. Paragraphs have topic sentences. 27. Explicit transitions between paragraphs are present. Total Cohesion Score YES 1 NO 0 112 APPENDIX F Checklist 1.2 113 Cohesive marker 1. All pronouns refer to some previously mentioned noun. 2. All pronouns have a referent in the previous sentence. 3. All demonstratives (eg. this, these, that, those) have a clear referent in the previous sentence. 4. Referents for nouns used with 'the' that are not present in the text can be inferred from world knowledge. 5. All other referents for nouns used with the article 'the' have a previous referent in the text. 6. Each sentence is connected to the one preceding it by at least one anaphoric reference. 7. The written passage is sequentially organized. 8. 'And' is used to connect sentences and/or clauses. 9. Also' is used to connect sentences and/or clauses. 10. Other coordinating conjunctions are used to connect sentence and/or clauses (eg. or, another, as well as, etc.). 11. 'Then' is used to connect sentences and/or clauses. 12. ‘When’ is used to connect clauses. 13.'Before' and 'after' are used to connect sentences and/or clauses. 14.‘First, next,' etc. are used to connect sentences and/or clauses. 15. Other temporal conjunctions are used to connect sentences and/or clauses. 16. Consistent tenses are used throughout. 17. Shifts in time are marked with temporal terms (eg. the next day) other than conjunctions. 18. Causal relationships are implied. 19. So' is used to connect sentences and/or clauses. 20. Because' is used to connect clauses. 21. Other causal conjunctions are used to connect sentences and/or clauses (eg. consequently, therefore, etc.). 22. Adversative conjunctions (eg. but) are used to connect clauses. 23. Super-ordinates, synonyms or near-synonyms are used for the same referent in adjacent sentences. 24. Super-ordinates, synonyms or near-synonyms are used for the same referent across the text. 25. Complementary terms appear in adjacent sentences. 26. Converses or antonyms appear in adjacent sentences. 27. A new paragraph is used when there is a shift in story events. Total Cohesion Score YES 1 NO 0 114 APPENDIX G Checklist 2.0 With Instruction Manual 115 COHESION CHECKLIST Developed by Lynda Smith April 14, 2000 Revisions. August, 2000 *Not to be copied without written consent of the author. 116 Table of Contents Background...................................................................................................................... 3 Definition of Key Terms.................................................................................................. 4 Scoring Instructions.........................................................................................................6 Table of Specifications...................................................................................................14 References...................................................................................................................... 15 Checklist........................................................................................................................ 17 Scoring Summary Sheet............................................................................................... 18 Scoring Companion......................................................................................................19 117 Background This checklist was designed to evaluate the linguistic elements used to achieve cohesion in the writing of elementary school-aged children. Cohesion consists of the ties that link sentences and ideas together to form a unified, single text (Halliday & Hasan, 1976), that is comprehensible to the reader (Hedberg & Fink, 1996; Lindberg, 1984; Zamowski, 1981). Without it, writing would consist of a series of unrelated sentences or ideas. The content of the checklist originates from several studies of cohesion (Crowhurst, 1981, 1987; Halliday et al., 1976; Liles, 1985; McCutchen and Perfetti, 1982; Scott, 1991; Smith, 1999). The aspects o f cohesion examined through the items on this checklist include reference, conjunction, lexical cohesion and overall global structures. Reference includes the use of pronouns, articles and demonstratives to refer to information within the text. Conjunction is used to connect clauses and sentences and to organize text. The conjunctions evaluated here include additive (eg. and), temporal (eg. then), causal (eg. because), and adversative (eg. but) forms. Lexical cohesion is accomplished through reiteration of a term using the same word, a superordinate, a synonym or near­ synonym, or collocation which involves use of words that commonly occur together such as antonyms, complementary terms and converses. The degree of cohesion accomplished through lexical reiteration and collocation is a reflection of how close the words are in meaning and the distance between them within the written text. The degree of cohesion is stronger where the distance is less. Global structures that affect cohesion include consistencies used across a text, such as tense marking, and overall organization of a piece, such as temporal organization, causal relationships and paragraph structure. Definition of Key Terms adversative - marking an opposing or contrary relationship 118 antonyms - words that mean the opposite of each other. An example would be ‘hot’ and ‘cold’. These word pairs affect cohesion because of their strong semantic relationship, clause (clausal) - a clause consists o f a subject and verb. Clauses may be independent, in which case they can stand alone as a sentence. They may also be subordinating, in which case they need to be attached to an independent clause by a subordinating conjunction to complete the thought. Subordinating clauses consist of those which begin with conjunctions such as ‘because’, ‘when’, ‘until’, or ‘although’. Independent clauses may be joined to other independent clauses using coordinating conjunctions such as ‘and’, ‘or’, or ‘but’. ‘So’ and ‘then’ are often also treated as coordinating conjunctions in narrative analysis (Hughes, McGuillivray & Schmidek, 1997). complementary terms - words that often appear together and thus complement one another. Such word pairs consist of terms that have associated meanings such as ‘joke’ and ‘laugh,’ or ‘lake’ and ‘beach’. Such terms are important to cohesion due to their strong semantic relationship, converses - these consist of word pairs that reflect a relationship of response of one term to the other. Examples of converses include ‘lead’ and ‘follow’ or ‘throw’ and ‘catch’. Closely related to antonyms, converses are important to cohesion due to their strong semantic relationship. lexical - relating to words or the semantic relationship between words. This reflects word meaning. near-synonym - words that are used to refer to the same thing, but may not have identical meanings when used out of context. Examples of near-synonyms include ‘lion’ and ‘beast,’ or ‘cave’ and ‘shelter’. Such uses of near-synonyms are important to cohesion due to their strong semantic relationship and common reference. 119 reiteration - mentioning a person/place/thing/idea more than once in the same written text through direct repetition of a word, or replacement with a word that refers to the same thing. sentence - a sentence consists of an independent clause containing a subject and a verb and any attached subordinate clauses. For the purposes of this evaluation, a sentence need not be signaled by mechanical conventions such as capital letters and periods. The boundaries of the sentence are determined by the subject/verb parameters mentioned above. super-ordinate - a categorical label that can be used to replace a more specific term. For example, ‘animal’ is the super-ordinate o f ‘dog’. synonym - a different word used to mean the same thing. An example is ‘car’ and ‘automobile’. 120 Scoring Instructions The following descriptions provide the criteria for scoring the corresponding items on the checklist. The sum of the scores for all 25 items will provide the total cohesion score. Composite scores will be derived from each section. Before scoring the writing sample for cohesion, first mark the boundaries between sentences (see above definition) to clarify the beginning and ending of independent clauses. For more information on dividing samples in this manner see Hughes et al. (1997). Read through the entire sample once to familiarize yourself with the content before going through the items on the checklist. Ignore missing words as though they were purposely omitted. For example, if a child missed an article before a noun, do not treat it as a possible credit for ‘the’ on the checklist. Similarly, do not treat missing words as examples of errors. In this manner, if a child misses an article but uses ‘the’ correctly in every other instance, he or she would receive credit for the item. Treat incomplete thoughts as independent clauses or sentences. When scoring items for reference, a sore of one is achieved by demonstrating use as described on the checklist in all cases. For conjunction and lexical cohesion, only one example need be present in the sample to receive credit. Also, conjunctions must be used to join sentences or clauses and will not be given credit when used elliptically (eg. Someday I’ll go to the moon. I don’t know when.) Two conjunctions used together (eg. and then...) receive two credits. 121 Item # Scoring Criteria Reference 1. Score 1 if every pronoun used refers unambiguously to a noun previously mentioned in the text. Score 0 if any pronoun has more than one possible referent, or no referent mentioned in the text. The first appearance of first and second pronouns ‘F and ‘you’ are treated as nouns. Subsequent uses will be treated as pronouns. Disregard uses of ‘it’ that are used to establish setting (eg. It was a warm sunny day.). Also score 0 if no pronouns are used in the sample. 2. Score 1 if every pronoun refers unambiguously to a noun or pronoun in the previous sentence or clause. Score 0 if any referent is not unambiguously contained in the previous or same sentence. Apply the rules for uses o f ‘F, ‘you’ and ‘it’ established in item #1. 3. Score 1 if every demonstrative (this, that, these and those) is used with an object/person/place/idea that has been mentioned in the previous sentence. This previous mention of a noun need not be an exact repetition of the same word but must refer to the same thing. Score 0 if the object/person/place/idea was mentioned somewhere other than the previous sentence or not at all. Also score 0 if no demonstratives are used. 4. Score 1if every occurrence of ‘the’ is used with a noun that has an unambiguous referent. It should be clear to the reader to which specific person, place, thing or idea the writer is referring.. Occurrences of the word ‘the’ may be used next to a noun that has a previous mention in the text. This mention may include reference to the same object/person/place/idea using a different word. For example, the following use of ‘the’ would qualify for a score of 1: 122 I saw a dog running toward me. The beast looked mean. ‘The’ may also be used to refer to a special case so that the referent can be inferred from the content of the text or world knowledge. ‘The’ may also be used to introduce setting. The following examples would also qualify for a score of 1: I saw the Prime Minister on TV. We live on the earth. I walked into a store and asked to speak to the manager. The day was warm and sunny. Score 0 if the referent for a noun used with ‘the’ cannot be unambiguously inferred from context, world knowledge, or previous mention in the text. Also score 0 if ‘the’ is not used. 5. Score 1if every sentence contains a reference through the use of pronouns, demonstratives or the definite article ‘the’ to the sentence directly preceding it. In this case, to qualify for credit, uses of ‘the’ must refer to something specifically mentioned in the previous sentence. Score 0 if any sentences in the text do not refer directly to elements of the sentence preceding it. Conjunction 6. Score 1 if the conjunction ‘and’ is used to join any two independent clauses. Score 0 if ‘and’ is not present in the written text or if it is only present to create compound subjects, or verb phrases. For example, the following uses o f ‘and’ would not receive credit on this item; The boy and girl were running. The ball was red and black. The children were laughing and playing. 123 7. Score 1 if the conjunction ‘also’ is used to join any two independent clauses. Score 0 i f ‘also’ is not present in the written text or if it is only present to create compound subjects, or verbs. For example, the following use of ‘also’ would not receive credit: The boy and also the girl were hungry. 8. Score 1 if there is any indication of additive conjunctions being used to join any two independent clauses. Examples of additive conjunctions include “another, or, in addition/additionally, as well as, etc.” A semi-colon may also be used in this fashion and if used correctly would score 1. Score 0 if there are no other additive conjunctions used besides ‘and’ and ‘also’. 9. Score 1 if the term ‘then’ is used to join or relate any two independent clauses. Score 0 if ‘then’ is not present in the written text. 10. Score 1 if the term ‘when’ is used to join or relate any two clauses. Score 0 if ‘when’ is not present in the written text. 11. Score 1 if the term(s) ‘before’ or ‘after’ are used to join or relate any two clauses. It is not necessary for both of these terms to be present to receive credit. Either one will warrant a score of 1. Score 0 if these terms are not present in the writing sample. 12. Score 1 if any subordinating temporal conjunctions other than the ones mentioned above are used to join or relate any two clauses. These may include terms like “until, while, as, ...”. Score 0 if no other subordinating temporal conjunctions are present in the written passage. Caution: use of the word ‘as’ must reflect a temporal rather than causal meaning to receive credit (eg. I gazed at the horizon as the moon was setting.). 13. Score 1 if adverbs or adverbial phrases are used to mark shifts in time or the sequence of events. These might include temporal terms like “first, next, finally...”, their adverbial derivatives (eg. firstly), or phrases such as “all of a sudden, the next 124 day, later on, the following week, etc.”. Sequential markers like ‘first, second, last, etc.’ need not be presented in a series to receive credit. The following examples would receive a score of 1: First o f all, the boy ate his hotdog. He ate some candy next. OR First the boy was frightened. Then he got mad. Score 0 if no such adverbs or phrases appear in the text. 14. Score 1 if the conjunction ‘so’ is used to join any two independent clauses. Score 0 if ‘so’ is not present in the written text. 15. Score 1 if the conjunction ‘because’ is used to join any two clauses. Score 0 if ‘because’ is not present in the written text. 16. Score 1 if any other causal conjunctions are used to join clauses. Examples of additive conjunctions include “consequently, therefore, etc.” . Score 0 if there are no other causal conjunctions used other than “so” and “because”. 17. Score 1 if conjunctions showing an adversative relationship (eg. but, however, although) are used to join or relate any two clauses . Score 0 if no such conjunctions appear. Lexical Cohesion 18. Score 1 if any referent is reiterated in an adjacent sentence through the use of super­ ordinate, synonym, or near-synonym terms. An example of superordinates might be word pairs like ‘dog’ and ‘animal’. Synonyms involve the use of word pairs that mean the same thing like ‘dog’ and ‘canine’. Near-synonyms consist of word pairs with similar meanings that refer to the same thing. For example the following pairs of sentences contain an example of a near-synonym: 125 “He held a knife in his hand. He waved the blade wildly.” The referent of the term must be clear to the reader. Score 0 if reiteration consists only of repetition of the same word from sentence to sentence or if it does not occur at all. 19. Score 1 if any referent is reiterated through use of super-ordinate, synonym, or near­ synonym terms across the text. In this case, credit is given for such terms not occurring in adjacent sentences. Score 0 if reiteration across the text consists only of repetition of the same word or if there are no examples of super-ordinates, synonyms, or near-synonyms across the text. 20. Score 1 if any word pairs with complementary semantic links appear in neighboring sentences. Complementary terms include words that commonly occur together like ‘boy-girl’ or ‘play-Hin’. Such terms may also reflect topic maintenance by referring to things that commonly occur together. The following sentence pairs reflect this type of semantic connection: He fired the gun. A bullet grazed my ear. The UFO landed. Aliens appeared. Score 0 if complementary word pairs do not appear in neighboring sentences anywhere in the passage. 21. Score 1 if any word pairs with semantic links such as converses or antonyms appear in neighboring sentences. Converses include items that suggest a response of one to the other. These might include terms such as ‘order-obey’ or Tisten-telT. An example would be; He spoke. 1 listened. Score 0 if no such lexical pairs appear in neighboring sentences anywhere in the passage. 126 Global Organization 22. Score 1 if the written text has a general sequential order (ie. things that happened first are mentioned first, etc.). Score 0 if the text consists of randomly ordered ideas or does not flow in a temporal sequence. 23. Score 1 if one tense (eg. past, present, or perfect) is used consistently throughout the passage. A score o f 1 would also apply if shifts in tense occur in passages of dialogue. If a passage were written in past tense with a quotation written in another tense, it would still receive a score o f 1. For example: The girl ran down the hall. She shouted to the people standing there, “ Will you help me?” Score 0 if tenses are used inconsistently in the writing sample. 24. Score 1 if any event in the passage is causally linked to another event mentioned in the previous sentence. The two events need not be explicitly linked to receive credit. For example, the sentences “I was hungry. I went inside to get something to eat.” show an implicit causal connection. The sentences “ I was rurming. I stopped.” does not imply or demonstrate a causal connection. A score of 0 applies when no causal links between adjacent sentences are detected. 25. Score 1 if a new paragraph is used when there is a shift in the story’s events. These might include introduction of a new speaker, a new location or a new time. The paragraph need not be indented but should be marked by a new line of writing. Score 0 if there is only one paragraph in the sample or if new paragraphs are not introduced with changes in speakers, location or time. 127 Table 1A indicates the break down of cohesive devices examined by this instrument. Each item from the checklist is listed in the table next to the cohesion category(s) it represents. Table lA Table of Test Specifications Type of Cohesive Device Corresponding Item Numbers Referential Cohesion 1, 2, 3, 4, 5 Conjunction; additive 6, 7,8 temporal 9 ,1 0 ,1 1 ,1 2 ,1 3 causal 14, 15,16 adversative 17 Lexical Cohesion 3, 4, 18, 19, 20, 21 Global Organization 22, 23, 24, 25 128 References Crowhurst, M. (1981) Cohesion in argumentative prose written by sixth-, tenth-, and twelfth-graders. Paper presented at the Annual Meeting of the American Educational Research Association, Los Angeles. Crowhurst, M. (1987). Cohesion in argument and narration at three grade levels. Research in the Teaching of English. 21(2). 185-197. Halliday, M.A.K. & Hasan, R. (1976). Cohesion in English. London; Longman Group. Hedberg, N.L. & Fink, R.J. (1996). Cohesive harmony in the written stories of elementary children. Reading and Writing: An Interdisciplinary Journal. 8. 73-86. Hughes, D , McGillivray, L & Schmidek, M. (1997). Guide to narrative language: Procedures for assessment. Eau Claire, WI: Thinking Publications. Liles, B.Z. (1985). Cohesion in the narratives of normal and language-disordered children. Journal o f Speech and Hearing Research. 28. 123-133. Lindeberg, A C. (1984). Cohesion and coherence in short expository essavs. Proceedings from the Nordic Conference for English Studies. McCutchen, D. & Perfetti, C A (1982). Coherence and connectedness in the development of discourse production. Text. 2. 113-139. Scott, C. (1991a). Learning to write: Context form and process. In A G Kamhi & H.W. Catts (Eds.) Reading disabilities: A developmental language perspective (pp. 261-302), Boston: Allyn & Bacon. Smith, L. (1999). An exploration of cohesion in narrative and expository writing in the mid-elementary years. Unpublished paper. (Available from Lynda Struthers, 267 Claxton Cres., Prince George, BC, V2M 5X6). 129 Zamowski, M. (1981). A child’s composition: How does it hold together? Language Arts. 58 m . 316-19 130 Cohesive marker 1. All pronouns refer to some previously mentioned noun. 2. All pronouns have a referent in the previous sentence or clause. 3. All demonstratives (eg. this, these, that, those, here, there) have a clear referent in the previous text. 4. Referents for nouns used with 'the' have an unambiguous previous mention in the text or can be inferred from world knowledge. 5. Each sentence is connected to the one preceding it by at least one form of reference. YES 1 1 1 NO 0 0 0 1 0 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 0 REFERENCE SUB-SCORE 6. And' is used to connect independent clauses. 7. Also' is used to connect independent clauses. 8. Other coordinating conjunctions are used to connect independent clauses (eg. or, another, as well as, etc.). 9. Then' is used to connect independent clauses. 10. ‘When’ is used to connect clauses. 11. 'Before' or 'after' is used to connect clauses. 12. Other subordinating temporal conjunctions are used to connect clauses. 13. Adverb or adverbial phrases are used to mark sequence or shifts in time (eg. First, last, all of a sudden, etc.) 14. 'So' is used to connect sentences and/or clauses. 15.‘Because' is used to connect clauses. 16. Other causal conjunctions are used to connect sentences and/or clauses (eg. consequently, therefore, etc.). 17. Adversative conjunctions (eg. but) are used to connect clauses. CONJUNCTION SUB-SCORE 18. Super-ordinates, synonyms or near-synonyms are used for the same referent in adjacent sentences. 19. Super-ordinates, synonyms or near-synonyms are used for the same referent across the text. 20. Complementary terms appear in adjacent sentences. 21. Converses or antonyms appear in adjacent sentences. LEXICAL COHESION SUB-SCORE 22. The written passage is sequentially organized. 23. Consistent tense is used throughout. 24. A causal relationship exists between two adjacent sentences . 25. A new paragraph is used when there is a shift in story events. GLOBAL ORGANIZATION SUB-SCORE Total Cohesion Score 131 Scoring Summary Sheet Student’s Name; Examiner’s Name: Date o f Writing Sample: Grade: Referential Cohesion Conjunction Lexical Cohesion Global Organization Total Cohesion Score 132 Scoring Companion Reference Item 1 2 Scoring Criteria Examnle Every pronoun refers to a previously mentioned noun, ‘I’ or 'you'. Every pronoun refers to a noun or pronoun in the previous or same sentence. the referent for every pronoun is clear 3 Every noun used with a demonstrative has mention in the previous sentence. 4 ‘The’ is used with a noun with an unambiguous referent previously mentioned or understood from world knowledge. 5 Every sentence contains a reference to the previous sentence Non-examole a pronoun with no referent or the referent is unclear The bov was walking fast. referent not in the He was headed home. previous sentence or clause OR The man was afraid so he The sirl ate some chips. ran. The chips tasted very good. Then she drank her OR She walked. She laughed. peg?. OR OR She cried Tears rolled The girl waved her hand. down her cheeks. uses this, that, those or The dogs came running these before a noun at us. We were scared. A dog ccme running at Those dogs were mean. us. That beast was mean. OR Use of a demonstrative with a noun with no previous mention. the Prime Minister ‘the’ used with the first the manager mention of a noun that is not a special case: The day was warm and the cat sunny. the officer A dog chased me up the or ‘the’ used when it is street. I ran as fast as I not clear to the reader could. The dog looked which specific mean. person/place/thing/idea is being referred to reference to previous At least one sentence is not connected to the sentence through the correct use ofpronouns, previous one by use of pronouns, demonstratives, or the definite article ‘the ’ demonstratives, or the definite article ‘the ' (use of ‘the ’ must indicate something mentioned in the previous 133 Conjunction 6 Use of ‘and’ to connect independent clauses 7 I was late. I also was hungry. OR / ate some cheese and I also ate some crackers. or, another, in addition, Use of additive additionally, as well as, conjunctions other than furthermore, besides, nor ‘and’ and ‘also’ etc. Use of ‘then’ I went to the store. Then I went home. When I got home I ate Use of ‘when’ lunch. Use of ‘before’ or ‘after’ I did my homework before I went outside. to connect clauses OR After I ate lunch. I went home. 8 9 10 11 12 13 14 The children were playing and the boys were running. OR The dog was mean. And I was afraid. Use of ‘also’ to connect independent clauses Use of other subordinating temporal conjunctions Adverbs or adverbial phrases used to mark shifts in time or sequence Use of ‘so’ ............................ ........... ....... until, while, as, sinceftime), etc. first, second, third, next, last, finally, suddenly, later, all o f a sudden, as soon as, the next day, later on, the following week, after that, etc I was hungry. So I ate something. OR / was late m I hurried Ao/we. The boy and girl were running. The ball was red and black. The children are laughing and playing. OR ‘and’ not used The boy and also the girl were hungry. OR ‘also’ not used no other additive conjunctions no use of ‘then’ no use of ‘when’ no use of ‘before’ or ‘after’ OR ‘before’ and ‘after’ not connecting clauses I was there before. no other temporal conjunctions than those listed in items 11-14 no adverbs or phrases used marking shifts in time ‘so’ used in a way that conveys degree rather than causation / was m hungry, I could gaf a Aorjg. OR ‘so’ is not used 134 15 Use of ‘because’ 16 Use of other causal conjunctions 17 Use of adversative conjunctions Because I was sad. I went ‘because’ is not used home. OR ‘because’ does not OR I laughed because I was connect two ideas so happy. I went home because. OR / row. / w a; m a Ai/rry. no other causal cowggwgmt/y, conjunctions used other jfMcg etc. than ‘so’ and ‘because’ OR none used at all but, however, although, no adversative yet, instead, except, conjunctions used though, etc. Lexical Cohesion 18 Uses reiteration of a referent in adjacent sentences at least one time (applies to nouns only) 19 Uses reiteration of a referent at least one time across the text (applies to nouns only) superordinates I saw a dog. The animal was huge. OR synonyms I saw a dog. The mutt was huge. OR near-synonyms I saw a dog. The beast was huge. *note- these items will be signaled by the use of the definite article ‘the’ or a demonstrative. use of a synonym, near­ synonym, or super­ ordinate as described above but not in adjacent sentences repetition of a word / saw a dog. The dog was huge. OR superordinates, synonyms, or near­ synonyms are used but not in adjacent sentences. OR No superordinates, synonyms, or near­ synonyms are used repetition of the same word OR reiteration only in adjacent sentences OR no use of synonyms, near­ synonyms, or super­ ordinates 135 20 Use of complementary terms in adjacent sentences 21 Use of converses or antonyms in adjacent sentences The sun fired. A shot rang out. OR went to t/K AeocA. sow/ was /rot. speak-listen, ask-answer, order-obey, throw-catch, act-react, etc. You tell the story. I will listen. OR She climbed up. I slid down. such terms are not used in adjacent sentences sequence of events makes sense sequence of events does not make sense use of a single tense throughout the passage OR tense changes only in dialogue I was hungry. I went inside to eat. OR / was crying because I lost my ball. a new line is started with each new speaker OR where there are shifts in time and place The next day... Inside the house... inconsistent use of tense throughout the passage or tense mixing such terms are not used in adjacent sentences Global Organization 22 23 Events are mentioned in the order in which they occur. Consistent tense use 24 Causal link between any two events in adjacent sentences 25 New paragraphs used with new speakers, new time and new location. one event may be related to but does not cause the other. / was running. I stopped. no paragraph or no new line with a new speaker, time or location. 136 APPENDIX H Checklist 2.1 With Modified Scoring Criteria 137 Cohesion Checklist Student’s Name: Grade: Examiner’s Name: School: Date of Writing Sample: Cohesive marker 1. All 3rd person pronouns refer unambiguously to some previously mentioned noun. 2, All 3rd person pronouns have an unambiguous referent in the previous sentence or same sentence. 3. All demonstratives (eg. this, these, that, those, here, there) have a clear referent in the previous text. 4. Referents for nouns used with 'the' have an unambiguous previous mention in the text or can be inferred from world knowledge. 5. Each sentence is connected to the one preceding it by at least one form of reference. YES 1 NO 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 REFERENCE SUB-SCORE 6. Additive conjunctions are used to join independent clauses (e.g. and, also, or, another, as well as, etc.). 7. ‘When’ is used to connect clauses. 8. Other subordinating temporal conjunctions are used to connect clauses (e.g. before, after, until, while, as, etc.). 9. Adverb or adverbial phrases are used to mark sequence or shifts in time (e.g. then, next, first, last, all of a sudden, etc.). 10. Causal conjunctions are used to connect sentences and/or clauses (e.g. so, because, consequently, therefore, etc.). 11. Adversative conjunctions (eg. but) are used to connect clauses. CONJUNCTION SUB-SCORE 12. Super-ordinates, synonyms or near-synonyms are used for the same referent in adjacent sentences or across the text. 13. Complementary terms appear in adjacent sentences. LEXICAL COHESION SUB-SCORE Total Cohesion Score 138 Item # Scoring Criteria Reference 1. Score 1 if every 3rd person pronoun used refers unambiguously to a noun previously mentioned in the text. Score 0 if any pronoun has more than one possible referent, if different pronouns are used to refer to the same referent, or if an incorrect pronoun is used. Also score 0 if no referent for the pronoun is mentioned in the text. The following examples would result in a score of 0; The dog went home. It drank from its bowl. Then he ate. The girl dropped the ball. He tried to pick it up. John went to see Bill. He was in the school. Disregard uses of first and second person pronouns T and ‘you’. Disregard uses of ‘it’ that are used to establish setting (eg. It was a warm sunny day.). Also score 0 if no third person pronouns are used in the sample. 2. Score 1 if every third person pronoun refers unambiguously to a noun or pronoun in the same or previous sentence. At least one pronoun reference must cross a sentence or clause boundary to receive credit on this item. Score 0 if any referent is not unambiguously contained in the previous or same sentence. Also score 0 if all referents are contained in the same clause as their pronouns. Apply the rules for uses o f I’, ‘you’ and ‘it’ established in item #1. 3. Score 1 if every demonstrative (this, that, these, those, here and there) is used with or replaces an object/person/place/idea that has been previously mentioned in the text. This previous mention of a noun need not be an exact repetition of the same word but must refer to the same thing. Score 0 if the object/person/ place/idea was not mentioned in the previous sentence or if the referent is ambiguous. Disregard uses of 139 ‘there’ to establish setting (eg. There were 12 people in the garden.). Also score 0 if no demonstratives are used. 4. Score 1 if every occurrence of ‘the’ is used with a noun that has an unambiguous referent. It should be clear to the reader to which specific person, place, thing or idea the writer is referring. At least one occurrence of the word ‘the’ should be used next to a noun that has a previous mention in the text. This mention may include reference to the same object/person/place/idea using a different word. For example, the following use of ‘the’ would qualify for a score of 1: I saw a dog running toward me. The beast looked mean. ‘The’ may also be used to refer to a special case so that the referent can be inferred from the content of the text or world knowledge. ‘The’ may also be used to introduce setting. The following examples would also qualify for a score of 1: I saw the Prime Minister on TV. We live on the earth. I walked into a store and asked to speak to the manager. The day was warm and sunny. Score 0 if the referent for a noun used with ‘the’ cannot be unambiguously inferred from context, world knowledge, or previous mention in the text. Also score 0 if ‘the’ is never used to refer to a referent previously mentioned in the text or if ‘the’ is never used. 5. Score 1 if every sentence contains a reference through the use of pronouns, demonstratives or the definite article ‘the’ to the sentence directly preceding it. In this case, to qualify for credit, uses of ‘the’ must refer to something specifically 140 mentioned in the previous sentence. Score 0 if any sentences in the text do not contain a direct reference to elements of the sentence preceding it. Conjunction 6. Score 1 if an additive conjunction is used to join any two independent clauses. Examples o f additive conjunctions include “and, also, another, or, in addition/additionally, as well as, etc.” A semi-colon may also be used in this fashion and, if used correctly, would score 1. Score 0 if additive conjunctions are not present in the written text or if they are only present to create compound subjects or verb phrases. For example, the following uses o f ‘and’ would not receive credit on this item: The boy and girl were running. The ball was red and black. The children were laughing and playing. 7. Score 1 if the term ‘when’ is used to join or relate any two clauses. Score 0 if ‘when’ is not present in the written text or is used in a way that does not connect clauses. 8. Score 1 if subordinating temporal conjunctions other than ‘when’ are used to join or relate any two clauses. These may include terms like “before, after, until, while, as,...”. Score 0 if these terms are not present in the writing sample or are not used to connect clauses. Caution: use o f the word ‘as’ must reflect a temporal rather than causal meaning to receive credit (eg. I gazed at the horizon as the moon was setting.). 9. Score 1 if adverbs or adverbial phrases are used to mark shifts in time or the sequence of events. These might include temporal terms like “then, next, first, last, ...”, their adverbial derivatives (eg. firstly,), or phrases such as “all of a sudden, the next day, later on, the following week, etc.”. Sequential markers like “first, second, last, 141 etc.” need not be presented in a series to receive credit. The following examples would receive a score of 1: First o f all, the boy ate his hotdog. He ate some candy next. OR First the boy was frightened. He got mad and went home. Score 0 if no such adverbs or phrases appear in the text. 10. Score 1 if any causal conjunctions are used to join sentences or clauses. Examples of causal conjunctions include “so, because, consequently, therefore, etc.” . Score 0 if there are no other causal conjunctions used or if they are used in a way that does not connect sentences or clauses. 11. Score 1 if conjunctions showing an adversative relationship (eg. but, however, although) are used to join or relate any two clauses . Score 0 if no such conjunctions appear or if they are used in a way that does not connect clauses. Lexical Cohesion 12. Score 1 if any referent is reiterated anywhere in the text through the use of super­ ordinate, synonym, or near-synonym terms. An example of superordinates might be word pairs like ‘dog’ and ‘animal’. Synonyms involve the use of word pairs that mean the same thing like ‘dog’ and ‘canine’. Near-synonyms consist of word pairs with similar meanings that refer to the same thing. For example the following pairs of sentences contain an example of a near-synonym: “He held a knife in his hand. He waved the blade wildly.” The referent of the term must be clear to the reader. Score 0 if reiteration consists only of repetition of the same word from sentence to sentence or if it does not occur at all. 142 13. Score 1 if any word pairs with complementary semantic links appear in neighboring sentences. Complementary terms include words that commonly occur together like ‘boy-girl’ or ‘play-fun’. Such terms may also reflect topic maintenance by referring to things that commonly occur together. The following sentence pairs reflect this type of semantic connection; He fired the gun. A bullet grazed my ear. The UFO landed. Aliens appeared. Score 0 if complementary word pairs do not appear in neighboring sentences anywhere in the passage. 143 Scoring Companion Reference 1 Every 3rd person pronoun refers to a previously mentioned noun. 2 Every 3rd person pronoun refers to a noun or pronoun in the previous or same sentence. 3 Every noun used with a demonstrative has mention in the previous sentence. 4 ‘The’ is used with a noun with an unambiguous referent previously mentioned or understood from world knowledge. 5 Every sentence contains a reference to the previous sentence Example uses he, she, they, it, him, Ais, gfc. the referent for every pronoun is clear a pronoun with no referent, pronoun mismatch, or unclear referent disregard I, you and it used for setting The bov was walking fast. referent not in the He was headed home. previous sentence or clause OR The man was afraid so he The sirl ate some chips. The chips tasted very ran. good. Then she drank her OR She walked. She laughed OR OR She cried. Tears rolled no pronouns used down her cheeks. The dogs came running uses this, that, those, at us. We were scared. these, there, or here Those dogs were mean. before a noun or to refer to a noun OR A dog came running at use of a demonstrative us. That beast was mean. with or for a noun with no previous mention or I walked in the room. It was dark in there. no demonstratives used ‘the’ used with the first the Prime Minister mention of a noun that is the manager not a special case: The day was warm and the cat, the officer sunny. A dog chased me up the OR ‘the’ used with an un­ street. I ran as fast as I could The dog looked clear referent or never used with a referent from mean. the text At least one sentence is reference to previous not connected to the sentence through the previous one by use of correct use ofpronouns, pronouns, demonstratives, or the demonstratives, or the definite article ‘the ’ definite article 'the ’ 144 Conjunction 6 Use of additive conjunctions to connect independent clauses 7 Use of ‘when’ 8 Use of other subordinating temporal conjunctions 9 Adverbs or adverbial phrases used to mark shifts in time or sequence 10 Use of causal conjunctions to join sentences or clauses 11 Use of adversative conjunctions Example uses and, also, or, another, in addition, additionally, as well as, furthermore, besides, nor etc. The children were playing and the hoys were running. OR The dog was mean. And I was afraid. When I got home I ate lunch. uses before, after, until, while, as, since (time), etc. to join clauses second, third, next, last, finally, suddenly, later, all of a sudden, as soon as, the next day, later on, the following week, after that, etc uses so, because, therefore, consequently, since (cause), etc. I was hungry. So I ate something. OR Since I was late, I hurried home. but, however, although, yet, instead, except, though, etc. Non-example The boy and girl were running. The ball was red and black. The children are laughing and playing. OR additive conjunctions are not used no use of ‘when’ OR ‘When’ is not used to join clauses. “I ’m coming over ” she said “When? I asked. no subordinating temporal conjunctions (besides ‘when’) used to join clauses no adverbs or phrases used marking shifts in time ‘so’ used in a way that conveys degree rather than causation I was m hungry, I could eat a horse. OR conjunction does not connect two ideas I went home because. OR no causal conjunctions used no adversative conjunctions used 145 Lexical Cohesion Item Scoring C riteria 12 Uses reiteration of a referent at least one time anywhere in the text (applies to nouns only) 13 Use of complementary terms in adjacent sentences Example superordinates / saw a dog. The animal was /mgg. OR synonyms I saw a dog. The mutt was huge. OR near-synonyms I saw a dog. The beast was huge. *note- these items will be signaled by the use of the definite article ‘the’ or a demonstrative. The gun fired. A shot rang out. OR We went to the beach. The sand was hot. Non-example repetition of a word / saw a dog. The dog was Awge. OR no superordinates, synonyms, or near­ synonyms are used such terms are not used in adjacent sentences or not at all.