NOTE TO USERS This reproduction is the best copy available. ® UMI The Validity Of DIBELS As An Indicator Of Early Literacy Achievement Theodore Zarowny B. Ed., University of Alberta, 1985 Thesis Submitted in Partial Fulfillment Of The Requirements For The Degree Of Master Of Education in Curriculum And Instruction The University of Northern British Columbia December, 2007 © Theodore Zarowny, 2007 1*1 Library and Archives Canada Bibliotheque et Archives Canada Published Heritage Branch Direction du Patrimoine de I'edition 395 Wellington Street Ottawa ON K1A0N4 Canada 395, rue Wellington Ottawa ON K1A0N4 Canada Your file Votre reference ISBN: 978-0-494-48832-4 Our file Notre reference ISBN: 978-0-494-48832-4 NOTICE: The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats. AVIS: L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par Plntemet, prefer, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats. The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis. Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant. Canada NLC Abstract This study examines the concurrent validity of the Dynamic Indicators of Early Literacy Skills (DIBELS) using Kindergarten and Grade 1 teacher-assigned year-end scores and reading Curriculum Based Measurement (WRC) as the criterion variables. The study also establishes DIBELS benchmarks, or cut-off points, for Kindergarten and Grade 1 using a Northern British Columbia sample. A correlational analysis between the criterion and predictive variables did not confirm concurrent validity of the various DIBELS measures at the Kindergarten level; this was due to a lack of variability in the criterion measure However, significant correlation coefficients were produced between Grade 1 DIBELS measures and WRC. From the benchmarks, a set of risk factors were established. These risk factors provide an indication to educators the degree to which a student may be at risk in their early literacy development. This study supports the use of using DIBELS as part of a teacher's assessment regime. TABLE OF CONTENTS Abstract ii Table of Contents iii List of Tables v List of Figures vi Acknowledgements vii Chapter One: Introduction 1 Chapter Two: Literature Review Early Literacy Skills : Phonological Awareness Early Literacy Skills: Alphabetic Principle Early Literacy Skills : Reading Fluency Dynamic Indicators of Basic Early Literacy Skills Dynamic Indicators of Basic Early Literacy Skills in School District No. 57 Purpose of the Study Research Questions 6 6 11 13 14 17 Chapter Three: Method Sample 22 22 22 23 18 20 Subject Selection Instruments Predictive Variables DIBELS Criterion Variable Language Arts Scores Reading CBM Procedure Data Collection Data Cleaning and Screening Sample Size Collapsing Language Arts Categories Data Analysis 23 25 25 27 28 28 29 29 31 31 Concurrent Validity Descriptive Statistics Patterns in Convergent Validity Benchmarks 35 35 35 37 40 Chapter Four: Results in Chapter Five: Discussion Concurrent Validity Benchmarks Limitations Implications Implications for Education Implications for Future Research 51 51 55 57 58 58 62 References 64 Appendix A Appendix B Appendix C Appendix D Sample DIBELS Measure Sample Reading CBM Letter of Approval From School District No.57 Graphs Used to Establish Benchmarks 70 75 77 79 IV LIST OF TABLES Table 1: Descriptive Statistics For Kindergarten and Grade 1 36 Table 2: DIBELS Measures Correlations For Kindergarten and Grade 1 38 Table 3: Biserial Correlations for Kindergarten and Grade 1 with Language Arts Marks 40 Table 4: Benchmarks for Kindergarten and Grade 1 44 Table 5: Frequency of Scores Below the Benchmark 47 Table 6: Risk Factor for the Number of Tests Scored Below the Benchmark 48 Table 7: Pearson Chi-Square Values Between Students Meeting Expectations in Language Arts and Students Meeting Expectations on DIBELS Measures 50 v LIST OF FIGURES Figure 1: Frequency for kindergarten NWF & Language Arts final using a score category of 5. 41 Figure 2: Sample benchmark identification process using grade 1 NWF scores (category = 9) 42 Figure 3: Frequency Distribution for kindergarten ISF & Language Arts Final using a score category of 2.5 43 vi ACKNOWLEDGMENTS My heartfelt thanks go to Dr. Peter MacMillan for his amazing patience, direction, and ongoing support throughout all the phases of this document. From the start, Dr. MacMillan was responsive and always made himself available for feedback, discussions and problem solving sessions. I must acknowledge my parents who, despite life's circumstances denying them a formal education, instilled in me the value of education, perseverance, and life-long learning. Throughout this process, my wife and best friend Crystal has been unwavering in her support and patience. She believed in my aspirations from day one, and remained the constant through to the completion of this work. It is to her I owe my greatest thanks. vn CHAPTER 1: INTRODUCTION Educators throughout School District No. 57 (Prince George) and throughout the province of British Columbia are increasingly expected to use valid data in making educational decisions, a process that has aptly become known as data-driven decision-making. Data-driven decision-making involves taking what exists and measuring it in some way, usually resulting in numerical data. Based upon that data, objectives are identified and appropriate strategies are chosen to increase, maintain, or lessen the observed behavior. The reasons for the recent trend in data-driven decision-making in education are many. Popham (2004) identifies public accountability as one of the drivers behind data-driven decisionmaking. Pressure has been placed on education systems worldwide, and "accountability systems are imposed from outside because the public has doubts about that profession's quality of service." He adds that "once issued, demands for accountability rarely disappear" (p. 1). The need to address education's public accountability concern is evident in British Columbia. School districts across the province are expected to submit to the Ministry of Education annual Accountability Contracts. These contracts include a District Plan for Student Success that identifies how the District will address and maintain standards in several broadbased goal areas such as technology, numeracy, aboriginal education, social responsibility, and grade to grade transitions. Each goal has a series of more specific objectives. For example, an objective of the numeracy goal may be to increase student mathematical problem solving skills. Accompanying each objective is a series of strategies that identify specific actions to be taken, timelines, and resources needed to reach that objective. The expectation, of course, is that the goals and objectives are based upon measurable objectives - objectives for which data can be 1 collected in order to show the current status, and to monitor and observe the effectiveness of the selected intervention strategies. Provincial Foundation Skills Assessment (FSA) for grades 4, 7 and 10 l (Ministry of Education, 2005a), Student Satisfaction Surveys (Ministry of Education, 2005c), recentlyintroduced Grade 10 provincial examinations (Ministry of Education, 2005b) and Parent Satisfaction Surveys (Ministry of Education 2005c) are further examples of the province's reliance on data for accountability purposes. A district's Plan for Student Success must reflect data collected by the district as well as the goals and objectives in each school's School Plan for Student Success. Each school in British Columbia is required to submit an SPSS. Selected goals, objectives, and strategies within the SPSS are also data driven. Schools use a variety of measures or performance indicators. Some of these are Ministry-based such as the Foundation Skills Assessment (Ministry of Education, 2005a) and Satisfaction Surveys (Ministry of Education 2005c). Others are school-based assessments such as reading running records, performance standard rubrics, standardized tests, and teacher-made rubrics. Another reason for the increased use of data driven decision making is related to the first reason. A reliance on data-driven decision-making is increasingly necessary for economic reasons. Financial resources for education are stressed and, as Fullan, Bertani and Quinn (2004) state, "last year's success makes possible next year's new money" (p. 45). School jurisdictions and governments use high stakes assessments as evidence that the financial investment into education is paying off. In their School Plans for Student Success, schools make resource priorities and allocations based on goals that are decided upon available data. Accordingly, FSA scores are not calculated into a student's final course grade. In 2005, Grade 10 FSAs were replaced with provincial final examinations in Math and English and constitute a percentage of a student's final course grade. 2 School District No. 57 makes similar decisions based on the data collected by schools. Special Education funding, a scarce resource, also relies upon reliable and valid data for the identification of students with special needs. Funding to supplement special education services in schools require assessments that are able to identify at risk learners and monitor their progress. Besides public and financial accountability, educational researchers identify another reason, and perhaps the most important reason for the increasing use and reliance on data-driven decision-making based on meaningful assessment - to enhance student learning. Popham (2004) makes the case that carefully selected assessment tools that provide accurate data are essential for reducing the achievement gap - the discrepancy between skills of high-performing and lowperforming students. He states, "anyone who is working to reduce achievement gaps must become assessment literate - at least with respect to the qualities of achievement tests that will or won't reveal genuine differences between what upper-income and lower-income students learn" (p. 2). Similarly, Fullan, et al (2004) point out that education systems need a "collective moral purpose [that]makes explicit the goal of raising the bar and closing the gap for all individuals and schools" (p. 43). Further, after an extensive review of over 250 studies and journal articles, Black and Wiliam (1998a) found that formative assessment practices in the classroom produced substantial learning gains, especially for low-achieving students and students with learning disabilities. Black and Wiliam (1998b) use "the general term assessment to refer to all those activities undertaken by teachers - and by their students in assessing themselves - that provide information to be used as feedback to modify teaching and learning activities. Such assessment becomes formative assessment when the evidence is actually used to adapt the teaching to meet student needs" (p. 2). Indeed, the work of such researchers as Black and Wiliam, along with Clarke, 3 Owens and Sutton (2006) are contributing to the growing "assessment for learning" reform occurring in various school jurisdictions across Canada. Such attempts to use data are evident in School District No. 57 (Prince George). Kindergarten and Grade 1 educators in School District No. 57 use the University of Oregon's Dynamic Indicators of Basic Early Literacy Skills (Dynamic Indicators of Basic Early Literacy Skills, 2000-2004) as a key tool for assessing early literacy skills. Each measure included in the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) is comprised of probes, short oneminute to two-minute tests that assess one skill. The probes are used in the fall, winter and spring and provide educators with data in five indicators of literacy: letter naming fluency, initial sound fluency, phonemic segmentation fluency, nonsense word fluency, and oral reading fluency. Students who score considerably lower than do their peers are considered at risk, and intervention strategies are administered by teachers to increase student skills, thereby reducing the gap. As Kaminsky and Good (1998) point out, effective early literacy intervention has encountered two difficulties. One difficulty is a question of what literacy skills should be taught. The second difficulty is that while the effects of certain instructional strategies improve group outcomes, some individual students often remain deficient and may require individualized intervention strategies. Kaminsky and Good suggest that although many educators rely upon a "teach-and-hope" approach whereby early literacy instruction is based on beliefs about reading acquisition and instructional strategies, what is really required are effective assessment strategies that identify specific areas of concern for individual students and direct the use of specific instructional strategies which address the deficits identified by the assessment. Kaminsky and Good argue that the use of DIBELS helps fill those specific needs. 4 There is an ever-increasing reliance and demand upon assessment data. Public accountability, economic accountability, and an increased understanding of the link between assessment and student achievement, all create need for valid assessment techniques. The Dynamic Indicators of Basic Early Literacy Skills (DIBELS) is one assessment that school jurisdictions across North America are using to meet that demand. 5 CHAPTER 2: LITERATURE REVIEW Early Literacy Skills: Phonological Awareness Three questions should be considered when deciding upon effective assessment practices, whether it be assessment for learning or assessment of learning. First, what should be assessed? One would want to be certain that the data that are obtained accurately reflect what one wants to measure. In the case of early reading, therefore, the question becomes, "What are the aspects of early literacy that are meaningful indicators of reading ability." A second consideration is whether or not the skill or concept being assessed is teachable. Again, in the case of early literacy, if one assesses a student on a certain aspect of reading, can that aspect be affected through instruction? And third, what tool should be used to make that assessment? Is the tool reliable and valid under the conditions in which it is used? In deciding what should be assessed in early literacy, one key ability is phonological awareness. However, the term phonological awareness is not entirely clear. Stanovich (1994) defines "phonological awareness" as the ability to deal explicitly and segmentally with sound units smaller than the syllable, and includes a spectrum of possible activities. For example, Snow Burns, and Griffen (1998) distinguish phonemic segmentation as one of the more precise phonological abilities in that spectrum. Similarly, Adams (1990) identifies five levels of what are identified as "phonemic awareness" activities, and Chard and Dickson (1999) provide a similar spectrum that ranges from less complex initial rhyming and rhyming songs to the most complex activity - phonemic segmentation. Goswami and Bryant (1992) also make the case that phonological awareness involves a range of activities. Stanovich (1991), Mann (1993), Kaminsky and Good (1996), and Liberman, et al. (1989) make the case that phonemic awareness, or the lack thereof in early readers is a moderate 6 to strong predictor of future reading ability. Liberman et al. (1989) refer to several studies done in the 1970s and 1980s that demonstrate not only a correlation between phonological awareness and reading ability, but a causal relationship as well. A causal relationship would mean that improving the phonological abilities of readers would increase their reading abilities. Phonological processing weaknesses are characteristic of a broad spectrum of students who have difficulty acquiring early word recognition skills. The failure to acquire adequate early word recognition skills has dire consequences. Students who are not correctly assessed and remediated lag further behind in the development of critical early word reading skills. These students ... will receive less practice in reading than will other children, they will miss opportunities to develop reading comprehension strategies, and they will acquire negative attitudes about reading itself. Furthermore, if children do not acquire good word reading skills early in elementary school, they will be cut off from the rich knowledge sources available in print, and this may be particularly unfortunate for children who are already weak in general verbal knowledge and ability. (Torgesen, 2000, p. 4) Liberman and Shankweiler (1987) surveyed the research and make the case that being a skilled reader, either as an adult or child, is related to one's phonological understanding of words. Weak readers, they argue, have difficulty identifying the phonemic structure of words, and this weakness in phonological awareness has other implications. For instance, weaknesses in phonological awareness and processing leads to difficulties in developing word recognition strategies. Another weakness demonstrated by readers weak in phonemic awareness is the ability to name things; poor readers often have difficulty accessing the correct words in their lexicon and this causes them to misname objects - a problem the researchers say results from not being able to access the phonological properties of the words (Liberman & Shankweiler, 1987; Fowlert, 2004). 7 Liberman and Shankweiler (1987) make another connection between phonemic awareness and reading. They conclude that short term memory is essential for reading comprehension. Because the nature of short term memory allows its capacity to be reached quickly, the faster the rate that information can be put into it, the more that can be remembered. Weak readers have comprehension difficulties as the rate at which they can put words into short term memory is decreased by their slow word recognition. Liberman and Shankweiler (1987) conclude that "reading comprehension difficulties may reflect processing limitations originating in the phonology, and not necessarily absence or malformation of the higher level structures of the sentence grammar" (p. 219). Recent studies (e.g. Gray & McCutcheon, 2006), however, question the degree to which phonology plays a role in reading comprehension. They confirm that phonological processing indeed plays an important role in word recognition, but while phonological awareness plays a role in comprehension, uncertainty remains as to whether a different kind of phonological processing is involved in comprehension, or whether the contribution of phonology is of a different degree. Of all the phonological activities, phonemic awareness appears to be the most difficult one to develop. Phonemic awareness involves manipulation the individual sounds, or phonemes, that comprise a word. Liberman and Shankweiler (1987) explain that a word is formed by a certain phonological structure, or its combination of individual phonemes. They provide the example of the syllable /ba/. The consonant Ibl has a "buh" sound, and the vowel /a/ has the "ah" sound. Yet we do not produce the sound "buh-ah" when we pronounce that syllable. Instead, because of the coarticulation of speech sounds, we produce the sound /ba/. Understanding the phonological structure of a word means being able to break a word down to its individual phonemes - this is phonemic awareness. While phonemic awareness has a primary role in early literacy, Torgesen (2000) argues that phonemic awareness is a particularly difficult reading skill to master, but should, nonetheless, be taught as "difficulties in learning to 'sound out' words, or to use phonetic cues to help decipher them, limit the ability of children to read independently and accurately throughout first grade, and these difficulties usually extend into late elementary school and adulthood" (p. 3). Studies such as Uhry (2002) suggest that even reading strategies such as finger-point reading require phonemic awareness. Kaminsky and Good (1998) add that "by ensuring that all children have adequate phonological awareness skills when they enter first grade, we may be able to mitigate the effects of many differences in the initial skills required for successful reading instruction and prevent many cases of reading failure" (p. 115). Perfetti (1999) further argues that the relationship between phonemic awareness and reading is a reciprocal one "in which phonemic ability is first promoted through literacy acquisition and which then enables further gains in literacy" (p. 52). While the research shows that phonological awareness and, more specifically phonemic awareness, is indeed an important early literacy skill, another consideration is whether phonemic awareness is something that can be taught, or whether it is a product of biology and genetics and therefore difficult to affect. This is an important consideration as the implications for education would differ greatly if phonemic awareness was not affected by instruction. Hence the question despite its integral role in early literacy, can phonemic awareness be taught? To begin answering this question, one might assume that the ability to segment words into phonemes might be related to one's analytic ability which implies a deficit in the cognitive 9 domain if phonemic awareness abilities were weak. However, as Liberman and Shankweiler (1987) point out, weak readers are just as competent as strong readers with non-lingusitic analysis tasks, but are weak with phonemic segmentation, suggesting that the ability to segment words into phonemes is not a deficit in the analytical domain, but in the linguistic domain. Moreover, illiterate adults and adults who had reading instruction but remain poor readers are unable to perform well on tasks that require phonemic awareness (Liberman et al., 1989). Such information suggests that phonological awareness is not something that develops spontaneously, but rather a skill that requires direct instruction. The literature also tells us that, despite its important function in early literacy, young children have difficulty identifying phonemic segments (Adams, 1990; Chard & Dickson; 1999, Liberman & Shankweiler, 1987; Perfetti 1999; Treiman, 1992). For instance, Treiman notes that early readers have greater success with larger sound constructs such as intrasyllabic units. Intrasyllabic units are word components that are larger than a single phoneme, but smaller than a syllable. An example of an inrasyllabic unit from the word "children" would be /en/ which as composed of two phonemes {Id and /n/) but smaller than the final syllable /ren/. Perfetti (1999) agrees that phonemes have an "invisibility" to them and that there is "little outside of literacy contexts that can serve to draw the existence of phonemes" (p. 51). He maintains, however, that phonemic awareness in the pre-reading stage and at the Grade 1 level is essential for further gains in literacy, and that "good literacy instruction makes phonemes more visible..." (p. 52). However, will instruction increase a reader's phonemic processing abilities? Much evidence exists that phonological awareness can be taught. Liberman and Shankweiler (1987), Liberman et al. (1989) and Torgesen (2000) cite several studies that show training in phonological awareness helps early learners to read. Byrne (1991) argues that 10 phonological awareness needs to be taught - that it does not happen by learning to read holistically. Based on this discussion therefore, phonological awareness and, more specifically phonemic awareness, is a skill that, when assessed, can provide important information to teachers as they provide reading instruction to their students. Early Literacy Skills: Alphabetic Principle Having phonemic awareness skills alone are not enough in order to learn how to read. In other words phonemic awareness is a necessary but not sufficient aspect of a skilled reader (Phillips & Torgesen, 2006). Learning to read involves using phonological awareness to match the phonological structure of words to their alphabetical transcriptions (Byrne, 1992; Gough, 1992; Liberman et al., 1989; Perfetti, 1999). In other words, learning to read alphabetic writing systems includes the awareness that the individual letters that make up words merely represent sounds of spoken language. Understanding that written words are made up of individual letters and applying their letter-sound knowledge to these letters is necessary for children to analyze and to reliably and accurately read. This is called alphabetic understanding (Adams, 1990). The alphabetic principle has two parts (Big Ideas In Beginning Reading, 2004). First, alphabetic understanding is the knowledge that words are composed of letters that represent sounds or phonemes. The second part of the alphabetic principle is phonological recoding. This skill involves understanding the systematic relationships between letters and phonemes (lettersound correspondence) to pronounce or spell an unknown printed string. Phonological recoding consists of two tasks, the first of which is regular word reading (beginning decoding). Regular word reading requires that the word be read from left to right, that sounds for all letters can be generated, and sounds can be blended into recognizable words. A 11 second phonological recoding task is reading irregular words. Irregular words are those like "laugh" and "their" which do not conform to regular decoding rules. Irregular words are more difficult to read because the sounds of the letters are unique to that word or a few words, or the student has not yet learned the letter-sound correspondences in the word (Carnine, Silbert & Kame'enui, 1997). The third phonological recoding skill is advanced word analysis. The system that maps print to speech - the alphabetic principle- requires a strength in phonological awareness, and Liberman et al. (1989) state that "although both reading and speech require some degree of mastery of language, reading requires, in addition, a mastery of the alphabetic principle" (p. 5). According to Moats (1999) children who lack understanding of the alphabetic principle face many difficulties. Such children have difficulties understanding that words are composed of letters, cannot associate a letter of the alphabet with its corresponding phoneme or sound, will have difficulty identifying a word based on a sequence of letter-sound correspondences (e.g., that "ba" is made up of two letter-sound correspondences Ibl l&l). Other difficulties will be experienced when blending letter-sound correspondences to identify decodable words, using knowledge of letter-sound correspondences to identify words in which letters represent their most common sound, identifying and manipulating letter-sound correspondences within words, and reading pseudowords (e.g., "nug") with reasonable speed. Mastering the alphabetic principle along with phonological processing abilities is central to learning to read and should be a primary focus in early reading instruction. Perfetti (1999) argues that reading education should have a general goal of "making sure they learn the alphabetic principle, something that requires some attention to fostering students' phonemic awareness" (p. 54). 12 As with phonological awareness, whether or not instruction in alphabetic awareness is possible and has an effect on reading ability is an important consideration. Studies such as Foorman, Fletcher, Schatschneider and Mehta (1998) demonstrate that when given direct instruction in the alphabetic principle, student word recognition skills significantly improve. Early Literacy Skills : Reading Fluency A third area of the literature argues that another key indicator of early literacy is reading fluency. Collins, Good, Knutson, Shinn and Tilly (1992) define reading fluency as "the speed and accuracy with which a student reads words" (p. 460). More recently, Pikulski and Chard (2005) refer to a more recent definition: Reading fluency refers to efficient, effective word recognition skills that permit a reader to construct the meaning of text. Fluency is manifested in accurate, rapid, expressive oral reading and is applied during, and makes possible, silent reading comprehension (p. 510). Research shows that reading fluency measures are effective indicators of reading achievement (Deno, 1985; Good & Jefferson, 1998; Shinn, 1989). Criterion related validity strength of oral reading CBMs in relation to standardized tests and teacher judgments has been identified by Shinn et al. (1992). Collins, et al. (1992) also conclude that "oral reading fluency fits current theoretical models of reading well and can be validated as a measure of general reading achievement, including comprehension" (p. 476). Phonological awareness, knowledge of the alphabetic principle, and reading fluency, therefore, are three aspects of early literacy that play an important role in reading ability. They are measurable, and can be affected by appropriate interventions. There is a recognition that effective assessment tools are required for appropriate intervention, and that current assessment 13 techniques are necessary (Ysseldyke & Christensen, 1988). D'Angiulli and Siegel (2003) make a similar argument when they conclude that measures such as the WISC-R do not necessarily identify students with reading disabilities, and that measures other than IQ are needed to accurately identify students with learning disabilities. Good, Simmons, and Smith (1998) identify the procedures needed for an effective assessment tool. The assessment must identify students who are deficit in early literacy skills, provide ongoing feedback, evaluate the efficacy of interventions, identify accurately students with serious learning problems, and evaluate the overall effectiveness of intervention strategies. Dynamic Indicators of Basic Early Literacy Skills The Dynamic Indicators of Basic Early Literacy Skills (DIBELS) is a set of brief standardized measures designed for monitoring progress and for early identification of children with reading difficulties (DIBELS, 2004). Designed by the University of Oregon to be short fluency measures, DIBELS provide assessment information on a set of early literacy skills that are identified in the literature as being directly related to later reading success - phonological awareness, the alphabetic principle and reading fluency. The Phoneme Segmentation Fluency (PSF) measures provide assessment data for phonological awareness while Initial Sounds Fluency (ISF), Letter Naming Fluency (LNF) and Nonsense Word Fluency (NWF) provide indicators of a child's ability with the alphabetic principle. Finally, the Oral Reading Fluency (ORF) measure is the indicator of reading fluency. Recent research shows that DIBELS measures are a valid indicator of a child's progress towards the acquisition of early literacy skills. For instance, Kaminsky and Good (1996) found evidence of criterion-related validity in the areas of phonological awareness, vocabulary 14 development, and letter-naming fluency. Results indicate a significant correlation between Letter Naming Fluency (LNF) and Phonemic Segmentation Fluency (PSF), and the criterion measures which included the Stanford Diagnostic Reading Test and The McCarthy Scales of Children's Abilities. Coefficients ranged between .58 and .90 for LNF and between .63 and .73 for PSF. Weaker but still significant relationships were identified between the criterion variable and Picture Naming Fluency, another DIB ELS measure. Since the Kaminsky and Good (1996) study, replications have been conducted using the various DIB ELS tasks and a variety of criterion measures with similar results. For instance, Barger (2003) found a strong relationship between the DIB ELS Oral Reading Fluency (ORF) measure and the North Carolina End of Grade Test. In his study, Barger took the median score of three ORF readings of 38 Grade 1 students from North Carolina. A strong correlation of .73 was found to exist between the North Carolina End of Grade Test and DIB ELS ORF. Similar results were found by Buck and Torgesen (2003) when 1,102 third grade students were administered the ORF measure and the results then correlated with the Florida Comprehensive Assessment Test Sunshine State Standards. A coefficient of .70 was obtained between the reading comprehension component and the ORF scores. Other studies ( Shaw & Shaw, 2002; Vandre Meer, Lentz, and Stollar, 2005; Wilson, 2005) provide similar results. Strong correlation between both standardized achievement levels to DIB ELS and teacher ratings of achievement to DIB ELS were found also found by Elliot, Lee, and Tollefson (2001). A modified battery of DIB ELS measures was used and included fluency measures (LNF and Sound Naming Fluency) and ability measures (Initial Phoneme Ability, and Phoneme Segmentation Ability). The ability measures relate to the DIB ELS subtests of Initial Sound Fluency and Phoneme Segmentation Fluency. They were renamed with minor alterations for the purpose of 15 their particular study. Using 75 Kindergarten students from one district in the mid-western United States, the researchers found high correlations between DIBELS and four criterion measures: the Woodcock-Johnson PsychEducational Achievement Battery - Revised Broad Reading and Skills, the Test of Phonological Awareness, The Developing Skills Checklist, the Kaufman Brief Intelligence Test, and a Teacher Rating Questionnaire (TRQ). The TRQ was a five point teacher rating scale with which teachers rated student reading at the end of the year. The scale ranged from "well below average" to "well above average." Concurrent validity between the DIBELS measures and the achievement measures were found as correlations ranged from .68 to .75. In the same study, Elliot et al. (2001) also confirmed the predictive ability of DIBELS using a hierarchical regression analysis. In one analysis significant standardized beta weights were reported for the Woodcock-Johnson Broad Reading and Skill and for the Teacher Rating Questionnaire. Their results indicate significant correlations ranging from .56 to .70. Hinze, Ryan, and Stoner (2003) also conclude that DIBELS has strong concurrent validity. The researchers used a standardized test, The Comprehensive Test of Phonological Processing (CTOPP,) as the criterion measure. In their study 86 Kindergarten students from Massachusetts were given three of the DIBELS measures: LNF, ISF and PSF; these measures were correlated to CTOPP. The CTOPP is also composed of various measures, and while some correlations between the subtests of both measures were low (.08 for instance), the correlations between DIBELS and the composite scores were significant, with 8 of the 9 possible correlations ranging from .20 to .60. The literature also supports the reliability of DIBELS. Kaminsky and Good (1996) established significant alternate form reliability coefficients. The coefficient for Phonemic 16 Segmentation Fluency was .88 and .93 for Letter Naming Fluency. These results were mirrored by Elliot et al. (2001) who used interrater reliability, test-retest, and equivalent forms to establish reliability estimates. The reliability coefficients for Letter Naming Fluency, Sound Naming Fluency, Initial Phoneme Ability, and Phonemic Segmentation Ability ranged from .64 to .94 while the values for the composite scores ranged between .89 and .91. Despite being able to establish significant relationships between DIBELS and other measures, Kaminsky and Good (1998) recognize the inherent difficulties in assessing the early literacy performance of young readers. While only a year separates a Kindergarten student and a Grade 1 student, rapid changes and growth adds to the variability in performance over that year. Kaminsky and Good contend that DIBELS measures are not as predictive of future performance as a CBM may be with older students, and that "more data points may be needed to obtain the same amount of confidence in a performance estimates" than with older students (p. 121). Dynamic Indicators of Basic Early Literacy Skills in School District No. 57 As jurisdictions were required to develop District Plans for Student Success, School District 57 began searching for an early literacy assessment. DIBELS was seen as a viable choice. School District No. 57 (Prince George) adopted the use of DIBELS as its early literacy assessment tool to serve two primary functions. First, DIBELS provides Kindergarten and Grade 1 teachers in School District No. 57 assessments to assist in the identification of students who have early literacy skill deficits. Such knowledge guides teachers as they choose appropriate instructional and intervention strategies to correct deficits that are identified by the assessment. Besides acting as a feedback mechanism for educators, DIBELS would help track early literacy trends in the schools and in the district. From 2003 to 2004, School District No. 57 17 developed a data matrix - a table and timeline of the types of data each school in the district is expected to collect in order to facilitate school-based decision-making. Once collected, student scores from the variety of measures are entered into a district-wide data bank. As educators use assessment data such as DIBELS in developing intervention plans, information is required about how the target student compares to other students in the local population (Kaminsky & Good, 1998). Therefore, in the winter of 2002, local district DIBELS norms were established for School District #57. All District analyses were performed under the supervision of Dr. Peter MacMillan, a professor at the University of Northern British Columbia. By the winter of 2005, all elementary schools were expected to use DIBELS as their primary early literacy assessment. Each January, Kindergarten and Grade 1 DIBELS scores are collected by the elementary schools in the District. This data is used by District staff in the development of the District Plan For Student Success, by School Planning Councils for their respective School Plan for Student Success, and by classroom and learning assistance teachers as they plan their instruction. The creation of goals, the making of recommendations and the implementation of interventions and support programs from the District to the classroom level is, therefore, partially based partially on DIBELS data. Purpose of the Study The DIBELS research field is lacking research on two levels: research with a Canadian sample, and research using teacher-assigned grades as the criterion variable. Most of the literature uses a standardized measure to determine the validity of the DIBELS measures. In the early primary grades, year-end assessments are not produced by standardized tests. Instead, teacher assigned grades based on a rich assortment of records, anecdotal notes, and assessments 18 contribute to the assigning of a year end grade for students. How DIBELS relates to teacherassigned grades is an area that lacks research. Similarly, studies using a Canadian sample, and more specifically a northern British Columbia sample, are also lacking. Socioeconomic, cultural, and ethnic factors create a unique population and diversity. The District Data Summary 2001/02 - 2005/06: 057 Prince George. Ministry of Education (2007b) provides some insight into the demographics of School District No. 57 (Prince George). On many levels, the population in the District is economically disadvantaged when compared to the province. Education attainment levels, for instance, indicated that in 2001, 10% of the District population had a University degree while 17% held a University degree at the provincial level. Unemployment rates in the District were higher than the provincial average as was the percent of individuals on income assistance while the percent of single parent families exceeded the provincial average. The District is becoming increasingly culturally diverse. Almost 2% of households speak Punjabi or Chinese as the primary language in the home, and 7% of the students are identified as English as a Second Language students. Students of aboriginal ancestry comprise 20% of the student population in the District. DIBELS has been developed using an American sample. Using a different sample would broaden the base of validity research and provide some data useful for School District No. 57 . Besides contributing to DIBELS validation research, this thesis will also produce DIBELS benchmarks for School District No. 57. A benchmark is a reference or measurement point that can be used for comparison purposes. In the case of DIBELS, a series of benchmarks for each of the measures has been created by the University of Oregon Center for Teaching and Learning (2004). Along with the benchmark is a risk indicator. For instance, a score below the 19 benchmark of 4 for Kindergarten ISF at the beginning of the year indicates the child is at risk with this aspect of reading. A score above the benchmark of 8 indicates low-risk. Currently, School District No. 57 is using the University of Oregon's benchmarks for identifying DIB ELS cut-off points in the identification of at-risk readers. The District is interested in providing locally-based benchmarks (S. Fewster, personal communication, February, 2004). The University of Oregon benchmarks are based on a population and demographic different from the one in School District No. 57. Establishing benchmarks using a local sample can be beneficial in the identification and intervention processes for young at risk readers in the District. Research Questions One of the key measures that is mandated to be collected by School District No. 57 in its data matrix is DEBELS scores. Because of the reliance on DIBELS data on decision making at so many levels, educators want to be sure that the instrument is valid. Educators and parents want to be confident that reading deficits are correctly identified and appropriate intervention strategies for correcting those deficits are chosen at such formative stages of learning and language development. Educators, policy makers, and the public want to be confident that scarce educational resources are being directed towards real needs as they identify school improvement goals. In order to address these considerations, this study investigates the following question: What is the concurrent validity of DIBELS and what is strength of the relationship between DIBELS scores and other reading measures used by Kindergarten and Grade 1 teachers? Besides answering the above question, this study will also establish DIBELS benchmarks using a Northern British Columbia sample. Benchmarks for Kindergarten and Grade 1 will help 20 teachers in their assessment of the degree to which a student may be at risk at reading using local samples. CHAPTER 3: METHOD Sample Subject Selection The population for this study is School District No. 57 students who were enrolled in an elementary school in the 2003-2004 school year. The schools range in size from 26 students to over 500 students. While most of the students are registered in Prince George city schools, the district does include several smaller rural schools. The subjects of this study were students who were registered in Kindergarten and Grade 1 in the 2003-2004 school year. Not all students registered in these grades were subjects in this study. In January, 2004, all teachers of Kindergarten and Grade 1 across the district were required to administer an early literacy assessment to students. Kindergarten teachers had a choice of using either DIB ELS as the main performance indicator of early literacy skills or a University of British Columbia early literacy screen. Only scores from those schools who used DIBELS were used in the study. The use of an early literacy screen was optional for Grade 1 teachers. Those who did administer one used DIBELS. Moreover, while the district recommended that Grade 1 reading CBMs be administered, the use of these measures was a school-based decision. Therefore, the sample size was reduced by those schools not administering Grade 1 CBMs. All Kindergarten and Grade 1 students were assigned a performance level mark for Language Arts (LA) on the year-end reports. This study used only those schools who could report on all the variables in each of the years. Other exclusions included special needs students who met the criteria in the Guidebook for the Use of Curriculum Based Measurement in School District #57 (Prince George School 22 District No. 57, 1996). Consequently, students who were visually impaired, hearing impaired, multiply disabled, mentally disabled, or had English as a Second Language were not analyzed in this study. Instruments Predictive Variables DIBELS The DIBELS measures assess the following areas of early literacy: phonological awareness, alphabetic principle, and fluency with connected text (DIBELS, 2004). Phonological awareness, the ability to hear and manipulate sounds in words, is measured by two tests. First, Initial Sounds Fluency (ISF) considers a student's ability to identify and produce the initial sound of a given word. The examiner presents and orally names four pictures (see Appendix A for samples of all the Grade 1 DIBELS measures). The child is then asked to point to the picture that begins with the sound produced by the examiner. In the second part of the ISF phonological awareness assessment, the examiner asks the child to produce the beginning sound for a word that is presented orally and matches one of the pictures. The length of time it takes the child to produce the initial sound for the twelve given words is recorded, and a calculation is made to convert the time into an Initial Sound Fluency score. The second phonological awareness test is the Phoneme Segmentation Fluency (PSF) test. The PSF measure allows the examiner to assess a student's ability to fluently segment words into their individual phonemes. A phoneme is the spoken sound that makes up a word. For instance, the word "big" has three phonemes: "/b/ l\l /g/". In the PSF test, the examiner orally presents words comprised of three to four phonemes, and the child is required to orally produce 23 the individual phonemes for each word. After a student response, more words are given. The correct number of phonemes identified in one minute is recorded. Alphabetic principle includes alphabetic understanding, the notion diat words are composed of letters that represents sounds, and includes phonological recoding which involves saying an unknown word by using the relationship between letters and their corresponding phonemes. DIB ELS uses a Letter Naming Fluency (LNF) test which measures the number of letter names a child can correctly identify in one minute. The test consists of several rows of randomly arranged upper and lower cased letters. The student is given one minute to correctly identify as many letters as possible. Another measure of alphabetic principle is taken with the Nonsense Word Fluency (NWF). This test assesses students' knowledge of letter-sound correspondences as well their ability to blend letters together to form unfamiliar "nonsense" words (e.g., "tig") The child is given a random list of words with varying forms of vowelconsonant words (e.g., "ib") and consonant-vowel-consonant words (e.g.,"neg"). The child can respond by saying the nonsense word as a whole, or by verbally saying each letter sound such as /n/e/g/. The final score is the number of correct letter sounds identified in one minute. Higher scores are given when the child can phonologically state the word instead of stating the letter sounds in isolation. Fluency through connected text, the third area of early literacy, is measured though a test called the Oral Reading Fluency (ORF) test. With this measure, the child reads a selection of text at a designated level. The number of words the student reads correctly in one minute becomes the ORF score. The various DIB ELS measures are used at different times of the year: fall, winter, and spring. In this study, the spring measures will be used because such testing is typically done in 24 May and, therefore, close enough to mid June when teachers begin writing their final reports. The only exception is the Kindergarten ISF assessment. This study uses the winter assessment since there is no spring ISF test. The three Kindergarten DIB ELS measures, therefore, are Initial Sound Fluency, Letter Naming Fluency, and Phoneme Segmentation Fluency. The three Grade 1 measures are Phoneme Segmentation Fluency, Nonsense Word Fluency, and Oral Reading Fluency. Criterion Variable School Based Performance Levels Primary school teachers provide formal reports on student progress three times throughout the year. Two of those reports are given to parents during the school year, and the final June report reflects student progress and achievement to the end of the grade. The reports must comment on the "student's school progress with reference to the expected development for students in a similar age range" (Ministry of Education, 2004). During the year, teachers assign a numerical value to a student's reading status on progress report cards. This mark corresponds to the comments necessary for reporting to parents the student's development in relation to students of a similar range age. At the time the data for this study were collected, teachers could report student progress by using one of the four scores and accompanying comment for each learner outcome identified on the report card: 1 not yet meeting widely held expectations 2 minimally meeting widely held expectations 3 fully meeting widely held expectations 4 exceeding expectations 25 On each report card, separate performance levels are assigned for reading and writing. For instance, writing may receive a performance level of 2, while reading may receive a performance level of 3. At the end of the year, however, reading and writing are combined to create a single Language Arts performance level which is reported to the Ministry and identified in the student's Permanent Record Card as a verbal comment, not as a number. The Language Arts scores used in this study represent three possible categories or comments that were available to teachers for reporting student achievement on the Permanent Record Card: not yet meeting expectations, meeting expectations, or exceeding expectations. Although the final Language Arts mark used in this study is a reflection of the combined reading, writing, listening, and speaking levels, informal discussions with primary teachers in the Prince George School District revealed that reading was identified as the determining skill when providing a final Language Arts performance level for students for the Permanent Record Card. A study of concurrent validity requires that the criterion variable be accepted as valid. Teacher assigned grades that are reported to the Ministry via the Permanent Record Card are considered a valid measure in the British Columbia public school system. From Kindergarten to Grade 11 the Ministry uses teacher-assigned grades for students' Permanent Record Card, and even in cases where Provincial Final Exams occur, fifty percent of students' final grades are based on teacher-assigned grades. While teacher-assigned final grades are not b ased on standardized tests at the elementary level, these grades are still considered valid. First, the Ministry of Education collects and accepts teacher assigned grades, not standardized test scores, as valid indicators of student achievement; the Ministry does not require any justification for the teacher-assigned scores. Secondly, teacher grades become valid when considering the implications associated with them, especially at the kindergarten and Grade 1 level. Identifying a 26 student as "not yet meeting expectations" when reporting to parents often initiates a series of interventions including conferences, more in-depth assessments, school-based interventions, and possibly referrals to other agencies. Other educators rely upon the assigned grades of previous teachers as well when making educational decisions. When developing classroom composition, for instance, or when anticipating learning assistance demands, school planners will use teacher reporting to guide their decisions. Similarly, when a student transfers to another school, the teacher assigned grades provide important information that assists the receiving school in deciding how to best accommodate that student's needs. Reading CBM Measures According to Jenkins, Deno, and Mirkin (1979), a curriculum based measure (CBM) is described as an assessment that is directly related to a student's curricula, is short in duration, is capable of having multiple forms, is low cost, and can indicate progress over time. Many CBM measures exist and are used to assess variety of skills associated with mathematics, writing, and reading. The CBM of interest in this study is the reading CBM. A reading CBM is a one minute oral reading of a passage from a classroom text that measures reading fluency. The number of words read correctly in a minute is used as the measure. It identifies the number of words read correctly. To avoid confusion over the type of CBM being discussed, and because the reading CBM measures words read correctly, this study uses the acronym WRC when referring to the reading CBM. Although a rather simple assessment, the WRC measure has proven to be a useful and valuable assessment tool. Since the initial CBM validity study (Deno, Mirkin & Marston,1982), an abundance of validity studies have been conducted (see Marston, 1989) and have shown that 27 the Reading CBM is a valid indicator of reading ability. Using such criterion measures as the Stanford Diagnostic Reading Test, and the Woodcock Reading Mastery Test along with a numerous other reading assessments, the researchers found correlation coefficients ranging from .73 to .91. Locally, School District No. 57 (Prince George) has recognized the validity of WRC assessments (Fewster, 2000). This assessment tool has been used to provide a "consistent standard for decision-making across schools" (p. 2). Probes that measure WRC and are used by School District No. 57 were originally normed in 1995 and renormed again in 2003. The readings that received minor modifications in 2003 are comprised of short passages taken from reading anthologies used in the schools. Five levels of reading fluency are identified: well below average, below average, average, above average, and well above average. The cut score for these levels are simply the percentile scores corresponding to Pio, P25, P50, P75 and P90 .Samples of the reading probes can be found in Appendix B. Procedure Data Collection In order to facilitate data management, School District No. 57 provided schools with a data management tool called the Performance Standard Files (Prince George School District No. 57 , 2000) that enables electronic storage and retrieval of school performance indicators. Included on these data bases are WRC scores, and DIBELS scores. June report card marks are stored on a different system called Turbo School (Wong, 1985). This data management system stores a variety of ministry required student information including core subject performance levels. 28 The administration of WRC and DIBELS tests varied between schools. In some cases, trained Learning Assistance / Support Teachers administered the tests, while in other cases, classroom teachers trained in the administration of DIBELS were involved. In order to gather the necessary data from the 2003-2004 school year for this study, the School District was provided with a brief proposal outlining the purpose and method of the study. The school district granted approval, (see Appendix C) and the school principals were informed of the study by Ms. Bonnie Chappel, the Director of School Services. Principals from each participating school arranged for the necessary data to be forwarded. In most cases the data was forwarded electronically, but in some cases, hard copies of the data were forwarded and the data was changed into electronic form. A step-by-step procedure for data retrieval and instructions on forwarding the data were provided. Because some schools did not use DIBELS as their early literacy assessment in Kindergarten in 2004, the data from only those schools which used DIBELS are included in this study. Data Cleaning and Screening Sample Size The data for the study were gathered, but required sorting and matching to the students' Personal Education Numbers (PENs) as the scores that were imported from the Performance Standard Files (Prince George School District No.57, 2000) could not be exported with the corresponding PENs. Even though PENs are a required piece of data that must be entered into that particular data base, a technical incompatibility made impossible an electronic transfer of that piece of data. Therefore, the students' names, PENs and matching final report marks were imported from each participating school's TURBO data base on a separate file. Some arrived as hard copies while others were electronic. Using student names and PENs, the scores from the 29 various measures were matched to allow for an analysis using the statistical program SPSS 2 Windows version 13.0. Student names were deleted once the matching was completed. The sorting eliminated many cases, especially at the Grade 1 level. The Kindergarten sample size was large, approximately 500 for each measure, due mainly to a District requirement that schools collect DIB ELS scores at the Kindergarten level for its early literacy monitoring purposes. The Grade 1 sample size, in contrast, was smaller and numbers varied greatly between the different DIB ELS measures. The sample size of Grade 1 students was potentially 991 based upon the available final Language Arts report marks, but were reduced (see Table 1). Several factors combined to reduce the sample size. First, only those students who had at least one score on any of the assessments were included, thus reducing the samples size of students with Language Arts (LA) scores to 673. Further, not all schools used DIB ELS and WRC measures at the Grade 1 level, and in one case a school had the data in a form that could not be used in this study. The largest sample for the Grade 1 criterion measures was the WRC assessment with 712 scores, while the number for the final LA scores was fewer. This discrepancy may be explained by students moving out of the district before the end of the school year resulting in having a WRC score but no LA final score. Another factor that reduced the sample size was the manner in which the some students and their scores were identified. The Performance Standards File used by School District No. 57 schools requires both a student name and a PEN number before data can be saved. In numerous cases "fake" PENs were entered beside students. For instance, PENs such as "11111" or "3333" were used instead of the actual PEN. Fake PENs are used in instances where testing on a student is done before a PEN is available. For example, a student can often arrive to a school before his 2 Statistical Program for the Social Sciences not to be confused with the School Plan for Student Success. or her Permanent Record arrives from the previous school. Consequently, many LA scores could not be used because they could not be matched to the fake PEN numbers that accompanied the DIBELS and WRC scores. The final n for Grade 1 is found in Table 1. Collapsing Language Arts Categories Besides the sorting and matching task, the Language Arts scores were coded to assist with the analysis. The first task involved turning LA marks from verbal comments of either "not yet meeting expectations" or "meeting/exceeding expectations" into numbers. A system that reflects performance levels on the report cards was used. A " 1 " was assigned to the "not yet meeting expectations" category, and a "2" was assigned to the "meeting/exceeding expectations." However, three schools, differentiated the second category and identified students as either "meeting expectations" or "exceeding expectations." Because this differentiation occurred in only three of the schools, students who were given "exceeding expectations" final grades were also assigned a "2". A final task in the data preparation involved assigning numerical rankings to the students on Individual Education Plans (IEP). Keeping in line with Ministry regulations, Language Arts grades are not given to students on IEPs. Those on IEP's were given a " 1 " for the purpose of the study since students are provided with IEP's in designated subject areas because they are unable to meet the intended learning outcomes for the grade, thus not yet meeting expectations. Data Analysis Descriptive statistics were first calculated for each of the variables at the Grade 1 and Kindergarten levels. Means, standard deviations, median, and skewness was calculated for the dependent variable (LA final marks) and the covariates which included all DIBELS measures and WRC scores. A measure of concurrent validity of DIBELS was created by calculating Pearson's product-moment correlation coefficients between the 2003 Kindergarten LA final marks and spring DIBELS subtest scores. A second test was done using 2003 Grade 1 LA final marks and spring reading WRC and DIBELS measures. A second analysis was completed to address the use of a two-point scale in this study. Using a two-point system for Language Arts Scores created an artificial dichotomy. The two point scale that was used for Language Arts is not a true dichotomy (e.g., being either an elementary or high school student). Rather, the dichotomy created by making the Language Arts mark a two-point system is what Glass and Hopkins (1996) call an "artifact of a crude measurement" (p. 135). In other words, underlying the "meeting/not yet meeting" dichotomy is a normal distribution that has artificially been altered to create a dichotomous distribution. Using other measurement techniques, it might be possible to create a more normal distribution of final Language Arts marks. Therefore, in order to get an approximation of the product-moment correlation between the hypothetically normally distributed Language Arts marks and the independent variables, a biserial correlation between the variables was calculated using the following formula (Glass & Hopkins, 1996, p. 137): Setting benchmarks for the Kindergarten and Grade 1 DIBELS involved using a contrasting groups method similar to that as identified by Nedelsky (in Crocker & Algina,1986). This method requires that two groups with differing proficiency levels be identified and tested. The score distribution of each group is plotted on the same graph. The intersection point of the two curves becomes the cut-off point or benchmark. As will be seen in the next chapter, the high levels of skewness of most data in this study resulted in an alteration of the procedure. The benchmarks resulted in two risk factor calculations. A risk-factor is a percentage that identifies the likelihood of an event occurring. While some students scored below the benchmarks on a particular DIBELS or WRC measures and were identified as not meeting Language Arts expectations, others were identified as meeting expectations. In other words, there were cases where a student scored below the benchmarks on a measure, but received a passing grade in Language Arts. Therefore, a risk factor was generated. The first risk factor is the percentage of all students who scored below the benchmark on a test and who were also identified as not yet meeting expectations in Language Arts. For example, a benchmark for a measure with a risk factor of 25% indicates that a student who scores below the benchmark has a one in four chance of receiving a "not yet meeting expectations" in Language Arts. These results are identified in Table 5. The creation of benchmarks in this study led to another analysis and a subsequent, second set of risk factors. As with the DIBELS and WRC assessments, some students who scored below the benchmarks on all the measures were still identified as meeting expectations. Upon closer examination of the data, it became evident that all the students who were identified as not yet meeting expectations scored below the benchmark on at least two of the measures. Yet, other 33 students who scored below the benchmarks on at least two of the assessments were identified as meeting expectations in Language Arts. Therefore, a risk factor was generated by calculating the percentage of all students who scored below the benchmark on a certain number of tests and also identified as not yet meeting expectations. These risk factors can be found in Table 6. A chi-square test was then used to analyze the relationship between the student Language Arts scores and their scores on the DIB ELS measure. Student DIBELS scores were transformed into nominal numbers representing scores that were above the School District's benchmark and below the benchmark. These scores were then correlated with the Language Arts scores. The resulting chi-square values are found in Table 7. 34 CHAPTER 4: RESULTS Concurrent Validity Descriptive Statistics The descriptive statistics for both Kindergarten and Grade 1 revealed similar observations - the Language Arts marks for both groups are highly negatively skewed. As indicated in Table 1, for the Kindergarten Language Arts (LA) mark distribution, skewness was -3.66 while for the Grade 1 Language Arts mark the skewness was -3.06. A high negative skewness indicates an abnormally large number of cases distributed to the right of the mean. The high negative skew shows, therefore, that a large proportion of the Language Arts marks report students as meeting or exceeding expectations in Grade 1, and even more so in Kindergarten. The mean score > 1.94 (out of 2) and a small SD (< .27) also show that the vast majority of students scored a 2, or "meeting expectations" in Language Arts. On the other hand, the WRC and DIBELS measures were less skewed in their distributions. The most highly skewed distribution in the Kindergarten distributions were the Phoneme Segmentation Fluency (PSF) scores with a skew of 1.70 and Initial Sound Fluency (ISF) scores with a skew of 1.11. Not only was the skewness significantly less than the LA scores, but the skewness was positive rather than negative. The positive skewness of these measures indicates that while some students performed well on these measures, the scores tended towards the lower ranges. Such a result is opposite to the LA scores. Similar results were produced with the Grade 1 data. The scores with the highest skew were the reading fluency scores - Words Read Correctly (WRC) and Oral Reading Fluency (ORF). Again, as with the Kindergarten data, the skewness was not as significant as were the LA results, and the skewness was opposite to the LA results. 35 In Table 1, the frequency if) refers to the percent of students "meeting expectations" in the case of the LA marks while the frequency for the DIB ELS measures is based on the University of Oregon benchmarks. The frequency of students meeting the LA expectations was high at the Kindergarten level (94%) and at the Grade 1 level (92%). The disproportionate distribution of the LA scores resulted in the skew mentioned earlier in this chapter. Table 1 Descriptive Statistics for Kindergarten and Grade 1 Kindergarten Grade 1 Test n f Mean Median SD Skew LA 503 94% 1.94 2.00 0.24 -3.66 PSF 451 55% 15.45 13.00 15.11 1.70 NWF 462 63% 14.61 10.00 10.74 0.82 ISF 513 60% 12.38 8.00 12.61 1.11 LNF 463 55% 22.44 18.00 16.99 0.51 LA 673 92% 1.92 2.00 0.27 -3.06 PSF 202 95% 36.96 39.00 17.93 -0.09 NWF 168 83% 55.10 51.00 28.19 0.66 ORF 178 71% 37.98 32.00 27.05 1.54 WRC 712 85% 46.16 37.00 32.28 0.90 In the case of the DIBELS measures, the frequency of students scoring above the benchmarks is lower in Kindergarten than in Grade 1. At the Kindergarten level, between 55% and 65% of the students met or exceeded the benchmarks. By Grade 1, between 70% to 95% of the students scored at or above the benchmark on the different measures. At the Kindergarten level, the PSF had the smallest frequency of success as 55% of the scores were at or above the 36 benchmark. In contrast, the measure with the highest frequency of success at 95% was PSF at the Grade 1 Level. Patterns in Convergent Validity The correlation coefficients in Table 2 indicate no significant relationship between the DIB ELS measures and final Kindergarten LA grades. The correlation coefficient values between the predictive and criterion variables suggest a trivial effect size between the DIB ELS measures and the final Language Arts scores for Kindergarten. Effect size is a judgment of the strength of relationship between two variables. The magnitude of the effect size of Pearson's r is said to be trivial if \r\ < .1, small if > .1, medium if \r\ > .3, and large if \r\ > .5 (Cohen,1988). All effect sizes between Kindergarten LA scores and the DIBELS measures were less than .1, and none of the correlation coefficient values were statistically significant at the p < .05 level or the p < .01. The highest correlation coefficient for Kindergarten was -.09 between LA and ISF scores. However, not only is the highest correlation coefficient considered trivial using Cohen's effect size guidelines, it is negative, suggesting that a high ISF score has a trivial relationship with a low LA mark. The highest positive correlation at 0.02 was between the LA final mark and Letter Naming Fluency (LNF), but remains statistically insignificant. Such results are what might be expected with such high degree of negative skewness in the predictive variable and more normal, but still positive skewness distributions amongst the criterion variables. The n ranged from 513 to 451 for the DIBELS measures, and n = 503 for Language Arts. 37 Table 2 DIBELS Measures Correlations for Kindergarten and Grade 1 Test Kindergarten LA PSF ^03 PSF NWF ISF LNF m T09 m .53** -.02 .47** .11* .38** NWF ISF .03 PSF Grade 1 LA PSF ^7* NWF ORF WRC 31* 31* 30* .36* .33* .31* .79* .72* NWF ORF .87 * Correlation is significant atp < .05 level (2-tailed). ** Correlation is significant atp < .01 level (2-tailed). Some significant correlations were found internally, however, among the DIBELS measures. The strongest correlations exist between Nonsense Word Fluency (NWF) and the other DIBELS measures. They ranged from r = .53 (NWF and PSF) to r = .11 (NWF and ISF). The weakest correlations were between ISF and the other DIBELS measures. Interestingly, PSF produced two of the highest r values (.53 and .47) as well as the lowest value (-.024 with ISF). Moreover, the results show no significant relationship between PSF and ISF (r = -.024) despite both being indicators of phonemic awareness. That the ISF scores represented winter scores while the other DIBELS measures represented spring scores may have had an impact on these results. 38 The results for Grade 1 found in Table 2 produced higher correlations between the criterion and predictive variables. Despite the skewed LA results, a medium effect size was obtained between the criterion and predictive variables. A fairly consistent relationship between the LA and the criterion variables appears to exist as the r values between the measures were quite similar. The strongest relationship was between Language Arts and DIBELS with the NWF measure both (r =.31), while the weakest was with the PSF measure (r = .27, p < .05). At the Grade 1 level, the relationship between the DIBELS measures and the WRC measure was high. As might be expected because of the similarity of the tests, a high correlation was found to exist between ORF and WRC scores (r = .87), and the correlation between these two measures and NWF were almost identical. The n was 673 for Language Arts and 712 for CBM while n ranged from 161 to 202 for the DIBELS measures. In contrast to the Kindergarten results, the PSF measures produced the lowest r values between it and the other measures as the coefficients ranged from .31 to .36. However, similar to the Kindergarten results, the DIBELS measure that produced the highest correlations involved the NWF measures (.76, .36, and .79). A biserial correlation was calculated because of the dichotomy created by reporting the LA scores on a two-point (meeting expectations - not yet meeting expectations) scale. Calculating a biserial correlation increased the magnitude of the r value for both Grade 1 and Kindergarten correlations (see Table 3). A biserial correlation analysis approximates a correlation by taking into account that the LA scores would typically resemble a normal distribution. The biserial correlations suggest a significant relationship of a medium magnitude between the Grade 1 LA scores and the independent variables. Correlation coefficients ranged 39 from .48 to .56. On the other hand, while strengthened, the correlation between Kindergarten LA final marks and the independent variables remained non-significant and trivial. Biserial correlations for the Kindergarten results ranged from -.06 to .03. Table 3 Biserial Correlations for Kindergarten and Grade 1 with Language Arts Marks Test Kindergarten Grade 1 LNF !ol ISF -.02 PSF -.06 .48 NWF .02 .56 ORF - .56 WRC - .54 Benchmarks Establishing benchmarks for Kindergarten and Grade 1 DIB ELS was challenging. Following the procedures identified by Nedelsky (in Crocker & Algina, 1986), each of the frequency distributions of predictive variables was plotted. One frequency line represents students who were identified as not yet meeting expectations in LA. These students received a score of " 1 " for their final LA score. The second line represents students who received a "2" in LA, or those who were meeting expectations. The Nedelsky procedure would typically have two symetrical curves, and the cut off point would be the intersection at which one curve begins rising and the other curve begins declining. Such normal curves were not produced with the data in this study. Because of the skewed nature of most of the distributions, establishing an 40 intersection of the two curves was problematic. Figure 1 exemplifies the nature of the difficulty. Note that the intersection of the curve occurs at the highest score on each curve. LA Final: 1 (not meeting expectations) LA Final: 2 (meeting expectations) 0-8 9-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 SI - 89 90-98 99-107 108-116 117-125 126-134 135-143 144-152 153-161 NWF Score Figure 1 Frequency for Kindergarten NWF & Language Arts final using a score category of 9. The skewness of the distribution and the resulting difficulty in finding an appropriate intersection point meant redefining the notion of a benchmark or cut-off score. Instead of being a line that establishes a definitive score at which a cutoff can be made to distinguish between at risk and not at risk readers, this study identifies a benchmark score that indicates students who have a certain degree risk of not meeting expectations. In other words, this study establishes a score or cut-off for which students who receive this score or less have a certain degree of risk at being identified as not yet meeting expectations in reading. In order to identify such score, instead of choosing a specific cut-off point at the intersection of the two curves, a point was chosen that approximates an intersection point on the two curves (or a point as close as possible) where the distribution curve of students receiving a " 1 " (not yet meeting expectations) for an LA final mark began a noticeable decline and the 41 distribution curve of students receiving a "2" (meeting expectations) for an LA final mark began noticeable rise. A perpendicular line was drawn through these points and the score corresponding to that line became the benchmark. Figure 2 provides an example of the process. In this particular case, the cut-off score for Grade 1 NWF is 35. _ _ LA Final: 1 (not meeting expectations) ....... LA Final: 2 (meeting expectations) Cut Off Score a 3 on I ,5? £s.< £ FgfS -^. -* « ai r-. f? X 1 fll 1 ft 3- 1 1m g - 1 EF" n list ti 3 ™*2 •R p. & 5 I © I" s F" mm PIP S" J o H « ^ £ i SB a H y i If? -SF I? 2 g.s ig 1 a 4 s. ft- s 5 •53 i a 0* •3* £& ST* "33 •CJ e> •-. 5M rt 1 1 Ujt £aL. va S * •5! r**' o s CL p. " I r; f •8 a *•> 5 3 S fit s "-.# 3* .**. ," lir o iw 3 1 2" ^. P & 3 =•' ^ c 3 ^s 5 ft i 6 ,S H •2 H s» 3 _>. a Z sn •§ 3 J5 3 S >. *.* iCfi C_ I * ft £g c '•s I* 5 5 P ire re -I Zl P 1i .Ti .« ifl ' l - I Si1 ^ 1- K S fH^ ••£ f* 11 3! Ei J? ^ *C •« 3* li *? £S *I H s 5*? l | | in Is 1*1 S fr r*. 1" 1 |£- ?D, g i 1£f=!= (—• rn .« ' • *i u1 It £ p. 3 IX. or » (JSi 5? e £ s« ?1 s B I i I 8 I fV f «i; £ 3. I i. ^ '--. % •—. "—^ s s "Hi MS S3 • • ^ ••-w 1: • - - K. D ?r . .^r ^^" C -^ f ^ i ne c V* 3 o < n u L-j §» ?: -3 ^ G c i. ^ ^ m "*•-. 5" £?.' ~^. % 1" Z2- :n £ £_ er c 2 •J *"^ •w Fi "-%. Q c -„ s V'l -,. ( T if Tl" ^ r? •-^ ^; C £E -^ --. ?: S- s ^: «" a •IP I Ei, •w •- , f• j :r cv. •i TO f 43 ^ o j?_m 1 .. -^^ -... --. g; e 8. d' --. "J•4 B cr. 9 -Tl C r? a £_ i I i? Lft '& 'JT O Sample Letter Naming Fluency (Fall Only) - Grade One Hcinihrnark 1 Ijellfir Naming fluency SiuurL 1- t?rm Directuma Here, are some letters (yuiid t-j d« sanka tuobil TeU me the names <*}'«* mtiny fvU**n itsyw *•**• H^tn fxav '''ASJJJVJ mart here xixiiciltefitfull-Mm ami git atT&s} the page ([wim) Poinfitt dflW< fcjftv ffltii UN me £fti>ruiwf. of'that lifter. Jfytta cametti a tetter ytm daw't knmv I'll tell itttt vm*. Patyaurjinffcrtm the first hitif. Rewfy, begin. q T n * S ft G F> W U. IP .s 3 t B P A .1 . z c V 1 r D 0 K r N c: z 1 a r p e r IT z R M P T p m V X w K E V d r n z n i N li X F a H R Y o w J ii w TTl d R 0 i V c h J 1 ft h I, 0 A B U IJ Q V Y l L F k X Y M u k b K _...&..... T n •x. S K Total: 72 -a a s-< "KrimKt;n. o Hcttcbmarl; 3 hie 7. u r 1ac ft P bif J>:f-.OK« I T I' n K MO I? KSI :UN'HK If Lit clliUdn'S L»tJTO3ffl T-vjIliTii ^ -=i!Lia:i;li nr ie*p<»iuls lnfnmitllj-, nay H-hde WXll 'Vim "tfTHJ-ifiiT linjjdr Kuil htttita'J: lilt Wju&lc m-'SJiH). faUdtke d nj i u- zee E i tin f.flflfr at this tvtntt ipoLnttai > fir*- w"nl mi '.l-upaHlccvrcbO- It'x * mffftriJjefffrtvc HwrJ. H'VirtArtKTiTf/rf/jl**'Hw«/'.'/ # / / ! / / j » / "jwrr" fp:iiir i« i-.t-li lut-ji -Jieiiniiiyoir tin?™- lad I!ITIH:II!I "JIC whole word) / can nay the sottmft ct'tfm Aimrw, /s/j'v'/mr'Hiotitit- MtshlrKsrl, AT / a w l^Hr iiww to reditu nmfte-brfirw word. Road this mint the fttot vna ran (,!t iul ts UK w ^ "hi1."), ittjpfce irwir^aM sat titty Sflitndx yast ft-rnnv, i)i i •£. dos w ij vrv/VMSM *r "tar O 0V j OZ BFtHnttrhvr, yna txttt uty the sounds w you can say the. n>h#la ward. Watch ntti ttw,t0ttitdfar£&'/H//ttiji<3k.'t<: •3IK,:I ItM.J) irr "iut" una your Han*! li* Ihrn^L ihevrliHRiv-Hnl). Lets try a^aln. Heatltbh tvimlthe best y&U cat) rpuuu EirvrPni'jcm: In lilt wiud '".ul"'l. Tlnce-rlw sludpnil copy o[ ttc: pmLvJU fusil uf (lie ci»:li$ Pfl^u J J Sec p. 4 ftis bocldcl loi dLr«£nns. <:• JL01 G«od A Kauiiii&kj ffere are wme ffim tuaktt'betiepe ward* <-p>\r.\.'v Uival-.iik.-jit IXTJIC) Start h^ff- fjxiini in ihe :ir PPfayj I #«/ "bv-gi't", rf-ni thf. wtHtf.t i&t bestyjou ant, Futyvnr fl/tgtr an the first mmt. Hemfy.be^tt, Pivmpf: If a child moans p v u / f afwhut to dttt am) "Rmmber you out paint m eaeh letter and ieffme thf stwnd&r read the what*, word h^e 4 Total: |>ej a o not administer i f the stnderul sa^eil below 10 on paasanc L IdKifti^jKi / say hifgi/r, staii reading aloud at th* rtits of tit fo\x a bracks (]) fitter Thn last word imd say " Stop." My Succei' '[>H(H I suit so happy! 1 ju=d frvund ouL 1 van \x uii the soccer 14 team. We have our first practice uil 3ftlunJsy. Wc prnciicc aL 15 3'1 mv EGTIQQA right allcr lunch. Our leam i i cal led lire Bine Bombers. Our ixilors aye blue -11 and while so I |J(JL LU wear Mue slioKs find F blue and whits <3 shirt. The number o n my shirt i s seven. I ' m &0i-#n years o l d . 67 l o u . 1 d u n k suvtm musl be iwy iiwlcy mtnilicr, 7* W c play GW first&Jims next week n n Mdturduy. I cm:"I $i wa.il l o play. M y dad &a.id i t I practice a lot J w i l l d n well al l-:ia tlic panics-. M y d a d J5 going Lo practice wiih me lu<»i«]hl. 114 TtiLiht after dinner m y dud! is GUKLJ! Lu Luke rtie Lu l}m Slone Vf to buy some soccer times aJK) a soccer ball '1 hen wc wil I p k * 1 >iu on die grass by my seluwl. My dad will lieip mc 10 kick the l.vs bull A *p *pT t«*> fc „*>' *& «£> ."<>' t ^ •£ ^ *P~ S> .«p" *."£ & "p' A^*& .*p' <$ ^ xp" # .*p' ^ NWF Score Benchmark for kindergarten NWF (Score Category = 5). LA Final: 1 LA Final: 2 Cut Off Score Figure 6 ORF Score Benchmark for kindergarten LNF (Score Category = 5). 82 _ _ _ LA Final: 1 LA Final: 2 Cut Off Score PSF Score Figure 7 Benchmark for kindergarten PSF (Score Category = 5). 83