|FIRST YEAR RESULTS OF
THE STUDENT ACHIEVEMENT GUARANTEE
IN EDUCATION PROGRAM
Submitted by the SAGE Evaluation Team
Center for Urban Initiatives and Research
University of Wisconsin - Milwaukee
For further information contact Alex Molnar, Center for Urban Initiatives and Research, University of Wisconsin - Milwaukee, P.O. Box 413, Milwaukee, WI 53201, (414)229-5916
The Student Achievement Guarantee in Education (SAGE) evaluation is being conducted under contract to the Department of Public Instruction by the Center for Urban Initiatives and Research (CUIR) at the University of Wisconsin - Milwaukee. This is the first of five annual evaluation reports.
The purpose of the SAGE evaluation is to determine the effectiveness of the Student Achievement Guarantee in Education (SAGE) program in promoting academic achievement of students in grades K-3 in schools serving low-income children. The SAGE program was enacted by the Wisconsin legislature in 1995, with implementation in kindergarten and first grade beginning in the 1996-1997 school year. The SAGE statute [s. 118.43] requires participating schools to
(a) reduce class size to 15 in grades kindergarten and one in 1996-97, grades kindergarten to two in 1997-98, and grades kindergarten through three in 1998-99 to 2000-2001;
(b) keep the schools open from early in the morning to late in the day and collaborate with community organizations to provide educational, recreational, community, and social services (i.e., the “lighted schoolhouse”);
(c) provide a rigorous academic curriculum to improve academic achievement; and (d) establish staff development and accountability mechanisms.
During 1996-97, the SAGE program was implemented in 30 schools located in 21 school districts throughout the state, as shown in Table 1. Over the course of the year it involved 3,614 students and 220 teachers in 190 kindergarten and first grade classrooms. The gender, race, and other characteristics of students in participating schools are displayed in Table 2.
Schools reduced class size in several ways. The SAGE legislation defines class size as "the number of pupils assigned to a regular classroom teacher." In practice, reduced class size has been interpreted as a 15:1 student-teacher ratio, implemented in the following ways:
A Regular classroom refers to a classroom with 1 teacher. Most regular classrooms have 15 or fewer students, but a few exceed 15.
A 2 Teacher Team classroom is a class where two teachers work collaboratively to teach as many as 30 students.
A Shared Space classroom is a classroom that has been fitted with a temporary wall which creates two teaching spaces, each with 1 teacher and about 15 students.
A Floating Teacher classroom is a room consisting of 1 teacher and about 30 students, except during reading, language arts and mathematics instruction when another teacher joins the class to reduce the ratio to 15:1.
Two other types of classroom organization were also utilized in the SAGE program, but to a limited extent. They are the Split Day classroom consisting of 15 students and 2 teachers, one who teaches in the morning and one who teaches in the afternoon, and the 3 Teacher Team classroom where there are 37 students taught collaboratively by 3 teachers.
The types of classrooms and the enrollments in each are displayed in Table 3. In sum, SAGE classes range in number of students from 9 to 37. A few SAGE classrooms exceed the 15:1 student-teacher ratio, but only by a few students.
The average SAGE classroom contains 17.4 students.
The SAGE Evaluation
The SAGE evaluation plan for 1996-97 follows. Described are the purpose, design, instrumentation, and data collection plan.
The main purpose of the SAGE evaluation is to determine if the SAGE program of 15:1 student-teacher ratios, rigorous curriculum, lighted schoolhouse, and staff development is of benefit to students in promoting academic achievement. The main questions that guided the evaluation effort for 1996-97 are the following:
1. What differences exist in student achievement between SAGE schools and comparison schools?
2. How was each of the four SAGE elements implemented?
a) 15:1 student-teacher ratio (type of classroom, teaching methods, student behavior)
b) Rigorous curriculum (congruence with national standards)
c) Lighted-schoolhouse (type and extent of before and after school programs)
d) Staff development (type and extent of program)
The first question focuses on the product of the SAGE program, i.e., student achievement. The second question focuses on the process of the SAGE program, i.e., what happened in SAGE classrooms and schools that may, over time, help explain achievement variations and suggest future actions for teachers and administrators seeking to enhance student performance.
A two-part formative evaluation is used to determine the effectiveness of SAGE. The first part focuses on reduced student-teacher ratio, the main variable of the SAGE evaluation, through a quasi-experimental, comparative change design. The comparative change design utilizes a treatment group (30 SAGE schools) which has implemented the 15:1 student-teacher ratio and a comparison group (16 non-SAGE schools) that is as identical as possible to the treatment group except for reduced student-teacher ratio. Changes in achievement over time, as measured by a standardized achievement pre-test (baseline) and repeated standardized achievement post-tests, are compared between the groups.
To carry out this design 16 comparison schools were identified. The selection of
comparison schools was constrained by practical considerations. Originally, the evaluation research design called for "matched pairs;" that is, one comparison school for each SAGE school. However, because of limited incentives to encourage potential comparison schools to participate in the evaluation the matched pairs design was changed to a "matched group" design, which compares SAGE schools as a group to comparison schools as a group. Furthermore, the evaluation team intended to draw comparison schools from among all elementary schools in the state, but the lack of incentives foreclosed this strategy as well. Instead, comparison schools were selected from school districts participating in the SAGE program, for whom cooperation with the evaluation was a condition of participating in the program. Reliance on SAGE districts for comparison schools resulted in underrepresentation of rural schools in the comparison group, since rural districts have limited numbers of elementary schools from which to choose. Moreover, some of the rural schools in the comparison group have class sizes only marginally above the 15:1 ratio required in SAGE schools.
The specific method of identifying schools for the comparison group was to minimize Squared Euclidean Distance1 between the following variables (Z-scored) for each school: percent scoring above standard on the Wisconsin Third Grade Reading Test; percent Asian, Native
American, African American, and Hispanic; percent low income; and, total enrollment in grades K-3. Squared Euclidean Distances were computed for all SAGE schools and schools within SAGE districts. The first step was to check similarities among participating SAGE schools; a relatively homogeneous group of SAGE schools requires only a single matching school. Several relatively homogeneous groups of varying sizes were identified. The "best" matches were determined first by one non-SAGE match per SAGE group, and second, by pairwise matches for SAGE schools that did not fit a group. “Best” was defined as a combination of quantitative, research design, and practical considerations. Because some SAGE schools do not resemble any other schools in SAGE districts, particularly on racial composition. Squared Euclidean Distances were recomputed, omitting the variable rendering schools so dissimilar, or substituting the variable percent white in place of all other racial variables.
Difference of means tests between SAGE schools, as a group, and comparison schools, as a group, showed no statistically significant differences on any of the variables at the .05 level.2 Similarly, difference of means tests between SAGE schools, as a group, and comparison schools, as a group, from student demographic data collected by the SAGE Evaluation Team showed no statistically significant differences on any of the variables at the .05 level. However, when these data were subjected to tests at the individual-level of analysis, as shown in the composite profile of SAGE and comparison schools in Table 4, the large increase in N yielded several statistically significant differences.
The largest difference is the percentage of Native Americans in the SAGE versus the comparison group. The over representation of African Americans in the comparison group reflects the high proportion of Milwaukee schools in the comparison group, nearly 44 percent of all comparison group schools. Comparison group students are somewhat better off economically than SAGE students. Further, the comparison group has fewer Exceptional Education Needs
(EEN) students and fewer English as a Second Language (ESL) students.
During the course of the 1996-97 school year records were compiled on 5613 students. Many students withdrew from SAGE and comparison schools during the year, while others enrolled. Those students who remained in their schools for the entire year are labeled "ongoing." As Table 5 shows, enrollment in comparison schools was slightly more stable than in SAGE schools. Moreover, in both SAGE and comparison schools, the number of students withdrawing exceeded the number of students enrolling during the year. Thus the number of ongoing plus newly enrolled students recorded during spring data collection totals 5038, distributed across schools and grades as shown in Table 6.
In addition to the comparative change design, the nature of the four SAGE program elements is being examined to explain variation in achievement among SAGE schools, classes, and students. This is accomplished through a repeated measures, reflexive controls design. SAGE schools, classes, and students are compared to themselves over time, as measured at the beginning of the treatment through an achievement pretest and other baseline measures and after the treatment begins through repeated achievement tests and other indicators.
Data Collection Instruments
To provide information about the process and product of the SAGE program for 1996-97, a number of instruments were created and administered as part of the evaluation3. These instruments are the following:
1. Student Profiles. This instrument, completed in October and May, provided demographic and other data on each SAGE school and comparison school student.
2. Classroom Organization Profile. Completed in October, this instrument was used to record how SAGE schools attained a 15:1 student-teacher ratio.
3. Principal Interviews. These end-of-year interviews elicited principals’ descriptions and perceptions of effects of their schools’ rigorous curriculum, lighted-schoolhouse activities, and staff development program, as well as an overall evaluation of the SAGE program.
4. Teacher Questionnaire. Administered in May, this instrument obtained teachers’ descriptions and judgments of the effects of SAGE on teaching, curriculum, family involvement, and professional development. It also was used to assess overall satisfaction with SAGE.
5. Teacher Activity Log. This instrument, administered in October, January, and May, required teachers to record classroom events concerning time use, grouping, content, and student learning activities for a typical day.
6. Student Participation Questionnaire. In both October and May teachers used this instrument to assess each students’ level of participation in classroom activities.
7. Classroom Observations. A group of first-grade classrooms representing the various types of 15:1 student-teacher ratios and a range of geographic areas was selected for qualitative observations in October and May to provide descriptions of classroom events.
8. Teacher Interviews. Although in-depth teacher interviews were not part of the original SAGE Evaluation Design, they were added because it became apparent that teachers had important stories to tell about their SAGE classroom experiences. The interviews, held in August 13, 1996, for complete details.
May, dealt with teachers’ perceptions of the effects of SAGE on their teaching and on student learning. The observed teachers served as the interview sample.
9. Comprehensive Test of Basic Skills (CTBS). The Comprehensive Test of Basic Skills (CTBS) Complete Battery, Terra Nova edition, Level 10 was administered to first-grade students in the 30 SAGE schools and the 16 comparison schools in October, 1996 and May, 1997. The purpose of the October administration of the CTBS was to obtain baseline measures of achievement for SAGE schools and comparison schools. The complete battery includes sub-tests in reading, language arts and mathematics. The CTBS was chosen as an achievement measure because it is derived from an Item Response Theory (IRT) model which allows comparison of performance across time. Moreover, it is one of a few instruments that attempts to minimize items biased against minorities and educationally disadvantaged students. Kindergarten students were not tested because of
concerns over the reliability and validity of standardized test results for kindergarten-aged children, and
b) the view expressed by many kindergarten teachers that standardized tests would have a traumatizing effect on their students. The effects of SAGE on kindergarten students will be determined when they are tested as first grade students the following year.
The methods of data collection by type of school and grade are listed in Table 7.
The instruments identified in the SAGE Evaluation Design that are not used in this report, or used in only a limited way, were the Baseline Data Questionnaire, School Implementation Plan, Teacher Profile, and the Teacher Development Plan. These instruments and program requirements were either not completed, as in the case of the Teacher Development Plan, or were only useful in part, as in the case of the Baseline Data Questionnaire.
The remainder of this report provides the results of the evaluation of the 1996-97 SAGE program. Part II presents data about the effects of SAGE on student achievement in reading, language arts and mathematics. Part III describes what went on in SAGE classrooms. Part IV addresses rigorous curriculum, staff development and lighted schoolhouse programs.
II. EFFECTS OF THE SAGE PROGRAM ON STUDENT ACHIEVEMENT
The effects of the SAGE program on student achievement were evaluated by several methods. Analyses were conducted at both the individual-level and class-level of analysis. SAGE effects were assessed with both bivariate and multivariate statistical tests. Results are reported first for individual-level analyses, then for class-level analyses.
The number of first grade students for whom valid test scores are available is substantially less than the total number of students. First, the evaluation team presented schools with the option of allowing EEN and ESL students for whom the test may be inappropriate to take the test anyway. These scores were invalidated based on a "Nonvalid/Missing Test Report," developed by the evaluation team and completed by all first grade classes. Second, given withdrawals and enrollments during the school year, a number of students had valid pre-test scores, but no post-test scores, and vice versa. Third, some students took the reading and language arts components of the CTBS, or the mathematics component, but not both. Total scores are unavailable for these students. Finally, some students were absent for all of the pre-test, the post-test, or both. The number of valid test scores for the 1996-97 school year is presented in Table 8.
Pre-Test (Baseline) Results of Standardized Testing
Table 9 provides descriptive statistics on the scale scores from the pre-test, or baseline results. Scale scores can be used to measure student performance across all grade levels. Given the longitudinal nature of the SAGE evaluation, scale scores will serve as the primary measure of student achievement.4 To place the pre-test scale scores in context, national percentiles are also provided in Table 9. For example, the mean or average total scale score of 517.07 corresponds to a national percentile rank of 38.90. That is, the average first grade student in the SAGE evaluation scored as well on the CTBS as about 39 percent of students taking the test nationwide.
Since the SAGE program was created in response to lower levels of achievement among low income students, this subaverage (below 50th percentile) performance on the baseline CTBS was expected.
The results from difference of means tests between SAGE and comparison student scale scores from the October CTBS are reported in Tables 10 through 13. Comparison school students scored slightly higher than SAGE school students on reading, language arts, and total score, and slightly lower in mathematics. However, none of the differences is statistically significant at the .05 threshold; we fail to reject the null hypothesis of no difference between SAGE and comparison school students on the pre-test. Since SAGE and comparison students are virtually equal in achievement at the beginning of the SAGE program, any subsequent differences in achievement tests that favor SAGE students may be more confidently attributed to the student-teacher ratio of 15:1 in the SAGE program.
Post-Test Results of Standardized Testing
As noted above, student populations varied in SAGE and comparison schools due to withdrawals and within-year enrollments. The post-test results are based only on those students who remained in their schools for the entire 1996-1997 school year (88.2 percent of SAGE students and 87.0 percent of comparison school students took both pre- and post-tests).
Although the CTBS allows measurement of performance over time, with younger children different test levels can result in content-related invalidation. For example, when one attempts to compare students on level 10 and level 11 of the CTBS, the scores are comparable from a measurement point of view, but the contents of the two tests are not totally congruent. For this reason level 10 was used a both a pre-test and post-test measure. However, as a consequence of the decision to administer level 10 of the CTBS for both pre-test and post-test, a substantial number of students achieved perfect scores on the sub-tests of the CTBS.
Perfect scores introduce an element of uncertainty into comparative analyses; once a student reaches an achievement ceiling the extent to which a student might have achieved is unknown. This "restricted range" issue is, on balance, more problematic than the issue of content-related validity. Therefore, beginning in 1997-98 level 10 of the CTBS will be administered to first grade students in the fall, and level 11 will be administered in the spring.
As shown in Table 14, perfect scores are particularly prevalent in the Language Arts and Reading sub-tests. The ceiling effect on the CTBS Language Arts sub-test is portrayed graphically in figure 1. A perfect score in language arts equals 620, the point on the graph where a straight line appears on the post-test and, to a far lesser extent, on the pre-test.
Figure 1. The Ceiling Effect on the CTBS Language Arts Sub-Test
200 300 400 500 600 700
Language Arts Pre-Test
Language Arts Post-Test
As Table 14 shows, students in SAGE schools disproportionately achieved perfect scores.
Thus estimates of the effect of the SAGE program on student achievement are likely to be conservative. One approach to mitigating the ceiling effect is to conduct analyses first with all cases, then truncating the sample and repeating analyses with those cases performing at or below the 75th percentile on the pre-test. Students who achieved perfect scores on the post-test are predominantly those who scored in the top quartile on the pre-test, and whose change scores from pre-test to post-test are restricted. In language arts, for example, the average change in score from pre-test to post-test is 49.67. However, the average change score for students who scored in the top quartile on the pre-test is 22.59, whereas the average change score for the other 75 percent of students is 58.73. Thus the statistical tests of the effects of the SAGE program that follow are presented first for all cases, then for those cases scoring through the 75th percentile on pre-tests.
Difference of Means Tests
Tables 15 through 22 show the results of difference of means tests for each of the CTBS sub-tests and total scores. Unlike the difference of means tests for the CTBS pre-test, which showed no statistically significant differences between SAGE and comparison school students, statistically significant differences are found in favor of SAGE students for each sub-test, and for total scale scores on the post-test. These statistically significant differences are observed whether all students are analyzed, or the top scoring quartiles on the pre-test are excluded.
The largest difference in means is found on the mathematics sub-test, followed by language arts, and then reading. The largest gain in SAGE student scores from pre-test to post-test, relative to comparison school students, was in language arts, as shown in Table 23. The smallest relative gain for SAGE students from pre-test to post-test was on the reading sub-test. Finally, the expectation that observed differences between all SAGE and comparison school students would be understated due to a ceiling effect was not met in all of the bivariate analyses. When the top scoring quartile on pre-tests were withheld from analyses, the differences between SAGE and comparison school students on the language arts and mathematics sub-tests were actually smaller than when all students were included. To reiterate, however, comparison school students are better off than SAGE students in terms of family income and potential impediments to educational achievement. It is therefore necessary to statistically control for some of these differences through multivariate analyses.
The effect of the SAGE program on student achievement, controlling for other factors, was tested through a series of ordinary least squares regression models for each sub-test and for total scale scores. Control variables were entered into the models in blocks, with the SAGE/comparison student variable entered into the models last.
The first block of control variables included 1) student score on the pre-test; 2) eligibility for subsidized lunch as an indicator of family income, coded 0 if student is ineligible, 1 if student is eligible for reduced price lunch, and 2 if the student is eligible for free lunch (this variable is assumed to be interval level); and school attendance, measured as number of days absent, as reported by teachers in May of 1997.
The second block of control variables included dummy variables for race/ethnicity, coded 1 if a student was of a certain race/ethnicity, and 0 if not. Dummy variables were included for African American, Asian, Hispanic, Native American, and White. A residual category, Other, is included in the constant term in the regression equations.
Finally, a dummy variable for SAGE or comparison school student was entered on the third block. This variable is coded 1 if a student is from a SAGE school, 0 if a student is from a comparison school.
Some limitations of the data should be noted here. First, some of the racial/ethnic variables create complications. The variable "Asian" is a gross indicator which fails to distinguish among various Asian sub-groups. For example, we are unable to distinguish Hmong students, who tend to be more disadvantaged, from other Asian sub-groups. Native Americans are only minimally represented among comparison school first grade classes (as few as 8 in one analysis). And many Hispanic students are limited in English proficiency and did not take the CTBS (including one entire first grade class in a Milwaukee Public School). Similarly, many exceptional education students did not take the CTBS, or completed the test but had their scores invalidated. Whether a particular student took the CTBS, or had his or her scores invalidated, was left to the discretion of the teacher. Thus variables for exceptional education needs and limited English proficiency were not included in the regression models.