**Table 5. **Number of SAGE Classrooms by Type, Grade, and School Year
**Regular 2-Teacher**
**Team**
**Floating**
**Teacher**
**Shared**
**Space**
**Split**
**Day**
**3-Teacher**
**Team**
**96-97 97-98 96-97 97-98 96-97 97-98 96-97 97-98 96-97 97-98 96-97 97-98**
Kindergarten 50 89 24 22 3 2 2 4 0 0 1 0
Grade 1 61 84 18 23 7 2 8 8 2 0 0 1
Grade 2 NA 82 NA 21 NA 3 NA 6 NA 0 NA 1
**Data Collection Instruments**
To provide information about the processes and product of the SAGE program for 1996-
97 and 1997–98, a number of instruments were used as part of the evaluation.1 A description of
the test and non-test instruments used in 1996-97 and 1997-98 follows. The data collection
instruments and the plan for their use throughout the evaluation are displayed in Tables 6 and 7.
1. *Comprehensive Test of Basic Skills (CTBS)*. The Comprehensive Test of Basic Skills
(CTBS) complete Battery, Terra Nova edition, Level 10, was administered to first
grade students in SAGE schools and comparison schools in October 1996 and May
1997. In 1997-98, level 10 was administered in October and Level 11 in May to firstgrade
students and level 12 to second-grade students. The purpose of the first-grade
October administration of the CTBS was to obtain baseline measures of achievement
for SAGE schools and comparison schools. The complete battery includes sub-tests
1See the *Evaluation Design Plan for the Student Achievement Guarantee in Education (SAGE) Program*, August 13,
1996, for complete details.
15
in reading, language arts, and mathematics. The CTBS was chosen as an
achievement measure because it is derived from an Item Response Theory (IRT)
model that allows comparison of performance across time. Moreover, it is one of a
few instruments that attempts to minimize items biased against minorities and
educationally disadvantaged students. Kindergarten students were not tested because
of (1) concerns over the reliability and validity of standardized test results for
kindergarten-aged children and (2) the view expressed by many kindergarten teachers
that standardized tests would have a traumatizing effect on their students. The effects
of SAGE on kindergarten students will be determined when they are tested as firstgrade
students the following year.
**Table 6. **Cohort CTBS Testing by Grade Level 1996-01
**1996-97 1997-98 1998-99 1999-00 2000-01**
K K K K K
Cohort 1 Cohort 2 Cohort 3
1 (fall & spring) 1(fall & spring) 1(fall & spring) 1 1
2(spring) 2(spring) 2(spring) 2
3(spring) 3(spring) 3(spring)
2. *Student Profiles. *This instrument completed in October and May, provided
demographic and other data on each SAGE school and comparison school student.
3. *Classroom Organization Profile*. Completed in October, this instrument was used to
record how SAGE schools attained a 15:1 student-teacher ratio.
4. *Principal Interviews*. These end-of-year interviews elicited principals' descriptions
and perceptions of effects of their schools' rigorous curriculum, lighted-schoolhouse
activities, and staff development program, as well as an overall evaluation of the
SAGE program.
16
5. *Teacher Questionnaire*. Administered in May, this instrument obtained teachers'
descriptions and judgments of the effects of SAGE on teaching, curriculum, family
involvement, and professional development. It also was used to assess overall
satisfaction with SAGE.
6. *Teacher Activity Log*. This instrument required teachers to record classroom events
concerning time use, grouping, content, and student learning activities for a typical
day three times during the year.
7. *Student Participation Questionnaire*. In both October and May, teachers used this
instrument to assess each student's level of participation in classroom activities.
8. *Classroom Observations. *A group of first-grade and second-grade classrooms
representing the various types of 15:1 student-teacher ratios and a range of
geographic areas was selected for qualitative observations to provide descriptions of
classroom events.
9. *Teacher Interviews*. Although in-depth teacher interviews were not part of the
original SAGE evaluation design, they were added in 1997 because it became
apparent that teachers had important stories to tell about their SAGE classroom
experiences. The interviews dealt with teachers' perceptions of the effects of SAGE
on their teaching and on student learning.
17
**Table 7. **SAGE Non-Test Data Collection by Grade Level, 1996–01
**1996–97 1997–98 1998–99 1999-2000 2000-2001**
Student Participation
Questionnaire
Fall, Spring
K, 1 K, 1, 2 K, 1, 2, 3 K, 1, 2, 3 K, 1, 2, 3
Teacher Questionnaire
Spring
K, 1 K, 1, 2 K, 1, 2, 3 K, 1, 2, 3 K, 1, 2, 3
Teacher Log
Fall, Winter, Spring
K, 1 K, 1, 2
Classroom Observation
Fall, Spring
1
(Selected)
1, 2,
(Selected)
Teacher Interview
Spring
1
(Selected)
1, 2
(Selected)
Principal Interview
Spring
K, 1 K, 1, 2
School Case Study
Continuous
1, 2, 3
(Selected)
1, 2, 3
(Selected)
1, 2, 3
(Selected)
Principal Questionnaire
Spring
K, 1, 2, 3 K, 1, 2, 3 K, 1, 2, 3
18
ANALYSES OF STUDENT ACHIEVEMENT OUTCOMES 1997-98
**Methods Introduction**
Statistics Utilized
The 1997-98 SAGE evaluation design utilizes descriptive statistics and multivariate
inferential statistics, including linear regression and hierarchical linear modeling. Descriptive
statistics, including means and standard deviations, are incorporated into this report to provide a
less complicated, general analysis which the non-technical reader can use as a basis to interpret
the findings. Regression analyses (at the individual level), specifically the use of ordinary least
squares regression models, are employed frequently in this 1997-98 report. Regression models
enable “control” variables to be entered in blocks with the variable of interest, i.e. the
“SAGE/Comparison” variable entered last thus isolating its effects from the other variables.
Finally, hierarchical linear modeling is pertinent to the SAGE evaluation because this technique
focuses on the class effects of SAGE; that is, these analyses will specifically assess classroom
effects rather than those of individuals within the classroom. The classroom effects examined by
this approach are of primary importance to the SAGE evaluation.
The 1996-97 Report
In its 1996-97 evaluation, the SAGE evaluation team also utilized descriptive statistics
and multivariate analyses, including linear regression and hierarchical linear modeling.
However, there are two essential differences between the 1997-98 quantitative evaluation and the
1996-97 quantitative evaluation. First, the 1996-97 report included national percentile scores as
well as normal curve equivalent scores. National percentile scores are not reported in the 1997-
98 summary because the use of national percentile scores in regression analysis is potentially
misleading due to the non-equal interval nature of this scale. Instead, normal curve equivalents
are included in the descriptive sections of the current report to help clarify the analytical results.
19
Normal curve equivalents are not reported among the inferential analyses because the results of
such analyses would be redundant with those analyses utilizing the scale scores. Second,
sections of the 1996-97 report presented analyses based on the exclusion of the top scoring
quartile because the post-test given to 1996-97 first graders proved to be too easy, which in
essence created a test ceiling effect for top scoring students at this grade level. However, this
problem was corrected in the 1997-98 testing with an appropriate post-test level, and therefore
the inclusion of these analyses is not necessary (there was no ceiling effect).
General Findings 1996-97
Some general findings from 1996-97 quantitative analysis show that first-grade
classrooms in SAGE schools scored higher on the CTBS Complete Battery, Terra Nova Level 10
than first-grade students in comparison schools. As a group, when adjusted for pre-test scores,
SAGE students scored significantly higher on the post-test in the areas of reading, language arts,
and mathematics as well as total score. At the individual level of analysis, after controlling for
pre-test score, SES, attendance, and race, SAGE first-grade students scored statistically
significantly higher than comparison school students on the CTBS post-test in the areas of
language arts and mathematics as well as total score. At the class level of analysis, SAGE
classrooms scored significantly higher in language arts, mathematics, and reading as well as total
score after adjusting for individual pre-test results, SES, and attendance.
Score Metrics 1997-98
A brief discussion of the metrics reported in the 1997-98 SAGE evaluation is warranted.
The SAGE report presents the findings using two metrics, scaled scores and normal curve
equivalents. A scaled score provides a means for comparison across subjects or groups on a
specific task or trait. A scaled score provides a common yardstick by which scores may be
compared reasonably, subject to subject or group to group. The primary reason scaled scores are
20
used in the SAGE quantitative analysis is to anchor the scores from test level to test level (level
10, 11, etc.) so that year-to-year results can be compared.
When comparing the scores to those of other individuals (or groups) to obtain meaning,
we make a norm-referenced interpretation. Here the use of normal curve equivalents is useful.
A norm-referenced interpretation involves comparing a person’s score with those of some
relevant group of people. The normal curve equivalent scale ranges from 1 to 100 and thus
provides a comparative index of the performance of an individual or group to the reference
group. In this case, the reference group is the Terra Nova norm reference group (for norm
referencing population data see (CTB/McGraw-Hill, 1991). Normal curve equivalents are
generally not good indicators of longitudinal progress, however. With these scores, the group
average could remain at, for example 50, across pre-test and post-test with the reader erroneously
concluding that no gain was made. Actually, the focus group, in this example, did not “gain”
more than the reference group and thus the score remained constant.
Structure of 1997-98 Report
The descriptive analyses utilize both scale scores and normal curve equivalents. The
inferential analyses (regressions and hierarchical linear models) utilize only scale scores. For the
inferential tests, a significance level of .05 was used and significant results are denoted by an
asterisk (*). SAGE versus comparison analyses are divided into two major sections: (1) First-
Grade Results and (2) Second-Grade Results. The following are delineated within each of these
sections: (1) descriptive statistics (pre-test and post-test), (2) ordinary least squares regressions,
(3) analyses of the scores of African-American students, and (4) hierarchical linear modeling.
In addition, the quantitative section includes “within SAGE” analyses for first-grade
students. SAGE student achievement is examined in relation to teacher experience, student
participation, proximity to curriculum, and class organization.
21
**SAGE School/Classroom vs. Comparison School/Classroom Analyses**
First-Grade Results 1997-98
Descriptive Statistics
Valid Test Scores. The number of first-grade students for whom the valid test scores are
available is substantially less than the total number of students. There are four main explanations
for this. First, the evaluation team presented schools with the option of allowing EEN and ESL
students to take the test, even though the test may be inappropriate for these students. These
scores were invalidated based on a “Nonvalid/Missing Test Report,” developed by the evaluation
team and completed for all first grade classes. Second, given withdrawals and enrollments
during the school year, a number of students had valid pre-test scores, but no post-test scores and
vice versa. Third, some students took the reading and language arts components of the CTBS, or
the mathematics component, but not both. Consequently, total scores are unavailable for these
students. Finally, some of the students did not complete the pre-test, post-test, or both the preand
post-tests. The number of valid test scores for the 1997-98 school year are presented in
Table 8.
**Table 8. **Number of 1997-98 First-Grade Students with Valid Test Scores
**Fall 1997**
**Pre-Test**
**Spring 1998**
**Post-Test**
**Total SAGE Comparison Total SAGE Comparison**
Reading 2246 1383 863 Reading 2162 1318 844
Language Arts 2245 1383 862 Language Arts 2163 1319 844
Mathematics 2239 1382 857 Mathematics 2175 1334 841
Total 2211 1367 844 Total 2140 1310 829
Pre-Test (Baseline) Results. Table 9 provides descriptive statistics from the pre-test
(baseline) results. Both Scale Scores and Normal Curve Equivalents are presented. Given the
22
longitudinal nature of the SAGE evaluation, scale scores serve as the primary measure of student
achievement.
**Table 9. **Combined SAGE and Comparison Population Descriptive Statistics on CTBS PRETEST
Results for 1997-98 First-Grade Students
**SCALE SCORES NORMAL CURVE EQUIVALENT**
**MEAN STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
Reading 533.99 36.31 44.47 19.86
Language Arts 529.84 43.62 43.73 21.34
Mathematics 492.58 41.04 43.28 19.11
Total 519.20 34.59 43.31 19.11
Difference of Means Test. The results from difference of means tests between SAGE and
comparison student scale scores from the Fall 1997 CTBS Level 10 Pre-Test are reported in
Tables 10-13. Comparison school students scored slightly higher than SAGE school students on
the reading sub-test, mathematics sub-test, and total scale, and slightly lower on the language arts
sub-test. However, none of these differences is statistically significant at the .05 level. We fail
to reject the null hypothesis that there is no difference between SAGE and comparison school
students on the pre-test. As a result of SAGE and comparison students essentially being equal in
achievement at the beginning of the SAGE program, any differences in the post-test scores
benefiting SAGE students may be more assuredly attributed to the student-teacher ratio of 15:1
in the SAGE classroom.
**Table 10. **Differences of Means Test on Language CTBS FALL PRE-TEST for 1997-98 First-
Grade Students
**SCALE SCORES NORMAL CURVE EQUIVALENTS**
**N MEAN STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
**Comparison**
**Schools**
862 528.97 43.39 43.25 21.13
**SAGE**
**Schools**
1383 530.50 43.78 44.08 21.48
*Significant at .05 level
23
**Table 11. **Differences of Means Test on Reading CTBS FALL PRE-TEST for 1997-98 First-
Grade Students
**SCALE SCORES NORMAL CURVE EQUIVALENTS**
**N MEAN STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
**Comparison**
**Schools**
863 535.06 36.18 45.21 19.10
**SAGE**
**Schools**
1383 533.35 36.43 44.02 20.33
*Significant at .05 level
**Table 12. **Difference of Means Test on Mathematics CTBS FALL PRE-TEST for 1997-98 First-
Grade Students
**SCALE SCORES NORMAL CURVE EQUIVALENTS**
**N MEAN STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
**Comparison**
**Schools**
857 493.02 38.38 43.36 18.15
**SAGE**
**Schools**
1382 492.34 42.51 43.25 19.66
*Significant at .05 level
**Table 13. **Difference of Means Test on Total CTBS FALL PRE-TEST for 1997-98 First-Grade
Students
**SCALE SCORES NORMAL CURVE EQUIVALENTS**
**N MEAN STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
**Comparison**
**Schools**
844 519.51 33.35 43.47 18.34
**SAGE**
**Schools**
1367 519.06 35.34 43.25 19.56
*Significant at .05 level
As noted above, student populations varied in SAGE and comparison schools due to
withdrawals and within-year enrollments. The post-test results are based only on those firstgrade
students who remained in their schools for the entire 1997-98 school year. CTBS allows
for measurement of performance over time and therefore pre-test and post-test scores are
comparable from a measurement position. The CTBS Complete Battery, Terra Nova Level 10
24
was administered to first-grade students in the fall and the CTBS Complete Battery, Terra Nova
Level 11 was administered to first graders in the spring.
Results of the difference of means test between SAGE and comparison schools on the
CTBS Level 11 post-test are presented in Tables 14-17. Unlike the difference of means tests for
the CTBS Level 10 pre-test, which showed no statistically significant differences between SAGE
and comparison students, statistically significant differences are found in favor of SAGE
students for each sub-test, and for total scale scores on the post-test.
**Table 14. **Differences of Means Test on Language CTBS SPRING POST-TEST for 1997-98
First-Grade Students
**SCALE SCORES NORMAL CURVE EQUIVALENTS**
**N MEAN* STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
**Comparison**
**Schools**
844 573.98 46.84 50.07 21.53
**SAGE**
**Schools**
1319 586.02 45.33 55.78 21.17
*Significant at .05 level
**Table 15. **Differences of Means Test on Reading CTBS SPRING POST-TEST for 1997-98
First-Grade Students
**SCALE SCORES NORMAL CURVE EQUIVALENTS**
**N MEAN* STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
**Comparison**
**Schools**
844 570.80 45.52 47.81 21.87
**SAGE**
**Schools**
1318 580.33 41.33 52.50 20.77
*Significant at .05 level
**Table 16. **Differences of Means Test on Mathematics CTBS SPRING POST-TEST for 1997-98
First-Grade Students
**SCALE SCORES NORMAL CURVE EQUIVALENTS**
**N MEAN* STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
**Comparison**
**Schools**
841 525.14 42.53 45.21 19.90
**SAGE**
**Schools**
1334 538.63 40.09 51.72 19.24
*Significant at .05 level
25
**Table 17. **Difference of Means Test on Total CTBS SPRING POST-TEST for 1997-98 First-
Grade Students
**SCALE SCORES NORMAL CURVE EQUIVALENTS**
**N MEAN* STANDARD**
**DEVIATION**
**MEAN STANDARD**
**DEVIATION**
**Comparison**
**Schools**
829 556.87 38.83 47.54 21.01
**SAGE**
**Schools**
1310 568.63 36.66 53.91 20.17
*Significant at .05 level
The largest gain in SAGE student scores from pre-test to post-test, relative to
comparison school students, was on the mathematics sub-test shown in Table 18. The smallest
relative gain for SAGE students from pre-test to post-test was on the language arts sub-test.
**Table 18. **Change in Mean Score from PRE-TEST to POST-TEST for 1997-98 First-Grade
Students
**Scale Scores Normal Curve Equivalents**
**SAGE Gain Comparison**
**Gain**
**Gain**
**Difference**
**SAGE Gain Comparison**
**Gain**
**Gain**
**Difference**
Language
Arts
52.69 44.11 8.57* 10.33 6.40 3.93
Reading 45.32 34.99 10.33* 7.54 2.04 5.51
Mathematics 43.64 32.44 11.20* 7.30 1.91 5.39
Total 47.26 37.73 9.53* 9.36 4.11 5.25
*significant at .05 level
Regression Analysis
Regression Models. The effect of the SAGE program on student achievement, controlling
for other factors, was tested through a series of ordinary least squares regression models for each
sub-test and for total scale scores. Control variables were entered into the models in blocks, with
the SAGE/comparison student variable entered into the models last.
The first block of control variables included student score on the pre-test and school
attendance, measured as number of days absent, as reported by teachers in Spring 1998. The
second block of control variables included dummy variables for race/ethnicity, coded 1 if a
student was of a certain race/ethnicity, and 0 if not. Dummy variables were included for African
Americans and whites. A residual category, “other”, is included in the constant term in the
regression equations. Eligibility for subsidized lunch, as an indicator of family income, is also
26
included in the second block of control variables. This variable is coded 0 if student is ineligible,
1 if student is eligible for reduced price lunch, and 2 if the student is eligible for free lunch (this
variable is assumed to be interval level). In the final block, a dummy variable for SAGE or
comparison school student was entered on the third block. This variable is coded 0 if a student is
from a comparison school and 1 if a student is from a SAGE school.
Regression Results. Results of the regression analyses are presented in Tables 19-22.
For all analyses, membership in a SAGE school emerges as a significant predictor of student
achievement on the post-test, while controlling for pre-test scores, family income, school
attendance, and race/ethnicity. The magnitude of the effect of SAGE on student achievement, as
denoted by the “b” coefficient, varies depending on the CTBS sub-test.
The largest effects of SAGE are found on the on the language sub-test, while the smallest
effects of SAGE are found on the reading sub-test. When all cases are analyzed the goodness-offit
of the models (as denoted by the adjusted R square statistic), ranges from .270 (reading subscale
score) to .550 (total scale score). This means that when predicting the reading score and
total score, the variables included in the model explain 27% and 55% of the variance
respectively. Most of the variance in the post-test scores is, of course, explained by the pre-test
scores.
Explained Variance in Achievement Scores. Attendance (as represented by “days
absent”) emerges as a consistent and statistically significant predictor of performance on all subtests
and total scale score. “Family Income” and “Race” show some relatively large effects (as
denoted by the b coefficients), but the effects are highly variable and are only sometimes
statistically significant (race is discussed further below). Membership in SAGE schools has a
consistently positive, statistically significant effect on achievement on the CTBS.
27
**Share with your friends:** |