Table 5. Number of SAGE Classrooms by Type, Grade, and School Year
Regular 2Teacher
Team
Floating
Teacher
Shared
Space
Split
Day
3Teacher
Team
9697 9798 9697 9798 9697 9798 9697 9798 9697 9798 9697 9798
Kindergarten 50 89 24 22 3 2 2 4 0 0 1 0
Grade 1 61 84 18 23 7 2 8 8 2 0 0 1
Grade 2 NA 82 NA 21 NA 3 NA 6 NA 0 NA 1
Data Collection Instruments
To provide information about the processes and product of the SAGE program for 1996
97 and 1997–98, a number of instruments were used as part of the evaluation.1 A description of
the test and nontest instruments used in 199697 and 199798 follows. The data collection
instruments and the plan for their use throughout the evaluation are displayed in Tables 6 and 7.
1. Comprehensive Test of Basic Skills (CTBS). The Comprehensive Test of Basic Skills
(CTBS) complete Battery, Terra Nova edition, Level 10, was administered to first
grade students in SAGE schools and comparison schools in October 1996 and May
1997. In 199798, level 10 was administered in October and Level 11 in May to firstgrade
students and level 12 to secondgrade students. The purpose of the firstgrade
October administration of the CTBS was to obtain baseline measures of achievement
for SAGE schools and comparison schools. The complete battery includes subtests
1See the Evaluation Design Plan for the Student Achievement Guarantee in Education (SAGE) Program, August 13,
1996, for complete details.
15
in reading, language arts, and mathematics. The CTBS was chosen as an
achievement measure because it is derived from an Item Response Theory (IRT)
model that allows comparison of performance across time. Moreover, it is one of a
few instruments that attempts to minimize items biased against minorities and
educationally disadvantaged students. Kindergarten students were not tested because
of (1) concerns over the reliability and validity of standardized test results for
kindergartenaged children and (2) the view expressed by many kindergarten teachers
that standardized tests would have a traumatizing effect on their students. The effects
of SAGE on kindergarten students will be determined when they are tested as firstgrade
students the following year.
Table 6. Cohort CTBS Testing by Grade Level 199601
199697 199798 199899 199900 200001
K K K K K
Cohort 1 Cohort 2 Cohort 3
1 (fall & spring) 1(fall & spring) 1(fall & spring) 1 1
2(spring) 2(spring) 2(spring) 2
3(spring) 3(spring) 3(spring)
2. Student Profiles. This instrument completed in October and May, provided
demographic and other data on each SAGE school and comparison school student.
3. Classroom Organization Profile. Completed in October, this instrument was used to
record how SAGE schools attained a 15:1 studentteacher ratio.
4. Principal Interviews. These endofyear interviews elicited principals' descriptions
and perceptions of effects of their schools' rigorous curriculum, lightedschoolhouse
activities, and staff development program, as well as an overall evaluation of the
SAGE program.
16
5. Teacher Questionnaire. Administered in May, this instrument obtained teachers'
descriptions and judgments of the effects of SAGE on teaching, curriculum, family
involvement, and professional development. It also was used to assess overall
satisfaction with SAGE.
6. Teacher Activity Log. This instrument required teachers to record classroom events
concerning time use, grouping, content, and student learning activities for a typical
day three times during the year.
7. Student Participation Questionnaire. In both October and May, teachers used this
instrument to assess each student's level of participation in classroom activities.
8. Classroom Observations. A group of firstgrade and secondgrade classrooms
representing the various types of 15:1 studentteacher ratios and a range of
geographic areas was selected for qualitative observations to provide descriptions of
classroom events.
9. Teacher Interviews. Although indepth teacher interviews were not part of the
original SAGE evaluation design, they were added in 1997 because it became
apparent that teachers had important stories to tell about their SAGE classroom
experiences. The interviews dealt with teachers' perceptions of the effects of SAGE
on their teaching and on student learning.
17
Table 7. SAGE NonTest Data Collection by Grade Level, 1996–01
1996–97 1997–98 1998–99 19992000 20002001
Student Participation
Questionnaire
Fall, Spring
K, 1 K, 1, 2 K, 1, 2, 3 K, 1, 2, 3 K, 1, 2, 3
Teacher Questionnaire
Spring
K, 1 K, 1, 2 K, 1, 2, 3 K, 1, 2, 3 K, 1, 2, 3
Teacher Log
Fall, Winter, Spring
K, 1 K, 1, 2
Classroom Observation
Fall, Spring
1
(Selected)
1, 2,
(Selected)
Teacher Interview
Spring
1
(Selected)
1, 2
(Selected)
Principal Interview
Spring
K, 1 K, 1, 2
School Case Study
Continuous
1, 2, 3
(Selected)
1, 2, 3
(Selected)
1, 2, 3
(Selected)
Principal Questionnaire
Spring
K, 1, 2, 3 K, 1, 2, 3 K, 1, 2, 3
18
ANALYSES OF STUDENT ACHIEVEMENT OUTCOMES 199798
Methods Introduction
Statistics Utilized
The 199798 SAGE evaluation design utilizes descriptive statistics and multivariate
inferential statistics, including linear regression and hierarchical linear modeling. Descriptive
statistics, including means and standard deviations, are incorporated into this report to provide a
less complicated, general analysis which the nontechnical reader can use as a basis to interpret
the findings. Regression analyses (at the individual level), specifically the use of ordinary least
squares regression models, are employed frequently in this 199798 report. Regression models
enable “control” variables to be entered in blocks with the variable of interest, i.e. the
“SAGE/Comparison” variable entered last thus isolating its effects from the other variables.
Finally, hierarchical linear modeling is pertinent to the SAGE evaluation because this technique
focuses on the class effects of SAGE; that is, these analyses will specifically assess classroom
effects rather than those of individuals within the classroom. The classroom effects examined by
this approach are of primary importance to the SAGE evaluation.
The 199697 Report
In its 199697 evaluation, the SAGE evaluation team also utilized descriptive statistics
and multivariate analyses, including linear regression and hierarchical linear modeling.
However, there are two essential differences between the 199798 quantitative evaluation and the
199697 quantitative evaluation. First, the 199697 report included national percentile scores as
well as normal curve equivalent scores. National percentile scores are not reported in the 1997
98 summary because the use of national percentile scores in regression analysis is potentially
misleading due to the nonequal interval nature of this scale. Instead, normal curve equivalents
are included in the descriptive sections of the current report to help clarify the analytical results.
19
Normal curve equivalents are not reported among the inferential analyses because the results of
such analyses would be redundant with those analyses utilizing the scale scores. Second,
sections of the 199697 report presented analyses based on the exclusion of the top scoring
quartile because the posttest given to 199697 first graders proved to be too easy, which in
essence created a test ceiling effect for top scoring students at this grade level. However, this
problem was corrected in the 199798 testing with an appropriate posttest level, and therefore
the inclusion of these analyses is not necessary (there was no ceiling effect).
General Findings 199697
Some general findings from 199697 quantitative analysis show that firstgrade
classrooms in SAGE schools scored higher on the CTBS Complete Battery, Terra Nova Level 10
than firstgrade students in comparison schools. As a group, when adjusted for pretest scores,
SAGE students scored significantly higher on the posttest in the areas of reading, language arts,
and mathematics as well as total score. At the individual level of analysis, after controlling for
pretest score, SES, attendance, and race, SAGE firstgrade students scored statistically
significantly higher than comparison school students on the CTBS posttest in the areas of
language arts and mathematics as well as total score. At the class level of analysis, SAGE
classrooms scored significantly higher in language arts, mathematics, and reading as well as total
score after adjusting for individual pretest results, SES, and attendance.
Score Metrics 199798
A brief discussion of the metrics reported in the 199798 SAGE evaluation is warranted.
The SAGE report presents the findings using two metrics, scaled scores and normal curve
equivalents. A scaled score provides a means for comparison across subjects or groups on a
specific task or trait. A scaled score provides a common yardstick by which scores may be
compared reasonably, subject to subject or group to group. The primary reason scaled scores are
20
used in the SAGE quantitative analysis is to anchor the scores from test level to test level (level
10, 11, etc.) so that yeartoyear results can be compared.
When comparing the scores to those of other individuals (or groups) to obtain meaning,
we make a normreferenced interpretation. Here the use of normal curve equivalents is useful.
A normreferenced interpretation involves comparing a person’s score with those of some
relevant group of people. The normal curve equivalent scale ranges from 1 to 100 and thus
provides a comparative index of the performance of an individual or group to the reference
group. In this case, the reference group is the Terra Nova norm reference group (for norm
referencing population data see (CTB/McGrawHill, 1991). Normal curve equivalents are
generally not good indicators of longitudinal progress, however. With these scores, the group
average could remain at, for example 50, across pretest and posttest with the reader erroneously
concluding that no gain was made. Actually, the focus group, in this example, did not “gain”
more than the reference group and thus the score remained constant.
Structure of 199798 Report
The descriptive analyses utilize both scale scores and normal curve equivalents. The
inferential analyses (regressions and hierarchical linear models) utilize only scale scores. For the
inferential tests, a significance level of .05 was used and significant results are denoted by an
asterisk (*). SAGE versus comparison analyses are divided into two major sections: (1) First
Grade Results and (2) SecondGrade Results. The following are delineated within each of these
sections: (1) descriptive statistics (pretest and posttest), (2) ordinary least squares regressions,
(3) analyses of the scores of AfricanAmerican students, and (4) hierarchical linear modeling.
In addition, the quantitative section includes “within SAGE” analyses for firstgrade
students. SAGE student achievement is examined in relation to teacher experience, student
participation, proximity to curriculum, and class organization.
21
SAGE School/Classroom vs. Comparison School/Classroom Analyses
FirstGrade Results 199798
Descriptive Statistics
Valid Test Scores. The number of firstgrade students for whom the valid test scores are
available is substantially less than the total number of students. There are four main explanations
for this. First, the evaluation team presented schools with the option of allowing EEN and ESL
students to take the test, even though the test may be inappropriate for these students. These
scores were invalidated based on a “Nonvalid/Missing Test Report,” developed by the evaluation
team and completed for all first grade classes. Second, given withdrawals and enrollments
during the school year, a number of students had valid pretest scores, but no posttest scores and
vice versa. Third, some students took the reading and language arts components of the CTBS, or
the mathematics component, but not both. Consequently, total scores are unavailable for these
students. Finally, some of the students did not complete the pretest, posttest, or both the preand
posttests. The number of valid test scores for the 199798 school year are presented in
Table 8.
Table 8. Number of 199798 FirstGrade Students with Valid Test Scores
Fall 1997
PreTest
Spring 1998
PostTest
Total SAGE Comparison Total SAGE Comparison
Reading 2246 1383 863 Reading 2162 1318 844
Language Arts 2245 1383 862 Language Arts 2163 1319 844
Mathematics 2239 1382 857 Mathematics 2175 1334 841
Total 2211 1367 844 Total 2140 1310 829
PreTest (Baseline) Results. Table 9 provides descriptive statistics from the pretest
(baseline) results. Both Scale Scores and Normal Curve Equivalents are presented. Given the
22
longitudinal nature of the SAGE evaluation, scale scores serve as the primary measure of student
achievement.
Table 9. Combined SAGE and Comparison Population Descriptive Statistics on CTBS PRETEST
Results for 199798 FirstGrade Students
SCALE SCORES NORMAL CURVE EQUIVALENT
MEAN STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Reading 533.99 36.31 44.47 19.86
Language Arts 529.84 43.62 43.73 21.34
Mathematics 492.58 41.04 43.28 19.11
Total 519.20 34.59 43.31 19.11
Difference of Means Test. The results from difference of means tests between SAGE and
comparison student scale scores from the Fall 1997 CTBS Level 10 PreTest are reported in
Tables 1013. Comparison school students scored slightly higher than SAGE school students on
the reading subtest, mathematics subtest, and total scale, and slightly lower on the language arts
subtest. However, none of these differences is statistically significant at the .05 level. We fail
to reject the null hypothesis that there is no difference between SAGE and comparison school
students on the pretest. As a result of SAGE and comparison students essentially being equal in
achievement at the beginning of the SAGE program, any differences in the posttest scores
benefiting SAGE students may be more assuredly attributed to the studentteacher ratio of 15:1
in the SAGE classroom.
Table 10. Differences of Means Test on Language CTBS FALL PRETEST for 199798 First
Grade Students
SCALE SCORES NORMAL CURVE EQUIVALENTS
N MEAN STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Comparison
Schools
862 528.97 43.39 43.25 21.13
SAGE
Schools
1383 530.50 43.78 44.08 21.48
*Significant at .05 level
23
Table 11. Differences of Means Test on Reading CTBS FALL PRETEST for 199798 First
Grade Students
SCALE SCORES NORMAL CURVE EQUIVALENTS
N MEAN STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Comparison
Schools
863 535.06 36.18 45.21 19.10
SAGE
Schools
1383 533.35 36.43 44.02 20.33
*Significant at .05 level
Table 12. Difference of Means Test on Mathematics CTBS FALL PRETEST for 199798 First
Grade Students
SCALE SCORES NORMAL CURVE EQUIVALENTS
N MEAN STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Comparison
Schools
857 493.02 38.38 43.36 18.15
SAGE
Schools
1382 492.34 42.51 43.25 19.66
*Significant at .05 level
Table 13. Difference of Means Test on Total CTBS FALL PRETEST for 199798 FirstGrade
Students
SCALE SCORES NORMAL CURVE EQUIVALENTS
N MEAN STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Comparison
Schools
844 519.51 33.35 43.47 18.34
SAGE
Schools
1367 519.06 35.34 43.25 19.56
*Significant at .05 level
As noted above, student populations varied in SAGE and comparison schools due to
withdrawals and withinyear enrollments. The posttest results are based only on those firstgrade
students who remained in their schools for the entire 199798 school year. CTBS allows
for measurement of performance over time and therefore pretest and posttest scores are
comparable from a measurement position. The CTBS Complete Battery, Terra Nova Level 10
24
was administered to firstgrade students in the fall and the CTBS Complete Battery, Terra Nova
Level 11 was administered to first graders in the spring.
Results of the difference of means test between SAGE and comparison schools on the
CTBS Level 11 posttest are presented in Tables 1417. Unlike the difference of means tests for
the CTBS Level 10 pretest, which showed no statistically significant differences between SAGE
and comparison students, statistically significant differences are found in favor of SAGE
students for each subtest, and for total scale scores on the posttest.
Table 14. Differences of Means Test on Language CTBS SPRING POSTTEST for 199798
FirstGrade Students
SCALE SCORES NORMAL CURVE EQUIVALENTS
N MEAN* STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Comparison
Schools
844 573.98 46.84 50.07 21.53
SAGE
Schools
1319 586.02 45.33 55.78 21.17
*Significant at .05 level
Table 15. Differences of Means Test on Reading CTBS SPRING POSTTEST for 199798
FirstGrade Students
SCALE SCORES NORMAL CURVE EQUIVALENTS
N MEAN* STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Comparison
Schools
844 570.80 45.52 47.81 21.87
SAGE
Schools
1318 580.33 41.33 52.50 20.77
*Significant at .05 level
Table 16. Differences of Means Test on Mathematics CTBS SPRING POSTTEST for 199798
FirstGrade Students
SCALE SCORES NORMAL CURVE EQUIVALENTS
N MEAN* STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Comparison
Schools
841 525.14 42.53 45.21 19.90
SAGE
Schools
1334 538.63 40.09 51.72 19.24
*Significant at .05 level
25
Table 17. Difference of Means Test on Total CTBS SPRING POSTTEST for 199798 First
Grade Students
SCALE SCORES NORMAL CURVE EQUIVALENTS
N MEAN* STANDARD
DEVIATION
MEAN STANDARD
DEVIATION
Comparison
Schools
829 556.87 38.83 47.54 21.01
SAGE
Schools
1310 568.63 36.66 53.91 20.17
*Significant at .05 level
The largest gain in SAGE student scores from pretest to posttest, relative to
comparison school students, was on the mathematics subtest shown in Table 18. The smallest
relative gain for SAGE students from pretest to posttest was on the language arts subtest.
Table 18. Change in Mean Score from PRETEST to POSTTEST for 199798 FirstGrade
Students
Scale Scores Normal Curve Equivalents
SAGE Gain Comparison
Gain
Gain
Difference
SAGE Gain Comparison
Gain
Gain
Difference
Language
Arts
52.69 44.11 8.57* 10.33 6.40 3.93
Reading 45.32 34.99 10.33* 7.54 2.04 5.51
Mathematics 43.64 32.44 11.20* 7.30 1.91 5.39
Total 47.26 37.73 9.53* 9.36 4.11 5.25
*significant at .05 level
Regression Analysis
Regression Models. The effect of the SAGE program on student achievement, controlling
for other factors, was tested through a series of ordinary least squares regression models for each
subtest and for total scale scores. Control variables were entered into the models in blocks, with
the SAGE/comparison student variable entered into the models last.
The first block of control variables included student score on the pretest and school
attendance, measured as number of days absent, as reported by teachers in Spring 1998. The
second block of control variables included dummy variables for race/ethnicity, coded 1 if a
student was of a certain race/ethnicity, and 0 if not. Dummy variables were included for African
Americans and whites. A residual category, “other”, is included in the constant term in the
regression equations. Eligibility for subsidized lunch, as an indicator of family income, is also
26
included in the second block of control variables. This variable is coded 0 if student is ineligible,
1 if student is eligible for reduced price lunch, and 2 if the student is eligible for free lunch (this
variable is assumed to be interval level). In the final block, a dummy variable for SAGE or
comparison school student was entered on the third block. This variable is coded 0 if a student is
from a comparison school and 1 if a student is from a SAGE school.
Regression Results. Results of the regression analyses are presented in Tables 1922.
For all analyses, membership in a SAGE school emerges as a significant predictor of student
achievement on the posttest, while controlling for pretest scores, family income, school
attendance, and race/ethnicity. The magnitude of the effect of SAGE on student achievement, as
denoted by the “b” coefficient, varies depending on the CTBS subtest.
The largest effects of SAGE are found on the on the language subtest, while the smallest
effects of SAGE are found on the reading subtest. When all cases are analyzed the goodnessoffit
of the models (as denoted by the adjusted R square statistic), ranges from .270 (reading subscale
score) to .550 (total scale score). This means that when predicting the reading score and
total score, the variables included in the model explain 27% and 55% of the variance
respectively. Most of the variance in the posttest scores is, of course, explained by the pretest
scores.
Explained Variance in Achievement Scores. Attendance (as represented by “days
absent”) emerges as a consistent and statistically significant predictor of performance on all subtests
and total scale score. “Family Income” and “Race” show some relatively large effects (as
denoted by the b coefficients), but the effects are highly variable and are only sometimes
statistically significant (race is discussed further below). Membership in SAGE schools has a
consistently positive, statistically significant effect on achievement on the CTBS.
27
