administration student task rubric student work technical quality
Tree Study
Technical Quality Information
Contributed by: New Standards (NS)

4th Grade 1997 Research Pilot

During Spring 1997, over 3,000 students from eight states participated in the New Standards Science Reference 4th grade exam pilot. Each question on the exam was designed to measure specific characteristics of a good science education as described in the New Standards Performance Standards. These standards have been organized into measurement clusters, each with specific characteristics to ensure the development of a truly standards-referenced examination. Scoring guides were developed for each question also focused on the standards, enabling a detailed analysis of student strengths or weaknesses.

Special Studies

In addition, two special studies were conducted. The first study examined the impact of item placement and sequence. This study showed that students seemed to perform better when a set consisted of conceptually linked constructed-response items rather than when a set contained scaffolded multiple choice items followed by constructed response items.

The other study was designed to examine the impact of a "good" science program. During the administration of the science examination, teachers were surveyed about characteristics of their classes and instructional strategies used in their classes. Preliminary descriptive statistics were performed for these survey responses, and further investigations are planned.

Pilot Analysis

an analysis of the 4th grade pilot was conducted in June 1997. Approximately 1,500 student responses were used to evaluate the examination format and tasks. The analysis of the pilot data included:

  • tabulations of student responses by task, cluster, and examination totals in order to examine score distributions and the percentages of blank and off topic papers;
  • factor analyses to examine the validity of the three standards clusters;
  • examination of the "read behind" scores to measure the consistency of readers in applying the scoring guides;
  • calculations of alpha reliability coefficients for the cluster and total scores of the exam components.


The results of the pilot revealed that the Reference Exam does, indeed, measure the New Standards Science Standards in a consistent and reliable way. The exam format, questions, and activities were engaging to students of varying backgrounds, language proficiency, and ability. The exam elicited a full range of student responses and performance.

In addition to providing information valuable to the exam-development process, we discovered several "testing insights" that relate to the format and sequencing of exam items. Sample tasks and activities in this release package have been refined to reflect these insights. We have included the sample tasks and activities in the hopes that it might prove to be informative to the classroom teacher as well as the policy maker.

Development Process:

Content Review
Content Experts from AAAS and Science Teachers
Sensitivity Review
AAAS Standing Review Committee







Administration Information:

Demographics of Students Tested

Socio-economic status
Limited-English proficient


Number of Students Tested

For each task, approximately 100-500 students per grade level were tested.


Information is forthcoming in Year 2 Report for Technical Quality Review Panel.

Reliability and Additional Analyses:

Interrater Reliability
Completed by Technical Quality Review Panel (forthcoming)
Generalizability Studies
Completed by Technical Quality Review Panel (forthcoming)
Disaggregated by Subgroups
Completed by Technical Quality Review Board (forthcoming)
Difficulty Analyses (forthcoming)


©1997-2005 SRI International. All rights reserved. Terms of Use