PALS Guide: Rubrics and Scoring

Background Information about Rubrics:

Rubrics, scoring guidelines, and criteria are all terms that refer to the guides used to score performance assessments in a reliable, fair, and valid manner. We have selected the term "rubric" to refer to the scoring guidelines used on the PALS web site. When designing performance assessments, the selection of targets, description of the assessment tasks, and development of the rubric are all interrelated. Without a rubric, a performance assessment task becomes an instructional activity. Rubrics should include:

dimensions of key behaviors
examples of the behaviors
scales (i.e., checklists, numerical, or qualitative)
standards of excellence for specified performance levels

Clear dimensions of performance assessments specify the definitions of performance using the behaviors that students will actually demonstrate and that judges will rate. For example, the dimensions of performance can be stated as follows: " Student's capability of using measurement tools will be demonstrated by plotting the levels of two variables on a two-dimensional graph using a graphing calculator." Do not make broad statements, such as: "Students will show an understanding of graphing calculators." Dimensions also can be clarified by the use of questions. For example, fluency in writing can be assessed as follows: "Does the student use pre-writing strategies (e.g., drawing, listing, clustering)?" "Does the student have spelling problems that block the flow of ideas?" These questions focus the teacher's or rater's attention on the dimensions of writing fluency that the student should demonstrate during the writing assessment: drawing, listing, clustering, and spelling difficulties. The dimensions of performance should be defined so that the scorers, teachers, students, and other stakeholders understand the performance dimensions in the same way. Some performance assessment tasks are complex and may require performances on several dimensions to determine if students have acquired the desired content and skills.

Rubrics can use different types of scales to document student performance. For example, the presence or absence of a variety of behaviors can be documented using a simple checklist. Numerical scales, such as those ranging from "1" to "10," can be used to assign ratings that differentiate among levels of performance. A third type of scale, qualitative, assigns words to various levels of performance, such as "inadequate" to the lowest levels of performance and "excellent" to the highest levels. A variety of descriptive terms can be used to rate the performances depending on the content and skills being assessed. For example, a qualitative scale can be used to rate degree of organization in a student project (i.e., "well organized "to " disorganized") or levels of originality in a project (i.e., "highly creative" to "little evidence of new or original thought.")

For each level of performance specified in the rubric, specific behaviors and examples of performance should be provided. For example, in the Electrical Circuit and Switches rubric, there are four levels of performance (criteria) that range from "1" (minimal performance) to "4" (excellent performance). The student behaviors needed to achieve each of the levels are specified in the rubric. To attain Level 1 (Criterion 1), a student "provides a complete circuit." To attain Level 2, a student must "provide a complete working circuit and switch or provide a complete working circuit and modified switch or provide a complete working circuit and short circuiting switch." To attain Level 3, a student must make a "clear drawing of a modified switch (Switch or main parts must be labeled!)" To attain Level 4, a student must provide a "clear description of how a modified switch works." To attain Levels 3 and 4, the student must also show that they accomplished Levels 1 and 2. Please note that Level 2 incorporates Level 1.

Technically sound rubrics are:

Continuous: The change in quality from score point to score point must be "equal": the degree of difference between a 5 and 4 should be the same as between a 2 and 1.

Parallel: Similar language should be used to describe each level of performance (e.g., low skill level, moderate skill level, and high skill level), as opposed to non-parallel constructions (e.g., low skill level, understands how to perform some of the task, excellent performance).

Coherent: The rubric must focus on the same achievement target throughout, although each level of the rubric will specify different degrees of attainment of that target. For example, if the purpose of the performance assessment is to measure organization in writing, then each point on the rubric must be related to different degrees of organization, not factual accuracy or creativity.

Highly Descriptive: Highly descriptive evaluative language ("excellent," "poor,") and comparative language ("better than," "worse than") should be used to clarify each level of performance in order to help teachers and raters recognize the salient and distinctive features of each level. It also communicates performance expectations to students, parents, and other stakeholders.

Valid: The rubric permits valid inferences about performance to the degree that what is scored is what is central to performance, not what is merely easy to see or score, or based on factors other than the achievements being measured. The proposed differences in levels of performance should a) reflect the key components of student performance, b) describe qualitative, not quantitative differences in performance, and c) not confuse merely correlative behaviors with authentic indicators of achievement (e.g., clarity and quality of information presented should be a criterion in judging speaking effectiveness, not whether the speaker used note cards while speaking). Valid rubrics reduce the likelihood of biased judgments of students' work by focusing raters' attention on factors other than students' gender, race, age, appearance, ethnic heritage, or prior academic record.

Reliable: In traditional assessments, such as multiple choice tests, where a student selects a response from among several options, the reliability of the score has to do primarily with the stability of the test score from one testing occasion to another in the absence of intervening growth or instruction. Establishing the reliability of a rubric for a performance assessment, however, is more complex. A reliable performance assessment rubric enables:

Several judges rating a student's performance on a specific task to assign the same score or rating to the student's performance.
Each judge to rate the student's performance on a specific task at about the same level on several occasions in the absence of intervening growth or instruction.

Rubrics can be generic or task-specific. A generic rubric can be used for multiple tasks, while a task-specific rubric is only appropriate for a particular task. The scoring guidelines within generic or task-specific rubrics may be analytical or holistic. Holistic scoring is typically based on a four- to six-point scale indicating specified performance levels that reflect an overall impression of student work. In contrast, analytic scoring provides separate scores, usually on a four- to six-point scale, for multiple dimensions of each student’s work. Analytic scoring allows for more specific and detailed feedback than holistic scoring. On the PALS Web site you can find examples of the following:

Holistic, Generic rubric

See "Some of Its Parts"

Holistic, Task-specific

See "Electrical Circuits and Switches"

Analytic, Task-specific

See "Car Wash" (4-6-point scale for each criterion)

See "Blue" (checklist