[Presented at the annual meeting of the American Education Research Association in San Diego, CA., April 1998.]
Introduction
Understanding the central role that performance assessment plays in standards-based reform, educators are seeking ways to use these assessments to test student learning. Education agencies need pools of performance tasks to use as components in their student assessment programs and in evaluations of state and federally funded programs (McLaughlin & Shepard, 1995). Reform projects need standards-based assessment, too, as do teachers who are trying to implement reforms. Experience indicates, however, that the level of effort and costs of developing performance assessment tasks, scoring rubrics, and rater training materials, and then ensuring their technical quality, are very high (CPRE, 1995; Quellmalz, 1984). To meet the need for innovative approaches for sharing exemplary assessment resources, collaborating on the development of new ones, and understanding how the use of standards-based performance assessment can advance educational reform at all levels of the educational system, SRI International and the Council of Chief State School Officers (CCSSO) are developing Performance Assessment Links in Science (PALS), a greatly needed, specialized type of electronic library-an on-line, standards-based, interactive resource bank of science performance assessments-and studying models of effective use of these resources.
In this paper, we will discuss the role performance assessments
can play in education reforms, suggest how technology expands
the ways in which educators can access and use standards-based
assessments and engage in collaborative professional development,
describe the features of the on-line assessment resource bank,
describe the results of prototype testing of the system, and raise
some issues involved in developing and expanding an on-line assessment
resource library.
The Role of Performance Assessment in School Reform
Reform-minded programs have set out to develop new, alternative
forms of student assessment that call for students to construct
rather than select responses. Performance assessments are generally
valued for testing students' deep understanding of mathematical
concepts and inquiry strategies, for making students' thinking
visible, and for measuring their skills in communicating about
their knowledge. In addition, performance assessments can present
authentic, real-world problems that help students to show they
can apply academic knowledge to practical situations. On the other
hand, performance assessments are time-consuming and costly to
develop, logistically demanding, and of questionable utility if
not developed and scored according to sound measurement methods.
The Role of Technology in Assessment Reform
Technology offers a powerful strategy for increasing the ease
with which educators can access and use standards-based student
assessment. Technology can be used to efficiently archive numerous
assessments for ready browsing. Currently, some sets of performance
assessment tasks are distributed on CD-ROM, and there are plans
to place released reference test items on the Internet, although
these resources are not yet coordinated or easily accessible to
assessment programs and schools. Electronic networks can go beyond
storage by supporting the growth of a community of colleagues
and leveraging expertise, regardless of geographic location and
institutional base. Networks can offer templates and guidelines
and support collaborative development and on-line conversations
about the alignment of tasks with standards, quality of tasks
and student work, on-line training and scoring, and on-line standards-setting
sessions. Technology can also support simulated investigations
and data collection and analysis of student responses. The
exponential advances taking place in technology promise to revolutionize
assessment practices and education reform (Kozma & Schank,
1998; Quellmalz, in press).
The Role of Technology in Professional Development
New professional development models are needed to provide teachers
with greater opportunity to access, discuss, incorporate, and
co-construct assessment resources and other reform-based materials
(Little, 1993). Current efforts find it difficult to reach many
teachers and to maintain discourse, and teachers have difficulty
developing and testing new ideas and adjusting their strategies
back at their schools (Carey & Frechtling, 1997; Corcoran,
1995). Participation in a professional community is a valuable
component of teacher professional development (Lieberman, 1996),
yet, unlike professionals in other fields who routinely collaborate
with fellow practitioners, teachers are isolated from other adults
all day. Technology can help provide mechanisms for teachers to
overcome their isolation and make more effective use of the time
they spend on their own professional growth. A few projects are
beginning to offer the kinds of tools, communication channels,
and contextual supports central to professional development and
successful collaborative work (Lave & Wenger, 1991; Levin,
Waugh, Brown, & Clift, 1994; Schlager & Schank, 1997).
For example, TAPPED IN (Schlager & Schank, 1997; www.tappedin.org)
supports a community of over 1,000 education professionals who,
on any given day, can attend live activities hosted by various
professional development organizations, jointly browse and contribute
resources, or expand their circle of colleagues by participating
in real-time or bulletin-board discussion groups. Such research,
along with our pilot study results, supports our plans for integrating
more collaboration support into PALS.
Performance Assessment Links in Science (PALS)
PALS will provide an on-line assessment resource library designed
to serve two purposes and user groups: (1) the accountability
requirements of state education agencies and specially funded
programs, and (2) the professional development needs of classroom
teachers. SRI International and CCSSO are building on a 1-year
NSF planning grant as we embark on a 3year implementation
project with two primary goals:
(1) To develop a two-tiered on-line performance assessment resource
library composed of performance assessment tasks for elementary,
middle, and secondary levels from multiple sources, such as state
assessment programs and collaboratives. One tier will be a password-protected,
secure Accountability Pool of science performance assessments
for use by state assessment programs and systemic reform programs
(e.g., Systemic Initiatives). The second tier will be for use
by teachers and professional development organizations. The Professional
Development Pool will provide performance-based science assessments
that have been used successfully in large-scale (state or national)
assessment programs and have been released.
(2) To identify, study, and evaluate the effectiveness of policies,
implementation models, and technical quality requirements for
the use of the two tiers of PALS. Policy issues may address questions
about the appropriate screening criteria for including performance
assessment resources in the pools, procedures for access and use
of the resources, and cost-effective alternatives for operating,
sustaining, and scaling up the numbers of resources and users.
Figure 1 portrays our vision of how PALS can support the accessibility and use of science performance assessment resources. In our design for PALS, assessment programs such as reference exams (e.g., TIMSS, the National Assessment of Educational Progress, the New Standards Project), state and other mandated testing programs, and districts can contribute standards-based science assessment tasks with documented technical quality to the PALS on-line resource library. The Accountability Pool will be composed of password-protected, secure tasks accessible only to approved assessment program staff. Assessment programs can thus share their resources and enjoy a much larger pool of science performance assessments to use or adapt for their testing administrations. The PALS resource library can provide large, continually updated collections that can support efficient search, selection, and printing. On-line rater training and scoring can vastly reduce assessment costs.
The on-line Professional Development Pool contains resources
that are of documented technical quality and have been released
for access by teachers and professional development groups. Preservice
and in-service programs, for example, can reach teachers in geographically
distributed and remote locales, resulting in great savings of
travel and materials expenses. On-line guidelines and templates
can support classroom use of science performance assessments.
Teachers may administer the science performance tasks as part
of their classroom assessments, adapt them, or use them as models
for developing similar investigations. They also can engage in
on-line rating sessions and conversations about how their students'
work meets local and national science standards.
PALS System Components. The nature of performance
assessments requires a highly sophisticated and versatile system
for storing, retrieving, and using or adapting them. Because the
components of performance assessments are both more numerous and
more complex than those of multiple-choice items, the performance
assessment electronic library must include more features than
a simple, static bank of multiple-choice items.
Standards measured. Each science performance assessment task in the PALS prototype Accountability Pool is linked to both the SCASS science framework and the National Science Education Standards the task is intended to measure. Since performance tasks tend to be complex and are typically designed to measure multiple outcomes, the system indexes tasks to multiple standards. We will ask each source (state or consortium) submitting science performance assessments to use a Standards Coding Format SRI is developing to identify the standards each science performance task is intended to test and the related National Science Education Standards.
Tasks/Events. The PALS prototype includes 10 Grade 8 performance events. In the implementation project, we will include performance assessment resources for the elementary, middle, and secondary levels developed by multiple sources (e.g., state assessment programs, assessment collaboratives, and reference programs), so that users will have access to a variety of assessment approaches and task formats. Some of the assessment resources will be "balanced" forms that present a linked set of test response modes, e.g., hands-on investigations, related multiple-choice questions, short-answer formats, and extended writing.
Assessment planning charts. To help users design one or more administration forms that cover important science standards, the on-line system will provide assessment planning charts (Stiggins, 1994; Stiggins, Rubel, & Quellmalz, 1986). The PALS prototype automatically produces an assessment planning chart to display coverage of standards by the performance tasks tentatively selected by state assessment staff.
Rubrics and student exemplars. The science performance assessment resource library includes the scoring rubrics designed to judge the quality of student responses to a task or set of tasks. To bring meaning to the scoring rubrics, a library of exemplars of a range of scored student work will be included, serving the same purpose as the informational materials for schools typically published by state assessment programs in the form of samplers of scored, annotated student work. The PALS prototype includes the rubrics and student work developed and used by the SCASS project.
Training and scoring packets. Rater training and scoring procedures and materials are not routinely published by assessment programs. The PALS prototype pilot tested on-line rater training and scoring for two of the Grade 8 SCASS tasks. The Rater Training Packet procedures for establishing and maintaining rater agreement are in accordance with proven methodologies (Quellmalz, 1984; Herman, Aschbacher, & Winters, 1992). We are developing specifications for preparing on-line rater training and scoring materials so that each agency wishing to take advantage of the on-line training capability of the PALS system can convert stand-up training instruction and explanations to written form, assemble training packets, and test their effectiveness.
Technical quality reports.
An essential component of the PALS on-line pools is the documentation
of the technical quality of each task in the library. The PALS
prototype contains the technical quality indicators provided to
SCASS partner states for the SCASS field-tested science tasks.
Since the resource library will be stocked only with performance
tasks that have survived a systematic test development process,
the tasks and rubrics will also have been subjected to content
validity and bias reviews. One task of the PALS Steering Committee
is to establish the technical quality procedures and statistics
that must be documented for tasks to be placed in either of the
pools.
The 10 performance events and 2
training packets in the PALS prototype are implemented in over
2,000 HTML, image, database, and program files, and the implementation
project will be at least an order of magnitude larger. Designing
such a large-scale site requires thoughtful planning and refinement.
The organizational framework, underlying database, screen interactions,
and specific user interface components were informed by current
research on information architecture and Web interface and site
design (Pirolli, Schank, Hearst, & Diehl, 1996; Rosenfeld
& Morville, l998; Sano, 1996) and tested via storyboards and
a pilot study of the prototype. Figure 2 shows the opening screen
of the PALS Web site.
Approximately 100 NSES and SCASS
standards are implemented and mapped to each other and the performance
events. From the NSES and SCASS search pages (see Figure 3), users
can view an assessment chart by selecting standards of interest
pressing the "Show Assessment Chart" button. The system
then queries the database to find the tasks relevant to the standards,
and generates a chart of all tasks that are intended to meet the
selected standards. The number in each cell indicates how well
a given task meets a given standard. Below the chart, the tasks
are listed again, and users can select a subset of the tasks from
this list to narrow the chart (see Figure 4). Users can also access
tasks and generate assessment charts via the Tasks search page.
Here users can read brief task descriptions, follow a link to
a particular task, or select tasks of interest and display an
assessment chart of the standards that the task intends to meet.
A user who clicks on a task for viewing is queried for a login name and password (unless it is a public, released task). The opening screen for the task is then presented (see Figure 5). This page contains a short description of the task, and buttons that link to the task's components--administration procedures, rubric, directions to the student, technical quality data, and examples of student work. Figures 6 and 7 show examples of technical quality data and student work. Each task includes at least two examples of student work for each score point, generally ranging from 4 to 0.
The two performance event training
packets each consist of about 300 samples of student work, segmented
into introduction sets, calibration sets, training sets, and recalibration
sets. Student work samples scored by ACT-trained raters were coded,
scanned, and assigned to one of the sets. The underlying training
packet program presents items in the packets for scoring (see
Figure 8), checks scoring success against the benchmark scores,
presents feedback on the user's performance (see Figure 9), and
saves users' scores for each item. Login names and passwords are
required to access the training packets.
Finally, on-line surveys were implemented
for both the search/select and training portions of the prototype
to help us obtain feedback and improve the prototype. The search
and select survey is accessible via a Feedback button (on every
page, except during rater training), and the on-line training
survey is automatically presented to users at the end of their
training session. All survey responses are automatically e-mailed
to us when submitted.
Pilot Study Findings
In general, the pilot subjects thought that the information provided for each performance task (e.g., the components) was very useful. One subject commented, "Yes, the performance events were great!!! All information needed for administrational procedures, teachers, students, and the rubrics were easy to follow and were thoroughly explained"; another said, "Yes, you have included everything we would need to run the assessment: time, materials, notes for teachers, sample rubrics and anchor papers."
Additional comments were generally very positive. For example, one pilot subject said, "Being an education student with hopes of one day being an educational administrator this website opens up the doors for countless ways of improving the state and local assessment programs. Also being from a state where great educational reforms are taking place an educational tool such as this can only enhance the possibilities of gaining valid and authentic assessment tasks."
The pilot tests also provided evidence that the two functions--search and select and on-line rater training--generally worked well. For example, when asked to rate the ease and usefulness of various functions of the prototype in the on-line survey, 71% of all subject ratings given were in the very useful/easy category. Ratings for the on-line rater training were lower (27% very useful/easy and 53% somewhat easy/useful). Most negative comments related to the logic behind the benchmark scores given by the trained raters; a few related to difficulty reading student work, likely aggravated by low screen resolution. From a technical standpoint, pilot subjects found the system generally easy to use.
From our experiences during the development process and the feedback from various advisors and users, we have learned a number of lessons about how to proceed as we implement PALS on a larger scale. This section summarizes our conclusions as they relate to the system architecture.
Support for Using and Adapting Tasks. To better help assessment personnel and teachers discover how best to use the resource pools for their purposes, adapt tasks, or use the tasks as models for developing similar investigations, we plan to add templates, guidelines, and "frequently asked questions" (FAQ) information to the PALS system. Using templates, teachers could then add their own adapted versions of tasks (e.g., for a different grade level) to an "Adapted Tasks" segment of the resource bank for others to use, if the tasks are accompanied by acceptable technical quality data. We will also investigate collaborative filtering techniques using a system such as GroupLens (Konstan, Miller, Maltz, Herlocker, Gordon & Riedl, 1997) that would allow users to rate or "vote on" the value of each task or its alignment with the standards, and use this information, in addition to other information optionally specified by the user (e.g., grade level, interests) and/or traffic patterns, to recommend or rank tasks for users. This technique is used by several leading-edge Web-based services, such as Amazon Books (www.amazon.com), and could become increasingly useful as the resource bank grows.
Communication Support. The frustrations expressed by pilot subjects using the training packet dealt mainly with issues they had with the benchmark scores--in particular, not understanding or agreeing with them. The next most common frustration was not being able to read student work. Notice that these frustrations are mostly independent of the technology--teachers going through training sessions face-to-face have similar frustrations with rubrics (as well as reading student work), but they are able to talk about them with other teachers and training leaders to resolve them. More support for communication could help alleviate such frustrations and provide useful feedback to the trainers. For example, synchronous communication mechanisms (e.g., chat, audio) could support real-time rater training and discussion of issues related to scoring procedures, rubrics, and administration procedures. Asynchronous mechanisms (e.g., bulletin boards, listservs) could provide a means for teachers to exchange ideas for adapting events, resolve questions about rubrics or administration procedures, and discuss policy issues.
Advanced Data Collection and
Security Mechanisms. We implemented simple feedback and tracking
measures in the prototype, such as on-line user surveys and a
counter to tally the number of visits to the site. Additional
mechanisms for automatically collecting usage statistics for each
performance event and its components, compiling rater training
results and errors, etc., would further aid analysis and provide
useful feedback for our developers and funders. As the PALS community
grows, we may also need better security mechanisms for managing
users and passwords (e.g., automatic password generation and expiration)
to help streamline the management of the system.
Expansion of the PALS System
Plans and Issues Related to the Collection. Over the next 3 years, we intend to enlarge the collection of science assessments to encompass elementary, middle, and secondary levels and to represent the approaches designed by multiple developers (collaboratives, state programs, reference exams). We aim to include tasks that vary in length and format and that provide depth of inquiry and coverage of the science standards.
The project will be addressing a number of issues. One will be the procedures for identifying the science standards that tasks are designed to test and the nature of the groups that will make the alignment judgments. Another issue relates to policies for accessing the resources that have been developed with funds from different sources and that represent assessment materials being distributed by various nonprofit and for-profit organizations. A third issue relates to the criteria and procedures for reviewing tasks submitted for inclusion in the bank. The technical review criteria and guidelines for use recommended by the PALS Steering Committee will shape this process. Finally, the system must develop strategies for operating, maintaining, updating, and expanding the assessment resources.
Building on the Collaborative Environment. PALS uses technology to efficiently archive and catalogue numerous assessments for ready browsing by teachers, and goes beyond mere storage of assessments to support cross-links with standards, tasks, scoring criteria, and annotated student work. These resources can be reused and shared by assessment programs, allowing them to have access to a large pool of performance assessments to use or adapt for their testing administrations. Teachers can administer the tasks as part of their classroom assessments, adapt them or use them as models for developing their own investigations, and contribute their adaptations to the resource bank for other teachers to use.
As an on-line library alone, PALS
is a valuable set of resources. However, the growing body of professional
development and collaboration research (mentioned earlier) and
the results of our pilot study suggest many benefits of integrating
more collaboration support into PALS. We plan to integrate TAPPED
IN (Schlager & Schank, 1997) into a future version of PALS
to help the community members leverage expertise, regardless of
geographic location and institutional base. TAPPED IN could be
used to support collaborative development and on-line conversations
about the alignment of tasks with standards, the quality of the
assessment tasks and student work, rater training and scoring,
and standards setting. We believe that, by taking advantage of
new models of professional development that include innovative
digital technologies, PALS will provide excellent professional
development opportunities for teachers.
References
Carey, N., & Frechtling, J. (1997, March). Best practice in action: Followup survey on teacher enhancement programs. Washington, DC: National Science Foundation.
Consortium for Policy Research in Education (CPRE). (1995). Tracking student achievement in science and math: The promise of state assessment programs. New Brunswick, NJ: CPRE Policy Briefs.
Corcoran, T. B. (1995). Transforming professional development for teachers: A guide for state policymakers. Washington, DC: National Governors' Association.
Herman, J. L., Aschbacher, P. R., & Winters, L. (1992) A practical guide to alternative assessment. Alexandria, VA: ASCD.
Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L.R., & Riedl, J. (1997). GroupLens: Applying collaborative filtering to Usenet News. Communications of the ACM, 40(3), 77-87.
Kozma, R., & Schank, P. (1998). Connecting with the twenty-first century: Technology in support of education reform. In D. Palumbo and C. Dede (Eds.), Association for Supervision and Curriculum Development 1998 yearbook: Learning with technology. Alexandria, Virginia: Association for Supervision and Curriculum Development.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge, UK: Cambridge University Press.
Levin, J., Waugh, M., Brown, D., & Clift, R. (1994). Teaching teleapprenticeships: A new organizational framework for improving teacher education using electronic networks. Journal of Machine-Mediated Learning, 4(2&3), 149-161.
Lieberman, A. (1996, November). Creating intentional learning communities. Educational Leadership, 51-55.
Little, J. (1993). Teachers' professional development in a climate of educational reform. Educational Evaluation and Policy Analysis, 15(2), 129-151.
McLaughlin, M. W., & Shepard L. A. (1995). Improving education through standards-based reform: A report by the National Academy of Education Panel on Standards-Based Education Reform. Washington, DC: National Academy of Education.
Pirolli, P., Schank, P., Hearst, M., & Diehl, C. (1996). Scatter/Gather browsing communicates the topic structure of a very large text collection. Human Factors in Computing Systems CHI '96, pp. 213-220. New York, NY: Association for Computing Machinery.
Rosenfeld, L., & Morville, P. (1998). Information architecture for the World Wide Web. Sebastopol, CA: O'Reilly & Associates.
Sano, D. (1996). Designing large-scale web sites: A visual design methodology. John Wiley & Sons.
Quellmalz, Edys (in press). The role of technology in advancing performance standards in science and mathematics learning. In Kathleen Comfort (Ed.), This year in school science. Washington, DC: AAAS.
Quellmalz, E. S. (1984). Designing writing assessments: Balancing fairness, utility, and cost. Educational Evaluation and Policy Analysis, 6, 63-72.
Schlager, M., & Schank, P. (1997). TAPPED IN: A new on-line community concept for the next generation of Internet technology. In R. Hall, N. Miyake & N. Enyedy (Eds.), Proceedings of the Second International Conference on Computer Support for Collaborative Learning, pp. 231-240. University of Toronto Press: Toronto, Canada.
Stiggins, R. J. (1994). Student-centered classroom assessment. New York: Macmillan College Publishing Company.
Stiggins, R. J., Rubel, E., &
Quellmalz, E. S. (1986). Measuring thinking skills in the
classroom. Washington, DC: NEA Professional Library.