PALS Final Report

[Presented at the annual meeting of the American Education Research Association in San Diego, CA., April 1998.]

Performance Assessment Links in Science (PALS):

On-line, Interactive Resources

Edys Quellmalz and Patricia Schank

SRI International, Center for Technology in Learning

quellmalz@unix.sri.com, schank@sri.com

http://pals.sri.com

Introduction

Understanding the central role that performance assessment plays in standards-based reform, educators are seeking ways to use these assessments to test student learning. Education agencies need pools of performance tasks to use as components in their student assessment programs and in evaluations of state and federally funded programs (McLaughlin & Shepard, 1995). Reform projects need standards-based assessment, too, as do teachers who are trying to implement reforms. Experience indicates, however, that the level of effort and costs of developing performance assessment tasks, scoring rubrics, and rater training materials, and then ensuring their technical quality, are very high (CPRE, 1995; Quellmalz, 1984). To meet the need for innovative approaches for sharing exemplary assessment resources, collaborating on the development of new ones, and understanding how the use of standards-based performance assessment can advance educational reform at all levels of the educational system, SRI International and the Council of Chief State School Officers (CCSSO) are developing Performance Assessment Links in Science (PALS), a greatly needed, specialized type of electronic library-an on-line, standards-based, interactive resource bank of science performance assessments-and studying models of effective use of these resources.

In this paper, we will discuss the role performance assessments can play in education reforms, suggest how technology expands the ways in which educators can access and use standards-based assessments and engage in collaborative professional development, describe the features of the on-line assessment resource bank, describe the results of prototype testing of the system, and raise some issues involved in developing and expanding an on-line assessment resource library.

The Role of Performance Assessment in School Reform

Reform-minded programs have set out to develop new, alternative forms of student assessment that call for students to construct rather than select responses. Performance assessments are generally valued for testing students' deep understanding of mathematical concepts and inquiry strategies, for making students' thinking visible, and for measuring their skills in communicating about their knowledge. In addition, performance assessments can present authentic, real-world problems that help students to show they can apply academic knowledge to practical situations. On the other hand, performance assessments are time-consuming and costly to develop, logistically demanding, and of questionable utility if not developed and scored according to sound measurement methods.

The Role of Technology in Assessment Reform

Technology offers a powerful strategy for increasing the ease with which educators can access and use standards-based student assessment. Technology can be used to efficiently archive numerous assessments for ready browsing. Currently, some sets of performance assessment tasks are distributed on CD-ROM, and there are plans to place released reference test items on the Internet, although these resources are not yet coordinated or easily accessible to assessment programs and schools. Electronic networks can go beyond storage by supporting the growth of a community of colleagues and leveraging expertise, regardless of geographic location and institutional base. Networks can offer templates and guidelines and support collaborative development and on-line conversations about the alignment of tasks with standards, quality of tasks and student work, on-line training and scoring, and on-line standards-setting sessions. Technology can also support simulated investigations and data collection and analysis of student responses. The exponential advances taking place in technology promise to revolutionize assessment practices and education reform (Kozma & Schank, 1998; Quellmalz, in press).

The Role of Technology in Professional Development

New professional development models are needed to provide teachers with greater opportunity to access, discuss, incorporate, and co-construct assessment resources and other reform-based materials (Little, 1993). Current efforts find it difficult to reach many teachers and to maintain discourse, and teachers have difficulty developing and testing new ideas and adjusting their strategies back at their schools (Carey & Frechtling, 1997; Corcoran, 1995). Participation in a professional community is a valuable component of teacher professional development (Lieberman, 1996), yet, unlike professionals in other fields who routinely collaborate with fellow practitioners, teachers are isolated from other adults all day. Technology can help provide mechanisms for teachers to overcome their isolation and make more effective use of the time they spend on their own professional growth. A few projects are beginning to offer the kinds of tools, communication channels, and contextual supports central to professional development and successful collaborative work (Lave & Wenger, 1991; Levin, Waugh, Brown, & Clift, 1994; Schlager & Schank, 1997). For example, TAPPED IN (Schlager & Schank, 1997; www.tappedin.org) supports a community of over 1,000 education professionals who, on any given day, can attend live activities hosted by various professional development organizations, jointly browse and contribute resources, or expand their circle of colleagues by participating in real-time or bulletin-board discussion groups. Such research, along with our pilot study results, supports our plans for integrating more collaboration support into PALS.

Performance Assessment Links in Science (PALS)

PALS will provide an on-line assessment resource library designed to serve two purposes and user groups: (1) the accountability requirements of state education agencies and specially funded programs, and (2) the professional development needs of classroom teachers. SRI International and CCSSO are building on a 1-year NSF planning grant as we embark on a 3year implementation project with two primary goals:

(1) To develop a two-tiered on-line performance assessment resource library composed of performance assessment tasks for elementary, middle, and secondary levels from multiple sources, such as state assessment programs and collaboratives. One tier will be a password-protected, secure Accountability Pool of science performance assessments for use by state assessment programs and systemic reform programs (e.g., Systemic Initiatives). The second tier will be for use by teachers and professional development organizations. The Professional Development Pool will provide performance-based science assessments that have been used successfully in large-scale (state or national) assessment programs and have been released.

(2) To identify, study, and evaluate the effectiveness of policies, implementation models, and technical quality requirements for the use of the two tiers of PALS. Policy issues may address questions about the appropriate screening criteria for including performance assessment resources in the pools, procedures for access and use of the resources, and cost-effective alternatives for operating, sustaining, and scaling up the numbers of resources and users.

Figure 1 portrays our vision of how PALS can support the accessibility and use of science performance assessment resources. In our design for PALS, assessment programs such as reference exams (e.g., TIMSS, the National Assessment of Educational Progress, the New Standards Project), state and other mandated testing programs, and districts can contribute standards-based science assessment tasks with documented technical quality to the PALS on-line resource library. The Accountability Pool will be composed of password-protected, secure tasks accessible only to approved assessment program staff. Assessment programs can thus share their resources and enjoy a much larger pool of science performance assessments to use or adapt for their testing administrations. The PALS resource library can provide large, continually updated collections that can support efficient search, selection, and printing. On-line rater training and scoring can vastly reduce assessment costs.

The on-line Professional Development Pool contains resources that are of documented technical quality and have been released for access by teachers and professional development groups. Preservice and in-service programs, for example, can reach teachers in geographically distributed and remote locales, resulting in great savings of travel and materials expenses. On-line guidelines and templates can support classroom use of science performance assessments. Teachers may administer the science performance tasks as part of their classroom assessments, adapt them, or use them as models for developing similar investigations. They also can engage in on-line rating sessions and conversations about how their students' work meets local and national science standards.

Figure 1. How PALS supports the access and use of performance assessment resources.

PALS System Components. The nature of performance assessments requires a highly sophisticated and versatile system for storing, retrieving, and using or adapting them. Because the components of performance assessments are both more numerous and more complex than those of multiple-choice items, the performance assessment electronic library must include more features than a simple, static bank of multiple-choice items.

Standards measured. Each science performance assessment task in the PALS prototype Accountability Pool is linked to both the SCASS science framework and the National Science Education Standards the task is intended to measure. Since performance tasks tend to be complex and are typically designed to measure multiple outcomes, the system indexes tasks to multiple standards. We will ask each source (state or consortium) submitting science performance assessments to use a Standards Coding Format SRI is developing to identify the standards each science performance task is intended to test and the related National Science Education Standards.

Tasks/Events. The PALS prototype includes 10 Grade 8 performance events. In the implementation project, we will include performance assessment resources for the elementary, middle, and secondary levels developed by multiple sources (e.g., state assessment programs, assessment collaboratives, and reference programs), so that users will have access to a variety of assessment approaches and task formats. Some of the assessment resources will be "balanced" forms that present a linked set of test response modes, e.g., hands-on investigations, related multiple-choice questions, short-answer formats, and extended writing.

Assessment planning charts. To help users design one or more administration forms that cover important science standards, the on-line system will provide assessment planning charts (Stiggins, 1994; Stiggins, Rubel, & Quellmalz, 1986). The PALS prototype automatically produces an assessment planning chart to display coverage of standards by the performance tasks tentatively selected by state assessment staff.

Rubrics and student exemplars. The science performance assessment resource library includes the scoring rubrics designed to judge the quality of student responses to a task or set of tasks. To bring meaning to the scoring rubrics, a library of exemplars of a range of scored student work will be included, serving the same purpose as the informational materials for schools typically published by state assessment programs in the form of samplers of scored, annotated student work. The PALS prototype includes the rubrics and student work developed and used by the SCASS project.

Training and scoring packets. Rater training and scoring procedures and materials are not routinely published by assessment programs. The PALS prototype pilot tested on-line rater training and scoring for two of the Grade 8 SCASS tasks. The Rater Training Packet procedures for establishing and maintaining rater agreement are in accordance with proven methodologies (Quellmalz, 1984; Herman, Aschbacher, & Winters, 1992). We are developing specifications for preparing on-line rater training and scoring materials so that each agency wishing to take advantage of the on-line training capability of the PALS system can convert stand-up training instruction and explanations to written form, assemble training packets, and test their effectiveness.

Technical quality reports. An essential component of the PALS on-line pools is the documentation of the technical quality of each task in the library. The PALS prototype contains the technical quality indicators provided to SCASS partner states for the SCASS field-tested science tasks. Since the resource library will be stocked only with performance tasks that have survived a systematic test development process, the tasks and rubrics will also have been subjected to content validity and bias reviews. One task of the PALS Steering Committee is to establish the technical quality procedures and statistics that must be documented for tasks to be placed in either of the pools.

The 10 performance events and 2 training packets in the PALS prototype are implemented in over 2,000 HTML, image, database, and program files, and the implementation project will be at least an order of magnitude larger. Designing such a large-scale site requires thoughtful planning and refinement. The organizational framework, underlying database, screen interactions, and specific user interface components were informed by current research on information architecture and Web interface and site design (Pirolli, Schank, Hearst, & Diehl, 1996; Rosenfeld & Morville, l998; Sano, 1996) and tested via storyboards and a pilot study of the prototype. Figure 2 shows the opening screen of the PALS Web site.

Figure 2. PALS opening screen. (full size) Figure 3. Searching by NSES standards. (full size)

Approximately 100 NSES and SCASS standards are implemented and mapped to each other and the performance events. From the NSES and SCASS search pages (see Figure 3), users can view an assessment chart by selecting standards of interest pressing the "Show Assessment Chart" button. The system then queries the database to find the tasks relevant to the standards, and generates a chart of all tasks that are intended to meet the selected standards. The number in each cell indicates how well a given task meets a given standard. Below the chart, the tasks are listed again, and users can select a subset of the tasks from this list to narrow the chart (see Figure 4). Users can also access tasks and generate assessment charts via the Tasks search page. Here users can read brief task descriptions, follow a link to a particular task, or select tasks of interest and display an assessment chart of the standards that the task intends to meet.

Figure 4. A sample assessment chart. (full size) Figure 5. Sample performance event (full size)

Figure 6. Technical quality data (full size) Figure 7. Examples of student work (full size)

A user who clicks on a task for viewing is queried for a login name and password (unless it is a public, released task). The opening screen for the task is then presented (see Figure 5). This page contains a short description of the task, and buttons that link to the task's components--administration procedures, rubric, directions to the student, technical quality data, and examples of student work. Figures 6 and 7 show examples of technical quality data and student work. Each task includes at least two examples of student work for each score point, generally ranging from 4 to 0.

The two performance event training packets each consist of about 300 samples of student work, segmented into introduction sets, calibration sets, training sets, and recalibration sets. Student work samples scored by ACT-trained raters were coded, scanned, and assigned to one of the sets. The underlying training packet program presents items in the packets for scoring (see Figure 8), checks scoring success against the benchmark scores, presents feedback on the user's performance (see Figure 9), and saves users' scores for each item. Login names and passwords are required to access the training packets.

Figure 8. Rater training segment (full size) Figure 9. Rater training feedback example (full size)

Finally, on-line surveys were implemented for both the search/select and training portions of the prototype to help us obtain feedback and improve the prototype. The search and select survey is accessible via a Feedback button (on every page, except during rater training), and the on-line training survey is automatically presented to users at the end of their training session. All survey responses are automatically e-mailed to us when submitted.

Pilot Study Findings

In general, the pilot subjects thought that the information provided for each performance task (e.g., the components) was very useful. One subject commented, "Yes, the performance events were great!!! All information needed for administrational procedures, teachers, students, and the rubrics were easy to follow and were thoroughly explained"; another said, "Yes, you have included everything we would need to run the assessment: time, materials, notes for teachers, sample rubrics and anchor papers."

Additional comments were generally very positive. For example, one pilot subject said, "Being an education student with hopes of one day being an educational administrator this website opens up the doors for countless ways of improving the state and local assessment programs. Also being from a state where great educational reforms are taking place an educational tool such as this can only enhance the possibilities of gaining valid and authentic assessment tasks."

The pilot tests also provided evidence that the two functions--search and select and on-line rater training--generally worked well. For example, when asked to rate the ease and usefulness of various functions of the prototype in the on-line survey, 71% of all subject ratings given were in the very useful/easy category. Ratings for the on-line rater training were lower (27% very useful/easy and 53% somewhat easy/useful). Most negative comments related to the logic behind the benchmark scores given by the trained raters; a few related to difficulty reading student work, likely aggravated by low screen resolution. From a technical standpoint, pilot subjects found the system generally easy to use.

From our experiences during the development process and the feedback from various advisors and users, we have learned a number of lessons about how to proceed as we implement PALS on a larger scale. This section summarizes our conclusions as they relate to the system architecture.

Support for Using and Adapting Tasks. To better help assessment personnel and teachers discover how best to use the resource pools for their purposes, adapt tasks, or use the tasks as models for developing similar investigations, we plan to add templates, guidelines, and "frequently asked questions" (FAQ) information to the PALS system. Using templates, teachers could then add their own adapted versions of tasks (e.g., for a different grade level) to an "Adapted Tasks" segment of the resource bank for others to use, if the tasks are accompanied by acceptable technical quality data. We will also investigate collaborative filtering techniques using a system such as GroupLens (Konstan, Miller, Maltz, Herlocker, Gordon & Riedl, 1997) that would allow users to rate or "vote on" the value of each task or its alignment with the standards, and use this information, in addition to other information optionally specified by the user (e.g., grade level, interests) and/or traffic patterns, to recommend or rank tasks for users. This technique is used by several leading-edge Web-based services, such as Amazon Books (www.amazon.com), and could become increasingly useful as the resource bank grows.

Communication Support. The frustrations expressed by pilot subjects using the training packet dealt mainly with issues they had with the benchmark scores--in particular, not understanding or agreeing with them. The next most common frustration was not being able to read student work. Notice that these frustrations are mostly independent of the technology--teachers going through training sessions face-to-face have similar frustrations with rubrics (as well as reading student work), but they are able to talk about them with other teachers and training leaders to resolve them. More support for communication could help alleviate such frustrations and provide useful feedback to the trainers. For example, synchronous communication mechanisms (e.g., chat, audio) could support real-time rater training and discussion of issues related to scoring procedures, rubrics, and administration procedures. Asynchronous mechanisms (e.g., bulletin boards, listservs) could provide a means for teachers to exchange ideas for adapting events, resolve questions about rubrics or administration procedures, and discuss policy issues.

Advanced Data Collection and Security Mechanisms. We implemented simple feedback and tracking measures in the prototype, such as on-line user surveys and a counter to tally the number of visits to the site. Additional mechanisms for automatically collecting usage statistics for each performance event and its components, compiling rater training results and errors, etc., would further aid analysis and provide useful feedback for our developers and funders. As the PALS community grows, we may also need better security mechanisms for managing users and passwords (e.g., automatic password generation and expiration) to help streamline the management of the system.

Expansion of the PALS System

Plans and Issues Related to the Collection. Over the next 3 years, we intend to enlarge the collection of science assessments to encompass elementary, middle, and secondary levels and to represent the approaches designed by multiple developers (collaboratives, state programs, reference exams). We aim to include tasks that vary in length and format and that provide depth of inquiry and coverage of the science standards.

The project will be addressing a number of issues. One will be the procedures for identifying the science standards that tasks are designed to test and the nature of the groups that will make the alignment judgments. Another issue relates to policies for accessing the resources that have been developed with funds from different sources and that represent assessment materials being distributed by various nonprofit and for-profit organizations. A third issue relates to the criteria and procedures for reviewing tasks submitted for inclusion in the bank. The technical review criteria and guidelines for use recommended by the PALS Steering Committee will shape this process. Finally, the system must develop strategies for operating, maintaining, updating, and expanding the assessment resources.

Building on the Collaborative Environment. PALS uses technology to efficiently archive and catalogue numerous assessments for ready browsing by teachers, and goes beyond mere storage of assessments to support cross-links with standards, tasks, scoring criteria, and annotated student work. These resources can be reused and shared by assessment programs, allowing them to have access to a large pool of performance assessments to use or adapt for their testing administrations. Teachers can administer the tasks as part of their classroom assessments, adapt them or use them as models for developing their own investigations, and contribute their adaptations to the resource bank for other teachers to use.

As an on-line library alone, PALS is a valuable set of resources. However, the growing body of professional development and collaboration research (mentioned earlier) and the results of our pilot study suggest many benefits of integrating more collaboration support into PALS. We plan to integrate TAPPED IN (Schlager & Schank, 1997) into a future version of PALS to help the community members leverage expertise, regardless of geographic location and institutional base. TAPPED IN could be used to support collaborative development and on-line conversations about the alignment of tasks with standards, the quality of the assessment tasks and student work, rater training and scoring, and standards setting. We believe that, by taking advantage of new models of professional development that include innovative digital technologies, PALS will provide excellent professional development opportunities for teachers.

References

Carey, N., & Frechtling, J. (1997, March). Best practice in action: Followup survey on teacher enhancement programs. Washington, DC: National Science Foundation.

Consortium for Policy Research in Education (CPRE). (1995). Tracking student achievement in science and math: The promise of state assessment programs. New Brunswick, NJ: CPRE Policy Briefs.

Corcoran, T. B. (1995). Transforming professional development for teachers: A guide for state policymakers. Washington, DC: National Governors' Association.

Herman, J. L., Aschbacher, P. R., & Winters, L. (1992) A practical guide to alternative assessment. Alexandria, VA: ASCD.

Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L.R., & Riedl, J. (1997). GroupLens: Applying collaborative filtering to Usenet News. Communications of the ACM, 40(3), 77-87.

Kozma, R., & Schank, P. (1998). Connecting with the twenty-first century: Technology in support of education reform. In D. Palumbo and C. Dede (Eds.), Association for Supervision and Curriculum Development 1998 yearbook: Learning with technology. Alexandria, Virginia: Association for Supervision and Curriculum Development.

Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge, UK: Cambridge University Press.

Levin, J., Waugh, M., Brown, D., & Clift, R. (1994). Teaching teleapprenticeships: A new organizational framework for improving teacher education using electronic networks. Journal of Machine-Mediated Learning, 4(2&3), 149-161.

Lieberman, A. (1996, November). Creating intentional learning communities. Educational Leadership, 51-55.

Little, J. (1993). Teachers' professional development in a climate of educational reform. Educational Evaluation and Policy Analysis, 15(2), 129-151.

McLaughlin, M. W., & Shepard L. A. (1995). Improving education through standards-based reform: A report by the National Academy of Education Panel on Standards-Based Education Reform. Washington, DC: National Academy of Education.

Pirolli, P., Schank, P., Hearst, M., & Diehl, C. (1996). Scatter/Gather browsing communicates the topic structure of a very large text collection. Human Factors in Computing Systems CHI '96, pp. 213-220. New York, NY: Association for Computing Machinery.

Rosenfeld, L., & Morville, P. (1998). Information architecture for the World Wide Web. Sebastopol, CA: O'Reilly & Associates.

Sano, D. (1996). Designing large-scale web sites: A visual design methodology. John Wiley & Sons.

Quellmalz, Edys (in press). The role of technology in advancing performance standards in science and mathematics learning. In Kathleen Comfort (Ed.), This year in school science. Washington, DC: AAAS.

Quellmalz, E. S. (1984). Designing writing assessments: Balancing fairness, utility, and cost. Educational Evaluation and Policy Analysis, 6, 63-72.

Schlager, M., & Schank, P. (1997). TAPPED IN: A new on-line community concept for the next generation of Internet technology. In R. Hall, N. Miyake & N. Enyedy (Eds.), Proceedings of the Second International Conference on Computer Support for Collaborative Learning, pp. 231-240. University of Toronto Press: Toronto, Canada.

Stiggins, R. J. (1994). Student-centered classroom assessment. New York: Macmillan College Publishing Company.

Stiggins, R. J., Rubel, E., & Quellmalz, E. S. (1986). Measuring thinking skills in the classroom. Washington, DC: NEA Professional Library.


Figure 2. PALS opening screen. (full size)	Figure 3. Searching by NSES standards. (full size)


Figure 4. A sample assessment chart. (full size)	Figure 5. Sample performance event (full size)


Figure 6. Technical quality data (full size)	Figure 7. Examples of student work (full size)


Figure 8. Rater training segment (full size)	Figure 9. Rater training feedback example (full size)