Teaching Main      Learning Styles      EDSE 415      PLSI    School Climate Page   Shindler Index     Classroom Management


Week 9: Authentic Assessment


Theoretical Considerations in the Development of “Authentic Assessment”




·        processes

·        content skills

·        applied learning

·        social skills

·        problem solving

·        products







·        objective

·        valid and reliable

·        doable/efficient

·        desirable consequences

·       translate to a grade













·        define what “good” is

·        have a clear “task analysis” of your objective

·        develop a sound instrument for assessment

·       develop a clear and effective process to gather data



Steps in Developing a Performance Assessment



1. Define the performance

·        integrate the desired performance with course instructional objectives.

·        operationalize” the task and clearly define the concept of a “quality” performance.

·        determine if the performance should be naturally occurring or manufactured.


2. Select the most appropriate form of assessment scale

*        checklist (i.e., performance is characterized by can/can’t type outcomes)

*        combination of primary traits (i.e., performance is characterized by a number of component parts, defining its critical attributes)

*        rubric scale (i.e., performance is characterized by a hierarchical progression of quality and complexity as the performance is mastered)


3. Create the assessment criteria and scale to represent it

¨      develop a framework that any and all performances can be placed within for clear and reliable scoring.

¨      determine the type of score/feedback most suitable.

*        holistic (i.e., one score representing the complex elements of the performance)

*        primary traits (i.e., a series of scores relating to each component of the performance)

*        narrative (i.e., anecdotal feedback relating to salient aspects of the performance)

¨      communicate assessment criteria with participants, and/or have the participants take part in the development of the criteria.



4. Prepare for sampling and technical considerations

¨      how much of student’s work is necessary to represent the whole?

¨      how can the procedure best be carried out efficiently?


5. Address issues of reliability and bias

3 Conceptual Scale Types for use in

Performance Assessment





YES/did      NO/didn’t

_____         _____                   task 1

_____                   _____                   task 2

_____                   _____                   task 3

_____                   _____                   etc...



Best for performances that are defined by did or didn’t - there or not there characteristics.  These tasks need to be observably evident and can not require interpretation.






Best for performances and products that have a complex series of traits.  If the definition of a “good . . .” can not be reduced to one holistic scale, separate traits must be determined, and this scale type is necessary.







Trait A

Trait B

Trait C

Level 3





Level 2





Level 1





















This scale is best for assessing performances and products that require an interpretation of quality, and can be reduced to progressive levels of caliber.  The scale should represent clear and concrete behaviors defining distinct levels

Conceptual Design for Holistic Rubric Scale









        Level 4






                             Level 3





                                                Level 2






                                                                        Level 1





Rules for rubric construction:

1.      each level should be stated in positive, behavioral terms

2.    each progressive level should be inclusive of the last

3.    each level should be clear and distinct from the last

4.    each line should represent specific defining behavior(s)

5.    avoid negative behaviors unless absolutely necessary

6.    the number of levels should reflect the nature of the task

7.    label levels according to the needs of student group

Which Type of Rubric is Best? Exploring Various Structural Options for Performance Assessment Scale Design

(Forum Journal of Teacher Education, V12, n2 2002)



            In recent years, the field of education has incorporating an increasing amount of performance assessment.  As a result, there has been a proliferation and legitimization of the use of assessment scales often called scoring rubrics.  Training in rubric design has become common, and even many parents are becoming familiar with the practice.  This growing enthusiasm is not surprising; the use of well-designed performance assessment procedures opens many new assessment possibilities to today’s teacher. 

The benefits of rubrics to students can be significant.  Quality rubrics can provide students with clear targets (Stiggins, 1994; Huffman, 1998).  They can help students become more self-directed and reflective (Luft, 1998), and feel a greater sense of ownership for their learning (Branch, 1998).  Maybe most importantly, given that rubrics have the capacity to capture complex performances, they provide the opportunity to assess many more student outcomes than traditional objective methods – outcomes that are in many cases very relevant, authentic and which involve real world applications.

            Yet all performance assessment scales (or rubrics) are not created equal.  A quality scale must first, incorporate the best design option for the given task, and second, be constructed “soundly”.  There are a few basic principles to consider when constructing or choosing a pre-made scale.  This article may be helpful to teachers who want to be able to develop just the right scale for the situation, and who want to be confident that their rubrics are having the educational benefits that they desire.


An Operational Definition of Soundness

Assessment in a very real and material way defines success for our students.  This is especially true for rubrics.  The soundness of our rubrics will effect how instructional, fair, and motivational they are.  For an assessment procedure to be considered sound it must possess high degrees of validity, reliability, must be able to be carried out efficiently, and must have a generally positive academic and psychological affect on students.

            Validity deals generally with the degree to which any assessment method suits the job.  The question validity asks is, “Does this form of assessment capture the most relevant, essential, and inclusive set of outcomes for a particular learning performance?” For this reason, rubrics theoretically are often the most valid way to assess complex performances when a reliable qualitative measurement is required.  However, there are many forms of rubrics and they all function differently, and produce different results.

Issues of reliability seem to be the primary focus of the academic community’s examination of rubric usage (Crehan, 1998; Pophen, 1997), yet reliability is just one aspect of holistic soundness (Myford, 1996; Shindler, 1999; Stiggins, 1994).  Reliability generally deals with how well any assessment method can obtain similar results over separate applications, from one performance to the next, and from one student to the next.  Greater reliability is usually generated by greater specificity of content.  Usually, the more concrete, precise and observable the language content, the more reliable the rubric will be.  However, in the effort to gain specificity it may lead one to an over emphasis on quantification.  In the extreme, this can cause problems with validity.  Many times the most important and essential qualities of a performance do not lend themselves to quantification.  The conception of the quality whole can be lost in a list of amounts.


            Given the already unreasonable amount of work required of teachers, practices that are not efficient will not be sustained for long.  Teachers are just too burdened by too many needs.  Constructing rubrics and then using them to assess individual performance does take time.  But usually the time to construct a quality rubric pays for itself many times due to the time saved clarifying and re-teaching the desired performance tasks.  Still, the teacher must ask the question, “When I am doing my performance assessment, what am I not doing?”  The cost must be worth the benefit.  For that reason, regardless of grade level, having students help develop and then use the rubric for peer assessment can make assessment very cost effective in the long run.  It not only relieves the teacher of being the sole instrument of evaluation, but having students assess themselves and one another throughout the performance product or skill development process has benefits beyond merely greater efficiency.

            Any assessment procedure that claims to be sound educationally must not only be valid, reliable and efficient, but must have an overall positive influence on the student.  We have traditionally viewed assessment as “measuring what went on during the learning.”  In this view, assessment is a value-free abstraction.  The reality is that every assessment practice has an affect.  It either improves or detracts from students’ sense of motivation, control, worth, and belonging within the group.  Assessment, in a very real way, defines the epistemological reality of the classroom.  It tells us what knowledge is, and that which is important in our learning.  Assessment can empower or erode each student’s basic psychology of achievement.

            Rubrics as a practice can neither be viewed as homogeneously sound or unsound.  In one case, an assessment procedure using a rubric could rate high on all four of these areas of soundness. In another case, it could fail on all fronts.  These four areas of soundness will be investigated within the context of a discussion of rubric design and construction considerations.


General Guidelines for Scale Development:

            To begin the process of developing a performance assessment scale, it is best to start with the desired outcome or learning objective.  These can come from state or district learning outcomes, or the teacher’s own curriculum.  Given what we want our students to learn, what task would best reflect that learning?  Quite often teachers are not ambitious enough at this point.  With the right tool, there is very little that we could not assess pretty soundly.  It might be useful to ask oneself the question, “What is the most authentic and meaningful way that I could see my students learn ____, or show they have learned ___ ?”  This learning outcome can take the form of a project, lab, product, report, presentation, piece of writing, or the process of getting to an outcome.  The performance task can be done individually or in groups, and the assessment can be applied to individuals or to groups.

            To a great extent the soundness of any performance assessment will be predicated on how well it can be “operationalized.”  If a task, product, or process can be broken down into a well-defined set of parts, it can be assessed.  If the assessor can not define the “good performance” before assigning the work or beginning to assess that work, the assessment will be unsound.  If the qualities of a good performance can not be clearly outlined to students before they begin the work, the assessment will not be useful to their learning, it will not be perceived as fair, and it will not be reliable across multiple performances.  It will be no better than a subjectively determined mark, the kind we remember at the top of many of our papers over the years, the kind that was of little help to our learning, and that most of us experienced as something of a personal gift or punishment. 

So let’s say the authentic task we chose to demonstrate the essential learning was some kind of a project.  We would need to define very clearly each and every quality, component, and quantity that would be need to be included in a fully successful performance.  Again the more specific our language the better our students can worry about the content and less about guessing what we want.  The less students have to guess, the more they are in control of their learning, and consequently the more motivated they will be, especially the typically low performing ones.  Moreover, I have to ask myself, “If I can not tell them specifically what I am looking for in the assignment, why did I assign it?”

            At this definition stage of the process it can be effective on many levels to bring the students in on the rubric construction.  It can be as easy as asking the class, “What needs to be included in a quality _____?” or “What should a good _____ have in it?”  Student involvement in the process engenders a sense of ownership, which leads to a greater investment of care and effort.  The process also reinforces in each student’s mind a definition of the concept of a “quality performance.”  Having this advanced organizer introduced early in the performance construction process can be a substantial learning and motivational tool.

            The next consideration is the choice of format or type of scale design that should be used.  This is the step where too often we make casual choices that can keep us from achieving the soundest results.  A design poorly matched to the task can perform awkwardly at best and be counterproductive at worst.  If ultimately we need to attach a reliable, justifiable, informative grade to the performance we are assessing, then we must use some form of scale, either a checklist, holistic rubric, or a primary trait rubric.  It should be noted that not all performance tasks should be formally graded, but if they are they should be done so with a well-designed, sound scale.  As mentioned earlier, if we do not give the task a grade we make an implicit statement that it is less important learning than that which we do grade.



Three Scale Design Type Options;

Checklists, Holistic Rubrics and Primary Trait Rubrics



            The most simple and most easily interpreted scale design is that of a checklist. Checklists are best for performances that are defined by “did or didn’t,” or “there or not there” characteristics.  When the performance is defined by a series of procedural steps, or a set of concrete components that need to be included, and/or the list of possible behaviors is vast and can not easily be reduced to a theme, then a check list is usually the most appropriate scale choice.


Figure A: A Generic Checklist Structure

YES/did           NO/didn’t

_____              _____              task/component 1

_____              _____              task/component 2

_____              _____              task/component 3

_____              _____              etc...

_____              _____              total score


            Developing a checklist can be as straightforward as the figure above suggests, but they also can pose potential dilemmas. For instance, if the tasks and/or components are all listed uniformly and given the same value, then this may belie their relative or differential importance.  If certain tasks are more important to the overall quality of the performance, then assigning differential point values is necessary or it is necessary to use another type of scale.

Checklists are very popular due to their ease of construction, but they are limited. The fundamental limitation is that by nature they cannot contain items that infer grades of quality.  For example, if one of our desired outcomes were creativity, since creativity does not exist as an absolute and material reality, we could not make a judgment that any performance was either entirely creative or entirely not creative.  In that case, we would need to breakdown creativity into concrete, observable sub-components, drop it from our checklist, or use another type of scale design.



A holistic rubric functions to capture a complex performance and then express it in ascending grades of attainment.  This type of scale is best for assessing performances and products that require an interpretation of quality and represent a “whole performance,” which is essentially greater than the sum of its parts.


The 3 Structural Designs for Holistic Rubric Scales

Holistic rubrics assume that all qualities in a product or performance can be reduced to a single score.  That score reflects the level of the quality of that performance or product on an ascending scale – from lowest to highest.  Given these assumptions, the following 3 structures can be used in the design of a holistic rubric.


Figure 2A: Option A: Proportion of Desired Qualities


Level 5



All 5












The assessment scale outlines a number of essential qualities or traits that need to be in a “fully successful” product/performance.



Level 4


4 of 5





If all of the qualities are shown in the work, the performance is scored the highest possible.  If some are missing then the score reflects the number missing.  The score reflects qualities evident.


Level 3


3 of 5





Advantages: It is the most cut and dried format.  Easy to use and for students to understand.




Level 2


2 of 5





Disadvantages: It can’t discriminate between the quality of the various traits.  In most cases, you might as well use a checklist.



Level 1


1 of 5








Figure 2B: Option B: Good news bad news



level 4



All desired traits there,

None of the undesired


This assessment scale outlines all likely outcomes and then arranges them into a scale where the desired ones are more on top and the undesired ones are more on the bottom.  If a product/ performance had only positive traits then it receives the highest score. 


++++- -

level 3

Mostly desired traits, few undesired


If the performance possesses some of the prescribed negative traits then it gets a lower score.  The chart to the left depicts this – all good, mostly good, mostly bad, all bad – structure.


++ - - -

level 2

Some desired traits, many undesired



Advantages: It includes the traits that characterize unwanted components in the performance.

Disadvantages: Why include the unwanted aspects of a performance?  It can reinforce negative behaviors that you are trying to un-teach.


+ - - - - -

level 1

Mostly undesired traits





Figure 2C: Option C: Each level inclusive of the last


Level 5
















Level 4

Includes qualities from 1,2, and 3+


Assessment outlines all of the traits that are present in a quality performance and then places them in order of ascending importance. Only positive qualities of a “good” product are included

Level 3



qualities in levels 1 and 2+


Advantage: Gives students a good image of how to progress up the levels of quality.  Creates a psychological mindset for success (moving-up).

Level 2

Includes the qualities in L1+





Disadvantages: Problems with ordering the desired traits.  A “good  performance may violate a lower level requirement.  It does not include what you don’t want included.

Level 1

Evidence of minimal quality






A primary trait scale is best for performances, processes and products that have a complex series of traits and/or components.  If the definition of a “good performance” cannot be reduced to one holistic entity, then a scale that contains separate traits must be used.  The question here involves, “Is it possible for a student to do very well on one aspect of the performance and very poorly on another?”  If so, a holistic rubric may be technically impossible to construct and be deficient in providing the specificity of feedback that a separate trait scale is capable of.

            When constructing a primary trait rubric it is useful to think of each of the separate traits as its own holistic scale.  Therefore, any of the 3 types of thinking regarding holistic scales, (e.g., proportion of components, wanted/unwanted content, or ascending quality) could be applied to a particular trait.  Yet as always, soundness requires concrete, specific and as observable language, and a well-tailored design.

            A generic example of a primary trait scale is provided in figure 3 below.  In this example performance, the hypothetical designer chose organization, content, and presentation as the categorical “traits” judged to be the fundamental and essential areas that would thoroughly defined the qualities of a successful whole.


Figure 3: A Generic Structural Design for a Primary Trait Rubric





level 4

Concrete specific qualities defining all the desired aspects of a well-organized performance

Concrete specific qualities defining all the desired content necessary for an excellent performance

Concrete specific qualities defining all the desired aspects of a fully successful presentation.

level 3

Specific qualities that define a level below level 4 and greater than level 2.

Specific qualities that define a level below level 4 and greater than level 2.

Specific qualities that define a level below level 4 and greater than level 2.

level 2

Level 1 plus additional specific components or qualities.

Level 1 plus additional specific components or qualities.

Level 1 plus additional specific components or qualities.

level 1

Concrete specifics defining a minimum effort.

Concrete specifics defining a minimum effort.

Concrete specifics defining a minimum effort.

level 0





Given this design, there is flexibility in assigning relative importance to each trait. Separate traits can be given differing weights and thus point values.  For example, in the scale above, organization may be worth 4,3,2,1,0 points depending on the level achieved, whereas content, may be given the corresponding values of 8,6,4,2,0 respectively.  This weighting would make the statement to students that content, in this case, was twice as meaningful as organization.

            Each of the scale designs outlined above could be combined and/or modified to suit the occasion.  Yet, in most cases, our assessment needs lead us to some form of one of them.  It bears repeating that our task - the manner that manner that we feel students could best demonstrate their learning – should drive our choice.  This is why beginning with a clear idea of our objective and then sufficiently operationalizing the performance task are such critical steps in the process.  While for those new to rubric construction, this process may seem too complicated to attempt.  I guarantee it get easier each time you try it.  Many teachers find working with a partner can promote both confidence and soundness.


Use of Rubrics:

While the popularity of rubrics is primarily due to their ability to achieve a very fair and reliable means of measuring complex end product outcomes, they have the additional benefits that bear mentioning.  First, as stated earlier, they are as much teacher as test.  A soundly designed rubric is not only an accurate tool to assess the outcome, but it can guide and motivate the learner along the way.  Second, rubrics can be designed to assess process.   It could be said that only a rubric could do so.  Any system that attempts to make a judgement as to the quality of a learner’s effort, progress, incremental growth, affect, behavior, peer interactions, or developmental stage requires some form of rubric.  While there has always been a well justified apprehension to assessing process and behavior, these are powerful areas and assessed soundly can have powerful benefits.  When asked to list the outcome that we want our students to obtain from their education, the ones most critical for a good life, most of us mention quite a few that are in the domain of processes and behaviors.  Sure, operationalizing processes can be difficult.  However, if we are able to we can achieve benefits that no other educational process can.  What one finds when one assesses process and behavior (or anything for that matter), is that if you assess it, you get more and a better quality of it.  For example, teachers who have a sound system for assessing the quality of cooperation find they have more cooperative students.  The other benefit of assessing process is psychological and its fruits can often only be seen over time.  When we assess outcomes that are 100% within the control of students (i.e., choices, behavior, application to the process) it develops a cause and effect relationship in their minds between what they put into a task and how they are rewarded.  This is not generally true of traditional methods of assessment.  When a student begins to trust that relationship between effort and outcome the enhanced sense of internal locus of control is very motivating, and they develop the habit of being self-responsible for their learning.



            Assessment is such a critical factor in the instructional design process.  What we assess and attach a grade to in a very real and material way defines success in our classes.  Well-constructed performance assessment rubrics can provide the capacity to assess more meaningful and authentic outcomes.  If we develop sound rubrics and procedures for our assessment, we can profoundly affect student achievement.





Arter, J. (1993) Designing Scoring Rubrics for Performance Assessment: Getting to the Heart of the Matter.  Paper Presented at the Annual Meeting of the American Educational Research Association, Atlanta, GA, April 12-16.


Branch, M., Grafelman, B., Hurelbrink, K. (1998) Increasing Student Ownership and Responsibility through Collaborative Assessment Process.  Unpublished Report (ERIC Reproduction service number ED424284).


Crehan, K., Hudson, R. (1998) A Comparison of Two Scoring Strategies for Performance Assessment.  Paper presented at the National Council on Measurement in Education, San Diego. April 14-16.


Goodrich, H. (1997) Understanding Rubrics. Educational Leadership, 54 (4) pp. 14-17


Huffman, E. (1998) Authentic Rubrics.  Art Education, 51 (1) pp. 64-68.


Jensen, K. (1995)  Effective Rubric Design: Making the Most of this Powerful Tool. Science Teacher, 62 (5) pp. 72-75.


Myford, C. (1996) Constructing Scoring Rubrics: Using “Facets” to Study Design Features of Descriptive Rating Scales. Paper Presented at the Annual Meeting of the American Educational Research Association, New York, April 8-12.


Popham, W. (1997) What’s Wrong and What’s Right – With Rubrics?  Educational Leadership, 55 (3) pp. 72-75.


Taggart, G., ed.; Phifer, S. ed.; Nixon, J, ed.; Wood, M. ed. (1998)  Rubrics a Handbook for Construction and Use.  Technomics Publishing, Lancaster, PA.



Wiggins, G. (1999) Educative Assessment.  Designing Assessments To Inform and Improve Student Performance. Jossey Bass.  San Francisco.