A conundrum in the learning and development field: the amount of talk about evaluation of training vastly outweighs actual efforts to conduct evaluation. This is due to a number of factors, among them difficulty in establishing useful measures, challenges in collecting data, and the reality that the trainers, designers, and training department, moving on to the next thing, may just not get it done. This month, let’s look at several approaches to carrying out evaluation of training, including evaluation of e Learning.
In 1959, Donald Kirkpatrick published a taxonomy (not a model, not a theory) of criteria for evaluating instruction that is widely regarded as the standard for evaluating training. Most often referred to as “Levels,” the Kirkpatrick taxonomy classifies types of evaluation as:
Type (Level) 1: Learner satisfaction;
Type (Level) 2: Learner demonstration of understanding;
Type (Level) 3: Learner demonstration of skills or behaviors on the job; and
Type (Level) 4: Impact of those new behaviors or skills on the job.
Jack Phillips later added a fifth level, Return on Investment (ROI) of training, purporting to offer calculations for demonstrating cost effectiveness of the training intervention.
Those who’ve attempted to employ the taxonomy have no doubt noticed some challenges in using it. For one, it invites evaluation after the fact, focusing on terminal outcomes while gathering little data that will inform training program improvement efforts. (Discovering after training that customer service complaints have not decreased only tells us that the customer service training program didn’t “work”; it tells us little about how to improve it.)
The linearity and causality implied within the taxonomy (for instance, the assumption that passing a test at Level 2 will result in improved job performance at Level 3) masks the reality of transferring training into measurable results. Many factors enable — or hinder — the transfer of training to on-the-job behavior change, including support from supervisors, rewards for improved performance, culture of the work unit, issues with procedures and paperwork, and political concerns. Learners work within a system, and the Kirkpatrick taxonomy essentially attempts to isolate training efforts from the systems, context, and culture in which the worker operates. Brinkerhoff, discussed below, describes this as evaluating the wedding rather than the marriage.
It’s easy to understand why the Kirkpatrick taxonomy is appealing, and at face value appears straightforward to employ. The reality? Beyond checking for learner understanding, measurement becomes much harder. The truth? Despite all the talk about training evaluation and ROI, hardly anyone does anything beyond Level 2, and almost no one does anything at Levels 4 or 5.
To be fair, Kirkpatrick himself has pointed out some of the problems with the taxonomy, and suggested that in seeking to apply it, the training field has perhaps put the cart before the horse. He advises working backwards through his four levels more as a design, rather than an evaluation, strategy. That is — what business results are you after. What on-the-job behavior/performance change will this require? How can we be confident that learners, sent back to the work site, are equipped to perform as desired? And finally, how can we deliver the instruction in a way that is appealing and engaging?
Robert Brinkerhoff takes a systems view of evaluation of training, believing it should focus on sustained performance rather than attempting to isolate the training effort: “Achieving performance results from training is a whole-organization challenge. It cannot be accomplished by the training function alone. … Virtually all evaluation models are construed conceptually as if training were the object of the evaluation.” (Brinkerhoff, http://aetcnec.ucsf.edu/evaluation/Brinkerhoff.impactassess1.pdf, p. 87.
Brinkerhoff’s “Success Case Method” (SCM) helps to identify both positive results and the organizational factors that supported or hindered the training effort. Unlike the activities around Kirkpatrick’s levels, the Brinkerhoff SCM helps to tell us how to “fix” training that may not be as effective as hoped. At the risk of oversimplifying his approach, he suggests we can learn best from the outliers, those who have been most and least successful in applying new learning to work. The method asks evaluators to:
Identify individuals or teams that have been most successful in using some new capability or method provided through the training;
Document the nature of the success; and
Compare to instances of nonsuccess.
It’s important to note that this is not just a compilation of success stories or anecdotes, but data that you can confirm via documented results and feedback from other stakeholders. The Brinkerhoff approach can be especially useful in teasing out data related to “soft skills” programs like communication skills.
Like Brinkerhoff, Stufflebeam focuses less on proving past performance and more on improving future efforts. He offers a framework for evaluation that views training as part of a system. Evaluators can employ the Stufflebeam model even with programs in progress, thus serving as a means of formative as well as summative evaluation.
Context: Why create the program? Who was it intended to serve? What is the history? Who are the Champions? Who are the Detractors? What are the political issues?
Inputs: Are funds, staffing, and other resources adequate to support the desired outcomes?
Process: What organizational or cultural factors are supporting or hindering delivery of the training? What feedback or support are trainers and designers given? Is the program implemented and offered as you intended and designed?
Product: Are you achieving the outcomes? Are those outcomes congruent with the stated needs? Are enrollments and completions in line with targets? What happens once the learner returns to the worksite? Did the product serve the intended people? What do stakeholders, including the learners, say about the outcomes? Compare outcomes data back to needs assessment data.
Effectiveness: What is the outcome? Do learners receive promotions because of improved performance following training? Are error rates or accidents down? Are sales up? Who are the beneficiaries?
Sustainability: What is the program’s place in the organization’s mission — how deeply is it embedded in operations? Which program successes can and should we sustain?
Transportability: Can we replicate the program? Can other work units or organizations replicated it? Can it serve as a model for others or for future programs?
Recap of evaluation
Most instructional designers and training practitioners agree: We spend far more time on other phases of design and delivery than we do on evaluation. We often treat evaluation as an afterthought, focus on measures that offer little real information, or, when the effort looks difficult, just don’t do evaluation at all. In looking at evaluation strategies, choose those that will get you what you need. Are you trying to prove results, or drive improvement? And above all, remember: some evaluation is better than none. Choose what you’re really likely to use.
Kirkpatrick (overview) http://www.e-learningguru.com/articles/art2_8.htm
Kirkpatrick (overview-pdf file) http://www.stfrancis.edu/content/assessment/Kirkpatrick_1.pdf
Brinkerhoff model/diagrams http://www.kenblanchard.com/img/pub/newsletter_brinkerhoff.pdf
Stufflebeam: Printable checklist: http://www.wmich.edu/evalctr/archive_checklists/cippchecklist_mar07.pdf
Application of the Stufflebeam model at http://www.wmich.edu/evalctr/archive_checklists/cippchecklist.htm
Stufflebeam, D. (2007) Evaluation theory, models, & application. San Francisco: Jossey-Bass. Available from Amazon (and elsewhere) http://www.amazon.com/Evaluation-Applications-Research-Methods-Sciences/dp/0787977659/ref=sr_1_1?s=gateway&ie=UTF8&qid=1285512029&sr=8-1
The intent of my monthly “Nuts and Bolts” column is to help out those who may have found their way to training, e-Learning, and instructional design through less-than-formal means. The idea is to provide, in about a thousand words, an introduction (not necessarily an endorsement) to something that might be new for that reader: an overview of a learning theory, a suggestion about an approach to instructional design, a do-this-not-that example. Those wanting more in-depth information are encouraged to check the links to supplemental reading I offer in the “Want More” section at the end of each column. Also — I’m always up for suggestions about future columns. Think “Nuts and Bolts.”