Can we measure e-Learning 2.0 just like we measure e-Learning 1.0?
Because most of us have been doing a horrendous job in measuring our learning interventions, some of you may consider the question above a complete joke. Ignoring our common inadequacy, and focusing instead on current best-practices, the question is a valid one. For example, can we rely on a Kirkpatrick-like analysis? Will it be sufficient for us to capture learner reactions (Kirkpatrick Level 1), assess learning (Level 2), measure on-the-job performance (Level 3), and/or determine results (Level 4)? Let’s examine these one at a time.
Asking learners for their reactions
Asking learners for their reactions is always a double-edged sword. While it is possible to gather good information, the information can often be biased and untrustworthy. For example, there is evidence that learner reactions don’t correlate with learning results or on-the-job performance (Alliger, et al 1997). In other words, learners can give the learning high ratings and not learn anything.
Learners have a reputation for being overly optimistic about their ability to remember information. Doubly compounding this type of bias in e-Learning 2.0 interventions are cases where bad information may propagate. If learners can’t easily validate the information on their own — and we’ve discussed above how very difficult it is to verify some information — how will we be able to get accurate feedback from them?
Many Web 2.0 Websites enable visitors to rate products, services, or other people’s comments, and so on. E-Learning 2.0 technologies can offer similar opportunities to rate information. Such rating systems have the advantages and disadvantages inherent in getting other people’s opinions. Sometimes there is wisdom in crowds; sometimes there is mediocrity, a herd mentality, or some corruption of the process.
In traditional learning interventions, learner reactions can rise or drop for reasons that have little to do with learning outcomes. A motivating speaker can get high marks, even with bad content or poor learning support. Well-appointed conference rooms can engender higher smile-sheet ratings than old training rooms. E-Learning courses with irrelevant gaming or video can produce better learner reactions than e-Learning courses without such high production values. E-Learning 2.0 interventions will suffer from the same biases, but may also be open to new-technology bias (i.e., learners rate it higher just because it’s a snazzy new technology), at least initially.
Learners may sometimes automatically discount traditional learning interventions because they “come down from corporate.” It is also true, in comparison, that learners may inherently trust learning interventions because they assume someone vetted the interventions. E-Learning 2.0 interventions, on the other hand, may encourage learners to be better consumers of information. That is, they may perhaps encourage learners to be simultaneously more open to learning, and more skeptical about the veracity of the information. Such openness and skepticism will depend on many factors, including the culture of the organization, previous learning interventions, and experience with the new e-Learning 2.0 technologies.
The point here — from the standpoint of evaluation — is that learner reactions will depend on many factors. Moreover, we may want to ask the learners how much they trust the information, how open they are to the new ideas in the learning, and so on.
E-Learning 2.0 may be able to produce a different kind of learning milieu than traditional LMS-course-dominated paradigms. If we can realize the promise, e-Learning 2.0 may (a) enable our employees to take a more active role in their own learning, (b) change the dynamics of the learning enterprise to one that is more peer-to-peer and less top-down, and (c) be more aligned with young learners’ tendencies to be self-directed compilers of multi-sourced information.
These potentialities call out for the need to evaluate learners’ reactions. For e-Learning 2.0 to produce these learner-driven effects, the interventions will have to motivate and engage learners. We ought to aim our antennae toward their perceptions and usage patterns.
The bottom line on gathering learner reactions, is that we need to do it, but we can’t rely on it as the sole metric for evaluating e-Learning 2.0.
In the last few years, after reviewing learning research for almost 20 years, I realized that there were two fundamentally different learning approaches, (a) building understanding, and (b) supporting long-term retrieval. These overlap, so it may not be immediately obvious why we should make a distinction — but one of the most important reasons for the separation is in learning measurement. Sometimes we can teach people something so that they understand it, but can’t remember it later. For most learning situations, our learning designs should support both understanding and retrieval. Our learning evaluations should enable us to assess them both as well.
You will recall Table 1, in which I surmised that e-Learning 2.0 is not particularly good at helping support long-term retrieval (i.e., “remembering”) — at least in its current instantiations. This includes blogs, wikis, community discussion boards, etc. Thus, in evaluating e-Learning 2.0, we might expect to find that people often forget what they’ve learned after a time. It is certainly worth a test, especially as the evangelists are likely, in the early days of e-Learning 2.0, to see it as a global fix — one all-purpose cleaner, turning whites whiter, colors brighter, and grays more gray.
The whole emphasis of e-Learning 2.0 makes measuring learning difficult. In traditional learning interventions, we know in advance what we want learners to remember. We specify instructional objectives, and base our evaluations on them. In fact, I’ve often recommended to my clients that we create separate evaluation objectives, and measure those directly. But how, in an e-Learning 2.0 implementation, do we know what to measure?
Suppose Jane develops some content for a Wiki, but someone else overwrites it in two days. Do we test others on Jane’s information? Or, what if they never saw Jane’s contribution? Do we need to track who saw what? Or, do we only measure people on information that has met the test of time (i.e., on a wiki), or that receives high ratings, or that others link to (that is, on a blog)? What if a learner reads three different discussion board posts, all recommending different solutions? Or, reads four blog posts, three of which present bad information and one that provides good information? Does measuring learning require an authority to determine what is most valuable to measure? If so, is this extra resource worth it? Who gets to act as the authority?
Can we design e-Learning 2.0 systems to automatically trigger learning measurement? For example, the title of a blog post might be, “How to turn off automatic updates in Microsoft Vista.” A day after someone accesses the information, Microsoft could check to see if automatic updates were turned off. Scary — or powerful, I’m brainstorming here. Or, perhaps we could write the blog-post title to direct a question to the learner. “Hey, a week ago you accessed information on how to turn off automatic updates. Here are three choices. Which one is correct?” Or, perhaps e-Learning 2.0 authoring tools could provide a “learning-evaluation layer” that enables such future evaluations. These could be useful either for individual learners who want to remember or challenge themselves, or for program developers who want to see how well one of their e-Learning 2.0 interventions is supporting learning.
Is it just too complicated?
Is it just too complicated to measure learning for e-Learning 2.0 technologies? The paragraphs above certainly make the task seem daunting. There is good news, however. We just need to go back to the basics of measurement, and ask ourselves what we were hoping to accomplish in the first place.
In measuring learning, we can ask, “What do we want our learners to understand and remember? What decisions do we want them to be able to make?” If we don’t have answers to these questions in advance, we can create a list post-hoc (that is, after the fact). Or, we can make a decision to forgo learning measurement and measure on-the-job performance. It could be that our hope for our e-Learning 2.0 technology has nothing to do with remembering, in and of itself, and everything to do with immediate utilization of the learning message for on-the-job performance.
In addition, we might not be interested either in remembering or in on-the-job performance, in the short term. We might instead be interested in using e-Learning 2.0 technologies to spur future on-the-job learning. Kirkpatrick’s 4-level model does not cover this competence, so I’ll add it after covering all four levels of Kirkpatrick.
To reiterate, in thinking about measuring our e-Learning 2.0 interventions, first we have to decide what our intervention will support:
- Long-term retrieval
- Future on-the-job learning
- On-the-job performance
- Organizational results
Then we ought to devise a measurement regime to measure the outcomes we hope for — and the factors on the causal pathway to those outcomes.
In measuring understanding, we can ask learners questions immediately at the end of learning. Memorization questions are not a recommended choice in most instances, as they are a poor proxy for real world remembering (Shrock and Coscarelli, 2008). It is better to use scenario-based decision-making or simulations, if not real-world tests of performance. To measure long-term retrieval, we have to delay our tests of understanding for a period — usually a week or more is the preferred delay (Thalheimer, 2007). I will cover measuring future on-the-job learning later.
Measuring on-the-job performance
If I am correct that the current state of e-Learning 2.0 technology (when not blended to support e-Learning 1.0) is best for small chunks of information that learners can use immediately on-the-job, then measuring on-the-job performance (i.e., related to the specific learning chunk) seems like the ideal evaluation tool with stand-alone e-Learning 2.0.
Where blended e-Learning 2.0 interventions support an e-Learning 1.0 design, measuring on-the-job performance is also beneficial. However, measurement of learning and retrieval should augment on-the-job performance measurement, to capture the full causal pathway from learning to performance. People can’t use what they don’t understand or can’t remember, so it’s important to measure both understanding and remembering, in addition to on-the-job performance. In that way, you can trace any performance problems that arise back to understanding or remembering, to see if the learner achieved these prerequisites. If not, program redesign can address enabling these. If learners are able to understand and remember, then you can find any failure to implement by looking in the workplace environment.
Figure 2 outlines the causal flow of how learning produces workplace performance and results. Those in the education field not focused on producing future on-the-job performance can substitute any future performance situation for the “on-the-job” situation depicted. The diagram illustrates how some learning-landscape outcomes are prerequisite to other outcomes.
- Learning events must generate understanding.
- Understanding must translate into remembering, or future on-the-job learning.
- Learning-driven application requires either remembering or on-the-job learning.
- Results, whether they be organizational results (like increased sales) or learner results (like more efficacy or a higher salary), arise largely from on-the-job application.
Note that Figure 2 is a little oversimplified, for example (a) learners may get some of their results without applying what they know (such as by increasing their confidence), (b) applying learning produces more learning as people get real-world feedback, and (c) learning events need not be “formal learning” events, the diagram may suggest. Just remember, as I said before, that the diagram is meant to highlight the fact that some outcomes are prerequisites to other outcomes. This is a critical concept in evaluation design.
In measuring traditional Learning 1.0, we measure the targeted work behavior. If we train our learners to do performance reviews, we see how good they are at doing on-the-job performance reviews (Wick, Pollock, Jefferson, & Flanagan, 2006). Where we target e-Learning 2.0 interventions toward specific behaviors, we can measure using traditional methods. Where the intervention did not target performance behaviors in advance, we might identify post-hoc performances, though such post-hoc analysis is open to bias.
For example, suppose we create an e-Learning 2.0 system for use by high-potential assistant managers. We find that 95% of them use the system, and join a discussion to brainstorm ways to improve the way they delegate tasks to the people they work with. A robust 75% of their direct reports say that they prefer the new delegation methods and feel more productive. In 80% of the assistant-manager work shifts, performance has improved. This seems like a great result, but what if a comparison group utilized a different technique and got even better results — and they spent 50% less time “learning” the technique? The point is that on-the-job performance results require some comparison.
Comparing results to randomly assigned control groups is ideal, but often logistically difficult. Comparing results post-intervention to pre-intervention (or over time as the intervention is used) provides another comparison method. Combining these strategies, by comparing a group of people who use our e-Learning 2.0 interventions now, to those who use them 6 months from now, is also an alternative. I highly recommend you call in a measurement professional to help you.
Measuring results (including ROI)
One could write a whole book on measuring organizational results, but in the interest of time, I’m going to keep this short and sweet. If metrics are available, we can look at things like sales results, customer service ratings, secret shopper ratings, turnover, accidents, law suits, employee survey results, multi-rater feedback, manufacturing defect rates, and ROI.
Because many factors — not just our learning and development efforts — influence these results, determining the effect of our e-Learning 2.0 interventions requires the use of randomly assigned learners and statistical techniques. These procedures isolate the impact of our intervention from the impact of other organizational influences (that is, such things as the economy, other initiatives, morale, and so on). Again, unless you know what you’re doing, you ought to call in a measurement expert.
Finally, it is beneficial, if not a full-fledged moral imperative, to look at the results for our learners as well. Does the learning benefit them? Are they more competent? Do they feel more empowered? Do their reputations improve? Do they receive promotions more quickly? Are they able to thrive in the job market better? We might focus on lots of angles. Again, because learner engagement is likely to be a prime driver of e-Learning 2.0 outcomes, examining learner results may be especially important.
Interestingly, when we give “learner results” a place in our models (and I’ve included learner outcomes in my learning landscape models for several years now), we come back to learner reaction data, at least in part. Might our learners have a pretty good sense of whether the learning interventions are benefiting them? Wow. What would Donald Kirkpatrick say if he knew I was mashing Level 1 and Level 4 together just a bit?
Actually, Don is a very thoughtful fellow. He’d probably just say, “Hey, a result is a result. Whether it’s learner outcomes or ROI, it fits in Level 4. There’s no reason we can’t use reaction sheets for both Level 1 and Level 4 data gathering.” At least that’s what I’m guessing he’d say.