Your Source for Learning
Technology, Strategy, and News
ARTICLES       RSS feed RSS feed

Evaluating E-Learning 2.0: Getting Our Heads Around the Complexity

by Will Thalheimer

August 18, 2008


by Will Thalheimer

August 18, 2008

E-Learning 2.0 technology offers great promise, but only those who are getting the quickest, most robust feedback will be able to maximize that promise. It takes good evaluation design to produce that sort of feedback.

E-Learning 2.0 interventions are coming to a workplace near you. Maybe they surround you already. Whether you’re an expert or a virgin, it’s time to think about evaluation.

Whether you think that e-Learning 2.0 is a force for good or evil, sometime in the near future you will likely have a responsibility to determine the effectiveness of e-Learning 2.0 interventions. (See Sidebar 1 for a definition of “e-Learning 2.0.”)

Sidebar 1 Some handy definitions

E-Learning 2.0: The idea of learning through digital connections and peer collaboration, enhanced by technologies driving Web 2.0. Users/Learners are empowered to search, create, and collaborate, in order to fulfill intrinsic needs to learn new information.
Vetting: an investigative process of examination, fact-checking, and evaluation.

Web 2.0: The stage of the World Wide Web where the Internet has become a platform for users to create, upload, and share content with others, versus simply downloading content.

By evaluating our results, we can refine and improve what we’re doing, or discard what’s not working. We can also give a coherent answer when someone in management (or our clients) asks us to prove that this e-Learning 2.0 stuff works. E-Learning 2.0 technology offers great promise, but only those who are getting the quickest, most robust feedback will be able to maximize that promise. It takes good evaluation design to produce that sort of feedback.

In last year’s Guild report, Measuring Success (Wexler, Schlenker, and others, 2007), I outlined 18 reasons (see Sidebar 2) that we might measure learning. These included giving learners grades or credentials, helping learners learn, comparing one learning intervention to another, and so on. (Please see the References at the end of this article for all citations.)

For the purpose of this article, I’m only going to focus on enabling you to:

  1. Determine what level of benefit (or harm) your e-Learning 2.0 interventions produce.
  2. Use evaluation results to improve your e-Learning 2.0 interventions.
  3. Compare the effectiveness of your e-Learning 2.0 intervention to some other learning intervention.

Sidebar 2 Why Do We Measure Learning? Eighteen Reasons

(From The eLearning Guild Measuring Success Report (2007, pp. 118-119).
To support the learners in learning and performance
1. To encourage learners to study
2. To give learners feedback on their learning progress.
3. To help learners better understand the concepts being taught, by giving them tests of understanding and follow-up feedback.
4. To provide learners with additional retrieval practice (to support long-term retrieval).
5. To give successful assessment-takers a sense of accomplishment, a sense of being special, and/or a feeling of being in a privileged group.
6. To increase the likelihood that the learning is implemented
To support certification, credentialing, or compliance
7. To assign learners with grades, or give them a passing score.
8. To enable learners to earn credentials.
9. To document legal or regulatory compliance.
To provide learning professionals (i.e., instructors/developers) with information
10. To provide instructors with feedback on learning.
11. To provide instructional designers/developers with feedback.
12. To diagnose future learning needs.
To provide additional information
13. To provide learners’ managers’ with feedback and information.
14. To provide other organizational stakeholders with information.
15. To examine the organizational impacts of learning.
16. To compare one learning intervention to an alternative one.
17. To calculate return-on-investment of the learning program.
18. To collect data to sell or market the learning program.


This article will NOT cover how to decide whether to implement e-Learning 2.0 strategies in your organization. Rather, this article should help you think through the many issues and complexities involved in evaluating e-Learning 2.0 interventions. At the end, I outline a short list of the most critical things we should be doing as we evaluate e-Learning 2.0. As you will see, getting started with a few simple imperatives may be the best strategy.

Beware of seduction

In thinking about how to evaluate e-Learning 2.0, the first thing to remember is that EVERY new learning technology brings with it hordes of booming evangelists rhapsodizing utopian visions. These visions may or may not be true or realistic, yet they may seduce us beyond all rationality or evidence. Programmed instruction, 16mm movies, and filmstrips were the first such seductive technologies. (Many Learning Solutions readers are not old enough to remember them, of course.) Later, radio, television, and computer-based training were magical technologies that many believed would completely transform the learning landscape.

E-Learning 2.0 is no different. Some tout it as the key to unlocking the unlimited promise of informal learning. Some supporters present it as a way to democratize organizations. Others promote it as a way to empower employees to help each other rise to their fullest potential. Because these visions can be so enticing, we have to make an extra effort to be objective. We have to shield ourselves from temptation by investing in evaluation — and in doing evaluation right.

Grassroots content development

E-Learning 2.0 differs from most traditional learning methodologies in allowing — even encouraging — everybody to contribute in creating learning messages. The term “learning messages” refers to the learning points that a learning event conveys within them. I prefer this term to “content” or “learning materials.” The reality is that learning only occurs when the learning materials convey learning messages, and learners attend to and receive the learning messages. Too many of us design instruction as if the creation of learning materials guarantees that learning will take place.

Traditionally, a central authority created learning messages. Experts vetted the messages before learners saw them. In the workplace, the training department typically created learning messages, and management, legal, and subject-matter experts vetted them. Only then were they ready for presentation to employees. In education, the writers compiled learning messages from textbooks and journal articles, and from individual experts, including professors, teachers, and curriculum specialists. Whether training or education, everyone assumed that someone had vetted the learning messages to validate them for learners.

E-Learning 2.0 offers a different model, enabling “grass roots” creation of learning messages. Experts and people closest to the issue may create such messages. However, an authoritative editorial function does not necessarily vet the messages. Individuals at the grassroots level can create information in e-Learning 2.0, and vet it prior to release, or others at the grassroots level may vet the information after the fact. Finally, institutional agents monitoring the material may check the information, instead of, or in addition to, grassroots verification. Recent data from Guild Research (August 2008) illustrates the various ways companies deal with user-generated content (see Figure 1):


Figure 1 Policies for dealing with user-generated content vary widely across organizations of all sizes.

There are two sets of employees involved in e-Learning 2.0. There are those who learn from the content (“learners”) and those who create the content (“creators”). Because of this, we need to evaluate e-Learning 2.0’s effects on both groups of people. Of course, one person can play both roles, depending on the issue that’s in play.

Because e-Learning 2.0 produces learning messages that arise from non-vetted sources, one aspect of evaluation that may appear to differ from traditional evaluation involves assessing the truth or completeness of the learning messages. Of course, far too many of us assume that traditional training and education courses provide good information. For example, many of us in the United States learned of our first President’s legendary honesty in a story that told of him chopping down a cherry tree, and then telling the truth about it. The story is almost certainly a fabrication, because cherry trees did not grow in the area near his family’s farm (Wilford, 2008). The bottom line is that content matters for both e-Learning 1.0 and e-Learning 2.0.

It is easy to verify some information, and it is difficult to verify other information. For example, if I learn from a Microsoft PowerPoint users group how to do something in PowerPoint, I can test out the solution rather quickly. I can verify for myself how well that information solved my problem. I may not be able to tell whether a better approach exists. I may not be able to say whether the author could have conveyed the same approach in a better manner. But, I can, at least, verify that the information is generally good.

On the other hand, suppose I go to a blog to read about leadership techniques. One blog entry tells me that as a leader I should encourage my team to push for innovation and change. Over a month or two I try several recommended techniques, and my team appears to be coming up with more ideas. At the same time, my team uses a lot of time deciding which ideas are best, my boss doesn’t like a lot of the ideas, and my team morale seems to be plummeting. It is hard for me to verify the benefits of implementing the blog-post ideas, because it seems to have an effect on so many factors. Also, I’ve long forgotten which blog I got the idea from, so I have no way of providing feedback.

To complicate things more, our focus tends to be on intentional learning. Verifying learning is even harder when we’re learning without intention or conscious effort. For example, a blog post might say something like,

“I read about this new technique on Stephanie’s blog. We ought to incorporate her idea starting at the senior management level. Here’s the idea…blah, blah….If only we used this, I think people would start getting fired up again.”

The main learning point of the blog post is about the new technique (i.e., in the “blah, blah” above), but we might also learn some other things from this blog post. They include: (a) Stephanie’s blog is a trusted go-to source, (b) our senior management isn’t performing well enough, (c) we are a company with a morale or productivity problem, and (d) we are a company in trouble. Because readers will process these learnings with little conscious effort, they are even less likely than they would be in the case of consciously considered content to read them with a critical eye. In other words, learners won’t even know that they might want to verify these nuggets. They’ll just accept them.

Learning supports

As far as I can tell, most e-Learning 2.0 technologies present information to learners with only the thinnest facilitating learning support, if any. Learners do not receive support in the form of intentional repetitions, worked examples, retrieval practice, tests for understanding, intentional spacing, or augmenting visuals. There is, though, one key difference between e-Learning 1.0 and e-Learning 2.0 content creations today.

E-Learning 1.0 content tends to come from people who have at least some expertise in learning design and presentation, and a lot of learner-empathy. E-Learning 2.0 content creation may have an advantage in being created by peers. However, it may not provide all the learning supports that would help learners (a) understand the content, (b) remember the content, and (c) apply the content to their jobs.

Given the current state of e-Learning 2.0 technologies, Table 1 summarizes my view of the best fit for e-Learning 1.0 and e-Learning 2.0 technologies. As you can see, where learners need extra supports (e.g., to spur long-term remembering and/or implementation), e-Learning 2.0, as it is currently deployed, may not provide the best fit.


Table 1 What are the “best fits” for e-Learning 1.0 and for e-Learning



Support for Remembering and Implementation



Support Not Critical

Support Critical

Information that Needs to be Learned:

Small Chunks of Information

e-Learning 2.0

e-Learning 1.0

Complex System of Information

e-Learning 1.0

e-Learning 1.0


Let me offer two caveats to this depiction. First, experts in a domain may not need learning supports for remembering as much as novices do. Experts are likely to have a rich web of knowledge structures in place that enables them to integrate and remember information better than novices. Novices have no such knowledge structures (or inadequate structures) in which to integrate the new information. Second, if people use an e-Learning 2.0 system extensively on a particular topic, the spaced repetitions and retrieval practice (when generating content) can be so powerful that the effect will mimic the benefits of a well-designed e-Learning 1.0 intervention.

When remembering or implementation is critical, e-Learning 1.0 (if well designed) seems a better choice. Most current e-Learning 2.0 interactions don’t support remembering or implementation. Also, given that e-Learning 2.0 technologies are not typically set up to consider sequencing of learning material, e-Learning 1.0 methods seem best when conveying lots of information or complicated topics.

In the areas in which e-Learning 1.0 can provide better learning support, it might not be fair to compare our e-Learning 2.0 interventions to well-designed e-Learning 1.0 interventions. On the other hand, if we are using e-Learning 2.0 technologies to replace Learning 1.0 technologies, comparing results seems desirable.

Designers can use e-Learning 2.0 on its own — not as a replacement for Learning 1.0, but as a separate tool to improve learning and performance. In these cases, we don’t use evaluation in comparison to e-Learning 1.0 technology. We compare it to the default situation without the e-Learning 2.0 technology.

Of course, the distinctions I’ve drawn are too pure. We can certainly use an e-Learning 2.0 intervention to support an e-Learning 1.0 effort (a blended approach). For example, a trainer might add blogging as a requirement for a course on merchandising techniques. When blending e-Learning 2.0 into an e-Learning 1.0 intervention, it makes sense to determine whether adding the e-Learning 2.0 methodology supports the goals of the course. In other words, when e-Learning 2.0 augments e-Learning 1.0, our highest priority must be to verify the intended e-Learning 1.0 outcomes.

We can’t focus solely on these e-Learning 1.0 outcomes however. We also have to analyze e-Learning 2.0 methodologies separately to determine their effects, both positive and negative. On the positive side, an e-Learning 2.0 technology such as a wiki may enable our learners to do a better job of learning on their own about merchandising after the course is over. If we only followed traditional measurement practices, we might never think to measure our learners’ ability to learn on-the-job after the formal training is over. On the negative side, we need to evaluate e-Learning 2.0 separately to determine if it has hurt learning or utilized too many valuable resources.

First do no harm

Because doctors work in situations of uncertainty, they take an oath to “First Do No Harm.” We ought to do the same, especially when it comes to new learning technologies. The first question we should ask in evaluating e-Learning 2.0 is whether it is in fact doing any harm.

“Harm?” you might ask incredulously. How can learning be harmful? Learning can be harmful in a number of ways. Here is a short list:

  1. Learners can learn bad information.
  2. Learners can spend time learning low-priority information.
  3. Learners can learn the right information but learn it inadequately.
  4. Learners can learn the right information but learn it inefficiently.
  5. Learners can learn at the wrong time, hurting their on-the-job performance.
  6.  Learners can learn good information that interferes with other good information.
  7. Learners utilize productive time in learning. Learners can waste time learning.
  8. Learners can learn something, but forget it before it is useful.
  9. Previous inappropriate learning can harm learners’ on-the-job learning.
  10. Content creators may utilize productive time to create learning messages.
  11. Content creators may reinforce their own incorrect understandings.
  12. And so on.

Wow. “That’s a long list,” you might be thinking. Being practical about evaluation, we probably don’t want to separately examine each of these potential repercussions. Fortunately, we can boil the list down to two essential points. We need to recognize that people may (a) develop inadequate knowledge and skills because of our e-Learning 2.0 interventions, and (b) waste time as a learner or creator in the e-Learning 2.0 enterprise. We ought to evaluate these possibilities where possible.


Can we measure e-Learning 2.0 just like we measure e-Learning 1.0?

Because most of us have been doing a horrendous job in measuring our learning interventions, some of you may consider the question above a complete joke. Ignoring our common inadequacy, and focusing instead on current best-practices, the question is a valid one. For example, can we rely on a Kirkpatrick-like analysis? Will it be sufficient for us to capture learner reactions (Kirkpatrick Level 1), assess learning (Level 2), measure on-the-job performance (Level 3), and/or determine results (Level 4)? Let’s examine these one at a time.

Asking learners for their reactions

Asking learners for their reactions is always a double-edged sword. While it is possible to gather good information, the information can often be biased and untrustworthy. For example, there is evidence that learner reactions don’t correlate with learning results or on-the-job performance (Alliger, et al 1997). In other words, learners can give the learning high ratings and not learn anything.

Learners have a reputation for being overly optimistic about their ability to remember information. Doubly compounding this type of bias in e-Learning 2.0 interventions are cases where bad information may propagate. If learners can’t easily validate the information on their own — and we’ve discussed above how very difficult it is to verify some information — how will we be able to get accurate feedback from them?

Many Web 2.0 Websites enable visitors to rate products, services, or other people’s comments, and so on. E-Learning 2.0 technologies can offer similar opportunities to rate information. Such rating systems have the advantages and disadvantages inherent in getting other people’s opinions. Sometimes there is wisdom in crowds; sometimes there is mediocrity, a herd mentality, or some corruption of the process.

In traditional learning interventions, learner reactions can rise or drop for reasons that have little to do with learning outcomes. A motivating speaker can get high marks, even with bad content or poor learning support. Well-appointed conference rooms can engender higher smile-sheet ratings than old training rooms. E-Learning courses with irrelevant gaming or video can produce better learner reactions than e-Learning courses without such high production values. E-Learning 2.0 interventions will suffer from the same biases, but may also be open to new-technology bias (i.e., learners rate it higher just because it’s a snazzy new technology), at least initially.

Learners may sometimes automatically discount traditional learning interventions because they “come down from corporate.” It is also true, in comparison, that learners may inherently trust learning interventions because they assume someone vetted the interventions. E-Learning 2.0 interventions, on the other hand, may encourage learners to be better consumers of information. That is, they may perhaps encourage learners to be simultaneously more open to learning, and more skeptical about the veracity of the information. Such openness and skepticism will depend on many factors, including the culture of the organization, previous learning interventions, and experience with the new e-Learning 2.0 technologies.

The point here — from the standpoint of evaluation — is that learner reactions will depend on many factors. Moreover, we may want to ask the learners how much they trust the information, how open they are to the new ideas in the learning, and so on.

E-Learning 2.0 may be able to produce a different kind of learning milieu than traditional LMS-course-dominated paradigms. If we can realize the promise, e-Learning 2.0 may (a) enable our employees to take a more active role in their own learning, (b) change the dynamics of the learning enterprise to one that is more peer-to-peer and less top-down, and (c) be more aligned with young learners’ tendencies to be self-directed compilers of multi-sourced information.

These potentialities call out for the need to evaluate learners’ reactions. For e-Learning 2.0 to produce these learner-driven effects, the interventions will have to motivate and engage learners. We ought to aim our antennae toward their perceptions and usage patterns.

The bottom line on gathering learner reactions, is that we need to do it, but we can’t rely on it as the sole metric for evaluating e-Learning 2.0.

Measuring learning

In the last few years, after reviewing learning research for almost 20 years, I realized that there were two fundamentally different learning approaches, (a) building understanding, and (b) supporting long-term retrieval. These overlap, so it may not be immediately obvious why we should make a distinction — but one of the most important reasons for the separation is in learning measurement. Sometimes we can teach people something so that they understand it, but can’t remember it later. For most learning situations, our learning designs should support both understanding and retrieval. Our learning evaluations should enable us to assess them both as well.

You will recall Table 1, in which I surmised that e-Learning 2.0 is not particularly good at helping support long-term retrieval (i.e., “remembering”) — at least in its current instantiations. This includes blogs, wikis, community discussion boards, etc. Thus, in evaluating e-Learning 2.0, we might expect to find that people often forget what they’ve learned after a time. It is certainly worth a test, especially as the evangelists are likely, in the early days of e-Learning 2.0, to see it as a global fix — one all-purpose cleaner, turning whites whiter, colors brighter, and grays more gray.

The whole emphasis of e-Learning 2.0 makes measuring learning difficult. In traditional learning interventions, we know in advance what we want learners to remember. We specify instructional objectives, and base our evaluations on them. In fact, I’ve often recommended to my clients that we create separate evaluation objectives, and measure those directly. But how, in an e-Learning 2.0 implementation, do we know what to measure?

Suppose Jane develops some content for a Wiki, but someone else overwrites it in two days. Do we test others on Jane’s information? Or, what if they never saw Jane’s contribution? Do we need to track who saw what? Or, do we only measure people on information that has met the test of time (i.e., on a wiki), or that receives high ratings, or that others link to (that is, on a blog)? What if a learner reads three different discussion board posts, all recommending different solutions? Or, reads four blog posts, three of which present bad information and one that provides good information? Does measuring learning require an authority to determine what is most valuable to measure? If so, is this extra resource worth it? Who gets to act as the authority?

Can we design e-Learning 2.0 systems to automatically trigger learning measurement? For example, the title of a blog post might be, “How to turn off automatic updates in Microsoft Vista.” A day after someone accesses the information, Microsoft could check to see if automatic updates were turned off. Scary — or powerful, I’m brainstorming here. Or, perhaps we could write the blog-post title to direct a question to the learner. “Hey, a week ago you accessed information on how to turn off automatic updates. Here are three choices. Which one is correct?” Or, perhaps e-Learning 2.0 authoring tools could provide a “learning-evaluation layer” that enables such future evaluations. These could be useful either for individual learners who want to remember or challenge themselves, or for program developers who want to see how well one of their e-Learning 2.0 interventions is supporting learning.

Is it just too complicated?

Is it just too complicated to measure learning for e-Learning 2.0 technologies? The paragraphs above certainly make the task seem daunting. There is good news, however. We just need to go back to the basics of measurement, and ask ourselves what we were hoping to accomplish in the first place.

In measuring learning, we can ask, “What do we want our learners to understand and remember? What decisions do we want them to be able to make?” If we don’t have answers to these questions in advance, we can create a list post-hoc (that is, after the fact). Or, we can make a decision to forgo learning measurement and measure on-the-job performance. It could be that our hope for our e-Learning 2.0 technology has nothing to do with remembering, in and of itself, and everything to do with immediate utilization of the learning message for on-the-job performance.

In addition, we might not be interested either in remembering or in on-the-job performance, in the short term. We might instead be interested in using e-Learning 2.0 technologies to spur future on-the-job learning. Kirkpatrick’s 4-level model does not cover this competence, so I’ll add it after covering all four levels of Kirkpatrick.

To reiterate, in thinking about measuring our e-Learning 2.0 interventions, first we have to decide what our intervention will support:

  1. Understanding
  2. Long-term retrieval
  3. Future on-the-job learning
  4. On-the-job performance
  5. Organizational results

Then we ought to devise a measurement regime to measure the outcomes we hope for — and the factors on the causal pathway to those outcomes.

In measuring understanding, we can ask learners questions immediately at the end of learning. Memorization questions are not a recommended choice in most instances, as they are a poor proxy for real world remembering (Shrock and Coscarelli, 2008). It is better to use scenario-based decision-making or simulations, if not real-world tests of performance. To measure long-term retrieval, we have to delay our tests of understanding for a period — usually a week or more is the preferred delay (Thalheimer, 2007). I will cover measuring future on-the-job learning later.

Measuring on-the-job performance

If I am correct that the current state of e-Learning 2.0 technology (when not blended to support e-Learning 1.0) is best for small chunks of information that learners can use immediately on-the-job, then measuring on-the-job performance (i.e., related to the specific learning chunk) seems like the ideal evaluation tool with stand-alone e-Learning 2.0.

Where blended e-Learning 2.0 interventions support an e-Learning 1.0 design, measuring on-the-job performance is also beneficial. However, measurement of learning and retrieval should augment on-the-job performance measurement, to capture the full causal pathway from learning to performance. People can’t use what they don’t understand or can’t remember, so it’s important to measure both understanding and remembering, in addition to on-the-job performance. In that way, you can trace any performance problems that arise back to understanding or remembering, to see if the learner achieved these prerequisites. If not, program redesign can address enabling these. If learners are able to understand and remember, then you can find any failure to implement by looking in the workplace environment.


Figure 2 Causal flow from learning to results


Figure 2 outlines the causal flow of how learning produces workplace performance and results. Those in the education field not focused on producing future on-the-job performance can substitute any future performance situation for the “on-the-job” situation depicted. The diagram illustrates how some learning-landscape outcomes are prerequisite to other outcomes.

  1. Learning events must generate understanding.
  2. Understanding must translate into remembering, or future on-the-job learning.
  3. Learning-driven application requires either remembering or on-the-job learning.
  4. Results, whether they be organizational results (like increased sales) or learner results (like more efficacy or a higher salary), arise largely from on-the-job application.

Note that Figure 2 is a little oversimplified, for example (a) learners may get some of their results without applying what they know (such as by increasing their confidence), (b) applying learning produces more learning as people get real-world feedback, and (c) learning events need not be “formal learning” events, the diagram may suggest. Just remember, as I said before, that the diagram is meant to highlight the fact that some outcomes are prerequisites to other outcomes. This is a critical concept in evaluation design.

In measuring traditional Learning 1.0, we measure the targeted work behavior. If we train our learners to do performance reviews, we see how good they are at doing on-the-job performance reviews (Wick, Pollock, Jefferson, & Flanagan, 2006). Where we target e-Learning 2.0 interventions toward specific behaviors, we can measure using traditional methods. Where the intervention did not target performance behaviors in advance, we might identify post-hoc performances, though such post-hoc analysis is open to bias.

For example, suppose we create an e-Learning 2.0 system for use by high-potential assistant managers. We find that 95% of them use the system, and join a discussion to brainstorm ways to improve the way they delegate tasks to the people they work with. A robust 75% of their direct reports say that they prefer the new delegation methods and feel more productive. In 80% of the assistant-manager work shifts, performance has improved. This seems like a great result, but what if a comparison group utilized a different technique and got even better results — and they spent 50% less time “learning” the technique? The point is that on-the-job performance results require some comparison.

Comparing results to randomly assigned control groups is ideal, but often logistically difficult. Comparing results post-intervention to pre-intervention (or over time as the intervention is used) provides another comparison method. Combining these strategies, by comparing a group of people who use our e-Learning 2.0 interventions now, to those who use them 6 months from now, is also an alternative. I highly recommend you call in a measurement professional to help you.

Measuring results (including ROI)

One could write a whole book on measuring organizational results, but in the interest of time, I’m going to keep this short and sweet. If metrics are available, we can look at things like sales results, customer service ratings, secret shopper ratings, turnover, accidents, law suits, employee survey results, multi-rater feedback, manufacturing defect rates, and ROI.

Because many factors — not just our learning and development efforts — influence these results, determining the effect of our e-Learning 2.0 interventions requires the use of randomly assigned learners and statistical techniques. These procedures isolate the impact of our intervention from the impact of other organizational influences (that is, such things as the economy, other initiatives, morale, and so on). Again, unless you know what you’re doing, you ought to call in a measurement expert.

Finally, it is beneficial, if not a full-fledged moral imperative, to look at the results for our learners as well. Does the learning benefit them? Are they more competent? Do they feel more empowered? Do their reputations improve? Do they receive promotions more quickly? Are they able to thrive in the job market better? We might focus on lots of angles. Again, because learner engagement is likely to be a prime driver of e-Learning 2.0 outcomes, examining learner results may be especially important.

Interestingly, when we give “learner results” a place in our models (and I’ve included learner outcomes in my learning landscape models for several years now), we come back to learner reaction data, at least in part. Might our learners have a pretty good sense of whether the learning interventions are benefiting them? Wow. What would Donald Kirkpatrick say if he knew I was mashing Level 1 and Level 4 together just a bit?

Actually, Don is a very thoughtful fellow. He’d probably just say, “Hey, a result is a result. Whether it’s learner outcomes or ROI, it fits in Level 4. There’s no reason we can’t use reaction sheets for both Level 1 and Level 4 data gathering.” At least that’s what I’m guessing he’d say.


Other issues to consider: What the Kirkpatrick Model leaves out

Measurement affects behavior, so we have to be aware that measuring e-Learning 2.0 might affect its success. First, there is the tradeoff between precision and workability. The more extensive our measurement instruments, the more precise they are, but the more likely they are to adversely impact the learning and measurement processes. Novices in the measurement game often hurt their cause by creating measurement instruments that take too much time for people to complete. As time requirements increase, fewer and fewer people engage the measurement instruments with attention and care. And, because some types of people tend to drop out earlier than others, increasing time requirements increases the bias of the folks you’re actually sampling.

Other concerns are also in play. Measuring or monitoring people may actually change their behavior. When an e-Learning 2.0 system feels unmonitored, people are likely to feel free to be themselves. There is a certain power and motivation, which result in feeling that one is doing something on his or her own volition. As a sense of monitoring, oversight, or “doing it to look good” rises (Dweck, 1986, 2006), some people may disengage completely. Others may engage with little enthusiasm, or in a manner that is so personally protective that it loses value, or minimizes the opportunity for relationship-building.

To minimize these issues, design e-Learning 2.0 measurement as much as possible to balance precision and workability, taking care to limit the amount of perceived time required to complete evaluation instruments. Alternatively, where possible, we want to design assessment tools to feel like a part of the interaction, not as an add-on that requires extra effort. If possible (and if saying so is true) assessments should be framed as beneficial to the user, designed to improve their experience, and to enable them to focus their time on what is most valuable to them.

Measuring the effect on future learning

Learning interventions don’t just help people perform in the future (for example, on the job); they also can help people learn more in the future (again, on the job) (see Bransford and Schwartz, 1999; Schwartz and Martin, 2004). When people learn basic techniques in a software program, they may be better able to learn advanced techniques while using the program (that is, depending on the design of the original training). When people begin to learn some of the fine taste distinctions in wine, they are better able to learn even finer distinctions as they continue to sample wines (unless of course they are sampling too much wine all at once). When people in a leadership-development class learn that managers can hurt productivity by telling people what to do, they may not only learn that fact. They may also begin to see other ways that they hurt productivity. For example, by learning that “telling is unmotivating” in the training class, Sam may be more likely to notice that Sally stays more motivated when he first asks her to evaluate her own performance, before he steps in to provide feedback. He doesn’t learn this in the training class, but the training class helps him notice this later on the job.

We almost never measure bolstering future learning directly for most traditional training programs, and most training interventions are not designed specifically to aid future learning. E-Learning 2.0 interventions, because they prompt users to generate content, may be especially helpful in supporting future learning, both for the creators of the learning messages, and for the learners. Of course, there are some counter-arguments as well.

Here’s my thinking on this: Writing instructional messages forces creators to think deeply about the topic they are writing about. It also forces them to consider how a topic looks to a novice. It prompts them to reflect on the context in which the learner will utilize the learning. All these processes are likely to help the creator of the learning to deepen and reinforce what they have learned — enabling future learning. Of course, some creators may become so effective in creating learning messages, that they may become too narrow in their own thinking, not opening up to new ideas from others.

For the learners, e-Learning 2.0 may support future learning by creating a rich network of resources that they can rely on in the future to learn more and different concepts. Of course, some learners may forgo their own learning if these resources are available, stunting their ability to further their own learning in the future.

How do we measure the ability of our learning interventions to improve future learning? This is obviously a very tricky business. We could just measure on-the-job performance, and ignore the causal pathway through future learning. Alternatively, to measure future learning we could provide people with problems to solve or cases to analyze, and see how fast they learn from working on those problems or cases. We could track people’s promotions or job responsibilities, assuming that on-the-job learning is required for advancement. We could measure people’s learning through self-assessment or multi-rater feedback from colleagues. We could also decide that future learning is just too difficult to measure, given the current state of expertise about how to go about it.


Measuring e-Learning 2.0 is fraught with complexities, but we absolutely have to figure out a way to do it, and do it well. In this article, I’ve tried to give you some things to think about as you begin to plan how you might measure e-Learning 2.0 interventions.

Certainly, there are no easy recipes to follow.

In the Guild’s latest survey, respondents saw evaluation as one of the biggest areas of need for e-Learning 2.0. They felt strongly that it was important to evaluate e-Learning 2.0. Look at the second item in Figure 3.


Figure 3 Not knowing how to measure interactions was the second most-often cited barrier to adoption of e-Learning 2.0

Most respondents seemed to be ready to rely on user reactions, an inadequate strategy if utilized alone. In Figure 4, you can see the responses to the survey question on evaluating e-Learning 2.0. Over 75% of respondents were heading down the path of measuring learner reactions, with far fewer using other metrics. Respondents could choose more than one item, so hopefully they will use learner reactions along with other corroborating evidence.


Figure 4 Most respondents favored measuring learner reactions to e-Learning 2.0 as the basis for evaluating success.


Despite the complexities of measuring e-Learning 2.0, I can offer the following recommendations:

  1. Because e-Learning 2.0 is already on the fad upswing, we ought to be especially careful about assuming its benefits. In other words, we ought to measure it early and often, at least at first until our implementations prove to be beneficial investments.
  2. Because there are two sets of employees involved in e-Learning 2.0, those who learn from the content (“learners”) and those who create the content (“creators”), we need to evaluate the effects of e-Learning 2.0 on both groups of people.
  3. We have to determine if the created content is valid. Your content may not need to be 100% perfect, but you do need to know if the content is valid enough for its intended purposes.
  4. Measuring only the most obvious “learning content” may miss important aspects of the information that e-Learning 2.0 messages communicate.
  5. For situations in which e-Learning 1.0 is better positioned to provide necessary learning supports than e-Learning 2.0 (e.g., when long-term remembering is required), it might not be fair to compare our e-Learning 2.0 interventions to well-designed e-Learning 1.0 interventions. On the other hand, if we are using e-Learning 2.0 technologies to replace e-Learning 1.0 technologies, comparing results seems desirable.
  6. When we blend e-Learning 2.0 to support an e-Learning 1.0 intervention, we must focus first on whether the e-Learning 2.0 methodology supports the e-Learning 1.0 intended outcomes. We must also look at whether the e-Learning 2.0 methodology creates separate benefits or damage.
  7. Because e-Learning 2.0 can create harm, part of our measurement mission ought to be to determine whether people are developing inadequate knowledge or skills and/or wasting time as learners and creators.
  8. Asking people for their reactions to learning can provide some valuable knowledge, but is often fraught with bias. Therefore, we cannot consider asking for reactions to our e-Learning 2.0 interventions a sufficient measurement design.
  9. In thinking about measuring our e-Learning 2.0 interventions, we first have to decide what we designed our intervention to support: (a) Understanding, (b) Long-term Retrieval, (c) Future On-the-job Learning, (d) On-the-job Performance, (e) Organizational Results. Then we ought to devise a measurement regime to measure the outcomes we hope for — as well as the factors on the causal pathway to those outcomes.
    1. On-the-job performance is a necessary component for organizational results, therefore we must measure it and its prerequisites.
    2. Understanding and remembering are necessary components for learning-based performance improvement. Therefore, it is critical that we track them to help diagnose the cause of on-the-job performance failures.

Making it simple

While the in-depth thinking represented in this article may be helpful in providing you with rich mental models of how to think about measuring e-Learning 2.0 (and that was my intent), some of you will probably just want a simple heuristic about what to do. In lieu of a detailed conversation, here goes:


  1. Don’t just ask users for their feedback or rely on usage data.
  2. Don’t look only at benefits — consider potential harm too.
  3. Don’t look only at the learners — consider the creators too.


  1. Pilot test your e-Learning 2.0 intervention in a small way before full deployment. This will enable you to actually be able to invest in gathering the requisite data.
  2. Measure your users compared to those who are not using the e-Learning 2.0 intervention (after having used random assignment to groups), or compare results over time, or both.
  3. Use multiple measurement methods to gather corroborating evidence.


Alliger, G. M., Tannenbaum, S. I., Bennett, W. Jr., Traver, H., and Shotland, A. (1997). A meta-analysis of the relations among training criteria. Personnel Psychology, 50, 341-358.

Bransford, J. D., and Schwartz, D. L. (1999). Rethinking transfer: A simple proposal with multiple implications. Review of Research in Education, 24 , 61-100.

Dweck, C. S. (1986). Motivational processes affecting learning. American Psychologist, 41, 1040-1048.

Dweck, C. S. (2006). Mindset: The new psychology of success. New York, NY, US, Random House.

Schwartz, D. L., and Martin, T. (2004). Inventing to prepare for future learning: The hidden efficiency of encouraging original student production in statistics instruction. Cognition and Instruction, 22(2), 129-184.

Shrock, S., and Coscarelli, W. (2008). Criterion-Referenced Test Development, Third Edition. San Francisco: Pfeiffer. 

Thalheimer, W. (2007, April). Measuring learning results: Creating fair and valid assessments by considering findings from fundamental learning research. Retrieved August 7, 2008, from

Wick, C., Pollock, R., Jefferson, A., and Flanagan, R. (2006). The Six Disciplines of Breakthrough Learning: How to Turn Training and Development into Business Results. San Francisco: Pfeiffer.

Wexler, S., Schlenker, B., Coscarelli, B., Martinez, M., Ong, J., Pollock, R., Rossett, A., Shrock, S., and Thalheimer, W. (2007, October). Measuring success: Aligning learning success with business success. Retrieved from:

Wilford, J. N. (2008). Washington’s Boyhood Home Is Found. Retrieved from

(Author’s Note) I’d like to thank the many Guild members in attendance at my espresso café roundtable at the Guild’s most recent Learning-Management Colloquium, who helped me see the unsettling complexities that are involved in evaluating e-Learning 2.0 interventions. I would also like to thank Steve Wexler, Bill Brandon, Jane Hart, and Mark Oehlert who provided trenchant commentary on a first draft — enabling significant improvement in this article — just like I might hope from a well-designed e-Learning 2.0 community.

Topics Covered

Appreciate this!
Google Plusone Twitter LinkedIn Pinterest Facebook Email Print

Login or subscribe to comment

Be the first to comment.

Related Articles