Misleading Data Visualizations Can Confuse, Deceive Learners

Data visualizations can be essential tools for exploring and communicating complicated information—or they can obfuscate, distort, or misrepresent data. Misleading data visualizations might be intentional, if the creator has an agenda to promote. Or they might be the result of errors, the creator not understanding the data or the data visualization process, or allowing engaging or even beautiful visual design to get in the way of clear communication. Whatever the reason, misleading data visualizations have no place in eLearning; using these guidelines can help designers avoid confusing or misinforming their learners.

The primary ways that a visualization can mislead learners are:

  • Hiding relevant data
  • Presenting too much data
  • Distorting the presentation of data
  • Describing the data inaccurately in annotations, title, or within the visualization itself

Let’s examine each of these.

Hiding relevant data

Hiding relevant data or highlighting a particularly beneficial or positive data point can lead learners to focus on a small fraction of the data story—at the expense of accurate understanding of the bigger picture. Any individual parameter or statistic can reveal interesting or useful information. But taken out of context, it can also be misleading.

For example, look at Figure 1, a data visualization based on Pew Research Center’s 2018 social media use survey. An infographic could zoom in on the line for Facebook, making much of the fact that 68 percent of American adults use Facebook.

Pew Research data show that YouTube and Facebook are used by the largest number of American adults, but that Instagram shows the strongest growth in user numbers, with Twitter close behind. Use of Facebook and LinkedIn are flat since the 2016 survey. Data for Snapchat, YouTube, and WhatsApp are available only for 2018.

Figure 1: Pew Research data visualization shows use of different social media platforms between 2012 and 2018

An eLearning or marketing strategy might be built around Facebook, with the architects believing that Facebook exposure is the golden ticket to reaching more members of their audience.

But a deeper look at the data shows that Facebook use has been flat since Pew’s previous study in 2016. It also shows a whopping 25 percent increase in Instagram use during the same period—from 28 percent of American adults to 35 percent. The increase is primarily among younger users. A company seeking to appeal to these learners might want to consider a multiple-platform strategy or focus its efforts on the up-and-coming platforms. Thus, a data visualization like Figure 1, which presents more complete data might lead eLearning designers to a different approach.

Presenting too much data

Sometimes, showing the big picture can make it hard to identify salient data or stories.

In Figure 2, the sheer number of lines makes it hard to focus on any one data point or trend. If the designer wanted to obscure some bad news, burying it in a massive amount of information could accomplish that—but it also makes the data visualization essentially worthless.

A data visualization with multiple overlapping lines or data points can obscure the information rather than communicate it.

Figure 2: The number of lines in this data visualization makes it hard to isolate any one fact or trend

In other cases, the trends that appear when an entire data set is visualized are the opposite of trends that appear when subsets of that data are studied separately. This phenomenon, known as Simpson’s Paradox, is explained in Cathy O’Neil’s Weapons of Math Destruction using a national report on school performance as an example. The report, A Nation at Risk, which was the basis of wide-ranging public policy, stated that nationwide, high schoolers’ SAT scores had declined.

Examination of the data revealed that, while this was true in a big-picture sense, the period of data covered an era with tremendous growth in the number and range of students taking the exam; universities were admitting more minority and lower-income students, vastly increasing the numbers of these students taking the exam. When each cohort of students—analyzed according to income groups—was examined, the data actually showed increases in the average scores of each group.

When learners will need both a big-picture and a detailed visualization of data, the designer should consider creating a series of data visualizations. News media often do this with large data stories, showing a national map, for example with broad representations of data by region or state, then a series of more narrowly focused visualizations that highlight important trends, outliers, or other information.

Distorting data

Showing too little or too much data or emphasizing selected data could simply be an error that results from choosing the wrong format for the data visualization or from not fully understanding the data. These errors can be unintentional, though some presentations of data distort the data in ways that appear to be intentional or agenda-driven. Examples include using different scales when graphing different variables or starting the Y axis at a non-zero point, which can de-emphasize differences in values.

In “Graphics, Lies, Misleading Visuals,” data journalist Alberto Cairo, who holds the Knight Chair in Visual Journalism at the School of Communication of the University of Miami, uses several examples from political campaign ads or media coverage. This type of distortion can also be found in consumer advertising, marketing and PR materials, and elsewhere. Figure 3 illustrates how something as simple as truncating the Y axis makes a significant difference in how readers will understand a data visualization.

Two charts showing the same data look very different if the presentation is distorted by truncating the Y axis; in one version, the Y axis begins at zero, while in the other it begins at the value 45.

Figure 3: Both charts show 48 No votes and 52 Yes votes, but the top figure, whose Y axis starts at 45, appears to show a much larger difference between the vote totals

An equally misleading presentation of data could be a “strategic” selection of where to begin and end or use of uneven intervals in the X or Y axis. Examples Cairo cited include presenting only six months of unemployment data in an economy where seasonal highs and lows are a known factor and switching—mid-chart—from a yearly interval to a monthly interval when presenting information on rate increases. The latter example could hide the size of an increase by presenting it in smaller chunks, graphed next to larger annual increases. The unemployment example could appear to show a large drop (or increase) in unemployment while actually reflecting an expected annual cycle.

Describing data inaccurately

A particularly unethical way to mislead using data visualizations is to mislabel data or use accompanying text that “explains” it inaccurately.

A county-by-county map of 2016 presidential election results shows a vast sea of red, which accurately reflects President Trump’s sweep of the American Heartland but does not accurately reflect actual vote counts (nor is it intended to).

Figure 4: This map, beloved by President Trump, accurately depicts the county-by-county results of the 2016 presidential election. It does not, as many have claimed, reflect the total number of votes in each red or blue square

The county-by-county map in Figure 4 is an accurate data visualization showing results of every US county for the 2016 presidential election. But when used, as shown in Figure 5, as a representation of voters or “citizens,” it is being described misleadingly.

Conflating “counties” with “citizens” is a misleading way to present data.

Figure 5: The book cover for Citizens for Trump

Using the map to imply a representation of “citizens” mischaracterizes the data map of county election results, conflating them with numbers of votes. Each county in the heartland states represents far fewer voters (though vastly more physical space) than the densely populated—and primarily blue—counties clustered along the coasts. While the map itself is accurate, it does not reflect citizen support for either candidate.

A related way to mislead with data visualizations is to present data that appear to show correlations—and imply or explicitly state that there is a causal relationship between them. Data do not show causes, Cairo often reminds students. Data sets provide information that can lead to questions. Further investigation of those questions might turn up a correlation—or it might not. Websites and books devoted to spurious correlations prove the axiom “correlation does not equal causation,” yet well-intentioned (as well as nefarious) designers are prone to the often-fallacious assumption that one trend in the data set somehow caused another.

Don’t avoid using data visualizations in eLearning

The many ways that data visualizations can go wrong is not an argument for avoiding them. Data visualizations can enhance eLearning and make complex information clear and instantly accessible to many learners. Choosing an appropriate format for data visualizations in eLearning and applying sound visual design principles can go a long way toward helping designers avoid misleading data visualizations. Register now for The eLearning Guild’s Data & Analytics Summit, August 22 & 23, 2018, and learn more about using data to enhance eLearning!

More Design

You May Also Like