The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines

-

“You don’t need to be an authority to deceive someone, though you would possibly need some expertise to reliably recognize when you’re being deceived.”

When my co-instructor and I start our quarterly lesson on deceptive visualizations for the information visualization course we teach on the University of Washington, he emphasizes the purpose above to our students. With the arrival of recent technology, developing pretty and convincing claims about data is simpler than ever. Anyone could make something that seems passable, but incorporates oversights that render it inaccurate and even harmful. Moreover, there are also malicious actors who actively to deceive you, and who’ve studied a few of one of the best ways to do it.

I often start this lecture with a little bit of a quip, looking seriously at my students and asking two questions:

  1. “Is it a very good thing if someone is gaslighting you?”
  2. After the final murmur of confusion followed by agreement that gaslighting is indeed bad, I ask the second query: “What’s one of the best option to ensure nobody ever gaslights you?”

The scholars generally ponder that second query for a bit longer, before chuckling a bit and realizing the reply: . Not so you may reap the benefits of others, but so you may prevent others from benefiting from you.

The identical applies within the realm of misinformation and disinformation. Individuals who need to mislead with data are empowered with a bunch of tools, from high-speed web to social media to, most recently, generative AI and huge language models. To guard yourself from being misled, you should learn their tricks.

In this text, I’ve taken the important thing ideas from my data visualization course’s unit on deception–drawn from Alberto Cairo’s excellent book –and broadened them into some general principles about deception and data. My hope is that you just read it, internalize it, and take it with you to arm yourself against the onslaught of lies perpetuated by ill-intentioned people powered with data.

Humans Cannot Interpret Area

No less than, not in addition to we interpret other visual cues. Let’s illustrate this with an example. Say we’ve an especially easy numerical data set; it’s one dimensional and consists of just two values: 50 and 100. One option to represent this visually is via the length of bars, as follows:

That is true to the underlying data. Length is a one-dimensional quantity, and we’ve doubled it with a view to indicate a doubling of value. But what happens if we would like to represent the identical data with circles? Well, circles aren’t really defined by a length or width. One option is to double the radius:

Hmm. The primary circle has a radius of 100 pixels, and the second has a radius of fifty pixels–so that is technically correct if we desired to double the radius. Nevertheless, due to the way in which that area is calculated (πr²), we’ve way greater than doubled the world. So what if we tried just doing that, because it seems more visually accurate? Here’s a revised version:

Now we’ve a unique problem. The larger circle is mathematically twice the world of the smaller one, but it surely now not that way. In other words, regardless that it’s a visually accurate comparison of a doubled quantity, human eyes have difficulty perceiving it.

The problem here is attempting to use area as a visible marker in the primary place. It’s not necessarily , but it surely is confusing. We’re increasing a one-dimensional value, but area is a two-dimensional quantity. To the human eye, it’s all the time going to be difficult to interpret accurately, especially when put next with a more natural visual representation like bars.

Now, this will appear to be it’s not an enormous deal–but let’s take a take a look at what happens once you extend this to an actual data set. Below, I’ve pasted two images of charts I made in Altair (a Python-based visualization package). Each chart shows the utmost temperature (in Celsius) in the course of the first week of 2012 in Seattle, USA. The primary one uses bar lengths to make the comparison, and the second uses circle areas.

Which one makes it easier to see the differences? The legend helps in the second, but when we’re being honest, it’s a lost cause. It is far easier to make precise comparisons with the bars, even in a setting where we’ve such limited data.

Do not forget that the purpose of a visualization is to make clear data–to make hidden trends easier to see for the typical person. To realize this goal, it’s best to make use of visual cues that simplify the means of making that distinction.

Beware Political Headlines (In Any Direction)

There’s a small trick query I sometimes ask my students on a homework project across the fourth week of sophistication. The project mostly involves generating visualizations in Python–but for the last query, I give them a chart I actually generated accompanied by a single query:

Query: There’s one thing egregiously mistaken with the chart above, an unforgivable error in Data Visualization. What’s it?

Most think it has something to do with the axes, marks, or another visual aspect, often suggesting improvements like filling within the circles or making the axis labels more informative. Those are nice suggestions, but not probably the most pressing.

Essentially the most flawed trait (or lack thereof, slightly) within the chart above is the . A title is crucial to an efficient data visualization. Without it, how are we speculated to know what this visualization is even about? As of now, we will only ascertain that it must vaguely have something to do with carbon dioxide levels across a span of years. That isn’t much.

Many people, feeling this requirement is simply too stringent, argue that a visualization is commonly meant to be understood in context, as part of a bigger article or press release or other accompanying piece of text. Unfortunately, this line of considering is much too idealistic; in point of fact, a visualization must stand alone, because it’ll often be the one thing people take a look at–and in social media blow-up cases, the one thing that gets shared widely. Consequently, it must have a title to clarify itself.

After all, the title of this very subsection tells you to be wary of such headlines. That’s true. While they’re vital, they’re a double-edged sword. Since visualization designers know viewers will listen to the title, ill-meaning ones may also use it to sway people in less-than-accurate directions. Let’s take a look at an example:

The above is a picture shared by the White House’s public Twitter account in 2017. The image can also be referenced by Alberto Cairo in his book, which emphasizes most of the points I’ll now make.

First things first. The word “chain migration,” referring to what’s formally referred to as family-based migration (where an immigrant may sponsor members of the family to come back to the USA), has been criticized by many who argue that it’s needlessly aggressive and makes legal immigrants sound threatening for no reason.

After all, politics is by its very nature divisive, and it is feasible for any side to make a heated argument. The first issue here is definitely a data-related one–specifically, what the usage of the word “chain” implies within the context of the chart shared with the tweet. “Chain” migration seems to point that folks can immigrate one after the opposite, in a seemingly infinite stream, uninhibited and unperturbed by the gap of family relations. The fact, in fact, is that a single immigrant can mostly just sponsor immediate members of the family, and even that takes quite a little bit of time. But when one reads the phrase “chain migration” after which immediately looks at a seemingly sensible chart depicting it, it is straightforward to imagine that a person can actually spawn additional immigrants at a base-3 exponential growth rate.

is the problem with any form of political headline–it makes it far too easy to hide dishonest, inaccurate workings with actual data processing, evaluation, and visualization.

There’s data underlying the chart above. None. Zero. It is totally random, and that isn’t okay for a chart that’s purposefully made to seem as whether it is showing something meaningful and quantitative.

As a fun little rabbit hole to go down which highlights the risks of political headlining inside data, here’s a link to FloorCharts, a Twitter account that posts probably the most absurd graphics shown on the U.S. Congress floor.

Don’t Use 3D. Please.

I’ll end this text on a rather lighter topic–but still a vital one. Certainly not–none in any respect–must you ever utilize a 3D chart. And when you’re within the shoes of the viewer–that’s, when you’re taking a look at a 3D pie chart made by another person–don’t trust it.

The explanation for this is straightforward, and connects back to what I discussed with circles and rectangles: a 3rd dimension distorts the reality behind what are often one-dimensional measures. Area was already hard to interpret–how well do you actually think the human eye does with volume?

Here’s a 3D pie chart I generated with random numbers:

Now, here is the very same pie chart, but in two dimensions:

Notice how the blue isn’t quite as dominant because the 3D version seems to suggest, and that the red and orange are closer to at least one one other in size than originally portrayed. I also removed the share labels intentionally (technically bad practice) with a view to emphasize how even with the labels present in the primary one, our eyes routinely pay more attention to the more drastic visual differences. In case you’re reading this text with an analytical eye, perhaps you think that it doesn’t make that much of a difference. But the actual fact is, you’ll often see such charts within the news or on social media, and a fast glance is all they’ll ever get.

It is crucial to be sure that the story told by that quick glance is a truthful one.

Final Thoughts

Data science is commonly touted as the right synthesis of Statistics, computing, and society, a option to obtain and share deep and meaningful insights about an information-heavy world. That is true–but because the capability to widely share such insights expands, so must our general ability to interpret them accurately. It’s my hope that in light of that, you’ve gotten found this primer to be helpful.

Stay tuned for Part 2, by which I’ll discuss just a few deceptive techniques a bit more involved in nature–including base proportions, (un)trustworthy statistical measures, and measures of correlation.

Within the meantime, try to not get deceived.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x