Data Visualization Explained (Part 3): The Role of Color

” .”

do you see in the image below?

Most individuals see 4: white, green, and two different shades of pinkish-red. In point of fact, those two shades are the exact same; there are only three colours within the image.

This popular optical illusion illustrates a crucial fact to contemplate when designing data visualizations: Poorly chosen color combos can trick the human eye. For an entire treatment of color, I would wish to delve into physiological details of the human eye and find out how we actually “see” color.

Nonetheless, seeing as this isn’t an optometry article, I’ll as an alternative give attention to the basics of color usage that you will want to construct clear data visualizations.

The Difference Between Color Hue and Color Value

After I introduced visual encoding channels within the previous article, I presented two different channels related to paint: hue and value. Allow us to discuss these formally.

Color hue is what you generally consider once you hear the word “color.” Red, green, blue, pink, yellow, etc. are all different . Color value, alternatively, refers back to the “lightness” of a person hue. The image below illustrates different values of the rainbow colours, showing how the identical hue can vary greatly in lightness/saturation:

Image by Wikimedia Commons

While each of those will be effective visual encodings (see my previous article on this series for an in depth discussion on visual encodings), color value has one notable advantage over hue: It might still be perceived if a visualization is printed in grayscale.

Forms of Color Scales

If you need to use color as a visible encoding, you could start by selecting a color scale. In doing so, there are just a few characteristics you could consider:

In case your data is nominal, then you should utilize a categorical color scale, which relies solely on color hue.
For quantitative data, you’ll have to make two additional decisions: 1) whether your scale will likely be sequential or divergent (i.e., if it should use one or two hues), and a pair of) whether your scale will likely be continuous or divided into classes.

Thus, there are five color scales at our disposal, all of which we are going to discuss below: 1) sequential and unclassed, 2) sequential and classed, 3) divergent and unclassed, 4) divergent and classed, and 5) categorical [1].

Sequential scales (one hue) are useful for visualizing numerical values that go from low to high. Divergent scales can prove helpful when values go from negative to positive or when the designer wishes to emphasise some difference between the colours on two ends of the dimensions.

After all, these are only general rules. Several types of scales are best depending on the actual visualization, and sometimes multiple can work.

Sequential and unclassed

The next map uses a sequential, unclassed color scale for instance the fraction of Australians that identified as Anglican on the time of the 2011 census. We are able to see that a single hue, green, increases in value from light to dark. Since there is simply one color, there isn’t a divergence, and for the reason that scale is continuous, there aren’t any classes.

Image by Toby Hudson on Wikimedia Commons

Sequential and classed

In contrast to the visualization above, we are able to see that the map of the US below has discrete classes which vary the colour value. It remains to be sequential, as only a pink hue is used. The colour value is increased as the share of adults of their early 20s inside a county increases.

One noteworthy element of this visualization is the uneven nature of the classes. (Note the width of the most important category.) This isn’t all the time good practice, especially if no reason is given. Image by Derek Montaño on Wikimedia Commons.

Divergent, classed and unclassed

Divergent scales are a bit trickier to grasp, so let’s consider each types together in a comparative example. In doing so, we’ll also see different benefits of classed and unclassed scales.

The 2 charts below were generated in Python using mock data. The information consists of the next visual representations (i.e., visual encoding channels):

The x-axis consists of a number representing store location.
The y-axis represents the months of the 12 months.
The colour represents a “customer satisfaction rating” collected by the fictional stores via monthly surveys.

The classed vs. unclassed aspect of those visualizations is very similar to within the sequential scales above. Within the left (unclassed) scale, the complete totality of values is represented, whereas in the best (classed) one, colours represent grouped buckets of values. The left visualization provides more precision, but the best one is simpler to interpret and apply.

The divergent aspect of those scales is more convoluted. Let’s break it down:

The divergent scale here uses two colours: red and green (not probably the most accessible colours on the earth, as we are going to see later within the article).
The neutral, white color (or the 2 light colours within the classed scale) represents a logical “middle point” in the info, which on this case is the worth
This middle point is vital, because it makes for a situation where a divergent scale lends itself naturally to the info. It makes little sense to make use of multiple color if values are only moving in a single direction with no meaningful center.

Categorical

The ultimate, and arguably most straightforward, color scale type is a categorical one. The chart below, which shows government funding breakdowns across various countries, provides a transparent example.

If you might have been taking note of the principles discussed on this chapter this far, you’ll probably notice that this isn’t a very well-designed data visualization. It gets the overall point across, but there are just a few too many alternative colours, leading to a confusing final design.

That said, it’s an efficient use of a categorical scale, appropriately applying this scale type to nominal data (data that has distinct, unordered categories). A typical mistake in data visualization—and one you need to take care to avoid—is using a categorical scale with several different hues when your data shows a transparent numerical increase or decrease. In those situations, discuss with one among the colour scales discussed above, depending in your specific data.

That sums up the fundamentals of color scales that you need to know to have interaction in effective data visualization. To conclude, let’s have a look at a pair more suggestions for using color well.

(Don’t) Use Color Redundantly

It might be tempting to make use of color in a visualization when it isn’t needed. For instance, it’s quite common to see bar graphs with clear x-axis labels to tell apart the bars that also have bars of various colours.

This isn’t , nevertheless it could also be unnecessary. If there are only just a few categories they usually’re linked with other visualizations, by all means use color to offer an extra visual cue. Nonetheless, if the visualization functions advantageous without it, then don’t force it.

Normally, any and all redundant encodings (representations) must be avoided unless they supply some additional ease of interpretation for the viewer. It’s either wasteful, as that encoding channel could possibly be used for a unique variable, or confusing, because the viewer is left to find out if the extra encoding is depicting something that’s going over their head.

Make Color Palettes Accessible

This last point it short, but incredibly vital. Don’t assume that just because you’ll be able to distinguish among the many colours in a visualization, so can everyone else. Data visualizations must be accessibly by everyone, including individuals who have various sorts of colorblindness [2].

For instance, consider the Python visualizations within the section on divergent color scales above. Do you’re thinking that someone with red-green color blindness will give you the chance to interpret it appropriately? Unlikely.

Luckily, we don’t have to do an excessive amount of extra work to make sure our visualizations are accessible. There are countless online tools [3, 4, 5] which mechanically check the accessibility of your chosen color palettes. Some will even aid you generate them. Benefit from them to make your visualizations as accessible as possible.

Final Thoughts

Congratulations! With the third article on this series, you might have learned the essential principles you will want to design compelling data visualizations. Within the articles to return, we are going to finally start designing and constructing our own visualizations! Until then.

References

[1] https://blog.datawrapper.de/which-color-scale-to-use-in-data-vis/
[2] https://www.nei.nih.gov/learn-about-eye-health/eye-conditions-and-diseases/color-blindness/types-color-vision-deficiency
[3] https://coolors.co/contrast-checker/112a46-acc8e5
[4] https://webaim.org/resources/contrastchecker/
[5] https://accessibleweb.com/color-contrast-checker/

Data Visualization Explained (Part 3): The Role of Color

The Difference Between Color Hue and Color Value