Do European M&Ms Actually Taste Higher than American M&Ms?

-


1. Introduction

1.1 Background and motivation

Chocolate is enjoyed all over the world. From ancient practices harvesting organic cacao within the Amazon basin, to chocolatiers sculpting edible art within the mountains of Switzerland, and massive factories in Hershey, Pennsylvania churning out 70 million kisses per day, the nuanced forms and flavors of chocolate have been integrated into many cultures and their customs. While quality can greatly vary across chocolate products, a widely known, shelf-stable, easily shareable type of chocolate are M&Ms. Readily found by convenience store check-out counters and in hotel vending machines, the brightly coloured pellets are a preferred treat whose packaging is re-branded to suit nearly any commercializable American holiday.

While living in Denmark in 2022, I heard a concerning claim: M&Ms manufactured in Europe taste different, and arguably “higher,” than M&Ms produced in america. While I recognized that fancy European chocolate is indeed quite tasty and infrequently superior to American chocolate, it was unclear to me if the identical claim should hold for M&Ms. I learned that many Europeans perceive an “unpleasant” or “tangy” taste in American chocolate, which is basically attributed to butyric acid, a compound resulting from differences in how milk is treated before incorporation into milk chocolate.

But truthfully, how much of a difference could this make for M&Ms? ? I imagined M&Ms would retain a comparatively processed/mass-produced/low cost candy flavor wherever they were manufactured. Because the lone American visiting a various lab of international scientists pursuing cutting-edge research in biosustainability, I used to be inspired to interrupt out my data science toolbox and investigate this M&M flavor phenomenon.

1.2 Previous work

To cite a European woman, who shall remain anonymous, after she tasted an American M&M while traveling in Recent York:

“They taste so gross. Like vomit. I don’t understand how people can eat this. I threw the remainder of the bag away.”

Vomit? Really? In my experience, children raised in america had no qualms about eating M&Ms. Growing up, I used to be accustomed to bowls of M&Ms strategically placed in high traffic areas around my house to supply available sugar. Clearly American M&Ms are edible. But are they significantly different and/or inferior to their European equivalent?

In response to the anonymous European woman’s scathing report, myself and two other Americans visiting Denmark sampled M&Ms purchased locally within the Lyngby Storcenter Føtex. We hoped to experience the incredible improvement in M&M flavor that was apparently hidden from us throughout our youths. But curiously, we detected no obvious flavor improvements.

Unfortunately, neither preliminary study was in a position to conduct a side-by-side taste test with proper controls and randomized M&M sampling. Thus, we turn to science.

1.3 Study Goals

This study seeks to treatment the previous lack of thoroughness and investigate the next questions:

  1. Is there a global consensus that European M&Ms are in actual fact higher than American M&Ms?
  2. Can Europeans actually detect a difference between M&Ms purchased within the US vs in Europe after they don’t know which one they’re eating? Or is that this a grand, coordinated lie amongst Europeans to make Americans feel embarrassed?
  3. Are Americans actually taste-blind to American vs European M&Ms? Or can they taste a difference but simply don’t describe this difference as “an improvement” in flavor?
  4. Can these alleged taste differences be perceived by residents of other continents? If that’s the case, do they find one flavor obviously superior?

2. Methods

2.1 Experimental design and data collection

Participants were recruited by luring — er,  them to a social gathering (with the promise of free food) that was conveniently co-located with the testing site. Once a participant agreed to pause socializing and join the study, they were positioned at a testing station with a trained experimenter who guided them through the next steps:

  • Participants sat at a table and received two cups: 1 empty and 1 stuffed with water. With one cup in each hand, the participant was asked to shut their eyes, and keep them closed through the rest of the experiment.
  • The experimenter randomly extracted one M&M with a spoon, delivered it to the participant’s empty cup, and the participant was asked to eat the M&M (eyes still closed).
  • After eating each M&M, the experimenter collected the taste response by asking the participant to report in the event that they thought the M&M tasted: Especially Good, Especially Bad, or Normal.
  • Each participant received a complete of 10 M&Ms (5 European, 5 American), one by one, in a random sequence determined by random.org.
  • Between eating each M&M, the participant was asked to take a sip of water to assist “cleanse their palate.”
  • Data collected: for every participant, the experimenter recorded the participant’s continent of origin (if this was ambiguous, the participant was asked to list the continent on which they’ve the strongest memories of eating candy as a toddler). For every of the ten M&Ms delivered, the experimenter recorded the M&M origin (“Denmark” or “USA”), the M&M color, and the participant’s taste response. Experimenters were also encouraged to jot down any amusing phrases uttered by the participant throughout the test, recorded under notes (data available here).

2.2 Sourcing materials and recruiting participants

Two bags of M&Ms were purchased for this study. The American-sourced M&Ms (“USA M&M”) were acquired on the SFO airport and delivered by the creator’s parents, who visited her in Denmark. The European-sourced M&Ms (“Denmark M&M”) were purchased at an area Føtex food market in Lyngby, a little bit north of Copenhagen.

Experiments were conducted at two most important time points. The primary 14 participants were tested in Lyngby, Denmark in August 2022. They mostly consisted of friends and housemates the creator met on the Novo Nordisk Foundation Center for Biosustainability on the Technical University of Denmark (DTU) who got here to a “going away party” into which the experimental procedure was inserted. A couple of additional family and friends who visited Denmark were also tested during their travels (e.g. on the train).

The remaining 37 participants were tested in Seattle, WA, USA in October 2022, primarily during a “TGIF blissful hour” hosted by graduate students in the pc science PhD program on the University of Washington. This second batch mostly consisted of scholars and staff of the Paul. G. Allen School of Computer Science & Engineering (UW CSE) who responded to the weekly Friday summoning to the Allen Center atrium at no cost snacks and drinks.

While this study set out to investigate global trends, unfortunately data was only collected from 51 participants the creator was in a position to lure to the study sites and isn’t well-balanced nor representative of the 6 inhabited continents of Earth (Figure 1). We hope to enhance our recruitment tactics in future work. For now, our analytical power with this dataset is restricted to response trends for people from North America, Europe, and Asia, highly biased by subcommunities the creator happened to have interaction with in late 2022.

2.3 Risks

While we didn’t acquire formal approval for experimentation with human test subjects, there have been minor risks related to this experiment: participants were warned that they might be subjected to increased levels of sugar and possible “unpleasant flavors” in consequence of participating on this study. No other risks were anticipated.

After the experiment nonetheless, we unfortunately observed several cases of deflated pride when a participant learned their taste response was skewed more positively towards the M&M type they weren’t expecting. This pride deflation seemed most severe amongst European participants who learned their very own or their fiancé’s preference skewed towards USA M&Ms, though this was not quantitatively measured and can’t be confirmed beyond anecdotal evidence.

3. Results & Discussion

3.1 Overall response to “USA M&Ms” vs “Denmark M&Ms”

3.1.1 Categorical response evaluation — entire dataset

In our first evaluation, we count the whole variety of “Bad”, “Normal”, and “Good” taste responses and report the proportion of every response received by each M&M type. M&Ms from Denmark more steadily received “Good” responses than USA M&Ms but additionally more steadily received “Bad” responses. M&Ms from the USA were most steadily reported to taste “Normal” (Figure 2). This will result from the elevated variety of participants hailing from North America, where the USA M&M is the default and thus more “Normal,” while the Denmark M&M was more often perceived as higher or worse than the baseline.

Figure 2. Qualitative taste response distribution across the entire dataset. The share of taste responses for “Bad”, “Normal” or “Good” was calculated for every kind of M&M. Figure made with Altair.

Now let’s break out some Statistics, corresponding to a -squared (X2) test to check our observed distributions of categorical taste responses. Using the scipy.stats chi2_contingency function, we built contingency tables of the observed counts of “Good,” “Normal,” and “Bad” responses to every M&M type. Using the X2 test to guage the null hypothesis that there isn’t a difference between the 2 M&Ms, we found the -value for the test statistic to be 0.0185, which is important on the common -value cut off of 0.05, but not at 0.01. So a solid “perhaps,” depending on whether you’d like this result to be significant or not.

3.1.2 Quantitative response evaluation — entire dataset.

The X2 test helps evaluate if there’s a difference in categorical responses, but next, we would like to find out a relative taste  between the 2 M&M types. To do that, we converted taste responses to a quantitative distribution and calculated a taste rating. Briefly, “Bad” = 1, “Normal” = 2, “Good” = 3. For every participant, we averaged the taste scores across the 5 M&Ms they tasted of every type, maintaining separate taste scores for every M&M type.

Figure 3. Quantitative taste rating distributions across the entire dataset. Kernel density estimation of the common taste rating calculated for every participant for every M&M type. Figure made with Seaborn.

With the common taste rating for every M&M type in hand, we turn to scipy.stats ttest_ind (“T-test”) to guage if the technique of the USA and Denmark M&M taste scores are different (the null hypothesis being that the means are equivalent). If the means are significantly different, it would supply evidence that one M&M is perceived as significantly tastier than the opposite.

We found the common taste scores for USA M&Ms and Denmark M&Ms to be quite close (Figure 3), and never significantly different (T-test: = 0.721). Thus, across all participants, we don’t observe a difference between the perceived taste of the 2 M&M types (or when you enjoy parsing triple negatives: “we  reject the null hypothesis that there’s  a difference”).

But does this variation if we separate participants by continent of origin?

3.2 Continent-specific responses to “USA M&Ms” vs “Denmark M&Ms”

We repeated the above X2 and T-test analyses after grouping participants by their continents of origin. The Australia and South America groups were combined as a minimal try and preserve data privacy. Because of the relatively small sample size of even the combined Australia/South America group (=3), we’ll refrain from analyzing trends for this group but include the info in several figures for completeness and pleasure of the participants who may eventually read this.

3.2.1 Categorical response evaluation — by continent

In Figure 4, we display each the taste response counts (upper panel, ) and the response percentages (lower panel) for every continent group. Each North America and Asia follow an identical trend to the entire population dataset: participants report Denmark M&Ms as “Good” more steadily than USA M&Ms, but additionally report Denmark M&Ms as “Bad” more steadily. USA M&Ms were most steadily reported as “Normal” (Figure 4).

Quite the opposite, European participants report USA M&Ms as “Bad” nearly 50% of the time and “Good” only 18% of the time, which is probably the most negative and least positive response pattern, respectively (when excluding the under-sampled Australia/South America group).

Figure 4. Qualitative taste response distribution by continent. Upper panel: counts of taste responses — click the legend to interactively filter! Lower panel: percentage of taste responses for every kind of M&M. Figure made with Altair.

This appeared striking in bar chart form, nonetheless only North America had a big X2 -value ( = 0.0058) when evaluating each continent’s difference in taste response profile between the 2 M&M types. The European -value is probably “approaching significance” in some circles, but we’re about to build up several more hypothesis tests and ought to be mindful of multiple hypothesis testing (Table 1). A false positive result here could be devastating.

When comparing the taste response profiles between two continents for a similar M&M type, there are a pair interesting notes. First, we observed no major taste discrepancies between all pairs of continents when evaluating Denmark M&Ms — the world seems generally consistent of their range of feelings about M&Ms sourced from Europe (right column X2 -values, Table 2). To visualise this comparison more easily, we reorganize the bars in Figure 4 to group them by M&M type (Figure 5).

Figure 5. Qualitative taste response distribution by M&M type, reported as percentages. (Same data as Figure 4 but re-arranged). Figure made with Altair.

Nonetheless, when comparing continents to one another in response to USA M&Ms, we see larger discrepancies. We found one pairing to be significantly different: European and North American participants evaluated USA M&Ms very otherwise ( = 0.000007) (Table 2). It seems most unlikely that this observed difference is by random probability (left column, Table 2).

3.2.2 Quantitative response evaluation — by continent

We again convert the explicit profiles to quantitative distributions to evaluate continents’ relative preference of M&M types. For North America, we see that the taste rating technique of the 2 M&M types are literally quite similar, but there’s the next density around “Normal” scores for USA M&Ms (Figure 6A). The European distributions maintain a bit more of a separation of their means (though not quite significantly so), with USA M&Ms scoring lower (Figure 6B). The taste rating distributions of Asian participants is most similar (Figure 6C).

Reorienting to check the quantitative means between continents’ taste scores for a similar M&M type, only the comparison between North American and European participants on USA M&Ms is significantly different based on a T-test ( = 0.001) (Figure 6D), though now we  are at risk of multiple hypothesis testing! Be cautious when you are taking this evaluation in any respect seriously.

Figure 6. Quantitative taste rating distributions by continent. Kernel density estimation of the common taste rating calculated for every each continent for every M&M type. A. Comparison of North America responses to every M&M. B. Comparison of Europe responses to every M&M. C. Comparison of Asia responses to every M&M. D. Comparison of continents for USA M&Ms. E. Comparison of continents for Denmark M&Ms. Figure made with Seaborn.

At this point, I feel myself considering that perhaps Europeans aren’t just making this up. I’m not saying it’s as dramatic as a few of them claim, but perhaps a difference does indeed exist… To a point, North American participants also perceive a difference, however the evaluation of Europe-sourced M&Ms isn’t consistently positive or negative.

3.3 M&M taste alignment chart

In our analyses to this point, we didn’t account for the baseline differences in M&M appreciation between participants. For instance, say Person 1 scored all Denmark M&Ms as “Good” and all USA M&Ms as “Normal”, while Person 2 scored all Denmark M&Ms as “Normal” and all USA M&Ms as “Bad.” They might have the identical relative preference for Denmark M&Ms over USA M&Ms, but Person 2 perhaps just doesn’t enjoy M&Ms as much as Person 1, and the relative preference signal is muddled by averaging the raw scores.

Inspired by the Lawful/Chaotic x Good/Evil alignment chart utilized in tabletop role playing games like Dungeons & Dragons©™, in Figure 7, we establish an M&M alignment chart to assist determine the distribution of participants across M&M enjoyment classes.

Figure 7. M&M enjoyment alignment chart. The x-axis represents a participant’s average taste rating for USA M&Ms; the y-axis is a participant’s average taste rating for Denmark M&Ms. Figure made with Altair.

Notably, the upper right quadrant where each M&M types are perceived as “Good” to “Normal” is usually occupied by North American participants and a couple of Asian participants. All European participants land within the left half of the figure where USA M&Ms are “Normal” to “Bad”, but Europeans are somewhat split between the upper and lower halves, where perceptions of Denmark M&Ms range from “Good” to “Bad.”

An interactive version of Figure 7 is provided below for the reader to explore the counts of varied M&M alignment regions.

Figure 7 (interactive): click and brush your mouse over the scatter plot to see the counts of continents in numerous M&M enjoyment regions. Figure made with Altair.

3.4 Participant taste response ratio

Next, to factor out baseline M&M enjoyment and give attention to participants’ relative preference between the 2 M&M types, we took the log ratio of every person’s USA M&M taste rating average divided by their Denmark M&M taste rating average.

Equation 1: Equation to calculate each participant’s overall M&M preference ratio.

As such, positive scores indicate a preference towards USA M&Ms while negative scores indicate a preference towards Denmark M&Ms.

On average, European participants had the strongest preference towards Denmark M&Ms, with Asians also exhibiting a slight preference towards Denmark M&Ms (Figure 8). To the 2 Europeans who exhibited deflated pride upon learning their slight preference towards USA M&Ms, fear not: you probably did not think USA M&Ms were “Good,” but simply ranked them as less bad than Denmark M&Ms (see participant_id 4 and 17 within the interactive version of Figure 7). In case you assert that M&Ms are a nasty American invention not value replicating and return to consuming artisanal European chocolate, your honor can likely be restored.

Figure 8. Distribution of participant M&M preference ratios by continent. Preference ratios are calculated as in Equation 1. Positive numbers indicate a relative preference for USA M&Ms, while negative indicate a relative preference for Denmark M&Ms. Figure made with Seaborn.

North American participants are pretty split of their preference ratios: some fall quite neutrally around 0, others strongly prefer the familiar USA M&M, while a handful moderately prefer Denmark M&Ms. Anecdotally, North Americans who learned their preference skewed towards European M&Ms displayed signals of inflated pride, as if their results signaled posh refinement.

Overall, a T-test comparing the distributions of M&M preference ratios shows a possibly significant difference within the means between European and North American participants ( = 0.049), but come on, that is just like the twentieth p-value I’ve reported — this one might be too near call.

3.5 Taste inconsistency and “Perfect Classifiers”

For every participant, we assessed their taste rating consistency by averaging the usual deviations of their responses to every M&M type, and plotting that against their preference ratio (Figure 9).

Figure 9. Participant taste consistency by preference ratio. The x-axis is a participant’s relative M&M preference ratio. The y-axis is the common of the usual deviation of their USA M&M scores and the usual deviation of their Denmark M&M scores. A price of 0 on the y-axis indicates perfect consistency in responses, while higher values indicate more inconsistent responses. Figure made with Altair.

Most participants were somewhat inconsistent of their rankings, rating the identical M&M type otherwise across the 5 samples. This could be expected if the taste difference between European-sourced and American-sourced M&Ms isn’t actually all that perceptible. Most inconsistent were participants who gave the identical M&M type “Good”, “Normal”,  “Bad” responses (e.g., points high on the y-axis, with wider standard deviations of taste scores), indicating lower taste perception abilities.

Intriguingly, 4 participants — one from each continent group — were perfectly consistent: they reported the identical taste response for every of the 5 M&Ms from each M&M type, leading to a mean standard deviation of 0.0 (bottom of Figure 9). Excluding the one in all the 4 who simply rated all 10 M&Ms as “Normal”, the opposite three seemed to be “Perfect Classifiers” — either rating all M&Ms of 1 type “Good” and the opposite “Normal”, or rating all M&Ms of 1 type “Normal” and the opposite “Bad.” Perhaps these folks are “super tasters.”

3.6 M&M color

One other possible explanation for the inconsistency in individual taste responses is that there exists a perceptible taste difference based on the M&M color. Visually, the USA M&Ms were noticeably more smooth and vibrant than the Denmark M&Ms, which were somewhat more “splotchy” in appearance (Figure 10A). M&M color was recorded throughout the experiment, and although balanced sampling was not formally built into the experimental design, colours appeared to be sampled roughly evenly, except Blue USA M&Ms, which were oversampled (Figure 10B).

Figure 10. M&M colours. A. Photo of every M&M color of every type. It’s perhaps a bit hard to perceive on screen in my unprofessionally lit photo, but with the naked eye, USA M&Ms appeared to be brighter and more uniformly coloured while Denmark M&Ms have a duller and more mottled color. Is it just me, or are you able to already hear the Europeans saying “They’re brighter due to all those extra chemicals you place in your food that we ban here!” B. Distribution of M&Ms of every color sampled over the course of the experiment. The Blue USA M&Ms weren’t intentionally oversampled — they need to be especially shiny/tempting to experimenters. Figure made with Altair.

We briefly visualized possible differences in taste responses based on color (Figure 11), nonetheless we don’t consider there are enough data to support firm conclusions. In spite of everything, on average each participant would likely only taste 5 of the 6 M&M colours once, and 1 color by no means. We leave further M&M color investigations to future work.

Figure 11. Taste response profiles for M&Ms of every color and kind. Profiles are reported as percentages of “Bad”, “Normal”, and “Good” responses, though not all M&Ms were sampled exactly evenly. Figure made with Altair.

3.7 Colourful commentary

We assured each participant that there was no “right “answer” on this experiment and that each one feelings are valid. While some participants took this to heart and sometimes spent over a minute deeply savoring each M&M and evaluating it as in the event that they were a sommelier, many participants appeared to view the experiment as a contest (which occasionally led to deflated or inflated pride). Experimenters wrote down quotes and notes together with M&M responses, a few of which were a bit “colourful.” We offer a rapidly rendered word cloud for every M&M type for entertainment purposes (Figure 12) though we caution against reading too far into them without diligent sentiment evaluation.

Figure 11. A straightforward word cloud generated from the notes column of every M&M type. Fair warning — these haven’t been properly analyzed for sentiment and a few inappropriate language was recorded. Figure made with WordCloud.

4. Conclusion

Overall, there doesn’t seem like a “global consensus” that European M&Ms are higher than American M&Ms. Nonetheless, European participants tended to more strongly express negative reactions to USA M&Ms while North American participants seemed relatively split on whether or not they preferred M&Ms sourced from the USA vs from Europe. The preference trends of Asian participants often fell somewhere between the North Americans and Europeans.

Due to this fact, I’ll admit that it’s probable that Europeans aren’t engaged in a grand coordinated lie about M&Ms. The skew of most European participants towards Denmark M&Ms is compelling, especially since I used to be the experimenter who personally collected much of the taste response data. In the event that they found a approach to cheat, it was done well enough to exceed my very own passive perception such that I didn’t notice. Nonetheless, based on this study, it could appear that a strongly negative “vomit flavor” isn’t universally perceived and doesn’t turn out to be apparent to non-Europeans when tasting each M&Ms types side by side.

We hope this study has been illuminating! We’d look ahead to extensions of this work with improved participant sampling, additional M&M types sourced from other continents, and deeper investigations into possible taste differences as a consequence of color.

Thanks to everyone who participated and ate M&Ms within the name of science!

Figures and evaluation might be found on github: https://github.com/erinhwilson/mnm-taste-test

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x