Bursting the AI Hype Bubble Once and for All

Misinformation and poor research: a case study

One cannot ignore the undeniable fact that AI models, equivalent to ChatGPT, have taken over the web, finding their way into every corner of it.

Most of AI’s applications are extremely useful and useful for a wide selection of tasks (in healthcare, engineering, computer vision, education, etc) and there’s no reason why we shouldn’t invest our money and time of their development.

That’s not the case for Generative AI (GenAI), to which I’ll be specifically referring in this text. This includes LLMs and RAGs, equivalent to ChatGPT, Claude, Gemini, Llama, and other models. It’s crucial to be very specific in what we call AI, what models we use, and their environmental impacts.

[1]: Interest over time (last 4 years) across the terms “AI” and “ChatGPT” online. Screenshot taken by me. Source: Google Trends

So, is AI taking on the world? Does it have an IQ of 120? Can it think faster and higher than a human?

AI hype is the generalized societal excitement around AI, specifically, transformer (GPT-like) models. It has infiltrated every sector — healthcare, IT, economics, art — and each level of the production chain. The truth is, a whopping 43% of executives and CEOs already use Generative AI to tell strategic decisions [2]. The next linked articles relate tech layoffs to AI usage in FAANG and other big firms [3, 4, 5].

AI hype’s effects may also be seen within the stock martket. The case of NVIDIA Corp is a transparent example of it: since NVIDIA produces key hardware components (GPU) to coach AI models, their stock value has risen incredibly (and arguably not reflecting an actual company’s growth, but more of a perceived importance).

NVIDIA Corp’s stock evolution in the course of the last fiive years. An incredible growth may be seen within the last 12 months, triplicating the market value (52wk High is x3.5 the worth of 52wk Low), and a fair greater growth within the last three years (x27.58). Screenshot taken by me. Data from Refinitiv.

Humans have all the time been immune to adopt latest technologies, specially those which they don’t fully understand. It’s a scary steps to take. Every breakthrough seems like a “bet” against the unknown — and so we fear it. Most of us don’t switch over to the brand new thing until we’re sure its utility and safety justifies the chance. Well, that’s until something upsets our instincts, something just as based in emotion as fear: hype.

Generative AI has an important deal of problems, most of them virtually unsolvable. A number of examples are model hallucinations (what number of r’s in strawberry? [6]), no auto-discrimination (models can’t tell wether they’re doing a task appropriately or not [7]) and others, like security vulnerabilities.

Example mock up conversation of an AI hallucination. Image generated by me. Example just like cases shown in [6] and [17].

Once we take ethics into consideration, things don’t get any higher. AI opens an enormous array of cans of worms: copyright, privacy, environmental and economic issues. As a transient summary, to avoid exceeding this text’s extension:

AI is trained with stolen data: Most, if not the overwhelming majority of content used for training is stolen. In the course of our society’s reckoning with the boundaries of authorship protection and fair use, the panic ignited by IA coud do as much damage as its proper thievery. The Smithsonian [8], The Atlantic [9], IBM [10], and Nature [11] are all talking about it.

Perpetuation of economic inequalities: Proxy, very large and low-return investments made by the CEOs normally bounce back on the working class through massive layoffs, lower salaries, or worse working conditions. This perpetuates social and economic inequalities, and only serves the aim of maintaining the AI hype bubble [12].

Contributing to the environmental crisis: Earth’s study [13], claims that ChatGPT-3 (175B parameters) used 700000 litres of freshwater for its training, and consumed half a litre of water per average conversation with a user. Linearly extrapolating the study, for ChatGPT-4 (around 1.8 trillion parameters), 7 million litres of water would have been used for the training, and 5 litres of water are being consumed per conversation.

A recent study by Maxim Lott [14], titled (sic) “Massive Breakthrough in AI intelligence: OpenAI passes IQ 120 ” [15] and published in his 6000+ subscriber newsletter, showed promising results when evaluating AI with an IQ test. The brand new OpenAI o1 achieved 120 IQ rating, leaving an enormous gap between itself and the subsequent models (Claude-3 Opus, GPT4 Omni and Claude-3.5 Sonnet, which scored just above 90 IQ each).

These are the averaged results of seven IQ tests. For context, an IQ of 120 would situate OpenAI among the many top 10% of humans when it comes to intelligence.

Image from Maxim Lott’s blog post. Mensa Norway’s IQ test results, questions online (first lead to DuckDuckGo “Mensa Norway iq test”, may be found here)

What’s the catch? Is that this it? Have we already programmed a model (notably) smarter than the typical human? Has the machine surpassed its creator?

The catch is, as all the time, the training set. Maxim Lott claims that the test questions weren’t within the training set, or that, a minimum of, whether or not they were in there or not wasn’t relevant [15]. It’s notable that when he evaluates the models with an allegedly private, unpublished (but calibrated) test, the IQ scores get absolutely demolished:

Image from Maxim Lott’s blog post. Latest test containing fresh IQ questions in addition to older, online-available questions. It isn’t clear what the ratio of old/latest questions is, in addition to in the event that they were equally distributed in complexity.

Why does this occur?

This happens because the models have the knowledge of their training data set, and by searching the query they’re being asked, they’re able to get the outcomes without having to “think” about them.

Give it some thought as if, before an exam, a human was given each the questions and the answers, and only needed to memorize each question-answer pair. You wouldn’t say they’re intelligent for getting a 100%, right?

On top of that, the vision models perform terribly in each tests, with a calculated IQ between 50 and 67. Their scores are consistent with an agent answering at random, which in Mensa Norway’s test would lead to 1 out of 6 questions being correct. Extrapolating from M. Lott’s observations and the way actual tests like WAIS-IV work, if 25/35 is similar to an IQ of 120, then 17.5/35 could be similar to IQ 100, 9/35 could be just above 80 IQ, and selections at random (~6/35 correct) would rating around 69–70 IQ.

Not only that, but most questions’ rationale seem, at best, significantly off or plain improper. The models seem to seek out non-existent patterns, or generate pre-written, reused answers to justify their selections.

Moreover, even while claiming that the test was offline-only, it appears that evidently it was posted online for an undetermined variety of hours. Quote, “I then created a survey consisting of his latest questions, together with some Norway Mensa questions, and asked readers of this blog to take it. About 40 of you probably did. I then deleted the survey. That way, the questions have never been posted to the general public web accessed by search engines like google, etc, and so they ought to be secure from AI training data.“ [15].

The writer continuously contradicts himself, making ambiguous claims without actual proof to back them up, and presenting them as actual evidence.

So not only the questions were posted to the web, however the test also included the older questions (those that were within the training data). We see here, again, contradictory statements by Lott.

Sadly, we don’t have an in depth breakdown of the questions results or proportions, separating them between old and latest. The outcomes would surely be interesting to see. Again, signs of incomplete research.

So yes, there’s evidence that the questions were within the training data, and that not one of the models really understand what they’re doing or their very own “considering” process.

Further examples may be present in this article about AI and idea generation. Despite the fact that it, too, rides the hype wave, it shows how models are incapable of distinguishing between good or bad ideas, implying that they don’t understand the underlying concepts behind their tasks [7].

And what’s the issue with the outcomes?

Following the scientific method, if a researcher got this results, the subsequent logical step could be to simply accept that OpenAI has not made any significant breakthrough (or that if it has, it isn’t measurable using IQ tests). As an alternative, Lott doubles down on his “Massive breakthrough in AI” narrative. That is where the misinformation starts.

Let’s close the circle: how are these sorts of articles contributing to the AI hype bubble?

The article’s web optimization [16] could be very clever. Each the title and the thumbnail are incredibly misleading, which in turn make for very flashy tweets, Instagram and Linkedin posts. The miraculous scores on the IQ bell curve are only too good to disregard.

On this section, I’ll review afew examples of how the “piece of stories” is being distributed along social media. Have in mind that the embedded tweets might take a couple of seconds to load.

CC: OpenAI o1 is now smarter than most humans, in line with the Norway Mensa IQ test. It scored 120, 20 points higher than the typical human and 30 points higher than other high-level AI models like Claude. Insane if true. Full IQ test results here: (link to article) [18]

This tweet claims that the outcomes are “in line with the Norway Mensa IQ test”, which is unfaithful. The claims weren’t made by the test, they were made by a 3rd party. Again, it states it as a fact, and later gives plausible deniability (“insane if true”). Let’s see the subsequent one:

CC: AI is smarter than the typical human now. This incredible research from maximlott@ is great, and I highly recommend following him. What happens when the entire models surpass humans? (picture of first a part of the article) [19]

This tweet doesn’t budge and directly presents Lott’s study as factual (“AI is smarter than the typical human now”). On top of that, only a screenshot of the primary plot (questions-answers within the training data, inflated scores) is shown to the viewer, which is incredibly misleading.

CC: It’s happening: OpenAI’s latest movel jumped ***30 IQ points*** to 120 IQ. […] “Anxious about AI taking on the world? You most likely ought to be […] (see more). Note: the writer maximlott@ administered one other contamination free test which showed a lower rating (~100 — average human) but a comparatively similar breakthrough in IQ. So no matter which rating you take a look at, the leap was HUGE, and the trend is apparent. There isn’t much time left. [20]

This one is definitely misleading. Even when a form of disclaimer was given, the knowledge is inaccurate. The latter test was NOT contamination free, because it reportedly contained online-available questions, and still showed terrible performance within the visual a part of the test. There isn’t any apparent trend that may be observed here.

Double, and even triple-checking the knowledge we share is amazingly necessary. While truth is an unattainable absolute, false or partially false information could be very real. Hype, generalised societal emotion, or similar forces shouldn’t drive us to post carelessly, inadvertently contributing to keeping alive a movement that ought to have died years ago, and which is having such a negative economic and social impact.

An increasing number of of what ought to be confined to the realm of emotion and concepts is affecting our market, with stock becoming more volatile every day. The case of the AI boom is just one other example of how hype and misinformation are combined, and of how disastrous their effects may be.

Disclaimer: as all the time, replies are open for further discussion, and I encourage everyone to participate. Harassment and any form of hate speech, either to the writer of the unique post, to 3rd parties, or to myself, is not going to be tolerated. Every other form of dialogue is greater than welcome, wether it’s constructive or harsh criticism. Research should all the time have the option to be questioned and reviewed.

[1] Google Trends, visualization of “AI” and “ChatGPT” searches in the online since 2021. https://trends.google.com/trends/explore?date=2021-01-01%202024-10-03&q=AI,ChatGPT&hl=en

[2] IBM study in 2023 about CEOs and the way they see and use AI of their business decisions. https://newsroom.ibm.com/2023-06-27-IBM-Study-CEOs-Embrace-Generative-AI-as-Productivity-Jumps-to-the-Top-of-their-Agendas

[3] CNN, AI in tech layoffs. https://edition.cnn.com/2023/07/04/tech/ai-tech-layoffs/index.html

[4] CNN, layoffs and investment in AI. https://edition.cnn.com/2024/01/13/tech/tech-layoffs-ai-investment/index.html

[5] Bloomberg, AI is driving more layoffs than firms need to admit. https://www.bloomberg.com/news/articles/2024-02-08/ai-is-driving-more-layoffs-than-companies-want-to-admit

[6] INC, what number of rs in strawberry? This AI can’t let you know https://www.inc.com/kit-eaton/how-many-rs-in-strawberry-this-ai-cant-tell-you.html

[7] ArXiv, Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. https://arxiv.org/abs/2409.04109

[8] Smithsonian, Are AI image generators stealing from artists? https://www.smithsonianmag.com/smart-news/are-ai-image-generators-stealing-from-artists-180981488/

[9] The Atlantic, Generative AI Can’t Cite Its Sources. https://www.theatlantic.com/technology/archive/2024/06/chatgpt-citations-rag/678796/

[10] IBM, topic on AI privacy https://www.ibm.com/think/topics/ai-privacy

[11] Nature, Mental property and data privacy: the hidden risks of AI. https://www.nature.com/articles/d41586-024-02838-z

[12] Springer, The mechanisms of AI hype and its planetary and social costs. https://link.springer.com/article/10.1007/s43681-024-00461-2