Smaller is smarter

-

Concerns concerning the environmental impacts of Large Language Models (LLMs) are growing. Although detailed information concerning the actual costs of LLMs could be difficult to search out, let’s attempt to collect some facts to know the size.

Generated with ChatGPT-4o

Since comprehensive data on ChatGPT-4 isn’t available, we will consider Llama 3.1 405B for instance. This open-source model from Meta is arguably probably the most “transparent” LLM thus far. Based on various benchmarks, Llama 3.1 405B is comparable to ChatGPT-4, providing an inexpensive basis for understanding LLMs inside this range.

The hardware requirements to run the 32-bit version of this model range from 1,620 to 1,944 GB of GPU memory, depending on the source (substratus, HuggingFace). For a conservative estimate, let’s use the lower figure of 1,620 GB. To place this into perspective — acknowledging that this can be a simplified analogy — 1,620 GB of GPU memory is roughly corresponding to the combined memory of 100 standard MacBook Pros (16GB each). So, while you ask one among these LLMs for a tiramisu recipe in Shakespearean style, it takes the ability of 100 MacBook Pros to offer you a solution.

I’m attempting to translate these figures into something more tangible… though this doesn’t include the training costs, that are estimated to involve around 16,000 GPUs at an approximate cost of $60 million USD (excluding hardware costs) — a big investment from Meta — in a process that took around 80 days. By way of electricity consumption, training required 11 GWh.

The annual electricity consumption per person in a rustic like France is roughly 2,300 kWh. Thus, 11 GWh corresponds to the yearly electricity usage of about 4,782 people. This consumption resulted in the discharge of roughly 5,000 tons of CO₂-equivalent greenhouse gases (based on the European average), , although this figure can easily double depending on the country where the model was trained.

For comparison, burning 1 liter of diesel produces 2.54 kg of CO₂. Due to this fact, training Llama 3.1 405B — in a rustic like France — is roughly corresponding to the emissions from burning around 2 million liters of diesel. This translates to roughly 28 million kilometers of automobile travel. I feel that gives enough perspective… and I haven’t even mentioned the water required to chill the GPUs!

Clearly, AI continues to be in its infancy, and we will anticipate more optimal and sustainable solutions to emerge over time. Nevertheless, on this intense race, OpenAI’s financial landscape highlights a big disparity between its revenues and operational expenses, particularly in relation to inference costs. In 2024, the corporate is projected to spend roughly $4 billion on processing power provided by Microsoft for inference workloads, while its annual revenue is estimated to range between $3.5 billion and $4.5 billion. Because of this inference costs alone nearly match — and even exceed — OpenAI’s total revenue (deeplearning.ai).

All of this is going on in a context where experts are announcing a performance plateau for AI models (scaling paradigm). Increasing model size and GPUs are yielding significantly diminished returns in comparison with previous leaps, akin to the advancements GPT-4 achieved over GPT-3. “The pursuit of AGI has all the time been unrealistic, and the ‘greater is best’ approach to AI was sure to hit a limit eventually — and I feel that is what we’re seeing here” said Sasha Luccioni, researcher and AI lead at startup Hugging Face.

But don’t get me fallacious — I’m not putting AI on trial, because I adore it! This research phase is completely a standard stage in the event of AI. Nevertheless, I imagine we’d like to exercise common sense in how we use AI: we will’t use a bazooka to kill a mosquito each time. AI should be made sustainable — not only to guard our surroundings but additionally to deal with social divides. Indeed, the danger of leaving the Global South behind within the AI race resulting from high costs and resource demands would represent a big failure on this latest intelligence revolution..

So, do you actually need the total power of ChatGPT to handle the only tasks in your RAG pipeline? Are you looking to regulate your operational costs? Do you would like complete end-to-end control over your pipeline? Are you concerned about your private data circulating on the net? Or perhaps you’re simply mindful of AI’s impact and committed to its conscious use?

Small language models (SLMs) offer a wonderful alternative price exploring. They will run in your local infrastructure and, when combined with human intelligence, deliver substantial value. Although there isn’t a universally agreed definition of an SLM — in 2019, as an illustration, GPT-2 with its 1.5 billion parameters was considered an LLM, which isn’t any longer the case — I’m referring to models akin to Mistral 7B, Llama-3.2 3B, or Phi3.5, to call just a few. These models can operate on a “good” computer, leading to a much smaller carbon footprint while ensuring the confidentiality of your data when installed on-premise. Although they’re less versatile, when used correctly for specific tasks, they will still provide significant value — while being more environmentally virtuous.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x