Is Google’s Reveal of Gemini’s Impact Progress or Greenwashing?

-

Based on a technical paper from Google, accompanied by a blog post on their website, the estimated energy consumption of “the median Gemini Apps text prompt” is 0.24 watt-hours (Wh). The water consumption is 0.26 milliliters which is about five drops of water in accordance with the blog post, and the carbon footprint is 0.03 gCO2e. Notably, the estimate doesn’t include image or video prompts.

What’s the magnitude of 0.24 Wh? When you give it 30 median-like prompts per day all 12 months, you’ll have used 2.62 KWh of electricity. That’s the identical as running your dishwasher 3-5 times depending on its energy label.

Google’s disclosure of the environmental impact of their Gemini models has given rise to a fresh round of debate on the environmental impact of AI and the right way to measure it.

On the surface, these numbers sound reassuringly small, however the more closely you look, the more complicated the story becomes. Let’s dive in. 

Measurement scope

Let’s take a have a look at what’s included and what’s omitted in Google’s estimates of the median Gemini text prompt.

Inclusions

The scope of their assessment is “material energy sources under Google’s operational control—i.e. the power to implement changes to behavior. Specifically, they decompose LLM serving energy consumption as:

  • AI accelerators energy (TPUs – Google’s pendant to the GPU), including networking between accelerators in the identical AI computer. These are direct measurements during serving. 
  • Energetic CPU and DRAM energy – although the AI accelerators aka GPUs or TPUs receive essentially the most attention within the literature, CPU and memory also uses noticeable amounts of energy. 
  • Energy consumption from idle machines waiting to process spike traffic
  • Overhead energy, i.e. the infrastructure supporting data centers—including cooling systems, power conversion, and other overhead throughout the data center. That is taken into consideration through the PUE metric – an element that you just multiply measured energy consumption by – and so they assume a PUE of 1.09.
  • Google not only measured energy consumption from the LLM that generates the response users see, but additionally energy from supporting models like scoring, rating, classification etc.

Omissions

Here’s what is just not included: 

  • All networking before a prompt hits the AI computer, ie external networking and internal networking that routes queries to the AI computer.
  • End user devices, ie our phones, laptops etc
  • Model training and data storage

Progress or greenwashing?

Above, I outlined the target facts of the paper. Now, let’s have a look at different perspectives on the figures. 

Progress

We are able to hail Google’s publication because:

  • Google’s paper stands out due to the detail behind it. They included CPU and DRAM, which is unfortunately unusual. Meta, for example, only measures GPU energy.
  • Google used the median energy consumption slightly than the typical. The median is just not influenced by outliers reminiscent of very long or very short prompts and thus arguably tells us what a “typical” prompt consumes. 
  • Something is healthier than nothing. It’s an enormous step forward from back of the envelope measurements (guilty as charged) and perhaps they’re paving the way in which for more detailed studies in the long run.
  • Hardware manufacturing costs and end of life costs are included 

Greenwashing

We are able to criticize Google’s paper because: 

  • It lacks accumulative figures – ideally we would love to know the whole impact of their LLM services and what number of Google’s total footprint they account for.
  • The authors don’t define what the median prompt looks like, e.g. how long is it and the way long is the response it elicits
  • They used the median energy consumption than the typical. Yes, you read right. This will be viewed as either positive or negative. The median “hides” the effect of high complexity use cases, e.g. very complex reasoning tasks or summaries of very long texts. 
  • Carbon emissions are reported using the market based approach (counting on energy procurement certificates) and never location-based grid data that shows the actual carbon emissions of the energy they used. Had they used the situation based approach, the carbon footprint would have been 0.09 gCO2e per median prompt and never 0.03 gCO2e.
  • LLM training costs will not be included. The talk concerning the role of coaching costs in total costs is ongoing. Does it play a small or big a part of the whole number? We shouldn’t have the total picture (yet). But, we do know that for some models, it takes tons of of thousands and thousands of prompts to achieve cost parity, which suggests that model training could also be a big think about the whole energy costs.
  • They didn’t disclose their data, so we cannot double check their results
  • The methodology is just not entirely clear. For example, it’s unclear how they arrived on the scope 1 and three emissions of 0.010 gCO2e per median prompt. 
  • Google’s water use estimate only considers on-site water consumption, and never total water consumption (i.e. excluding water consumption sources reminiscent of electricity generation) which is contrary to plain practice.
  • They exclude emissions from external networking, nevertheless, a life cycle assessment of Mistral AI’s Large 2 model shows that network traffic of tokens account for a miniscule a part of the whole environmental costs of LLM inference (<1 %). So does end user equipment (3 %)

Gemini vs OpenAI ChatGPT vs Mistral

Google’s publication follows disclosures — although of various degrees of detail — by Mistral AI and OpenAI. 

Sam Altman, CEO at OpenAI, recently wrote in a blog post that: “the typical query uses about 0.34 watt-hours, about what an oven would use in slightly over one second, or a high-efficiency lightbulb would use in a few minutes. It also uses about 0.000085 gallons of water; roughly one fifteenth of a teaspoon.” You’ll be able to read my in-depth evaluation of that claim here.

It’s tempting to check Gemini’s 0.24 Wh per prompt to ChatGPT’s 0.34 Wh, however the numbers will not be directly comparable. Gemini’s number is the , while ChatGPT’s is the (arithmetic mean, I might enterprise). Even in the event that they were each medians or means, we couldn’t necessarily conclude that Google is more energy efficient than OpenAI, because we don’t know anything concerning the prompt that’s measured. It may very well be that OpenAI’s users ask questions that require more reasoning or just ask longer questions or elicit longer answers. 

Based on Mistral AI’s life cycle assessment, a 400-token response from their Large 2 model emits 1.14 gCO₂e and uses 45 mL of water. 

Conclusion

So, is Google’s disclosure greenwashing or real progress? I hope I actually have equipped you to make up your mind about that query. In my opinion, it’s progress, since it widens the scope of what’s measured and offers us data from real infrastructure. Nevertheless it also falls short since the omissions are as essential because the inclusions. One other thing to bear in mind is that these numbers often sound digestible, but they don’t tell us much about systemic impact. Personally, I’m nevertheless optimistic that we’re currently witnessing a wave of AI impact disclosures from big tech, and I could be surprised if Anthropic is just not up next. 


That’s it! I hope you enjoyed the story. Let me know what you’re thinking that!

Follow me for more on AI and sustainability and be happy to follow me on LinkedIn.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x