DeepSeek may not be such excellent news for energy in any case

-

Add the proven fact that other tech firms, inspired by DeepSeek’s approach, may now start constructing their very own similar low-cost reasoning models, and the outlook for energy consumption is already looking loads less rosy.

The life cycle of any AI model has two phases: training and inference. Training is the usually months-long process through which the model learns from data. The model is then ready for inference, which happens every time anyone on the planet asks it something. Each normally happen in data centers, where they require numerous energy to run chips and funky servers. 

On the training side for its R1 model, DeepSeek’s team improved what’s called a “mixture of experts” technique, through which only a portion of a model’s billions of parameters—the “knobs” a model uses to form higher answers—are turned on at a given time during training. More notably, they improved reinforcement learning, where a model’s outputs are scored after which used to make it higher. This is commonly done by human annotators, however the DeepSeek team got good at automating it. 

The introduction of a method to make training more efficient might suggest that AI firms will use less energy to bring their AI models to a certain standard. That’s probably not how it really works, though. 

“⁠Since the value of getting a more intelligent system is so high,” wrote Anthropic cofounder Dario Amodei on his blog, it “causes firms to spend , not less, on training models.” If firms get more for his or her money, they’ll find it worthwhile to spend more, and subsequently use more energy. “The gains in cost efficiency find yourself entirely dedicated to training smarter models, limited only by the corporate’s financial resources,” he wrote. It’s an example of what’s referred to as the Jevons paradox.

But that’s been true on the training side so long as the AI race has been going. The energy required for inference is where things get more interesting. 

DeepSeek is designed as a reasoning model, which suggests it’s meant to perform well on things like logic, pattern-finding, math, and other tasks that typical generative AI models struggle with. Reasoning models do that using something called “chain of thought.” It allows the AI model to interrupt its task into parts and work through them in a logical order before coming to its conclusion. 

You’ll be able to see this with DeepSeek. Ask whether it’s okay to mislead protect someone’s feelings, and the model first tackles the query with utilitarianism, weighing the immediate good against the potential future harm. It then considers Kantian ethics, which propose that it’s best to act in accordance with maxims that could possibly be universal laws. It considers these and other nuances before sharing its conclusion. (It finds that lying is “generally acceptable in situations where kindness and prevention of harm are paramount, yet nuanced with no universal solution,” in case you’re curious.)

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x