“Nvidia may delay mass production of ‘Blackwell’ servers by 6 months resulting from heat issues, etc.”

-

(Photo = Core Weave)

Claims have emerged that NVIDIA may delay mass production of its ‘Blackwell’-based artificial intelligence (AI) servers to mid-2025 resulting from high power consumption and high-speed interconnection requirements. This implies a delay of about six months.

Citing a market research company TrendForce report on the twentieth (local time), Tom’s Hardware reported that NVIDIA could have to postpone production of AI servers based on the ‘B200’ and ‘GB200’ platforms to around mid-2025.

Nvidia has not released an official position on this. As well as, in a previous performance announcement, it was revealed that Blackwell servers have already been delivered to some firms and are in full production.

Nvidia and its partners are expected to ship limited quantities of Blackwell servers this yr. Although the schedule was barely delayed resulting from design issues, the improved B200 processor is claimed to have begun mass production in October.

Nonetheless, Trend Force predicted that the production of Blackwell servers is not going to increase significantly within the short term.

The reason is that that is resulting from power, heat generation, and inter-chip connection problems that occur when actually operating the Blackwell server. As well as, the time to resolve the issue and start full-scale mass production was estimated to be the second to 3rd quarter of 2025.

Initially, the ‘NVL72’ server equipped with 72 B200 GPUs is thought to eat 120 kilowatts (kW). This is far higher than the prevailing server, which consumes 20kW, and is thrice the 40kW consumed by the previous top-of-the-line ‘H100’ server.

Here, Trend Force announced that NVIDIA updated the specifications of the NVL72 device, and the present power consumption reaches 140kW. This exceeds the ability that may be supplied to a single rack in a typical data center.

As such, heat generation resulting from excessive power consumption was cited as the largest problem.

Even when consuming 120 kW, the server easily overheated. For this reason, NVIDIA needed to revise the server rack design several times. If increased to 140 kW, this will likely end in additional server design changes. This implies a delay in launch.

Cooling requirements may additionally increase. Liquid cooling is crucial for Blackwell servers, and modern coolant distribution units (CDUs) can handle 60 to 80 kW of warmth.

Nonetheless, it’s believed that this problem may be solved as cooling system firms are working to optimize the cooling plate design and expand the capability of the CDU to handle additional heat.

Trend Force identified that along with power and warmth issues, optimization of inter-chip connections is vital. Nonetheless, it didn’t mention specifically what the issue was.

Meanwhile, it was identified that if the mass production schedule of Blackwell servers is delayed resulting from this problem, it could affect the discharge schedule and availability of the follow-up models ‘B300’ and ‘GB300’ to be released next yr. Because the B300 series Blackwell GPUs will obviously provide more memory and improved computing performance, this will likely end in higher power consumption.

The evaluation is that the B300 server is prone to eat greater than 140kW of power, and for this reason, more sophisticated technology and cooling solutions are essential.

Reporter Park Chan cpark@aitimes.com

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x