AWS Integrates AI Infrastructure with NVIDIA NVLink Fusion for Trainium4 Deployment

-


As demand for AI continues to grow, hyperscalers are searching for ways to speed up deployment of specialised AI infrastructure with the very best performance.

Announced today at AWS re:Invent, Amazon Web Services collaborated with NVIDIA to integrate with NVIDIA NVLink Fusion — a rack-scale platform that lets industries construct custom AI rack infrastructure with NVIDIA NVLink scale-up interconnect technology and an unlimited ecosystem of partners — to speed up deployment for the brand new Trainium4 AI chips, Graviton CPUs, Elastic Fabric Adapters (EFAs) and the Nitro System virtualization infrastructure. 

AWS is designing Trainium4 to integrate with NVLink 6 and the NVIDIA MGX rack architecture, the primary of a multigenerational collaboration between NVIDIA and AWS for NVLink Fusion.

With best-in-class scale-up networking, a whole technology stack and a comprehensive ecosystem of partners constructing on the technology, NVLink Fusion boosts performance, increases return on investment, reduces deployment risks, and accelerates time to marketplace for custom AI silicon.

Challenges to deploying custom AI silicon

AI workloads are getting larger, models have gotten more complex, and the pressure to rapidly deploy AI compute infrastructure that meets the needs of the growing market is higher than ever. 

Emerging workloads like planning, reasoning and agentic AI, running on lots of of billions- to trillion-parameter models and mixture-of-experts (MoE) model architectures require many systems with many accelerators all working in parallel, and connected in a single fabric. 

Meeting these demands requires a scale-up network, like NVLink, to attach entire racks of accelerators along with a high-bandwidth, low-latency interconnect.

Hyperscalers face challenges for deploying such specialized solutions:

  • Long development cycles for rack-scale architecture: Along with designing a custom AI chip, hyperscalers must develop a scale-up networking solution, scale-out and storage networking, and a rack design including trays, cooling, power-delivery, system management and AI acceleration software. This may cost billions of dollars and take years to deploy.
  • Managing a posh supplier ecosystem: Manufacturing full-rack architecture requires a posh supplier ecosystem for CPUs and GPUs, scale-up networking, scale-out networking, racks and trays, in addition to busbars, power shelves, powerwhips, cold plates, coolant distribution units and quick disconnects. Managing dozens of suppliers and lots of of 1000’s of components is incredibly complex, and a single supply delay or component change can put your complete project in danger.

NVLink Fusion addresses these challenges, helping hyperscalers remove networking performance bottlenecks, reduce deployment risks and speed up time to marketplace for custom AI silicon. 

NVLink Fusion offers a rack-scale AI infrastructure platform that permits hyperscalers and custom ASIC designers to integrate custom ASICs with NVLink and the OCP MGX rack-scale server architecture. 

On the core of NVLink Fusion is the NVLink Fusion chiplet. Hyperscalers can drop the chiplet into their custom ASIC designs to connect with the NVLink scale-up interconnect and NVLink Switch. The NVLink Fusion technology portfolio includes the Vera-Rubin NVLink Switch tray with the sixth generation NVLink Switch and 400G custom SerDes. It enables NVLink Fusion adopters to attach as much as 72 custom ASICs all-to-all at 3.6 TB/s per-ASIC, for a complete of 260 TB/s of scale-up bandwidth.

Image of the NVLink Fusion Chiplet enables connecting 72 custom ASICs all-to-all at 3.6 TB/s per ASIC
Image of the NVLink Fusion Chiplet enables connecting 72 custom ASICs all-to-all at 3.6 TB/s per ASIC
Figure 1. The NVLink Fusion Chiplet enables connecting 72 custom ASICs all-to-all at 3.6 TB/s per ASIC

NVLink Switch enables peer-to-peer memory access using direct loads, stores, and atomic operations, in addition to NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) for in-network reductions and multicast acceleration.

Unlike other scale-up networking approaches, NVLink is a proven, widely adopted technology. And combined with NVIDIA AI acceleration software, NVLink Switch delivers as much as 3x the performance and revenue for AI inference1 by connecting 72 accelerators in a single scale-up domain.

Image of the Vera Rubin NVLink Switch tray includes sixth generation NVLink 6 Switches with 400G custom SerDes
Image of the Vera Rubin NVLink Switch tray includes sixth generation NVLink 6 Switches with 400G custom SerDes
Figure 2. The Vera Rubin NVLink Switch tray includes sixth generation NVLink 6 Switches with 400G custom SerDes

Reduce development costs and speed up time-to-market with proven architecture and ecosystem

NVLink Fusion adopters can tap right into a modular portfolio of AI factory technology, including NVIDIA MGX rack architecture, GPUs, NVIDIA Vera CPUs, co-packaged optics switches, NVIDIA ConnectX SuperNICs, NVIDIA BlueField DPUs and NVIDIA Mission Control software, together with an ecosystem of ASIC designers, CPU and IP providers, and manufacturers. 

This technology portfolio that comes with NVLink Fusion enables hyperscalers to cut back development costs and time to market compared with sourcing their very own technology stack.

AWS can also be harnessing the NVLink Fusion OEMs/ODMs and supplier ecosystem, which provides all of the components required for full rack-scale deployment, from the rack and chassis to power-delivery and cooling systems. This ecosystem lets hyperscalers eliminate a majority of the risks related to rack-scale deployments.

Heterogeneous AI silicon, single rack-scale infrastructure

NVLink Fusion also allows AWS to construct a heterogeneous silicon offering using the identical footprint, cooling system and power distribution AI factory designs they already deploy.

NVLink Fusion adopters can use as little or much of the platform as they need — every bit may also help them quickly scale up to fulfill the demands of intensive inference and agentic AI model training workloads. 

Bringing custom AI chips to market is tough. NVLink Fusion enables hyperscalers and custom ASIC designers to leverage the proven NVIDIA MGX rack architecture, and NVLink scale-up networking. By leveraging NVLink Fusion for Trainium4 deployment, AWS will drive faster innovation cycles and speed up time-to-market.

Learn more about NVLink Fusion.

13x performance increase based on fifth generation NVLink, comparing NVL72 GB200 to NVL8 B200, each with NVLink Switch.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x