Update: This service is deprecated and not available as of April tenth, 2025.
Today, we’re thrilled to announce the launch of Train on DGX Cloud, a brand new service on the Hugging Face Hub, available to Enterprise Hub organizations. Train on DGX Cloud makes it easy to make use of open models with the accelerated compute infrastructure of NVIDIA DGX Cloud. Together, we built Train on DGX Cloud in order that Enterprise Hub users can easily access the newest NVIDIA H100 Tensor Core GPUs, to fine-tune popular Generative AI models like Llama, Mistral, and Stable Diffusion, in only just a few clicks inside the Hugging Face Hub.
GPU Poor No More
This latest experience expands upon the strategic partnership we announced last yr to simplify the training and deployment of open Generative AI models on NVIDIA accelerated computing. One in all the predominant problems developers and organizations face is the scarcity of GPU availability, and the time-consuming work of writing, testing, and debugging training scripts for AI models. Train with DGX Cloud offers a simple solution to those challenges, providing quick access to NVIDIA GPUs, starting with H100 on NVIDIA DGX Cloud. As well as, Train with DGX Cloud offers an easy no-code training job creation experience powered by Hugging Face AutoTrain and Hugging Face Spaces.
Enterprise Hub organizations may give their teams quick access to powerful NVIDIA GPUs, only incurring charges per minute of compute instances used for his or her training jobs.
“Train on DGX Cloud is the simplest, fastest, most accessible solution to train Generative AI models, combining quick access to powerful GPUs, pay-as-you-go, and no-code training,” says Abhishek Thakur, creator of Hugging Face AutoTrain. “It should be a game changer for data scientists in all places!”
“Today’s launch of Hugging Face Autotrain powered by DGX Cloud represents a noteworthy step toward simplifying AI model training,” said Alexis Bjorlin, vp of DGX Cloud, NVIDIA. “By integrating NVIDIA’s AI supercomputer within the cloud with Hugging Face’s user-friendly interface, we’re empowering organizations to speed up their AI innovation.”
How it really works
Training Hugging Face models on NVIDIA DGX Cloud has never been easier. Below you will see that a step-by-step tutorial to fine-tune Mistral 7B.
Note: You wish access to an Organization with a Hugging Face Enterprise subscription to make use of Train on DGX Cloud
You could find Train on DGX Cloud on the model page of supported Generative AI models. It currently supports the next model architectures: Llama, Falcon, Mistral, Mixtral, T5, Gemma, Stable Diffusion, and Stable Diffusion XL.
Open the “Train” menu, and choose “NVIDIA DGX Cloud” – this can open an interface where you may select your Enterprise Organization.
Then, click on “Create latest Space”. When using Train on DGX Cloud for the primary time, the service will create a brand new Hugging Face Space inside your Organization, so you need to use AutoTrain to create training jobs that might be executed on NVIDIA DGX Cloud. When you desire to create one other training job later, you’ll robotically be redirected back to the present AutoTrain Space.
Once within the AutoTrain Space, you may create your training job by configuring the Hardware, Base Model, Task, and Training Parameters.
For Hardware, you may select NVIDIA H100 GPUs, available in 1x, 2x, 4x and 8x instances, or L40S GPUs (coming soon). The training dataset have to be directly uploaded within the “Upload Training File(s)” area. CSV and JSON files are currently supported. Make sure that that the column mapping is correct following the instance below. For Training Parameters, you may directly edit the JSON configuration on the proper side, e.g., changing the variety of epochs from 3 to 2.
When all the things is about up, you may start your training by clicking “Start Training”. AutoTrain will now validate your dataset, and ask you to substantiate the training.
You may monitor your training by opening the “logs” of the Space.
After your training is complete, your fine-tuned model might be uploaded to a brand new private repository inside your chosen namespace on the Hugging Face Hub.
Train on DGX Cloud is obtainable today for all Enterprise Hub Organizations! Give the service a try, and tell us your feedback!
Pricing for Train on DGX Cloud
Usage of Train on DGX Cloud is billed by the minute of the GPU instances used during your training jobs. Current prices for training jobs are $8.25 per GPU hour for H100 instances, and $2.75 per GPU hour for L40S instances. Usage fees accrue to your Enterprise Hub Organizations’ current monthly billing cycle, once a job is accomplished. You may check your current and past usage at any time inside the billing settings of your Enterprise Hub Organization.
For instance, fine-tuning Mistral 7B on 1500 samples on a single NVIDIA L40S takes ~10 minutes and costs ~$0.45.
We’re just getting began
We’re excited to collaborate with NVIDIA to democratize accelerated machine learning across open science, open source, and cloud services.
Our collaboration on open science through BigCode enabled the training of StarCoder 2 15B, a totally open, state-of-the-art code LLM trained on greater than 600 languages.
Our collaboration on open source is fueling the brand new optimum-nvidia library, accelerating the inference of LLMs on the newest NVIDIA GPUs and already achieving 1,200 tokens per second with Llama 2.
Our collaboration on cloud services created Train on DGX Cloud today. We’re also working with NVIDIA to optimize inference and make accelerated computing more accessible to the Hugging Face community, leveraging our collaboration on NVIDIA TensorRT-LLM and optimum-nvidia. As well as, among the hottest open models on Hugging Face might be on NVIDIA NIM microservices, which was announced today at GTC.
For those attending GTC this week, ensure that to look at session S63149 on Wednesday 3/20, at 3pm PT where Jeff will guide you thru Train on DGX Cloud and more. Also don’t miss the following Hugging Solid where we are going to give a live demo of Train on DGX Cloud and you may ask questions on to Abhishek and Rafael on Thursday, 3/21, at 9am PT / 12pm ET / 17h CET – Watch record here.
