Google Cloud TPUs made available to Hugging Face users

-



Google Cloud TPUs made available to Hugging Face users

We’re excited to share some great news! AI builders are actually in a position to speed up their applications with Google Cloud TPUs on Hugging Face Inference Endpoints and Spaces!

For many who may not be familiar, TPUs are custom-made AI hardware designed by Google. They’re known for his or her ability to scale cost-effectively and deliver impressive performance across various AI workloads. This hardware has played a vital role in a few of Google’s latest innovations, including the event of the Gemma 2 open models. We’re excited to announce that TPUs will now be available to be used in Inference Endpoints and Spaces.

This can be a big step in our ongoing collaboration to offer you one of the best tools and resources in your AI projects. We’re really looking forward to seeing what amazing stuff you’ll create with this recent capability!



Hugging Face Inference Endpoints support for TPUs

Hugging Face Inference Endpoints provides a seamless solution to deploy Generative AI models  with a number of clicks on a dedicated, managed infrastructure using the cloud provider of your selection. Starting today, Google TPU v5e is out there on Inference Endpoints. Select the model you ought to deploy, select Google Cloud Platform, select us-west1 and also you’re ready to choose a TPU configuration:

We’ve got 3 instance configurations, with more to come back:

  • v5litepod-1 TPU v5e with 1 core and 16 GB memory ($1.375/hour)
  • v5litepod-4 TPU v5e with 4 cores and 64 GB memory ($5.50/hour)
  • v5litepod-8 TPU v5e with 8 cores and 128 GB memory ($11.00/hour)

ie-tpu

While you need to use v5litepod-1 for models with as much as 2 billion parameters without much hassle, we recommend to make use of v5litepod-4 for larger models to avoid memory budget issues. The larger the configuration, the lower the latency will probably be.

Along with the product and engineering teams at Google, we’re excited to bring the performance and value efficiency of TPUs to our Hugging Face community. This collaboration has resulted in some great developments:

  1. We have created an open-source library called Optimum TPU, which makes it super easy so that you can train and deploy Hugging Face models on Google TPUs.
  2. Inference Endpoints uses Optimum TPU together with Text Generation Inference (TGI) to serve Large Language Models (LLMs) on TPUs.
  3. We’re all the time working on support for a wide range of model architectures. Starting today you may deploy Gemma, Llama, and Mistral in a number of clicks. (Optimum TPU supported models).



Hugging Face Spaces support for TPUs

Hugging Face Spaces provide developers with a platform to create, deploy, and share AI-powered demos and applications quickly. We’re excited to introduce recent TPU v5e instance support for Hugging Face Spaces. To upgrade your Space to run on TPUs, navigate to the Settings button in your Space and choose the specified configuration:

  • v5litepod-1 TPU v5e with 1 core and 16 GB memory ($1.375/hour)
  • v5litepod-4 TPU v5e with 4 cores and 64 GB memory ($5.50/hour)
  • v5litepod-8 TPU v5e with 8 cores and 128 GB memory ($11.00/hour)

spaces-tpu

Go construct and share with the community awesome ML-powered demos on TPUs on Hugging Face Spaces!

We’re pleased with what we have achieved along with Google and might’t wait to see how you may use TPUs in your projects.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x