Hugging Face and FriendliAI partner to supercharge model deployment on the Hub

FriendliAI’s inference infrastructure is now integrated into the Hugging Face Hub as an option within the “Deploy this model” button, simplifying and accelerating generative AI model serving.

A Collaboration to Advance AI Innovation

Hugging Face empowers developers, researchers, and businesses to innovate in AI. Our common priority is constructing impactful partnerships that simplify workflows and supply cutting-edge tools for the AI community.

Today, we’re excited to announce a partnership between HF and FriendliAI, a frontrunner in accelerated generative AI inference, to reinforce how developers deploy and manage AI models. This integration introduces FriendliAI Endpoints as a deployment option throughout the Hugging Face Hub, offering developers direct access to high-performance, cost-effective inference infrastructure.

FriendliAI is ranked because the fastest GPU-based generative AI inference provider by Artificial Evaluation, with groundbreaking technologies including continuous batching, native quantization, and best-in-class autoscaling. With this technology, FriendliAI continues to advance the standards for AI inference serving performance, delivering faster processing speeds, reduced latency, and substantial cost savings for deploying generative AI models at scale. Through this partnership, Hugging Face users and FriendliAI customers can effortlessly deploy open-source or custom generative AI models with unparalleled efficiency and reliability.

Simplifying Model Deployment

Last yr, FriendliAI introduced a Hugging Face integration, enabling users to seamlessly deploy Hugging Face models directly throughout the Friendli Suite platform. Through this integration, users gained access to hundreds of supported open-source models on Hugging Face, in addition to the potential to deploy private models effortlessly. The list of model architectures currently supported by FriendliAI may be found here.

Today, we’re taking this integration further by enabling the identical capability directly throughout the Hugging Face Hub, offering 1-click deployment for a seamless user experience. You possibly can deploy models directly from the model card on the Hugging Face Hub using a Friendli Suite account.

Choosing Friendli Endpoints will take you to FriendliAI’s model deployment page. Here, you’ll be able to deploy the model on NVIDIA H100 GPUs while concurrently interacting with optimized open-source models. The deployment page features an intuitive interface for organising Friendli Dedicated Endpoints, the managed service for generative AI inference. Moreover, while your deployment is processing, you’ll be able to chat with open-source models directly on the page, making it easy to explore and test their capabilities.

Deploy models with NVIDIA H100 in Friendli Dedicated Endpoints

With FriendliAI’s advanced GPU-optimized inference engine, Dedicated Endpoints delivers fast and cost-effective inference as a managed service. Developers can effortlessly deploy open-source or custom models on NVIDIA H100 GPUs using Friendli Dedicated Endpoints by clicking “Deploy now” on the model deployment page.

H100 GPUs are powerful but may be expensive to operate at scale. With FriendliAI’s optimized service, you’ll be able to reduce the variety of GPUs needed while maintaining peak performance, significantly lowering costs. Beyond cost efficiency, Dedicated Endpoints also simplifies the complexities of managing infrastructure.

Inference Open-Source Models with Friendli Serverless Endpoints

Friendli Serverless Endpoints is the right solution for developers who wish to efficiently inference open-source models. This service provides user-friendly APIs for models optimized by FriendliAI, ensuring high performance at a low price. You possibly can chat with these powerful open-source models directly on the model deployment page.

What’s Next

We’re thrilled to deepen the FriendliAI<>HF collaboration, enhancing accessibility to open-source AI for developers worldwide. FriendliAI’s high-speed, cost-efficient inference solution eliminates the complexities of infrastructure management, empowering users to concentrate on innovation. Along with FriendliAI, we remain committed to remodeling how AI is developed, driving groundbreaking innovation that shapes the following era of AI.

You can even give us a Follow on our organization page to be updated about future news 🔥

Source link

Hugging Face and FriendliAI partner to supercharge model deployment on the Hub

A Collaboration to Advance AI Innovation

Simplifying Model Deployment

Deploy models with NVIDIA H100 in Friendli Dedicated Endpoints

Inference Open-Source Models with Friendli Serverless Endpoints

What’s Next

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Large Language Models: A Recent Moore’s Law?

Scaling up BERT-like model Inference on modern CPU

Architecting GPUaaS for Enterprise AI On-Prem

Nice-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers

Accelerating PyTorch distributed fine-tuning with Intel technologies

Hugging Face and FriendliAI partner to supercharge model deployment on the Hub

A Collaboration to Advance AI Innovation

Simplifying Model Deployment

Deploy models with NVIDIA H100 in Friendli Dedicated Endpoints

Inference Open-Source Models with Friendli Serverless Endpoints

What’s Next

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.