High quality-tune Any LLM from the Hugging Face Hub with Together AI

The pace of AI development today is breathtaking. Each day, a whole lot of recent models appear on the Hugging Face Hub, some are specialized variants of popular base models like Llama or Qwen, others feature novel architectures or have been trained from scratch for specific domains. Whether it is a medical AI trained on clinical data, a coding assistant optimized for a specific programming language, or a multilingual model fine-tuned for specific cultural contexts, the Hugging Face Hub has turn into the beating heart of open-source AI innovation.

But here’s the challenge: finding a tremendous model is just the start. What happens once you discover a model that is 90% perfect on your use case, but you wish that extra 10% of customization? Traditional fine-tuning infrastructure is complex, expensive, and sometimes requires significant DevOps expertise to establish and maintain.

This is precisely the gap that Together AI and Hugging Face are bridging today. We’re announcing a robust recent capability that makes your complete Hugging Face Hub available for fine-tuning using Together AI’s infrastructure. Now, any compatible LLM on the Hub, whether it’s from Meta or a person contributor, could be fine-tuned with the identical ease and reliability you expect from Together’s platform.🚀

Getting Began in 5 Minutes

Here’s all it takes to start out fine-tuning a HF model on the Together AI platform:



from together import Together

client = Together(api_key="your-api-key")

file_upload = client.files.upload("sft_examples.jsonl", check=True)


job = client.fine_tuning.create(
    model="togethercomputer/llama-2-7b-chat",  
    from_hf_model="HuggingFaceTB/SmolLM2-1.7B-Instruct",  
    training_file=file_upload.id,
    n_epochs=3,
    learning_rate=1e-5,
    hf_api_token="hf_***",  
    hf_output_repo_name="my-username-org/SmolLM2-1.7B-FT"  
)

print(f"Training job began: {job.id}")

That is it! Your model shall be trained on Together’s infrastructure and could be deployed for inference, downloaded and even uploaded back to the Hub! For personal repositories, simply add your HF token with hf_api_token="hf_xxxxxxxxxxxx".

How It Works:

As seen in the instance above, once you fine-tune a Hugging Face model on Together AI, you really specify two models:

Base Model (model parameter): A model from Together’s official catalog that gives the infrastructure configuration, training optimizations, and inference setup
Custom Model (from_hf_modelparameter): Your actual Hugging Face model that gets fine-tuned

Consider the bottom model as a “training template.” It tells our system easy methods to optimally allocate GPU resources, configure memory usage, arrange the training pipeline, and prepare the model for inference. Your custom model must have the same architecture, approximate size, and sequence length to the bottom model for optimal results.

As seen in the instance above, if you desire to fine-tune HuggingFaceTB/SmolLM2-1.7B-Instruct (which uses Llama architecture), you’d use togethercomputer/llama-2-7b-chat as your base model template, because they share the identical underlying architecture.

The mixing works bidirectionally. Together AI can pull any compatible public model from the Hugging Face Hub for training, and with the correct API tokens, it may possibly download models private repositories as well. After training, your fine-tuned model could be mechanically pushed back onto the Hub when you’ve specified hf_output_repo_name, making it available for sharing along with your team or the broader community.

Usually, all CausalLM models under 100B params are intended to work. For a comprehensive walkthrough on easy methods to select base and custom models and rather more, read our full guide!

What This Means for Developers

This integration solves an actual problem a lot of us have faced: finding a terrific model on Hugging Face but not having the infrastructure to really fine-tune it on your specific needs. Now you possibly can go from discovering a promising model to having a customized version running in production with just a number of API calls.

The massive win here is removing friction. As a substitute of spending days establishing training infrastructure or being limited to whatever models are officially supported by various platforms, you possibly can now experiment with any compatible model from the Hub. Found a specialized coding model that is near what you wish? Train it in your data!📈

For teams, this implies faster iteration cycles. You’ll be able to test multiple model approaches quickly, construct on community innovations, and even use your personal fine-tuned models as starting points for further customization.

How Teams Are Using This Feature?

Beta users and early adopters of this feature are already seeing results across diverse use cases.

Slingshot AI has integrated this capability directly into their model development pipeline. Fairly than being limited to Together’s model catalog, they will now run parts of the training pipeline on their very own infrastructure, upload those models to the Hub, after which perform continued fine-tuning of those models using the Together AI fine-tuning platform. This has dramatically accelerated their development cycles and allowed them to simply experiment with a broad number of model variants.

Parsed has demonstrated the ability of this approach of their work showing how small, well-tuned open-source models can outperform much larger closed models. By fine-tuning models on fastidiously curated datasets, they’ve achieved superior performance while maintaining cost efficiency and full control over their models.

Common usage we’re seeing from other customers include:

Domain Adaptation: Taking general-purpose models and specializing them for industries like healthcare, finance, or legal work. Teams discover models that have already got some domain knowledge and use Together’s infrastructure to adapt them to their specific data and requirements.
Iterative Model Improvement: Starting with a community model, fine-tuning it, then using that result as the place to begin for further refinement. This creates a compound improvement effect that will be difficult to realize ranging from scratch.
Community Model Specialization: Leveraging models which have already been optimized for specific tasks (like coding, reasoning, or multilingual capabilities) and further customizing them for proprietary use cases.
Architecture Exploration: Quickly testing newer architectures and model variants as they’re released, without waiting for them to be added to official platforms.

Probably the most significant advantage teams report is speed to value. As a substitute of spending weeks establishing training infrastructure or months training models from scratch, they will discover promising starting points from the community and have specialized models running in production inside days.

Cost efficiency is one other major profit. By starting with models that have already got relevant capabilities, teams need fewer training epochs and may use smaller datasets to realize their goal performance, dramatically reducing compute costs.

Perhaps most significantly, this approach gives teams access to the collective intelligence of the open-source community. Every breakthrough, every specialized adaptation, every novel architecture becomes a possible place to begin for their very own work.

Show Us What You Construct!🔨

As expected with a big feature of this nature, we’re actively improving the experience based on real usage, so your feedback directly shapes the platform!

Start with our implementation guide for examples and troubleshooting suggestions. In the event you run into issues or wish to share what you are constructing, hop into our Discord, our team is on there and the community is pretty lively about helping one another out.

If you will have any feedback about fine-tuning at Together AI or wish to explore it on your tasks in additional depth, be at liberty to reach out to us!

Source link

High quality-tune Any LLM from the Hugging Face Hub with Together AI

Getting Began in 5 Minutes

How It Works:

What This Means for Developers

How Teams Are Using This Feature?

Show Us What You Construct!🔨

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

A Tale of Two Variances: Why NumPy and Pandas Give Different Answers

How Vision Language Models Are Trained from “Scratch”

Why Care About Prompt Caching in LLMs?

Supply-chain attack using invisible code hits GitHub and other repositories

Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline

High quality-tune Any LLM from the Hugging Face Hub with Together AI

Getting Began in 5 Minutes

How It Works:

What This Means for Developers

How Teams Are Using This Feature?

Show Us What You Construct!🔨

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.