Anthropic is limiting access to Claude models in open agent platforms for Pro/Max subscribers. Don’t worry though, there are great open models on Hugging Face to maintain your agents running! More often than not, at a fraction of the fee.
In case you’ve been cut off and your OpenClaw, Pi, or Open Code agents need resuscitation, you’ll be able to move them to open models in two ways:
- Use an open model served through Hugging Face Inference Providers.
- Run a completely local open model on your personal hardware.
The hosted route is the fastest way back to a capable agent. The local route is the correct fit in case you want privacy, zero API costs, and full control.
To accomplish that, just tell your claude code, your cursor or your favorite agent: help me move my OpenClaw agents to Hugging Face models, and link this page.
Hugging Face Inference Providers
Hugging Face inference providers is an open platform that routes to providers of open source models. It’s the correct selection in case you want the most effective models otherwise you don’t have the needed hardware.
First, you’ll must create a token here. You then can add that token to openclaw like so:
openclaw onboard --auth-choice huggingface-api-key
Paste your Hugging Face token when prompted, and also you’ll be asked to pick out a model.
We’d recommend GLM-5 due to its excellent Terminal Bench scores, but there are hundreds to selected from here.
You may update your Hugging Face model at any time entering its repo_id within the OpenClaw config:
{
agents: {
defaults: {
model: {
primary: "huggingface/zai-org/GLM-5:fastest"
}
}
}
}
Note: HF PRO subscribers get $2 free credits every month which applies to Inference Providers usage, learn more here.
Local Setup
Running models locally gives you full privacy, zero API costs, and the power to experiment without rate limits.
Install Llama.cpp, a completely open source library for low resource inference.
# on mac or linux
brew install llama.cpp
# on windows
winget install llama.cpp
Start a neighborhood server with a built-in web UI:
llama-server -hf unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL
Here, we’re using Qwen3.5-35B-A3B, which works great with 32GB of RAM. If you’ve got different requirements, please try the hardware compatibility for the model you are focused on. There are hundreds to select from.
In case you load the GGUF in llama.cpp, use an OpenClaw config like this:
openclaw onboard --non-interactive
--auth-choice custom-api-key
--custom-base-url "http://127.0.0.1:8080/v1"
--custom-model-id "unsloth-qwen3.5-35b-a3b-gguf"
--custom-api-key "llama.cpp"
--secret-input-mode plaintext
--custom-compatibility openai
Confirm the server is running and the model is loaded:
curl http://127.0.0.1:8080/v1/models
Which path do you have to select?
Use Hugging Face Inference Providers in case you want the quickest path back to a capable OpenClaw agent. Use llama.cpp in case you want privacy, full local control, and no API bill.
Either way, you don’t want a closed hosted model to get OpenClaw back on its feet!
