How to Run Claude Code for Free Using Ollama and Open Router
Learn two ways to use Claude Code without paying for Anthropic tokens: run open-source models locally with Ollama or route through Open Router's free tier.
Why Developers Are Running Claude Code Without Paying Anthropic
Claude Code is one of the most capable AI coding tools available right now. It reads your codebase, writes real changes, runs terminal commands, and reasons through multi-step problems — all from the command line. The catch is that it’s priced against Anthropic’s API, and heavy usage can get expensive fast.
The good news: you don’t need Anthropic tokens to run Claude Code. Two approaches let you use the same CLI tool at little or no cost — running open-source models locally through Ollama, or routing requests through Open Router’s free tier. Both involve redirecting Claude Code’s API calls away from Anthropic and toward a different backend.
This guide covers both methods step by step: what you need, how to set it up, and what to expect in terms of performance.
What Claude Code Actually Does Under the Hood
Claude Code is a CLI tool built by Anthropic. You install it via npm, then run it in a project directory. It reads files, executes shell commands, edits code, and interacts with you through a conversational interface.
What makes it interesting for this purpose is how it communicates with models: it sends requests to an API endpoint. By default, that endpoint is Anthropic’s servers. But the tool supports an environment variable — ANTHROPIC_BASE_URL — that lets you override where those requests go.
As long as the endpoint you point it to speaks a compatible API format, Claude Code will work. That’s the core mechanic behind both workarounds covered here.
The Two Approaches at a Glance
Before going deep on either method, here’s a quick comparison:
| Ollama (Local) | Open Router (Cloud) | |
|---|---|---|
| Cost | Free (hardware only) | Free tier available |
| Privacy | Fully local | Data leaves your machine |
| Model quality | Depends on hardware | Access to strong free models |
| Setup complexity | Moderate (requires proxy) | Straightforward |
| Internet required | No (after download) | Yes |
| Best for | Sensitive codebases, offline use | Better model performance for free |
If your code is sensitive or you work offline regularly, Ollama is the right pick. If you want better output quality without hardware investment, Open Router’s free tier is the easier path.
Method 1: Run Claude Code Locally with Ollama
Ollama lets you download and run open-source language models on your own machine. Models like Qwen2.5-Coder, DeepSeek-Coder, and Llama 3 run entirely locally — no data leaves your system.
The challenge is that Ollama uses an OpenAI-compatible API format, while Claude Code expects Anthropic’s message format. You need a translation layer between them. The most reliable tool for this is LiteLLM, a lightweight proxy that converts between API formats.
Step 1: Install Ollama
Download Ollama from the official site and install it for your operating system (macOS, Linux, or Windows via WSL).
Once installed, confirm it’s running:
ollama --version
Step 2: Pull a Coding Model
Not all models perform equally on code. For this use case, strong options include:
- Qwen2.5-Coder 14B — excellent at code generation and editing, good context window
- DeepSeek-Coder V2 Lite — solid reasoning for coding tasks
- Llama 3.1 8B — lighter and faster, good for simpler tasks
Pull your chosen model:
ollama pull qwen2.5-coder:14b
This downloads the model weights to your machine. File sizes range from ~4GB to ~20GB depending on the model and quantization level.
After the download, verify it runs:
ollama run qwen2.5-coder:14b
You can exit the interactive session with /bye.
Step 3: Install and Configure LiteLLM
LiteLLM acts as a proxy, accepting Anthropic-format requests from Claude Code and forwarding them to Ollama in OpenAI format.
Install it via pip:
pip install litellm
Start the proxy pointing at your Ollama model:
litellm --model ollama/qwen2.5-coder:14b --port 4000
Leave this terminal running. LiteLLM is now listening on http://localhost:4000 and will forward requests to Ollama, which is running at http://localhost:11434.
Step 4: Install Claude Code
If you haven’t already, install Claude Code globally:
npm install -g @anthropic-ai/claude-code
Step 5: Set the Environment Variables
In a new terminal, set these two variables before running Claude Code:
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=ollama
The ANTHROPIC_API_KEY value doesn’t matter here — it just needs to be set to something non-empty. “ollama” works fine.
Step 6: Run Claude Code
Navigate to your project directory and launch Claude Code:
cd /your/project
claude
Claude Code will start up and route all requests through LiteLLM to Ollama. You’ll see the familiar Claude Code interface, but the model responding is running entirely on your local hardware.
Hardware Requirements
The model you can run depends on your machine’s specs:
- 8GB RAM — 7B or 8B models (quantized), slower on CPU
- 16GB RAM — 7B–14B models, decent speed with CPU
- 16GB+ VRAM (GPU) — 14B–32B models with GPU acceleration, much faster
For coding tasks, a 14B model offers a meaningful quality jump over 8B. If you have an M-series Mac or a GPU-equipped Linux machine, larger models become practical.
Method 2: Route Claude Code Through Open Router’s Free Tier
Open Router is a unified API that gives you access to dozens of language models from a single endpoint. Many models on Open Router are available on a free tier with rate limits — enough for regular development use without spending anything.
Unlike Ollama, Open Router doesn’t require a proxy. It already speaks Anthropic’s message format, so you can point Claude Code directly at it.
Step 1: Create an Open Router Account
Go to openrouter.ai and sign up. Verification is quick.
Step 2: Get Your API Key
In your Open Router dashboard, navigate to the API Keys section and create a new key. Copy it — you’ll use it as your ANTHROPIC_API_KEY.
Step 3: Find a Free Model
Open Router’s model list shows which models are free (they’re marked with a :free suffix in the model ID). Strong free options for coding as of 2025 include:
meta-llama/llama-3.3-70b-instruct:free— Meta’s 70B model, impressive for its tiergoogle/gemma-3-27b-it:free— Google’s Gemma 3, solid reasoningqwen/qwen3-8b:free— Alibaba’s Qwen3, good at codemistralai/mistral-7b-instruct:free— fast and lightweight
Free tiers come with rate limits (typically requests per minute and per day). For most development sessions, these limits are workable. If you’re doing heavy automated processing, you may hit them.
Step 4: Set the Environment Variables
Export the following before running Claude Code:
export ANTHROPIC_BASE_URL=https://openrouter.ai/api/v1
export ANTHROPIC_API_KEY=your_openrouter_key_here
Step 5: Specify the Model and Run Claude Code
Claude Code needs to know which model to use. Set the model via the CLAUDE_MODEL environment variable or pass it with the --model flag:
export CLAUDE_MODEL=meta-llama/llama-3.3-70b-instruct:free
claude
Or inline:
claude --model meta-llama/llama-3.3-70b-instruct:free
Claude Code will now send requests to Open Router using the free model you specified. No Anthropic tokens consumed.
Handling Rate Limits
When you hit Open Router’s free tier limits, you’ll see errors returned to Claude Code. A few ways to handle this:
- Add a small credit balance — even $5 on Open Router buys significant usage without committing to a subscription
- Switch to a different free model — limits are per-model, so cycling models can extend your free usage
- Set a lower context window — fewer tokens per request means slower rate limit accumulation
Comparing Real-World Performance
Using open-source models through either method is different from using Claude 3.5 Sonnet or Claude 3 Opus directly. Here’s an honest breakdown of what changes:
What Works Well
- Simple code edits and refactoring
- Writing boilerplate and repetitive code
- Explaining existing code in a repo
- Generating tests for defined functions
- Searching and reading through files
Where You’ll Notice Gaps
- Complex multi-file reasoning across large codebases
- Architectural decisions requiring deep context retention
- Subtle bug detection in intricate logic
- Following complex multi-step instructions consistently
The 70B+ models available through Open Router perform noticeably better than smaller local models for complex tasks. If you’re doing serious work, the Llama 3.3 70B free tier on Open Router will outperform a local 8B model by a significant margin.
For local Ollama setups, Qwen2.5-Coder 14B offers the best cost-to-quality ratio on most consumer hardware.
Common Issues and How to Fix Them
Claude Code says the API key is invalid
This usually means the ANTHROPIC_BASE_URL isn’t set or isn’t being picked up. Confirm the variable is exported in the same terminal session you’re running Claude Code from.
Requests time out with Ollama
The model might be loading for the first time or running on CPU. First requests are always slower. If timeouts persist, try a smaller model or increase your system’s available RAM.
Open Router returns a 429 error
You’ve hit the rate limit on the free tier. Wait a few minutes or switch to a different free model. Adding even a small credit balance to your Open Router account removes most practical rate limit friction.
LiteLLM returns a format error
Check that your Ollama model is running (ollama ps) and that LiteLLM started cleanly. Restart LiteLLM if needed. If the model name is wrong in the LiteLLM command, it’ll fail silently.
Claude Code doesn’t recognize the model flag
Some older Claude Code versions handle model specification differently. Try setting CLAUDE_MODEL as an environment variable instead of using the --model flag.
Where MindStudio Fits Into This Picture
The approaches above work well for interactive CLI sessions. But once you want to do more — run AI coding agents on a schedule, chain model calls with business logic, or connect your code-generation workflows to other tools — you need infrastructure that handles those layers.
That’s where MindStudio comes in. MindStudio is a no-code platform for building AI agents and automated workflows, with 200+ models available out of the box — no API keys or separate accounts needed. You get access to models from Anthropic, Meta, Google, Mistral, and others without managing environment variables, proxies, or rate limits yourself.
If you’ve been experimenting with Claude Code through Ollama or Open Router and want to package that capability into a repeatable, shareable workflow — say, an agent that reviews pull requests, generates documentation, or audits code for security issues on a schedule — MindStudio handles the orchestration layer. You can build that workflow in 15–30 minutes, connect it to tools like GitHub, Slack, or Notion, and run it automatically without touching the CLI each time.
The MindStudio Agent Skills Plugin also lets Claude Code and other agentic tools call MindStudio’s capabilities as simple method calls — so if you’re building on top of Claude Code anyway, you can extend it with things like agent.sendEmail(), agent.searchGoogle(), or agent.runWorkflow() without building that infrastructure yourself.
You can try MindStudio free at mindstudio.ai.
Frequently Asked Questions
Is it legal to use Claude Code with non-Anthropic models?
Yes. Claude Code is Anthropic’s tool, and Anthropic’s terms govern use of their API and models. When you redirect Claude Code to use a different model backend — Ollama or Open Router — you’re not using Anthropic’s models or API. You’re using Claude Code as a client application pointed at a different server. There’s nothing in Anthropic’s terms that prohibits this use of the CLI itself.
Will open-source models through Ollama match Claude’s coding performance?
Not at parity, particularly for complex multi-file tasks. Models like Qwen2.5-Coder 14B or Llama 3.3 70B are genuinely capable for many real coding tasks, but they don’t match Claude 3.5 Sonnet on things like subtle bug identification or complex architectural reasoning. For straightforward work — editing functions, writing tests, explaining code — the quality gap is smaller. Use the free approach for day-to-day coding help and pay for Anthropic tokens only when you need Claude’s strongest reasoning on hard problems.
Can I switch between Anthropic and free backends easily?
Yes. The environment variables are the only thing that changes. You can create shell aliases or a simple script that exports the right variables for each mode, then source that before starting Claude Code. Some developers keep a .env.local file in their home directory for each backend and source the right one per session.
What’s the best free model for coding on Open Router?
As of mid-2025, meta-llama/llama-3.3-70b-instruct:free is one of the strongest free options for coding. Qwen-based models also perform well for code generation specifically. Check the Open Router model page for current availability and rate limits, as the free tier lineup changes as providers add and remove models.
Does this work with Claude Code’s agentic features (file editing, terminal commands)?
Yes. The file editing, terminal command execution, and multi-step agentic behavior in Claude Code are handled by the CLI itself — they don’t depend on Anthropic’s infrastructure. The underlying model still needs to produce the right output format, which capable open-source models generally do. Some edge cases in complex agentic chains may behave differently depending on how well the model follows Claude Code’s expected output structure.
Do I need to keep LiteLLM running for the Ollama method?
Yes. LiteLLM acts as a persistent proxy between Claude Code and Ollama. If LiteLLM stops, Claude Code’s requests have nowhere to go. A simple approach is to run LiteLLM in a dedicated terminal pane or use a tool like screen or tmux to keep it running in the background. On Linux, you can also run it as a systemd service.
Key Takeaways
- Claude Code supports a
ANTHROPIC_BASE_URLenvironment variable that lets you redirect API calls to any compatible endpoint. - Ollama lets you run open-source models locally with full privacy — use LiteLLM as a proxy to handle the API format translation.
- Open Router provides a free tier of cloud-hosted models that work directly with Claude Code, no proxy needed.
- Open Router’s free 70B models outperform smaller local models for complex coding tasks; Ollama wins on privacy and offline capability.
- Both methods work with Claude Code’s full feature set — file editing, terminal commands, and multi-step reasoning.
- For teams who want to automate AI coding workflows beyond the CLI, MindStudio provides a no-code layer with 200+ models built in, no API key management required.