How to Use Open Router Free Models With Claude Code to Cut AI Costs by 99%

The Real Cost of Running Claude Code All Day

If you’ve been using Claude Code heavily, you’ve probably had a moment of sticker shock looking at your Anthropic bill. Claude Opus can cost $15 per million output tokens. Run it through a full coding session with back-and-forth exchanges, context loading, and file reads, and you can burn through $10–30 on a single afternoon.

The good news: you can route Claude Code through OpenRouter’s free model tier and cut those costs to near zero for a significant chunk of your workload. This guide walks through the exact configuration — including the settings.json setup — to make that happen.

This is a practical how-to for developers already using Claude Code who want to use free LLMs on OpenRouter without abandoning their existing workflow.

What OpenRouter Actually Offers for Free

OpenRouter is an API gateway that gives you access to dozens of models through a single endpoint. The key thing for this guide: OpenRouter maintains a free tier where several capable models are available at $0 per token.

These aren’t toy models. The free tier includes models like:

Meta Llama 3.1 8B Instruct — solid for boilerplate generation, refactoring, and explanation tasks
Meta Llama 3.3 70B Instruct — much stronger reasoning, still free (with rate limits)
Google Gemma 3 12B — good at following structured instructions
Qwen 2.5 7B Instruct — strong at code tasks relative to its size
DeepSeek R1 — free tier available, competitive reasoning performance
Mistral 7B Instruct — fast, lightweight, useful for quick edits

Plans first. Then code.

PROJECTYOUR APP

SCREENS12

DB TABLES6

BUILT BYREMY

1280 px · TYP.

yourapp.msagent.ai

A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

Free models on OpenRouter come with rate limits — typically 20 requests per minute and 200 requests per day per model. That’s enough for moderate use. If you need more, you can rotate between free models or add a small OpenRouter credit balance to unlock higher limits on the free models.

The OpenRouter API is compatible with Anthropic’s API format, which is what makes this whole trick work.

Prerequisites Before You Start

Before touching any configuration, make sure you have the following in place:

1. Claude Code installed You need Claude Code (the Anthropic CLI coding assistant) already working on your machine. It should be runnable via the claude command in your terminal.

2. An OpenRouter account and API key Sign up at openrouter.ai — it’s free. Once logged in, go to Keys in your dashboard and create a new API key. Copy it and keep it somewhere accessible.

3. Know where your Claude config lives Claude Code stores its settings in ~/.claude/settings.json on Mac/Linux, or C:\Users\<YourName>\.claude\settings.json on Windows. The directory might not exist yet if you haven’t customized anything — you’ll create it.

Step-by-Step Configuration

Step 1: Create or Open Your settings.json

Navigate to your Claude config directory:

cd ~/.claude

If the directory doesn’t exist:

mkdir ~/.claude

Open or create settings.json:

nano ~/.claude/settings.json

Step 2: Add the OpenRouter Environment Variables

Paste this into your settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://openrouter.ai/api/v1",
    "ANTHROPIC_API_KEY": "sk-or-v1-YOUR_OPENROUTER_KEY_HERE"
  }
}

Replace sk-or-v1-YOUR_OPENROUTER_KEY_HERE with your actual OpenRouter API key.

What this does: Claude Code uses ANTHROPIC_BASE_URL to determine where to send API requests. By pointing it at OpenRouter’s endpoint — which uses the same API format as Anthropic — you’re telling Claude Code to route through OpenRouter instead of Anthropic’s servers. Your OpenRouter API key authenticates those requests.

Step 3: Specify the Model

By default, Claude Code will try to use its configured Claude model. But since you’re now pointing at OpenRouter, you need to specify an OpenRouter-compatible model ID. Update your settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://openrouter.ai/api/v1",
    "ANTHROPIC_API_KEY": "sk-or-v1-YOUR_OPENROUTER_KEY_HERE"
  },
  "model": "meta-llama/llama-3.3-70b-instruct:free"
}

The model string uses OpenRouter’s naming format: provider/model-name:free for free-tier models.

Step 4: Test the Connection

Run a quick sanity check:

claude "Write a one-line Python function that returns the factorial of a number"

If it works, you’ll see a response. If you get an authentication error, double-check your API key. If you get a model not found error, verify the exact model ID by checking OpenRouter’s model list — model IDs change occasionally.

Step 5: (Optional) Set a Fallback for Complex Tasks

The free models are capable, but they won’t match Claude Opus on complex multi-file refactors or deep architectural reasoning. You can configure a two-tier setup where you explicitly call the paid model when needed, while defaulting to free:

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://openrouter.ai/api/v1",
    "ANTHROPIC_API_KEY": "sk-or-v1-YOUR_OPENROUTER_KEY_HERE"
  },
  "model": "meta-llama/llama-3.3-70b-instruct:free",
  "smallModel": "meta-llama/llama-3.1-8b-instruct:free"
}

The smallModel setting is used by Claude Code for lightweight tasks like summarizing context or generating short completions.

Choosing the Right Free Model for Each Task

Not every free model performs equally well on every task. Here’s a practical breakdown:

For Code Generation and Completion

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

Llama 3.3 70B (meta-llama/llama-3.3-70b-instruct:free) is the strongest free option for writing new code. It handles function generation, class creation, and moderate complexity well. The rate limits (20 req/min) are manageable for focused sessions.

Qwen 2.5 Coder (when available on the free tier) is specifically trained on code and often outperforms general-purpose models of similar size on code completion tasks.

For Refactoring and Explanation

DeepSeek R1 (free tier) has strong reasoning capabilities. If you’re asking Claude Code to explain why something isn’t working or to think through a refactor, R1’s chain-of-thought approach gives better results than faster but shallower models.

For Quick Edits and Low-Stakes Tasks

Mistral 7B or Llama 3.1 8B are fast and cheap (free). For renaming variables, formatting, adding comments, or small one-liner changes, these are more than sufficient and won’t eat into your daily request limits.

What to Avoid Using Free Models For

Be realistic about limitations:

Large codebase analysis — Free models have smaller context windows than Claude Opus. Multi-file analysis across 50+ files will degrade quality fast.
Extended agentic tasks — Long autonomous sessions where Claude Code runs many sequential tool calls work better with Claude’s native models, which are tuned for that interaction pattern.
Highly precise security-sensitive code — Not because free models are untrustworthy, but because they’re more likely to produce plausible-but-wrong code in complex edge cases.

Using Environment Variables Instead of settings.json

If you prefer not to modify settings.json — or want to switch between configurations quickly — you can set the variables at the session level:

export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
export ANTHROPIC_API_KEY="sk-or-v1-YOUR_KEY"
claude "refactor this function to use async/await"

Or do it inline for a single command:

ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" \
ANTHROPIC_API_KEY="sk-or-v1-YOUR_KEY" \
claude --model "meta-llama/llama-3.3-70b-instruct:free" "add error handling to this API call"

This is useful if you want to keep your default Claude Code pointed at Anthropic but occasionally route specific sessions through OpenRouter.

Rotating Between Free Models to Avoid Rate Limits

The main friction with free models is the per-model rate limit. OpenRouter’s free tier is typically 20 requests per minute and around 200 requests per day per model. If you’re running a long coding session, you’ll hit that ceiling.

The workaround: rotate between models. Since the models are accessed through the same endpoint, switching is just a model ID change.

Here’s a simple shell function you can add to your .bashrc or .zshrc:

# Cycle through free models
claude_free() {
  local models=(
    "meta-llama/llama-3.3-70b-instruct:free"
    "qwen/qwen-2.5-72b-instruct:free"
    "deepseek/deepseek-r1:free"
    "google/gemma-3-12b-it:free"
  )
  local model="${models[$((RANDOM % ${#models[@]}))]}"
  echo "Using model: $model"
  ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" \
  ANTHROPIC_API_KEY="sk-or-v1-YOUR_KEY" \
  claude --model "$model" "$@"
}

Call it with claude_free "your prompt here". It randomly selects a free model each time, distributing requests across your daily quotas.

Realistic Cost Comparison

To put actual numbers on this:

Anthropic Claude Opus (direct)

Input: $15 / 1M tokens
Output: $75 / 1M tokens
A moderate coding session (50 exchanges, ~2,000 tokens each): roughly $7–15

OpenRouter free tier

$0 / 1M tokens (up to daily limits)
Same session: $0

OpenRouter paid models (via small credit balance)

Llama 3.3 70B: ~$0.12 / 1M tokens (for exceeding free limits)
Same session: ~$0.15–0.30

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

The 99% figure in the title is real — and if you stay within free tier limits, it’s 100%. Even if you occasionally exceed limits and pay OpenRouter’s paid rates for the overflow, you’re still looking at 95%+ savings compared to Claude Opus.

Troubleshooting Common Issues

”Authentication failed” or 401 errors

Your OpenRouter API key is wrong or expired. Go back to OpenRouter’s dashboard, regenerate a key, and update settings.json. Make sure there are no extra spaces or newline characters in the key string.

”Model not found” errors

Free model availability on OpenRouter changes. A model that was free last month might now have a paid tier. Check the OpenRouter model list, filter by “free,” and update your model ID.

Responses feel much weaker than Claude

You’re likely hitting tasks that genuinely need a larger model. Try DeepSeek R1 or Llama 3.3 70B instead of smaller free models. For tasks that truly need Claude Opus, it’s reasonable to switch back — that’s what the environment variable approach makes easy.

Rate limit errors mid-session

You’ve hit the 20 req/min or daily cap. Wait a few minutes, or switch to a different free model using a different model ID. Each model has its own quota.

Claude Code ignores settings.json changes

Restart your terminal session after editing settings.json. Some shells cache environment variables. A fresh terminal window picks up the updated config.

Where MindStudio Fits Into This Picture

Cost optimization at the model level — routing Claude Code through OpenRouter free models — solves one problem: the per-token bill for interactive coding sessions.

But if you’re building agents or AI-powered workflows on top of your code, you face a different version of the same problem: which model do you use at each step, how do you manage API keys across services, and how do you keep costs predictable at scale?

MindStudio handles this at the workflow layer. It gives you access to 200+ models — including free and open-source models — through a single platform, with no API keys to manage across services. You pick the right model for each step in a workflow, mix and match Claude for reasoning-heavy steps with cheaper models for formatting or classification steps, and keep costs visible in one place.

For developers who’ve already gone through the work of optimizing their Claude Code setup, MindStudio is a natural next step when you want to productize those workflows or hand them off to non-technical teammates. You can start building on MindStudio for free at mindstudio.ai.

It also pairs well with the MindStudio Agent Skills Plugin — if you want Claude Code or another AI agent to call a MindStudio workflow as a tool (for things like sending emails, running image generation, or triggering business process automations), the @mindstudio-ai/agent npm package makes that straightforward.

Frequently Asked Questions

Does this work with Claude Code’s agentic features?

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

Partially. Claude Code’s autonomous multi-step features (running shell commands, reading files, making sequential decisions) work through the same API, so they’ll function with OpenRouter models. The quality of those agentic sessions depends heavily on the model you choose — larger models like Llama 3.3 70B handle multi-step tasks better than smaller ones. Some Claude-specific behaviors (like extended thinking mode) won’t work because they’re Anthropic features, not standard API features.

Is it against Anthropic’s terms to route Claude Code through OpenRouter?

Claude Code using a non-Anthropic endpoint doesn’t violate Anthropic’s terms of service — you’re just pointing the client at a different API. OpenRouter is a legitimate API service. That said, you’re no longer using Anthropic’s models when you do this, so you lose any Claude-specific capabilities and quality guarantees. If you need Claude specifically (for compliance, consistency, or capability reasons), you’ll need to use Anthropic’s endpoint.

What happens when I exceed OpenRouter’s free tier limits?

Requests will fail with a rate limit error (429). Your options: wait for the limit to reset (usually hourly or daily), switch to a different free model, or add credits to your OpenRouter account to access paid rate limits. OpenRouter’s paid rates for open-source models are still dramatically cheaper than Claude Opus.

Can I use this setup for a team, not just personal use?

Yes, but with caveats. A shared OpenRouter API key for a team will hit free tier limits much faster. For team use, you’d likely want to add a credit balance to OpenRouter and use paid-but-cheap models rather than relying on free tier quotas. Alternatively, running local models via Ollama (which OpenRouter supports) can eliminate per-request costs entirely for a self-hosted setup.

Which free model is the closest to Claude in coding quality?

Llama 3.3 70B Instruct is the most capable free model currently available on OpenRouter for general coding tasks. DeepSeek R1 (free tier) is stronger for reasoning-intensive problems — debugging tricky logic, architectural decisions, explaining complex systems. Neither matches Claude Opus on very complex tasks, but for the 80% of day-to-day coding work, the quality gap is smaller than the price gap.

Will this setup break if I update Claude Code?

Possibly. Updates to Claude Code occasionally change how environment variables and settings.json are parsed. After any Claude Code update, run a quick test to confirm routing still works. If it breaks, check the Claude Code release notes for any changes to the configuration format.

Key Takeaways

Claude Code can be pointed at OpenRouter’s API by setting ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY in ~/.claude/settings.json
OpenRouter’s free tier includes capable models (Llama 3.3 70B, DeepSeek R1, Gemma 3, Qwen 2.5) that handle most everyday coding tasks well
Free tier limits (20 req/min, ~200/day per model) are manageable; rotating between models extends your daily capacity
The cost difference is real — $0 vs. $7–15 per moderate coding session on Claude Opus
For complex tasks, long agentic sessions, or Claude-specific features, switching back to Anthropic’s endpoint is straightforward
If you’re building production AI workflows rather than just personal coding sessions, MindStudio gives you multi-model access and cost control at the workflow level — try it free