How to Run Claude Code Against DeepSeek V4 for $3 a Session (Step-by-Step)

Build a Habit Tracker for $3 Instead of $10: Claude Code + DeepSeek V4

You’ve hit the Claude Code rate limit mid-session, or you’ve watched your Anthropic credits drain faster than expected on a project that’s mostly boilerplate. Either way, the math starts to feel wrong. The good news: you can set up the free-cloud-code GitHub proxy in under 15 minutes and run the full Claude Code CLI against DeepSeek V4 Flash backends for roughly $3 a session instead of $5–10.

This isn’t a stripped-down experience. You get the same terminal interface, the same commands, the same agentic loop. The model answering your prompts just costs $1.74/M input tokens and $3.48/M output tokens instead of Claude Opus 4.7’s $5/M input and $25/M output. For output-heavy coding sessions, that’s roughly a 7x difference on the line item that actually matters.

The proxy was built by Ali Sharer and has gone from near-zero GitHub stars to a sharp inflection point in a matter of weeks. The core idea is simple: intercept Claude Code’s outbound API calls at a local server and reroute them to OpenRouter, Nvidia NIM, or Ollama. Claude Code never knows the difference. The model, however, will confidently tell you it’s “Claude Opus 4.6” — because the Claude Code system prompt is baked into every request and the model just pattern-matches on it. That’s a quirk worth knowing upfront.

What You’re Actually Getting (and What You’re Trading Away)

DeepSeek V4 Flash is the smaller of the two V4 variants released. It’s a 284B parameter mixture-of-experts model with 13B active parameters at inference — meaning you’re paying for 13B parameters of compute while getting a model trained on 284B parameters of capacity. The 1 million token context window is real and matches what Anthropic charges a premium for.

The benchmarks put it close to last-generation frontier models — not quite Claude Opus 4.7 or GPT-5.5, but competitive with what was state-of-the-art six months ago. For most coding tasks — scaffolding, refactoring, writing tests, building CRUD apps — that’s more than sufficient.

What you’re trading: the top 10–20% of reasoning quality that Opus 4.7 brings to genuinely hard architectural decisions. One practical mitigation: use a smarter model as an orchestrator for planning, then hand off implementation work to DeepSeek. More on that in the next steps section.

If you want to understand the token cost tradeoffs more deeply before committing, the Claude Code token management hacks post covers session economics in detail — a lot of those techniques apply regardless of which backend you’re using.

What You Need Before Starting

Accounts and tools:

A terminal (macOS/Linux native; Windows users need PowerShell — the repo’s quick start covers both)
Node.js installed (the proxy is a Node server)
Git installed
An OpenRouter account at openrouter.ai — free to create, pay-as-you-go credits
Claude Code installed (npm install -g @anthropic-ai/claude-code)

Optional but useful:

An Nvidia NIM account if you want the free tier (more on this below)
Ollama installed if you want fully local inference

You do not need an active Anthropic subscription. That’s the point. You will need a small OpenRouter credit balance — $5 gets you a lot of DeepSeek sessions at these prices.

Knowledge baseline: You should be comfortable running commands in a terminal and editing a .env file. No coding experience required beyond that.

Setting It Up: Step by Step

Step 1: Clone the repo and install dependencies

Open a terminal and run these three commands from the quick start:

git clone https://github.com/AliSharer/free-cloud-code
cd free-cloud-code
npm install

The third command installs dependencies. If you already have Node and npm set up, this takes under a minute. Now you have the proxy server code on your machine.

Step 2: Get your OpenRouter API key

Go to openrouter.ai, create an account, and navigate to API Keys. Click Create, give it a name, set an expiration (24 hours is fine for testing — you can always create another), and copy the key immediately. OpenRouter only shows it once.

Add a small credit balance — $5 is plenty to start. At $3.48/M output tokens, you’d need to generate nearly 1.5 million output tokens to spend $5. A full habit tracker app session runs maybe 50,000–100,000 output tokens.

Now you have an API key that can route to DeepSeek V4 Flash.

Step 3: Configure the `.env` file

The proxy reads configuration from a hidden .env file in the project directory. On macOS, hidden files (anything prefixed with .) don’t show in Finder by default. Toggle them with Cmd + Shift + ..

Open .env in any text editor. You’ll see placeholder sections for OpenRouter, DeepSeek direct, Nvidia NIM, and Ollama configs. Find the OpenRouter section and paste your API key.

Remy doesn't build the plumbing. It inherits it.

Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.

WHAT REMY DOESN'T HAVE TO BUILD

200+

AI MODELS

GPT · Claude · Gemini · Llama

✓

1,000+

INTEGRATIONS

Slack · Stripe · Notion · HubSpot

✓

MANAGED DB

AUTH

PAYMENTS

CRONS

Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.

Then scroll to the model field under the OpenRouter config. You need to specify the model ID exactly. For DeepSeek V4 Flash via OpenRouter, the model ID is:

deepseek/deepseek-v4-flash

The format in the .env file should look like:

OPENROUTER_API_KEY=your_key_here
OPENROUTER_MODEL=deepseek/deepseek-v4-flash

Save the file. Now you have a configured proxy that knows where to send requests.

Step 4: Start the proxy server

In your terminal (make sure you’re inside the free-cloud-code directory — run pwd to check):

npm start

The proxy starts on localhost:8082 by default. You’ll see a log line confirming it’s running. Leave this terminal window open — it’s your live request log. Every incoming Claude Code request and its routing will appear here in real time.

Now you have a local server intercepting API calls.

Step 5: Launch Claude Code against the proxy

Open a second terminal window. Run:

ANTHROPIC_BASE_URL=http://localhost:8082 ANTHROPIC_AUTH_TOKEN=proxy claude

This tells Claude Code to send all API calls to your local proxy instead of Anthropic’s servers. The ANTHROPIC_AUTH_TOKEN=proxy is a dummy value — the proxy doesn’t validate it, it just needs something in that field.

You should see the standard Claude Code interface load. Type hello and hit enter. In your first terminal window, you’ll see the request come in, get routed to OpenRouter, and come back. The response will say something like “Hello! How can I help you today?” — and if you ask “what model are you?”, it will confidently say it’s Claude Opus 4.6. That’s the system prompt doing its thing.

Now you have Claude Code running against DeepSeek V4 Flash.

Step 6: Build something

The demo from the source that inspired this post: a habit tracker app, built entirely with DeepSeek V4 Flash via this proxy, for approximately $3. The prompt was straightforward:

Build me a simple habit tracking app in a subdirectory called habit-tracker. 
Make it local, straightforward. This is a demo.

DeepSeek scaffolded the HTML, CSS, and JavaScript. A follow-up prompt asked it to redesign with a serif font and a more polished feel. The whole session — including the redesign pass — came in under $3 in OpenRouter credits.

For comparison, the same session against Anthropic’s API directly would run $5–10 depending on how many tokens the planning and execution phases consumed.

Now you have a working setup and a sense of what it costs.

The Free Tier: Nvidia NIM

If you want to spend literally zero dollars, Nvidia NIM offers free model access with an account. The free model worth knowing about is z-ai/glm-4.7 — a capable model that doesn’t cost anything per request on NIM’s free tier.

The setup is identical to OpenRouter. In your .env file, find the Nvidia NIM section, paste your NIM API key (generated at build.nvidia.com under API Keys), and set the model to:

nvidia/z-ai/glm-4.7

The tradeoff: GLM 4.7 is less capable than DeepSeek V4 Flash for complex coding tasks. It’s fine for simpler scaffolding, but you’ll notice the quality gap on anything requiring multi-file reasoning or architectural decisions.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

One practical note: some Claude Code features like “fast mode” will throw API errors against models that don’t support that parameter. Turn off any Claude Code experimental features before running against NIM or Ollama backends.

Troubleshooting the Real Failure Modes

“The model keeps erroring on tool calls.” DeepSeek V4 Flash handles Claude Code’s tool-use protocol well, but some models (especially on Ollama) don’t implement the function-calling spec Claude Code expects. If you’re getting consistent tool errors, switch to OpenRouter + DeepSeek rather than a local model.

“Quality dropped off mid-session.” This is real and expected. The source creator’s recommendation: start a new Claude Code instance every ~50,000 tokens. Context accumulation degrades output quality on smaller models faster than on Opus. The Claude Code effort levels explained post has useful framing on how to think about when to reset versus continue.

“The proxy isn’t intercepting requests.” Check that you’re running the ANTHROPIC_BASE_URL environment variable in the same terminal session where you launch Claude Code. If you set it in one shell and run claude in another, it won’t apply. Alternatively, export it: export ANTHROPIC_BASE_URL=http://localhost:8082.

“I’m getting charged Anthropic rates anyway.” This means Claude Code is bypassing the proxy and hitting Anthropic directly. Verify the ANTHROPIC_BASE_URL is set correctly and that the proxy server is actually running on port 8082. Run curl http://localhost:8082 — you should get a response from the proxy, not a connection refused error.

“The .env file changes aren’t taking effect.” You need to restart the proxy server after editing .env. Stop it with Ctrl+C and run npm start again.

Anthropic’s harness detection. There’s a documented issue where Anthropic’s systems scan for keywords like hermes or openmcp in code and can trigger billing anomalies or access restrictions. This proxy doesn’t use those keywords, but if you’re working in a repo that references other agent frameworks, be aware. The GStack vs Superpowers vs Hermes comparison post covers the landscape of Claude Code frameworks if you’re evaluating which one to build on.

Taking This Further

The orchestrator pattern. The most cost-effective setup isn’t “use DeepSeek for everything.” It’s using a smarter model (Opus 4.7 or similar) as an orchestrator that plans and reviews, then routing implementation work to DeepSeek. The orchestrator sends structured prompts to DeepSeek via the proxy, reviews the output, and iterates. You pay Opus rates for a small number of high-level decisions and DeepSeek rates for the bulk of token generation. When Anthropic ran similar sub-agent experiments pairing Opus with Sonnet, they saw roughly 15% better outcomes than single-model runs — and this approach gets you that architecture at a fraction of the cost.

Context window management. DeepSeek V4 Flash has a 1 million token context window, which means you can feed it large codebases without truncation. But the proxy still has to send Claude Code’s full system prompt on every request — that’s around 30,000 tokens of overhead per call. For short sessions, this is fine. For long sessions, it adds up. The how to save tokens in Claude Code using Opus plan mode post covers strategies that apply here too.

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

Ollama for air-gapped or private work. If you’re working on code that can’t leave your machine, Ollama is the right backend. Download a model like gemma4:latest (around 10GB), run ollama serve, and point the proxy at localhost:11434. It’s slower — your laptop isn’t a GPU cluster — but the data never leaves your machine. For teams with strict data residency requirements, this matters more than the cost savings.

Spec-driven development as the next layer up. Once you’re comfortable with Claude Code as an implementation tool, the natural question is what sits above it. Remy takes a different approach to this abstraction: you write an annotated markdown spec describing your application — data types, edge cases, business rules — and it compiles that into a complete TypeScript stack with backend, database, auth, and deployment. The spec is the source of truth; the generated code is derived output. It’s a different mental model than prompt-driven coding, but worth understanding as the space evolves.

Multi-model orchestration without writing the plumbing. If you want to build agents that route between DeepSeek, Opus, and other models based on task complexity — without writing the proxy and orchestration code yourself — MindStudio handles that at a higher level of abstraction. It supports 200+ models out of the box and lets you chain them visually, which is useful when you want the cost optimization without maintaining your own routing infrastructure.

Staying current on Claude Code internals. The Claude Code source code leak post covers eight features that aren’t in the official docs — several of them are relevant to how the proxy interacts with Claude Code’s internal state management.

The setup takes 15 minutes. The cost difference is immediate and measurable. Whether you’re building demos, prototyping ideas, or just want to extend your coding sessions without watching a credit balance drain, routing Claude Code through DeepSeek V4 Flash is one of the more practical optimizations available right now.

How to Run Claude Code Against DeepSeek V4 for $3 a Session (Step-by-Step)

Build a Habit Tracker for $3 Instead of $10: Claude Code + DeepSeek V4

What You’re Actually Getting (and What You’re Trading Away)

What You Need Before Starting

Setting It Up: Step by Step

Step 1: Clone the repo and install dependencies

Step 2: Get your OpenRouter API key

Step 3: Configure the `.env` file

Remy doesn't build the plumbing. It inherits it.

Step 4: Start the proxy server

Step 5: Launch Claude Code against the proxy

Step 6: Build something

The Free Tier: Nvidia NIM

Hire a contractor. Not another power tool.

Troubleshooting the Real Failure Modes

Taking This Further

Remy is new. The platform isn't.

Related Articles

How to Build a Local AI Stack from Scratch: Ollama to vLLM, Step by Step

How to Build a Personal AI Operating System in Claude Code: Step-by-Step Setup Guide

Claude Code Remote Routines: Run Automations on Anthropic's Cloud While Your Laptop Is Closed

How to Use OpenRouter with Claude Code: Run Cheaper Models as a Backend

Build a Habit Tracker for $3 Instead of $10: Claude Code + DeepSeek V4

What You’re Actually Getting (and What You’re Trading Away)

What You Need Before Starting

Setting It Up: Step by Step

Step 1: Clone the repo and install dependencies

Step 2: Get your OpenRouter API key

Step 3: Configure the .env file

Remy doesn't build the plumbing. It inherits it.

Step 4: Start the proxy server

Step 5: Launch Claude Code against the proxy

Step 6: Build something

The Free Tier: Nvidia NIM

Hire a contractor. Not another power tool.

Troubleshooting the Real Failure Modes

Taking This Further

Remy is new. The platform isn't.

Related Articles

How to Build a Local AI Stack from Scratch: Ollama to vLLM, Step by Step

How to Build a Personal AI Operating System in Claude Code: Step-by-Step Setup Guide

Claude Code Remote Routines: Run Automations on Anthropic's Cloud While Your Laptop Is Closed

How to Use OpenRouter with Claude Code: Run Cheaper Models as a Backend

Step 3: Configure the `.env` file