Optimization Articles
Browse 89 articles about Optimization.
Claude Code Ultra Plan vs Local Plan Mode: Speed, Quality, and Token Cost Compared
Ultra Plan finishes in minutes while local plan mode takes 30–45 minutes. Here's what the difference means for your Claude Code workflows.
What Is Gemma 4's Mixture of Experts Architecture? How 26B Parameters Run Like a 4B Model
Gemma 4's MoE model has 128 experts with 8 active per token, giving you 27B-level intelligence at 4B compute cost. Here's the architecture explained.
How to Use Claude Code Ultra Plan: Requirements, Token Costs, and When to Use It
Ultra Plan requires a Git repo, a Pro or Max subscription, and CLI access. Here's what it costs, how many tokens it uses, and when it's worth it.
What Is Magnific Video Upscaler? How to Upscale AI Video From 720p to 2K
Magnific's video upscaler cleans up skin tones and maintains character consistency without over-sharpening. Here's how it performs on Seedance 2.0 clips.
What Is Anthropic's Prompt Caching and Why Does It Affect Your Claude Subscription Limits?
Anthropic uses prompt caching to reduce compute costs. When third-party tools break caching, your session limits drain faster. Here's the technical explanation.
18 Claude Code Token Management Hacks to Extend Your Session
Claude Code sessions drain faster than expected. Here are 18 practical techniques to reduce token usage, preserve context, and get more done per session.
AI Agent Token Budget Management: How Claude Code Prevents Runaway API Costs
Claude Code enforces hard token limits, compaction thresholds, and pre-execution budget checks. Here's how to implement the same pattern in your own agents.
How to Use Open Router Free Models With Claude Code to Cut AI Costs by 99%
Configure Claude Code to route through Open Router's free model tier instead of Anthropic's paid API. A step-by-step guide with the exact settings.json setup.
AI Token Management: Why Your Claude Code Session Drains Faster Than It Should
Token costs compound exponentially in long conversations. Learn the 18 habits that drain your Claude Code session and how to fix each one.
How to Use the /compact Command in Claude Code to Prevent Context Rot
Running /compact at 60% context capacity—not 95%—keeps your Claude Code sessions sharp. Learn when and how to compact with specific preservation instructions.
How to Use the /compact Command in Claude Code to Prevent Context Rot
Running /compact at 60% context capacity—not 95%—keeps your Claude Code sessions sharp. Learn when and how to compact with specific preservation instructions.
How Context Compounding Works in Claude Code (And How to Stop It)
Every Claude Code message re-reads your entire conversation history. Learn why token costs compound exponentially and how to manage it effectively.
How Context Compounding Works in Claude Code (And How to Stop It)
Every Claude Code message re-reads your entire conversation history. Learn why token costs compound exponentially and how to manage it effectively.
Claude Code MCP Servers and Token Overhead: What You Need to Know
Each connected MCP server loads tool definitions into every message, costing up to 18,000 tokens per turn. Here's how to audit and reduce that overhead.
Claude Code MCP Servers and Token Overhead: What You Need to Know
Each connected MCP server loads tool definitions into every message, costing up to 18,000 tokens per turn. Here's how to audit and reduce that overhead.
18 Claude Code Token Management Hacks to Extend Your Session
Stop burning through your Claude Code session limit. These 18 token management techniques can double or triple your effective usage per session.
18 Claude Code Token Management Hacks to Extend Your Session
Stop burning through your Claude Code session limit. These 18 token management techniques can double or triple your effective usage per session.
Claude Code Skills: Why Code Scripts Outperform Markdown Instructions for Agent Tasks
Most Claude Code skills rely too heavily on markdown. Using executable scripts instead reduces tokens by up to 90% and makes agent tasks more reliable.
What Is Claude Code's claude.md File? The Permanent Instruction Manual for Your AI Agent
The claude.md file is loaded into every Claude Code session. Learn what to put in it, how to structure it, and why most users are using it wrong.
What Is Google TurboQuant? The KV Cache Compression That Crashed Memory Chip Stocks
Google's TurboQuant algorithm compresses AI memory to 3 bits with zero accuracy loss, delivering 8x speed and 6x memory reduction on H100 GPUs.