Optimization Articles
Browse 105 articles about Optimization.
What Is Anthropic's Prompt Caching and Why Does It Affect Your Claude Subscription Limits?
Anthropic uses prompt caching to reduce compute costs. When third-party tools break caching, your session limits drain faster. Here's the technical explanation.
18 Claude Code Token Management Hacks to Extend Your Session
Claude Code sessions drain faster than expected. Here are 18 practical techniques to reduce token usage, preserve context, and get more done per session.
AI Agent Token Budget Management: How Claude Code Prevents Runaway API Costs
Claude Code enforces hard token limits, compaction thresholds, and pre-execution budget checks. Here's how to implement the same pattern in your own agents.
How to Use Open Router Free Models With Claude Code to Cut AI Costs by 99%
Configure Claude Code to route through Open Router's free model tier instead of Anthropic's paid API. A step-by-step guide with the exact settings.json setup.
AI Token Management: Why Your Claude Code Session Drains Faster Than It Should
Token costs compound exponentially in long conversations. Learn the 18 habits that drain your Claude Code session and how to fix each one.
How to Use the /compact Command in Claude Code to Prevent Context Rot
Running /compact at 60% context capacity—not 95%—keeps your Claude Code sessions sharp. Learn when and how to compact with specific preservation instructions.
How to Use the /compact Command in Claude Code to Prevent Context Rot
Running /compact at 60% context capacity—not 95%—keeps your Claude Code sessions sharp. Learn when and how to compact with specific preservation instructions.
How Context Compounding Works in Claude Code (And How to Stop It)
Every Claude Code message re-reads your entire conversation history. Learn why token costs compound exponentially and how to manage it effectively.
How Context Compounding Works in Claude Code (And How to Stop It)
Every Claude Code message re-reads your entire conversation history. Learn why token costs compound exponentially and how to manage it effectively.
Claude Code MCP Servers and Token Overhead: What You Need to Know
Each connected MCP server loads tool definitions into every message, costing up to 18,000 tokens per turn. Here's how to audit and reduce that overhead.
Claude Code MCP Servers and Token Overhead: What You Need to Know
Each connected MCP server loads tool definitions into every message, costing up to 18,000 tokens per turn. Here's how to audit and reduce that overhead.
18 Claude Code Token Management Hacks to Extend Your Session
Stop burning through your Claude Code session limit. These 18 token management techniques can double or triple your effective usage per session.
18 Claude Code Token Management Hacks to Extend Your Session
Stop burning through your Claude Code session limit. These 18 token management techniques can double or triple your effective usage per session.
Claude Code Skills: Why Code Scripts Outperform Markdown Instructions for Agent Tasks
Most Claude Code skills rely too heavily on markdown. Using executable scripts instead reduces tokens by up to 90% and makes agent tasks more reliable.
What Is Claude Code's claude.md File? The Permanent Instruction Manual for Your AI Agent
The claude.md file is loaded into every Claude Code session. Learn what to put in it, how to structure it, and why most users are using it wrong.
What Is Google TurboQuant? The KV Cache Compression That Crashed Memory Chip Stocks
Google's TurboQuant algorithm compresses AI memory to 3 bits with zero accuracy loss, delivering 8x speed and 6x memory reduction on H100 GPUs.
What Is the Context Window in Claude Code? How to Manage It for Consistent Results
Claude's context window is its short-term memory with a hard limit. When it fills with stale data, quality drops. Here's how to keep it fresh and focused.
How to Optimize Web Scraping Skills for AI Agents: 6 Token-Saving Techniques
Learn how to reduce token usage by 90% in web scraping AI agent skills by filtering HTML, hardcoding selectors, batching requests, and using incremental runs.
What Is the Context Window Limit in Claude Code? How to Manage It for Better Results
Claude Code's context window is its short-term memory. When it fills with stale content, quality drops. Here's how to keep it fresh and get consistent outputs.
Claude Code Effort Levels Explained: When to Use Low, Medium, High, and Max
Claude Code's effort level setting controls how much reasoning the model applies. Learn when to use each level to balance quality and token cost.