Optimization Articles
Browse 211 articles about Optimization.
DeepSeek Vision vs. Claude Sonnet 4.6 vs. Gemini Flash 3: Which Vision Model Uses 10x Less KV Cache?
DeepSeek's vision model uses ~90 KV cache entries per image vs. ~870 for Sonnet 4.6 and ~1,000 for Gemini Flash 3. Here's what that means for cost.
Andrej Karpathy on DeepSeek's OCR Paper: Why Pixels May Beat Tokens as AI Inputs
Karpathy called DeepSeek's Oct 2025 OCR paper — 10x text compression, 97% accuracy — a sign that tokenizers are on the way out.
Anthropic's Harness Detection Bug: 3 Things That Triggered Unexpected Claude Code Charges
A git commit mentioning 'hermes.md' triggered a $200.98 overage on a plan showing 86% unused. Here's exactly what caused it and how Anthropic responded.
How to Build a Local AI Stack from Scratch: Ollama to vLLM, Step by Step
From Ollama for daily use to vLLM for serving to TensorRT-LLM for production — here's the complete local AI runtime stack and when to use each layer.
Claude Code Skills Architecture: 4 Layers That Keep Your AI Agent Fast and Focused
The .claude/skills/ folder uses progressive context loading — only ~100 tokens read at search time — to keep Claude Code lightweight across dozens of SOPs.
Cursor SDK + GPT-5.5 Scores 87.2% vs Native Codex's 61.5% — The Harness Is the Bottleneck
Switching GPT-5.5 from Codex's native harness to Cursor's SDK jumped functionality from 61.5% to 87.2% — a 26-point gain from the harness alone.
GitHub Copilot's CPO Says the Flat-Rate AI Pricing Model Is Dead — What Usage-Based Billing Means for Builders
GitHub Copilot CPO Mario Rodriguez said flat-rate AI pricing 'is no longer sustainable.' Here's what the shift to usage-based billing means for AI builders.
Andrej Karpathy's LLM Wiki Pattern: Cut Claude Token Usage 95% with a Two-Folder System
One user turned 383 files and 100+ meeting transcripts into a compact wiki using Karpathy's raw/wiki pattern — and dropped Claude token usage by 95%.
Mac Mini M4 Pro vs RTX 5090 vs DGX Spark: Which Local AI Hardware Is Right for You in 2026?
Mac mini M4 Pro at 64GB, RTX 5090 at 32GB GDDR7, or DGX Spark at 128GB unified memory — here's the honest hardware comparison for running AI models locally.
Open Brain: The Open-Source Memory System That Lets You Rebuild AI Indexes Without Losing Your Data
Open Brain separates raw data from embeddings in SQL — so when better embedding models arrive, you rebuild the index without touching source data.
Post-Quantum Cryptography: What Engineers Need to Do Before 2029 (And Why Waiting Is Already Too Late)
Governments are already storing encrypted traffic to decrypt once quantum computers arrive. Here's the engineer's checklist for PQC migration before 2029.
The 4-Criteria Job Test That Gets Specialist AI Tools Approved Over Corporate Defaults
Run weekly. Takes 30+ minutes. Instant judgment. Real audience. Use these four criteria to build an evidence-based case for Claude or Codex at work.
5 Claude Code Skills That Cut Token Costs by Up to 70% — Benchmarked Across Real Sessions
Superpowers saves 14% tokens. Graphify cuts costs 70x on large codebases. Firecrawl reduces 80% vs raw HTML. Five skills benchmarked with real data.
The 7-Model Local AI Portfolio: How to Route Tasks Across Local and Cloud Models for Maximum Performance
One model can't do everything. Here's the 7-model local portfolio — from fast local inference to frontier cloud fallback — and how to route between them.
Claude Design Token Management: How to Stretch Your Weekly Usage Limit
Claude Design has a separate weekly quota from Claude Code. These 10 strategies help you get more done without burning through your session limit.
How to Cut Your AI Inference Bill Before It Spikes: A 5-Step Enterprise Playbook
From use-case audits to escape hatch architecture: the five steps enterprises need to run before AI costs overtake payroll.
How to Connect Firecrawl to Claude Code and Cut Web Scraping Token Costs by 80%
Firecrawl's MCP connector gives Claude Code clean web data instead of raw HTML — cutting token use by up to 80%. Here's the setup and a live lead gen demo.
Goldman Sachs Says AI Inference Is Approaching 10% of Payroll — 5 Steps to Audit Your Exposure Now
Goldman Sachs reports inference costs nearing 10% of headcount. Abacus AI says their AI bill beats payroll in 6 months. Here's your cost audit playbook.
Graphify for Claude Code: How a Karpathy-Inspired Knowledge Graph Cuts Large Codebase Costs by 70x
Graphify maps file relationships into a queryable graph before Claude touches your code. For 500+ file projects, it can cut token costs by up to 70x.
How to Manage Claude Code Token Usage: 10 Techniques That Actually Work
Context rot kills AI agent quality. Learn 10 proven techniques to reduce token usage in Claude Code, from plan mode to /compact and skill design.