Optimization Articles
Browse 205 articles about Optimization.
What Is Context Rot in AI Agents and How Do You Fix It?
Context rot happens when AI forgets earlier session data as context grows. Learn the session hooks, semantic search, and GSD framework that prevent it.
Why Anthropic's 70% Inference Margins Matter for Your API Costs — And What to Expect Next
Anthropic's inference margins jumped from 38% to 70% in a year. Here's what that signals about future API pricing and model availability.
Atlassian Rovo Doubled Customer ARR Growth by Replacing RAG with a 20-Year-Old Knowledge Graph
Rovo customers grow ARR 2x faster than non-Rovo customers — and it skips RAG entirely, using Jira/Confluence's existing knowledge graph instead.
Claude Opus API Output Tokens Just Hit 80,000/min — 10x Increase Explained
Opus API output tokens jumped from 8k to 80k per minute overnight. What triggered it and what it means for production pipelines.
Gemini 3.5 (Speed) vs. Gemini Ultra (Memory) — Google's Two-Track Model Strategy Explained
Leaked: Gemini 3.2/3.5 optimized for speed, Gemini Ultra going deep on memory and long-context. Here's what Google's two-track model strategy means for…
GPT 5.5 Instant vs. GPT 5.3 Instant: Free Tier Just Got a Frontier-Level Upgrade
GPT 5.5 Instant scores 81.2 on AIM 2025 math vs. 65.4 for its predecessor. It's now the default for free and Go users. Here's what actually changed.
SubCube's 12M Token Layer for Claude Code and Codex: What a Sparse Attention Plugin Would Actually Change
SubCube plans a long-context layer that plugs into Claude Code and Codex. Here's what 12M tokens of coding context would actually unlock for agent workflows.
SubCube Claims 12M Token Context at 5% of Opus Cost — 5 Numbers Behind the Sparse Attention Breakthrough
SubCube's SSA architecture claims 12M tokens, 52x Flash Attention speed, and sub-5% Opus cost. Here are the five numbers and what they'd mean if true.
Your AI Agent Is Underperforming: Run This 4-Question Harness Audit Before Switching Models
Before you upgrade your model, run this 4-question audit on your orchestration layer. Most performance problems live there, not in the weights.
Better Model vs. Better Harness — Which One Actually Moves Your Agent's Benchmark Score?
The same model shows up to 6x performance variation based solely on harness design. Here's the data on where to invest first.
Claude Code Found the UTC Timezone Bug in a Cal.com Tool Call by Reading the Conversation Transcript
The Cal.com tool was querying availability in UTC instead of local time. Claude found the bug by reading the transcript — without being told where to look.
Codex Automations Silently Default to GPT-5.2 — Here's How to Fix the Hidden Model Setting
Codex automations quietly use GPT-5.2 instead of GPT-5.5 by default. This hidden setting caused a 40-minute automation to stall. Here's the fix.
Google Pomelli Video Animation Only Works in 9:16 — The Hidden Format Requirement Most Users Miss
The animate button in Pomelli only appears after switching to 9:16 story format. Animated text is also unreliable. Here's the workaround for both issues.
Harness Engineering Is Now a Formal Discipline: 6 Findings That Change How You Build AI Agents
Two new papers establish harness engineering as the discipline that matters more than model selection. Here's what the research shows.
Higgsfield MCP vs. CLI for Claude Code Agents — Why the CLI Is Significantly Cheaper for Agentic Workflows
The Higgsfield MCP exposes every tool simultaneously — expensive for agents. The CLI is purpose-built for agentic use and significantly cheaper.
Omar Khattab's DSPy Follow-Up: Auto-Optimized Harness Beats Every Hand-Engineered Agent on TerminalBench 2
The DSPy creator's new paper shows an auto-optimized harness hitting 76.4% on TerminalBench 2 — outscoring every hand-built entry in the field.
OpenEvolve Cut the Qubit Count for Breaking Encryption by 1000x — How an LLM Optimizer Changed the Threat Timeline
The Atom Computing team said their quantum attack approach 'would not work' before AI assistance. OpenEvolve's LLM-based optimizer changed that by 1000x.
How to Start Your Post-Quantum Migration Before 2029: A Practical Checklist for Engineering Teams
NIST published three PQC standards in August 2024. Here's the practical migration checklist for engineering teams who need to act before the 2029 window closes.
Rewriting Agent Control Logic from Python to Natural Language Cut Runtime from 361 to 41 Minutes
No model swap, no architecture change — just rewriting control logic in natural language dropped runtime by 88% and lifted benchmark scores 17 points.
SubCube Claims a 12M Token Context Window at 5% of Claude Opus Cost: What the Numbers Actually Say
A lab with under 3,000 followers is claiming 12M tokens, 52x speed over flash attention, and near-Opus performance. Here's what to believe and what to wait on.