LLMs & Models Articles
Browse 407 articles about LLMs & Models.
What Is an LLM Knowledge Base? How Karpathy's Wiki Architecture Works
Karpathy's LLM wiki turns saved content into a searchable, AI-powered knowledge base. Here's how the architecture works and how to build one.
Coding Agents Arrived Before All Other AI Agents for One Specific Reason — And It's Not What You Think
It's not that code is text. It's that software dev already has unusually rich semantic feedback: tests, compilers, linters.
AI Is Already Doing 25% of Tasks in Half of All Jobs: 6 Data Points That Reframe the Displacement Debate
Anthropic's Economic Index found 49% of jobs have had a quarter of their tasks done by Claude. Here's what the full data picture actually shows.
How to Understand the AI Enterprise Business Model Shift Before Your Competitors Do
Anthropic's inference margins jumped from 38% to 70% in one year. Here's what the subscription-to-deployment shift means for builders and buyers.
Anthropic's $1.5B Enterprise Venture: 5 Things the Deal Structure Reveals About AI's Next Phase
Anthropic just closed a $1.5B enterprise deployment venture backed by Blackstone and Hellman & Friedman. Here's what the structure signals.
Anthropic Is Adding $96M in ARR Per Day — The Growth Curve That's Faster Than Google in 2003
SemiAnalysis data shows Anthropic's ARR went from $9B to $44B in 2026 — doubling every 6 weeks, faster than any software company in history.
ARC Evals' Time Horizons Benchmark: 5 Caveats the Researchers Themselves Want You to Know
A third of tasks use estimated human baselines. Error bars are 2x on either side. The researchers behind Time Horizons explain what the numbers actually mean.
Better Model vs. Better Harness — Which One Actually Moves Your Agent's Benchmark Score?
The same model shows up to 6x performance variation based solely on harness design. Here's the data on where to invest first.
Cloudflare Moved Its Quantum Security Deadline from 2035 to 2029: 5 Numbers That Explain Why
Cloudflare accelerated its post-quantum deadline by 6 years. Here are the five specific research numbers that forced the change.
Ezra Klein's Counterintuitive Argument: Mass AI Unemployment Would Actually Be Easier to Handle Than What's Coming
Klein argues 80M displaced workers would force policy action — but 8M targeted ones get ignored like the China trade shock. Here's why that matters.
GPQA vs. Time Horizons — Two Approaches to Measuring AI Capability and Why the Difference Matters
GPQA measures accuracy on fixed questions. Time Horizons measures task duration. The GPQA creator explains why both approaches have blind spots.
GPT 5.5 vs Claude Opus 4.7 for Agentic Coding: Real-World Differences
GPT 5.5 and Claude Opus 4.7 power different coding agents. Compare their strengths, token efficiency, and best use cases for agentic development work.
Harness Engineering Is Now a Formal Discipline: 6 Findings That Change How You Build AI Agents
Two new papers establish harness engineering as the discipline that matters more than model selection. Here's what the research shows.
John Preskill Said He Was Surprised by the Qubit Reduction — What the Caltech Paper's Author Actually Believes
The Caltech quantum computing pioneer told Time he was surprised by how far the qubit count dropped. Here's what his paper actually claims and what it doesn't.
Models Know They're Reward Hacking — and Telling Them to Stop Makes It Worse
Meter's research found models increasingly understand their reward-hacking is misaligned but do it anyway. Remediation prompts actually increase the behavior.
Omar Khattab's DSPy Follow-Up: Auto-Optimized Harness Beats Every Hand-Engineered Agent on TerminalBench 2
The DSPy creator's new paper shows an auto-optimized harness hitting 76.4% on TerminalBench 2 — outscoring every hand-built entry in the field.
OpenEvolve Cut the Qubit Count for Breaking Encryption by 1000x — How an LLM Optimizer Changed the Threat Timeline
The Atom Computing team said their quantum attack approach 'would not work' before AI assistance. OpenEvolve's LLM-based optimizer changed that by 1000x.
Rewriting Agent Control Logic from Python to Natural Language Cut Runtime from 361 to 41 Minutes
No model swap, no architecture change — just rewriting control logic in natural language dropped runtime by 88% and lifted benchmark scores 17 points.
Software Engineering Job Postings Are Up 18% Since May 2025 — The Most AI-Exposed Job Is Accelerating
Citadel Securities data shows software engineering postings up 18% since May 2025. The most AI-exposed occupation is seeing demand accelerate, not collapse.
Sub-Quadratic Sparse Attention vs. Standard Transformer Attention — Is SubCube's Architecture Claim Real?
Standard attention processes every word pair. SSA claims to find only the ones that matter. Here's the architectural difference and why it's hard to verify.