John Preskill's Quantum Paper Used an Open-Source LLM Optimizer — and It Made Algorithms 1,000x Better

John Preskill Co-Authored a Quantum Paper Where AI Did the Hard Part

John Preskill, one of the most respected names in quantum computing and the physicist who coined the term “quantum supremacy,” told Time that he was surprised by how much the qubit count dropped in his team’s latest paper. The tool responsible for a significant portion of that reduction was OpenEvolve — an open-source optimizer that uses large language models to search algorithm space through a process modeled on natural selection. The Oatomic/Caltech team’s early algorithms were roughly 1,000x worse before AI-assisted improvements. One author said plainly that the project “would not work” without them.

That sentence deserves a second read. Not “AI helped a little.” Not “AI accelerated our timeline.” The project would not work.

If you build AI systems, that framing should interest you — not because quantum computing is your problem today, but because it tells you something specific about what LLM-based optimization is actually capable of when pointed at hard technical search problems.

What OpenEvolve Actually Did

The standard story about AI in research is that it speeds up literature review, helps write code, and catches errors. Useful, but incremental. What happened here is different.

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

OpenEvolve works by treating algorithm design as an evolutionary search problem. You start with candidate algorithms, evaluate them against some fitness function, and use an LLM to propose mutations and recombinations — essentially the same loop as genetic algorithms, but with a language model doing the variation step instead of random perturbation. The LLM can draw on its training to propose variations that are semantically meaningful rather than random noise.

The Oatomic team used this to optimize the quantum circuits needed to run Shor’s algorithm at cryptographic scale. Shor’s algorithm has been known since the 1990s, but the resource requirements to run it against real-world encryption have historically been enormous. The question the team was working on: how few qubits can you actually get away with?

Their early estimates were bad — about 1,000x worse than what they eventually published. The AI-assisted search combined results from niche subfields of quantum computing in ways the human researchers hadn’t tried. Preskill confirmed the qubit reduction surprised him. The paper’s conclusion: Shor’s algorithm could run at cryptographically relevant scales with approximately 10,000 reconfigurable atomic qubits, and with around 26,000 physical qubits, a system could attack the P-256 elliptic curve problem in a few days under plausible assumptions.

That’s a significant number. For context, Google separately estimated that a quantum attack on P-256 elliptic curve cryptography would require fewer than 1,200 logical qubits and fewer than 19 million Toffoli gates — or in an alternative formulation, 1,450 logical qubits and fewer than 17 million Toffoli gates. Google also estimated this could run on a superconducting quantum computer with fewer than 500,000 physical qubits, potentially executing in minutes.

Two independent research groups, different hardware assumptions, converging on “this is more tractable than we thought.”

Why the 1,000x Number Is the Interesting Part

Most coverage of this story focuses on the cryptographic implications — and those are real. But for builders, the more interesting signal is what the 1,000x improvement says about the tool.

Algorithm optimization is a notoriously hard search problem. The space of possible algorithms is enormous, the fitness landscape is rugged (small changes can make things much worse before they get better), and human intuition about which directions to explore is limited by what researchers have already read and thought about. This is exactly the kind of problem where exhaustive search is intractable and random search is wasteful.

What LLM-based evolutionary search adds is a prior. The model has absorbed a huge amount of technical literature, so when it proposes a mutation to an algorithm, it’s not proposing random bit flips — it’s proposing changes that are at least syntactically coherent and often semantically informed. The Oatomic team specifically noted that the AI combined past scientific results across niche subfields in novel ways. That’s the LLM doing what LLMs are actually good at: pattern-matching across a large corpus and surfacing non-obvious connections.

This is meaningfully different from “AI wrote the paper.” Preskill was explicit that humans were still driving the research — asking the right questions, evaluating outputs, deciding which directions to pursue. The AI was a search engine for the idea space, not a replacement for the scientists navigating it.

That distinction matters if you’re thinking about where to apply similar techniques. The pattern is: human defines the problem and the fitness function, AI searches the solution space, human evaluates and steers. The human contribution is irreplaceable at the framing step. The AI contribution is irreplaceable at the search step, once the space is large enough that human intuition alone can’t cover it.

This same pattern shows up in software tooling. When you’re building an AI-powered workflow and need to chain multiple models together across different tasks, platforms like MindStudio handle the orchestration layer — 200+ models, 1,000+ integrations, and a visual builder for composing agents — so the human effort concentrates on defining what the system should do rather than wiring APIs together.

The Honest Caveats (and Why They Don’t Fully Defuse the Finding)

Princeton’s Jeff Thompson offered the most pointed skepticism: “It is very easy to shrink a computer on paper if you assume better qubits.” That’s a legitimate methodological concern. The Oatomic paper is a theoretical resource estimate, not a working system. At the time of reporting, it hadn’t been peer-reviewed. The assumptions about qubit fidelity and error correction may not hold in practice.

These caveats are real. You should hold the specific numbers loosely.

But here’s what the caveats don’t address: the 1,000x improvement in the Oatomic team’s own algorithms happened. That’s not a claim about future hardware — it’s a claim about what the optimization process produced, measured against the team’s own prior work. The AI found better algorithms. Whether those algorithms run on hardware that exists today is a separate question.

The more important point is directional. If AI-assisted algorithm search can find 1,000x improvements in one research cycle, and quantum hardware continues improving on its own trajectory, the two curves are moving toward each other. The cryptographic threat isn’t static. Google verified its own estimates using a zero-knowledge proof — a technique that lets you prove a claim is valid without revealing the underlying method. That’s an unusual choice, and it suggests Google took the sensitivity of publishing detailed attack circuits seriously enough to work around it.

Cloudflare’s internal reaction was described as “a real shock,” and the company is accelerating its 2029 deadline considerably. Cloudflare is not a research lab making theoretical claims — it’s an infrastructure company that routes a significant fraction of internet traffic and has to make real engineering commitments. When they say the timeline has moved, they mean they are changing their roadmap.

What This Means If You Build AI Systems

The cryptographic migration story is covered elsewhere. The more specific question for AI builders is: what does OpenEvolve’s success tell you about where LLM-based optimization is useful?

A few observations worth sitting with.

The fitness function is the hard part. OpenEvolve works because the Oatomic team could evaluate candidate algorithms against a clear metric: resource requirements for running Shor’s algorithm. If you can’t define what “better” means precisely, evolutionary search degrades quickly. The human work of specifying the problem is load-bearing.

Hire a contractor. Not another power tool.

Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.

LLMs are better at variation than at evaluation. In the OpenEvolve loop, the LLM proposes changes and humans (or automated tests) evaluate them. This is the right division of labor. LLMs are unreliable evaluators of their own outputs in technical domains — but they’re good at generating plausible variations that a human or a test suite can then filter. Designs that try to use the LLM for both steps tend to drift.

Cross-domain synthesis is the specific superpower. The Oatomic team noted that the AI combined results from niche subfields in novel ways. This is consistent with what practitioners report in other domains: LLMs are most useful not when they’re doing deep work within a single well-defined area, but when they’re bridging between areas that human researchers haven’t connected. If your problem has relevant prior work scattered across multiple literatures, LLM-assisted search is probably more valuable than if your problem is well-contained. The Qwen 3.6 Plus agentic coding model is a recent example of a model specifically optimized for this kind of multi-step technical reasoning — the same capability profile that makes LLMs useful in evolutionary search loops.

The 1,000x number is a ceiling, not a floor. The Oatomic team’s early algorithms were bad. OpenEvolve had a lot of room to improve them. If you’re starting from a near-optimal baseline, you won’t see 1,000x gains. The technique is most valuable when the existing solution space is poorly explored — which is often true in new research areas, less often true in mature engineering domains.

This connects to a broader shift in how technical work gets specified. When the Oatomic team handed OpenEvolve a problem, they were essentially writing a spec: here’s the algorithm structure, here’s the fitness function, here’s the search space. The AI compiled solutions from that spec. Tools like Remy take a similar approach to full-stack application development — you write an annotated markdown spec, and the system compiles a complete TypeScript backend, database, auth layer, and deployment from it. The spec is the source of truth; the generated output is derived. The abstraction level has moved up, but the precision requirement hasn’t gone away.

The Broader Pattern

NIST finalized its first three post-quantum cryptography standards on August 13, 2024. Cloudflare reports that more than 65% of human traffic through its network is already post-quantum encrypted. The migration is underway. The harvest-now-decrypt-later threat — where adversaries collect encrypted data today to decrypt it once quantum hardware matures — has been flagged by NSA, CISA, and NIST. The data being collected now includes things that need to stay secret for decades.

The OpenEvolve result sits inside this larger story as a specific data point about acceleration. The threat model for cryptographic systems has historically been updated slowly, because the underlying math changes slowly. What’s different now is that the algorithm optimization layer — the part that determines how efficiently a quantum computer can execute an attack — is now subject to AI-assisted search. That search can run continuously, in parallel, at a pace no human research team can match.

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

Preskill’s surprise at the qubit reduction is the honest signal here. He’s one of the people best positioned to have intuitions about what’s achievable, and the result exceeded his expectations. When domain experts are surprised by what AI-assisted search finds, that’s worth paying attention to — not because it means the threat is imminent, but because it means the rate of progress is harder to predict than it was.

For AI builders specifically, the takeaway isn’t “go work on post-quantum cryptography.” It’s that evolutionary LLM search is a real technique with documented results in hard technical domains, and the pattern — human defines problem, AI searches solution space, human steers — is worth understanding and applying. The Oatomic team’s experience is one of the cleaner documented examples of what that loop produces when it works.

The researchers said the project would not work without the AI-assisted improvements. That’s a strong claim from people who had every incentive to downplay the AI contribution. Take it seriously.

If you’re curious about what self-optimizing AI systems look like in practice, the MiniMax M2.7 self-evolving model is a related case study — a model that improved itself 30% on internal benchmarks through recursive self-optimization, which is a different mechanism but the same general idea of AI improving AI. And if you’re thinking about how LLM-based reasoning gets applied to hard optimization problems, the Claude Code effort levels breakdown is a practical look at how reasoning depth trades off against cost in a production tool — the same tradeoff the OpenEvolve loop has to manage at scale.

The algorithms got 1,000x better. The hardware is improving independently. The two curves are converging. The researchers are surprised. That’s the situation.

John Preskill's Quantum Paper Used an Open-Source LLM Optimizer — and It Made Algorithms 1,000x Better

John Preskill Co-Authored a Quantum Paper Where AI Did the Hard Part

What OpenEvolve Actually Did

Other agents start typing. Remy starts asking.

Why the 1,000x Number Is the Interesting Part

The Honest Caveats (and Why They Don’t Fully Defuse the Finding)

What This Means If You Build AI Systems

Hire a contractor. Not another power tool.

The Broader Pattern

Related Articles

Andrej Karpathy Said 'The Tokenizer Must Go' — DeepSeek's Vision Architecture Is Starting to Prove Him Right

DeepSeek's 'Thinking with Visual Primitives': 5 Technical Breakthroughs in the Paper That Briefly Disappeared

DeepSeek Vision's 7,000x Image Compression Pipeline: From 756px Input to 81 KV Cache Entries

Andrej Karpathy on DeepSeek's OCR Paper: Why Pixels May Beat Tokens as AI Inputs