Codex /goal: OpenAI's 'Ralph Loop' Feature That Ran a Device Driver Project for 14 Hours Without Stopping

A Coding Agent That Ran for 14 Hours Without You Touching It

Andrew Chen, a general partner at a16z, left Codex running on a low-level eGPU and Mac device driver project overnight. Fourteen hours later, it was still chipping away, making progress with each iteration. He hadn’t touched it.

That’s not a benchmark. That’s a qualitative shift in what “using an AI coding tool” means.

The feature responsible is /goal, and Philip Corey from the OpenAI Codex team describes it precisely: “our take on the Ralph loop — keep a goal alive across turns, don’t stop until achieved.” That framing matters. Most AI coding sessions are transactional — you prompt, it responds, you review, you prompt again. /goal breaks that model entirely. You hand it a mission and walk away.

This post is about how /goal actually works, what the Ralph loop concept means in practice, and how to use the feature in a way that produces results worth the token spend.

What “Keep a Goal Alive Across Turns” Actually Means

Standard Codex sessions are stateful within a conversation but fundamentally reactive. You ask, it answers. If the answer is wrong or incomplete, you correct it. The agent waits for you between every step.

Remy doesn't write the code. It manages the agents who do.

AGENTS ASSIGNED TO THIS BUILD

Remy

Product Manager Agent

Leading

Design

Engineer

Deploy

Remy runs the project. The specialists do the work. You work with the PM, not the implementers.

The Ralph loop — named after a concept in autonomous agent research — is different. The agent sets a goal, takes an action, observes the result, and then decides the next action based on that observation. It doesn’t stop and ask you what to do next. It keeps iterating until the goal state is reached or it genuinely can’t proceed.

/goal is Codex’s implementation of this. When you invoke it with a well-formed prompt, Codex enters a persistent loop: plan, execute, observe, re-plan, execute again. It will spawn subprocesses, write code, run tests, read error output, adjust, and try again — all without waiting for you to intervene.

Alex Finn, who tested the feature extensively, put it bluntly: “It allows your AI agent to quite literally work for days without stopping. You give it a mission, it works until the mission is complete.” He also built an entire complex extraction shooter video game in over an hour using /goal, with the image generation skill enabled so the agent generated all game assets autonomously.

The token implications are significant. Andrew Chen noted it’s “obvious it’s going to 10,000x token use.” That’s not hyperbole — it’s arithmetic. A session where you prompt 20 times and review each response uses maybe a few hundred thousand tokens. A session that runs for 14 hours unattended uses orders of magnitude more. Budget accordingly before you start.

What You Need Before Running /goal

A Codex CLI setup. /goal runs in the Codex CLI, not the web interface. You need the CLI installed and authenticated against your OpenAI account. If you’ve only used Codex through the browser, this is the prerequisite that trips most people.

A project with enough context for the agent to navigate. /goal works best on codebases where the agent can read existing files, understand the structure, and make meaningful decisions about what to change. A blank repo with a vague instruction is a recipe for confident nonsense. A repo with existing code, tests, and a README gives the agent something to work with.

The image generation skill, if you want assets. Alex Finn’s game example worked because he enabled the image gen skill before running /goal. The agent could then generate visual assets as part of the same autonomous loop. Skills in Codex are modular — you install them before the session, and the agent can invoke them as tools during the run.

A token budget you’ve thought about. Seriously. Fourteen hours of autonomous LLM work is not cheap. Know your OpenAI API spend limits before you kick off an overnight run. Set hard limits in your account if you haven’t already.

A good prompt. This is the one that most people underestimate, and it’s the subject of the next section.

How to Actually Run /goal (and Not Waste the Tokens)

Step 1: Don’t write the /goal prompt yourself

This is the counterintuitive part. Alex Finn tested /goal extensively and found that “basically any prompt I hand-write after /goal is never good enough. It produces results that might as well have been a normal prompt.”

The fix is meta-prompting. You use a separate AI — one that already has context on your project — to generate the /goal prompt for you. The process:

Open a chat with an AI that knows your codebase (Claude, GPT-5.5, whatever you use for planning).
Say: “I’m working with Codex and I want to use their new /goal feature. Please research the /goal feature. Then look at our project and give me three options for how we could use /goal to be maximally productive. Then give me a highly detailed /goal prompt for each.”
Review the three options. Pick the one that matches the scope you want.
Take that prompt to the Codex CLI and type /goal [the generated prompt].

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

The reason this works is that the planning AI can reason about what makes a good /goal prompt — a verifiable end state, a clear scope, checkpoints the agent can test against — in a way that’s hard to do cold when you’re staring at a blank terminal.

The AI Daily Brief host ran this exact process using GPT-5.5. He asked it to research /goal and identify which of his projects were well-suited to it. The model’s response was instructive: “Yes, this is a real /goal-shaped idea, but only after you separate two things. Building the system is a normal Codex project, but running the system every day against the new episode can become a /goal project. The key is: can the objective be made persistent, inspectable, and verifiable?”

That last sentence is the design principle. /goal is not for open-ended exploration. It’s for tasks where you can define done.

Now you have a prompt that was designed for the feature, not just typed at it.

Step 2: Define “done” before you start

The agent needs to know when to stop. A /goal prompt that says “improve the codebase” will run forever or produce random changes. A prompt that says “implement OAuth2 login, write integration tests that pass, and update the README with setup instructions” gives the agent a verifiable terminal state.

Good /goal prompts have:

A specific output (a feature, a passing test suite, a generated file)
A way for the agent to verify success (run the tests, check the output, compare against a spec)
A bounded scope (not “refactor everything,” but “refactor the authentication module”)

Now you have a prompt the agent can actually complete.

Step 3: Run it and leave it alone

Type /goal [your prompt] in the Codex CLI and resist the urge to intervene. The whole point is that the agent handles the iteration loop. If you keep jumping in to correct it, you’re back to a normal session.

Check in periodically — every few hours for a long run — to make sure it hasn’t hit a genuine blocker that requires human input. But the default posture is: let it run.

Now you have an agent working on your behalf while you do other things.

Step 4: Review the output as a diff, not a conversation

When the run completes (or you check in), review what changed as you would any pull request. Look at the diff. Run the tests. Don’t just read the agent’s summary of what it did — verify it.

This is where the persistent, inspectable, verifiable principle pays off. If you defined done correctly, you have a clear way to check whether the agent achieved it.

Now you have a completed task and a clear record of what changed.

The Real Failure Modes

Vague goals that produce confident drift. The agent will keep running even if it’s going in circles. A poorly defined goal doesn’t cause the agent to stop and ask for clarification — it causes the agent to make up its own interpretation and execute against that. You come back to 14 hours of work that solved the wrong problem.

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

Missing context in the repo. If the agent can’t read enough of the codebase to understand what it’s working with, it will make assumptions. Those assumptions compound over a long run. Before starting a /goal session, make sure the relevant files are accessible and the project structure is legible.

Token spend surprises. Andrew Chen’s observation that /goal will “10,000x token use” is not a complaint — he found it valuable. But if you’re on a tight API budget and you kick off an overnight run without checking your limits, you may wake up to a large bill and an incomplete task. Set spending limits in your OpenAI account before running long sessions.

The Chrome plugin interference. Separately from /goal, Codex’s new Chrome plugin — which enables browser control from within Codex — is working but buggy at launch. In testing, it was blocked by an open extension UI. If you’re running /goal on a task that involves browser automation, be aware that the Chrome integration is still rough. The plugin is installed via Codex → Plugins → install Chrome extension, but expect friction.

Not using skills that the task needs. If your goal involves generating images, you need the image gen skill installed before you run /goal. The agent can’t install skills mid-run. Check what tools the task will need and install them first.

Where This Fits in the Broader Shift Toward Persistent Agents

/goal is one implementation of a pattern that’s appearing across the tooling landscape. Cursor added /orchestrate the same week — described as a skill that “recursively spawns agents to tackle ambitious tasks with the Cursor SDK.” Anthropic’s Claude managed agents got multi-agent orchestration and a “dreaming” feature that reviews past sessions to find patterns. The direction is consistent: agents that run longer, observe their own outputs, and improve without waiting for human prompts between every step.

This is meaningfully different from agentic workflow patterns that chain discrete steps. Those patterns are still human-orchestrated at the macro level — you define the chain, the agent executes it. /goal pushes the orchestration further into the agent itself. The human defines the outcome; the agent figures out the path.

For teams thinking about how to structure longer-running AI work, the AutoResearch loop pattern is a useful frame. Karpathy’s approach — run experiments autonomously, measure results, keep improving overnight — maps directly onto what /goal enables for coding tasks. The agent isn’t just executing a fixed plan; it’s observing results and adjusting.

If you’re building orchestration infrastructure around this kind of persistent agent work, MindStudio’s approach is worth understanding: 200+ models, 1,000+ integrations, and a visual builder for composing agents and workflows — so you can wire up the surrounding system (notifications, logging, downstream triggers) without writing the orchestration code yourself.

The question of how to keep these agents running reliably is also non-trivial. Keeping a Claude Code agent running 24/7 covers the practical infrastructure side — preventing sleep, maintaining sessions — which applies equally to long /goal runs in Codex.

What to Actually Do This Week

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

Run the meta-prompting process on one real project. Don’t pick something trivial — pick something you’ve been putting off because it felt too large to tackle in a normal session. A feature you’ve been meaning to add. A test suite that needs to be written. A refactor you’ve been avoiding.

Use a separate AI to generate three /goal prompts for it. Pick the one with the clearest verifiable end state. Run it.

If you’re thinking about the next abstraction layer — where the output of a /goal run feeds into a deployed application rather than just a local codebase — Remy is worth a look. It compiles annotated markdown specs into complete TypeScript full-stack applications: backend, database with auto-migrations, auth, frontend, deployment. The spec is the source of truth; the generated code is derived output. That’s a different model than /goal, but they compose: use /goal to figure out what to build, use Remy to compile the spec into something deployed.

The 14-hour device driver run isn’t a party trick. It’s a preview of what the default working relationship with AI coding tools looks like when the session boundary disappears. The agents that matter going forward aren’t the ones that answer your questions — they’re the ones that work while you sleep.

Codex /goal: OpenAI's 'Ralph Loop' Feature That Ran a Device Driver Project for 14 Hours Without Stopping

A Coding Agent That Ran for 14 Hours Without You Touching It

What “Keep a Goal Alive Across Turns” Actually Means

Remy doesn't write the code. It manages the agents who do.

What You Need Before Running /goal

How to Actually Run /goal (and Not Waste the Tokens)

Step 1: Don’t write the /goal prompt yourself

Everyone else built a construction worker.
We built the contractor.

Step 2: Define “done” before you start

Step 3: Run it and leave it alone

Step 4: Review the output as a diff, not a conversation

The Real Failure Modes

Built like a system. Not vibe-coded.

Where This Fits in the Broader Shift Toward Persistent Agents

What to Actually Do This Week

Coding agents automate the 5%. Remy runs the 95%.

Related Articles

How to Deploy Hermes Agent on a VPS in Under an Hour: Step-by-Step Docker Setup Guide

How to Set Up Hermes Agent on a VPS with Telegram in Under 30 Minutes: Complete Setup Guide

How to Set Up OpenAI Codex for Multi-Hour Agentic Runs: /goal Command Step-by-Step

OpenAI Codex Super-App: 9 Features Most Users Haven't Found Yet

A Coding Agent That Ran for 14 Hours Without You Touching It

What “Keep a Goal Alive Across Turns” Actually Means

Remy doesn't write the code. It manages the agents who do.

What You Need Before Running /goal

How to Actually Run /goal (and Not Waste the Tokens)

Step 1: Don’t write the /goal prompt yourself

Everyone else built a construction worker.We built the contractor.

Step 2: Define “done” before you start

Step 3: Run it and leave it alone

Step 4: Review the output as a diff, not a conversation

The Real Failure Modes

Built like a system. Not vibe-coded.

Where This Fits in the Broader Shift Toward Persistent Agents

What to Actually Do This Week

Coding agents automate the 5%. Remy runs the 95%.

Related Articles

How to Deploy Hermes Agent on a VPS in Under an Hour: Step-by-Step Docker Setup Guide

How to Set Up Hermes Agent on a VPS with Telegram in Under 30 Minutes: Complete Setup Guide

How to Set Up OpenAI Codex for Multi-Hour Agentic Runs: /goal Command Step-by-Step

OpenAI Codex Super-App: 9 Features Most Users Haven't Found Yet

Everyone else built a construction worker.
We built the contractor.