Codex /goal: OpenAI's 'Ralph Loop' Feature That Ran a Device Driver Project for 14 Hours Without Stopping
Codex's /goal feature keeps a task alive across turns until complete — one user ran it on a device driver project for 14 hours overnight. Here's how it works.
A Coding Agent That Ran for 14 Hours Without You Touching It
Andrew Chen, a general partner at a16z, left Codex running on a low-level eGPU and Mac device driver project overnight. Fourteen hours later, it was still chipping away, making progress with each iteration. He hadn’t touched it.
That’s not a benchmark. That’s a qualitative shift in what “using an AI coding tool” means.
The feature responsible is /goal, and Philip Corey from the OpenAI Codex team describes it precisely: “our take on the Ralph loop — keep a goal alive across turns, don’t stop until achieved.” That framing matters. Most AI coding sessions are transactional — you prompt, it responds, you review, you prompt again. /goal breaks that model entirely. You hand it a mission and walk away.
This post is about how /goal actually works, what the Ralph loop concept means in practice, and how to use the feature in a way that produces results worth the token spend.
What “Keep a Goal Alive Across Turns” Actually Means
Standard Codex sessions are stateful within a conversation but fundamentally reactive. You ask, it answers. If the answer is wrong or incomplete, you correct it. The agent waits for you between every step.
Remy doesn't write the code. It manages the agents who do.
Remy runs the project. The specialists do the work. You work with the PM, not the implementers.
The Ralph loop — named after a concept in autonomous agent research — is different. The agent sets a goal, takes an action, observes the result, and then decides the next action based on that observation. It doesn’t stop and ask you what to do next. It keeps iterating until the goal state is reached or it genuinely can’t proceed.
/goal is Codex’s implementation of this. When you invoke it with a well-formed prompt, Codex enters a persistent loop: plan, execute, observe, re-plan, execute again. It will spawn subprocesses, write code, run tests, read error output, adjust, and try again — all without waiting for you to intervene.
Alex Finn, who tested the feature extensively, put it bluntly: “It allows your AI agent to quite literally work for days without stopping. You give it a mission, it works until the mission is complete.” He also built an entire complex extraction shooter video game in over an hour using /goal, with the image generation skill enabled so the agent generated all game assets autonomously.
The token implications are significant. Andrew Chen noted it’s “obvious it’s going to 10,000x token use.” That’s not hyperbole — it’s arithmetic. A session where you prompt 20 times and review each response uses maybe a few hundred thousand tokens. A session that runs for 14 hours unattended uses orders of magnitude more. Budget accordingly before you start.
What You Need Before Running /goal
A Codex CLI setup. /goal runs in the Codex CLI, not the web interface. You need the CLI installed and authenticated against your OpenAI account. If you’ve only used Codex through the browser, this is the prerequisite that trips most people.
A project with enough context for the agent to navigate. /goal works best on codebases where the agent can read existing files, understand the structure, and make meaningful decisions about what to change. A blank repo with a vague instruction is a recipe for confident nonsense. A repo with existing code, tests, and a README gives the agent something to work with.
The image generation skill, if you want assets. Alex Finn’s game example worked because he enabled the image gen skill before running /goal. The agent could then generate visual assets as part of the same autonomous loop. Skills in Codex are modular — you install them before the session, and the agent can invoke them as tools during the run.
A token budget you’ve thought about. Seriously. Fourteen hours of autonomous LLM work is not cheap. Know your OpenAI API spend limits before you kick off an overnight run. Set hard limits in your account if you haven’t already.
A good prompt. This is the one that most people underestimate, and it’s the subject of the next section.
How to Actually Run /goal (and Not Waste the Tokens)
Step 1: Don’t write the /goal prompt yourself
This is the counterintuitive part. Alex Finn tested /goal extensively and found that “basically any prompt I hand-write after /goal is never good enough. It produces results that might as well have been a normal prompt.”
The fix is meta-prompting. You use a separate AI — one that already has context on your project — to generate the /goal prompt for you. The process:
- Open a chat with an AI that knows your codebase (Claude, GPT-5.5, whatever you use for planning).
- Say: “I’m working with Codex and I want to use their new /goal feature. Please research the /goal feature. Then look at our project and give me three options for how we could use /goal to be maximally productive. Then give me a highly detailed /goal prompt for each.”
- Review the three options. Pick the one that matches the scope you want.
- Take that prompt to the Codex CLI and type
/goal [the generated prompt].
Everyone else built a construction worker.
We built the contractor.
One file at a time.
UI, API, database, deploy.
The reason this works is that the planning AI can reason about what makes a good /goal prompt — a verifiable end state, a clear scope, checkpoints the agent can test against — in a way that’s hard to do cold when you’re staring at a blank terminal.
The AI Daily Brief host ran this exact process using GPT-5.5. He asked it to research /goal and identify which of his projects were well-suited to it. The model’s response was instructive: “Yes, this is a real /goal-shaped idea, but only after you separate two things. Building the system is a normal Codex project, but running the system every day against the new episode can become a /goal project. The key is: can the objective be made persistent, inspectable, and verifiable?”
That last sentence is the design principle. /goal is not for open-ended exploration. It’s for tasks where you can define done.
Now you have a prompt that was designed for the feature, not just typed at it.
Step 2: Define “done” before you start
The agent needs to know when to stop. A /goal prompt that says “improve the codebase” will run forever or produce random changes. A prompt that says “implement OAuth2 login, write integration tests that pass, and update the README with setup instructions” gives the agent a verifiable terminal state.
Good /goal prompts have:
- A specific output (a feature, a passing test suite, a generated file)
- A way for the agent to verify success (run the tests, check the output, compare against a spec)
- A bounded scope (not “refactor everything,” but “refactor the authentication module”)
Now you have a prompt the agent can actually complete.
Step 3: Run it and leave it alone
Type /goal [your prompt] in the Codex CLI and resist the urge to intervene. The whole point is that the agent handles the iteration loop. If you keep jumping in to correct it, you’re back to a normal session.
Check in periodically — every few hours for a long run — to make sure it hasn’t hit a genuine blocker that requires human input. But the default posture is: let it run.
Now you have an agent working on your behalf while you do other things.
Step 4: Review the output as a diff, not a conversation
When the run completes (or you check in), review what changed as you would any pull request. Look at the diff. Run the tests. Don’t just read the agent’s summary of what it did — verify it.
This is where the persistent, inspectable, verifiable principle pays off. If you defined done correctly, you have a clear way to check whether the agent achieved it.
Now you have a completed task and a clear record of what changed.
The Real Failure Modes
Vague goals that produce confident drift. The agent will keep running even if it’s going in circles. A poorly defined goal doesn’t cause the agent to stop and ask for clarification — it causes the agent to make up its own interpretation and execute against that. You come back to 14 hours of work that solved the wrong problem.
Built like a system. Not vibe-coded.
Remy manages the project — every layer architected, not stitched together at the last second.
Missing context in the repo. If the agent can’t read enough of the codebase to understand what it’s working with, it will make assumptions. Those assumptions compound over a long run. Before starting a /goal session, make sure the relevant files are accessible and the project structure is legible.
Token spend surprises. Andrew Chen’s observation that /goal will “10,000x token use” is not a complaint — he found it valuable. But if you’re on a tight API budget and you kick off an overnight run without checking your limits, you may wake up to a large bill and an incomplete task. Set spending limits in your OpenAI account before running long sessions.
The Chrome plugin interference. Separately from /goal, Codex’s new Chrome plugin — which enables browser control from within Codex — is working but buggy at launch. In testing, it was blocked by an open extension UI. If you’re running /goal on a task that involves browser automation, be aware that the Chrome integration is still rough. The plugin is installed via Codex → Plugins → install Chrome extension, but expect friction.
Not using skills that the task needs. If your goal involves generating images, you need the image gen skill installed before you run /goal. The agent can’t install skills mid-run. Check what tools the task will need and install them first.
Where This Fits in the Broader Shift Toward Persistent Agents
/goal is one implementation of a pattern that’s appearing across the tooling landscape. Cursor added /orchestrate the same week — described as a skill that “recursively spawns agents to tackle ambitious tasks with the Cursor SDK.” Anthropic’s Claude managed agents got multi-agent orchestration and a “dreaming” feature that reviews past sessions to find patterns. The direction is consistent: agents that run longer, observe their own outputs, and improve without waiting for human prompts between every step.
This is meaningfully different from agentic workflow patterns that chain discrete steps. Those patterns are still human-orchestrated at the macro level — you define the chain, the agent executes it. /goal pushes the orchestration further into the agent itself. The human defines the outcome; the agent figures out the path.
For teams thinking about how to structure longer-running AI work, the AutoResearch loop pattern is a useful frame. Karpathy’s approach — run experiments autonomously, measure results, keep improving overnight — maps directly onto what /goal enables for coding tasks. The agent isn’t just executing a fixed plan; it’s observing results and adjusting.
If you’re building orchestration infrastructure around this kind of persistent agent work, MindStudio’s approach is worth understanding: 200+ models, 1,000+ integrations, and a visual builder for composing agents and workflows — so you can wire up the surrounding system (notifications, logging, downstream triggers) without writing the orchestration code yourself.
The question of how to keep these agents running reliably is also non-trivial. Keeping a Claude Code agent running 24/7 covers the practical infrastructure side — preventing sleep, maintaining sessions — which applies equally to long /goal runs in Codex.
What to Actually Do This Week
Coding agents automate the 5%. Remy runs the 95%.
The bottleneck was never typing the code. It was knowing what to build.
Run the meta-prompting process on one real project. Don’t pick something trivial — pick something you’ve been putting off because it felt too large to tackle in a normal session. A feature you’ve been meaning to add. A test suite that needs to be written. A refactor you’ve been avoiding.
Use a separate AI to generate three /goal prompts for it. Pick the one with the clearest verifiable end state. Run it.
If you’re thinking about the next abstraction layer — where the output of a /goal run feeds into a deployed application rather than just a local codebase — Remy is worth a look. It compiles annotated markdown specs into complete TypeScript full-stack applications: backend, database with auto-migrations, auth, frontend, deployment. The spec is the source of truth; the generated code is derived output. That’s a different model than /goal, but they compose: use /goal to figure out what to build, use Remy to compile the spec into something deployed.
The 14-hour device driver run isn’t a party trick. It’s a preview of what the default working relationship with AI coding tools looks like when the session boundary disappears. The agents that matter going forward aren’t the ones that answer your questions — they’re the ones that work while you sleep.