How to Set Up OpenAI Codex for Multi-Hour Agentic Runs: /goal Command Step-by-Step

Most Codex Users Are Running Their Agent at a Fraction of Its Actual Range

You’ve probably been using Codex the same way you use ChatGPT: send a message, wait for a response, review the output, send another message. That loop works fine for one-shot tasks. It doesn’t work for building a complete extraction shooter game, deploying a YouTube analytics dashboard, or running a weekly data pipeline that touches GitHub and Vercel without you watching it.

The /goal command changes the loop entirely. Instead of a back-and-forth conversation, you give Codex a multi-hour or multi-day autonomous run — what the Codex team calls a “Ralph loop.” The catch: it’s gated behind a TOML file edit that most users never find, because the feature isn’t surfaced in the main UI. This post walks through the full setup, the supporting infrastructure that makes long runs actually work, and the failure modes you’ll hit if you skip steps.

What the /goal Command Actually Does (and Why It’s Hidden)

The /goal command activates what Codex internally calls Ralph loops — agentic runs that can continue for multiple hours or, in some configurations, multiple days without requiring human confirmation at each step. You type /goal followed by your objective, and Codex takes over: planning, executing, self-correcting, and continuing until the goal is met or it runs out of runway.

This is meaningfully different from a normal Codex session. In a standard session, Codex pauses frequently to ask for permission or clarification. In a Ralph loop, it keeps going. That’s useful when you want to go do something else while your agent builds. It’s also why the feature requires an explicit opt-in: an agent that runs for six hours without asking questions can do a lot of damage if it’s pointed at the wrong thing.

To enable it, you need to edit the Codex TOML configuration file. The exact path depends on your OS, but it lives in your Codex app data directory — on Mac, typically ~/.config/codex/config.toml. You’re looking for a flag that enables the Ralph loop / goal command feature, which is off by default. Once you flip it and restart the app, /goal becomes available as a slash command in any chat.

The reason this isn’t a toggle in Settings > General is almost certainly intentional. OpenAI is treating this as early access — powerful enough to be useful, experimental enough that they don’t want every new user hitting it on day one.

Before You Run /goal: The Infrastructure That Makes Long Runs Survivable

A multi-hour agentic run without the right scaffolding is a gamble. You’ll come back to find your agent stuck on a file lock, running the wrong model, or having forgotten everything it learned in the first hour. Three things need to be in place before you kick off a /goal run.

The agents.md file. This is Codex’s equivalent of Claude Code’s CLAUDE.md — a markdown file that lives in your project root and gets read at the start of every new chat. If your /goal run spawns sub-sessions or gets interrupted and restarted, the agents.md is what gives the agent continuity. It should contain your project goal, any API keys or environment notes (not the keys themselves — those go in .env), known failure modes, and any architectural decisions you’ve already made. Think of it as the onboarding doc you’d write for a contractor who’s going to work unsupervised overnight.

Creating it is simple: open a new chat in your project, describe your project and its goal, and ask Codex to generate the agents.md. It’ll write a structured markdown file and place it in your project directory. From that point forward, every new chat in that project starts with that context loaded.

Full access permissions. By default, Codex runs in a mode where it pauses to ask for approval on actions like network access, file writes, and shell commands. For a /goal run, that’s a problem — it’ll stall within minutes waiting for you to click “allow.” Go to Settings > General > Auto Review and switch from default permissions to full access. The UI flags this in orange, which is appropriate. Full access means Codex can do anything your local user account can do. Use it for projects where you’ve already validated the agent’s behavior in default mode, not for first runs on new codebases.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

Model selection. This is where most people silently lose performance. Codex automations and long runs default to GPT-5.2, not GPT-5.5. This isn’t obvious — there’s no warning, no banner, nothing. In practice, a run that should take 20 minutes can stretch to 40+ minutes or stall entirely because 5.2 is less capable at the kind of multi-step reasoning a /goal run requires. Before you start any long run, verify the model selector shows GPT-5.5. For intelligence level, medium handles most planning and brainstorming; high is appropriate for large builds or debugging sessions; extra high is for genuinely hard problems, not routine tasks. Extra high also burns your session budget faster, which matters given the 5-hour reset window. If you’re evaluating model tradeoffs more broadly, the GPT-5.4 vs Claude Opus 4.6 comparison is worth reading before you commit to a model strategy for long runs.

The Rate Limits Panel: Know Your Budget Before You Commit

Codex sessions run on a dual-reset system: a 5-hour rolling window and a weekly cap. Both are visible in Settings > Rate Limits Remaining, shown as percentages. Check this before starting a /goal run. If you’re at 40% remaining on your 5-hour window and you kick off a multi-hour goal, you may hit the wall mid-run.

The intelligence level you choose has a direct impact on how fast you burn through your session. Low costs the least; extra high costs the most. For a /goal run that’s going to take several hours, medium or high is usually the right call — you want enough reasoning capability to handle unexpected problems, but not so much that you’re burning extra-high tokens on routine file operations.

One practical note from real usage: GPT-5.5 is significantly more token-efficient than you might expect. Input and output tokens are both handled more efficiently than comparable Claude sessions, which means your session budget goes further than the raw numbers suggest. This is part of why long Codex runs are viable in a way that would be expensive on other platforms.

The Context Window Bar and Why You Should Watch It

At the bottom of every Codex chat, there’s a thin bar showing how much of the context window is currently filled. For a /goal run, this matters more than in a normal session because you’re accumulating context over a long period.

Codex auto-compacts context, similar to how Claude Code handles long sessions. But auto-compaction isn’t free — it can cause the agent to lose nuance about earlier decisions. If you’re watching a /goal run and the context bar is approaching full, it’s worth pausing to check whether the agent still has the context it needs. You can ask it directly: “Summarize what you’ve built so far and what the remaining steps are.” If the summary is accurate, you’re fine. If it’s vague or wrong, you may want to inject a reminder or restart with a fresh context that includes a summary of progress.

The agents.md file is your insurance policy here. If you’ve been updating it as the run progresses — adding notes about what’s been built, what failed, what decisions were made — then a context reset is much less painful. The agent reads agents.md at the start of every new chat, so it can pick up roughly where it left off.

Plan Mode: The Step You Should Take Before /goal

Plan mode is a toggle that prevents Codex from executing anything — it can only brainstorm and ask questions. Use it before every /goal run.

The workflow is: enable plan mode, describe your goal, let Codex ask clarifying questions, iterate on the plan until you’re aligned, then disable plan mode and run /goal. This sounds like extra overhead, but it’s the difference between a /goal run that completes successfully and one that burns three hours going down the wrong path.

How Remy works. You talk. Remy ships.

YOU14:02

Build me a sales CRM with a pipeline view and email integration.

REMY14:03 → 14:11

Scoping the project

Wiring up auth, database, API

Building pipeline UI + email integration

Running QA tests

✓ Live at yourapp.msagent.ai

In practice, plan mode surfaces assumptions you didn’t know you were making. If you say “build a YouTube analytics dashboard,” Codex in plan mode will ask: which data source, what format for the output, what should the first screen prioritize, where should it be deployed. Those questions feel annoying until you realize that without them, the agent would have made those decisions autonomously — and probably not the way you wanted.

The side chat feature is useful here too. If your main session is running a /goal loop and you want to ask a question without interrupting it, click “Open Side Chat” — it opens a parallel conversation in the same project context, so you can check in, ask questions, or brainstorm without touching the main agent session. This pattern maps closely to the multi-agent workflow patterns documented for Claude Code, where isolating sub-tasks into parallel threads prevents context bleed and keeps the main agent focused.

Skills: The Reusable Layer That Makes /goal Runs Consistent

A /goal run is only as good as the recipes the agent has available. Skills are those recipes — markdown files that tell Codex exactly how to do something, step by step, with the right endpoints, the right output format, and the right error handling.

Skills live in two places: globally at ~/.codex/skills/ (available across all Codex projects) and locally in your project directory (available only in that project). The global location is the same directory that Claude Code, Cursor, and OpenClaw read from, which means a skill you write once works across all of them.

The workflow for building a skill is: do the thing once in a normal Codex session, get an output you’re happy with, then say “turn what you just did into a skill.” Codex will reverse-engineer the process, write it as a structured markdown file, and place it in the appropriate skills directory. From that point, you can call it with a slash command or just describe what you want in natural language — Codex will find and apply the relevant skill automatically.

For /goal runs, skills are especially valuable because they give the agent a consistent playbook. Instead of the agent improvising how to pull YouTube comments or generate UI assets, it reads the skill file and follows the recipe. This is also how you encode lessons from previous runs — if a run failed because of a specific API quirk, you update the skill file so the next run doesn’t hit the same wall.

This kind of structured, spec-driven approach to building is something Remy takes to its logical conclusion: instead of a skill file that guides an agent, you write an annotated markdown spec and Remy compiles it directly into a full-stack TypeScript application — backend, database, auth, and deployment included. Different tool, same underlying insight: the more precise your source of truth, the more consistent the output.

Automations: Scheduling /goal Runs Without Babysitting Them

The Automations tab in Codex lets you schedule cron-style tasks — set a time, set a prompt, and Codex runs it automatically. This is how you turn a /goal run into a recurring workflow: build the skill once, verify it works, then schedule it to run every Sunday at 5pm without you touching it.

Day one: idea. Day one: app.

DAY

DELIVERED

Not a sprint plan. Not a quarterly OKR. A finished product by end of day.

The setup is straightforward: open the Automations tab, create a new automation, write the prompt that describes what you want done, set the schedule, and save. Codex injects that prompt into a new chat in your project at the scheduled time and runs it as a full agentic session.

Two things to check before you walk away. First, the model: automations default to GPT-5.2, not GPT-5.5. This is a known issue and it’s easy to miss because the automation UI doesn’t flag it prominently. Manually set the model to GPT-5.5 and the reasoning level to high before saving. Second, the permissions: automations run with whatever permissions your project is set to. If you’re on default permissions, the automation will stall waiting for approvals that never come. Full access is usually appropriate for scheduled automations on validated workflows.

One important constraint: Codex automations are local cron jobs. If the Codex app is closed or your machine is off, the automation doesn’t run. For truly 24/7 scheduled runs, you’d need to keep the app running or find a cloud-hosted alternative. This is a real limitation for production workflows — something worth factoring into your architecture before you build a business-critical pipeline on top of it.

If you’re thinking about how to build autonomous agent workflows that don’t depend on a local machine staying awake, MindStudio offers a different model: 200+ AI models, 1,000+ integrations, and a visual builder for chaining agents and workflows that runs in the cloud without any local dependency.

The GitHub + Vercel Pipeline: Where /goal Runs Land

A /goal run that builds something needs somewhere to put it. The standard Codex deployment pipeline is: Codex builds locally → pushes to a GitHub repository → Vercel picks up the commit and auto-deploys.

This pipeline is worth setting up before you run /goal on anything you want to keep. The reason: Vercel and GitHub have a direct integration — any commit to your repo triggers an automatic Vercel deployment. That means your /goal run can push changes to production without you doing anything after the initial setup.

The setup sequence: create a GitHub account if you don’t have one, connect it to Codex (Codex will walk you through the GitHub CLI authentication), let Codex create a repository for your project, then connect that repository to Vercel. From that point, every git push from Codex becomes a live deployment.

One thing to keep in mind: your .env file should never be committed. Codex knows this — it will automatically exclude .env.local and similar files from commits. But if you’ve put API keys anywhere else, move them to .env before your first push. The dot prefix tells every agent harness (Codex, Claude Code, Cursor) to treat the file as a secret.

For teams thinking about how autonomous agents fit into a broader architecture, the WAT framework — Workflows, Agents, and Tools — is a useful mental model. The GitHub → Vercel deployment chain is a good example of a workflow layer that sits underneath an agent layer: the agent (Codex) makes decisions, the workflow (git push → Vercel deploy) executes them reliably without the agent needing to understand deployment infrastructure.

What Actually Goes Wrong in Long Runs (and How to Handle It)

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

Three failure modes show up consistently in multi-hour /goal runs.

File locks. If you have a file open in another application while Codex is trying to write to it, the run stalls. This sounds obvious but it’s easy to forget — you open the Excel output to check progress, Codex tries to update it, and the automation hangs for 40 minutes before you notice. Close files before starting long runs.

Wrong model. Already covered, but worth repeating: check the model selector before every run. GPT-5.2 vs GPT-5.5 is the difference between a run that completes in 20 minutes and one that stalls for 40. The automation UI doesn’t warn you.

Context rot. In very long runs, the agent can lose track of earlier decisions as the context window fills and compacts. The symptom is the agent re-doing work it already did, or making decisions that contradict earlier ones. The fix is a well-maintained agents.md that gets updated as the run progresses, and periodic check-ins where you ask the agent to summarize its current state.

When something goes wrong, the right move is to stop the run, diagnose the problem, fix it (close the file, switch the model, update the agents.md), and restart. Don’t let a stalled run burn session budget trying to solve a problem that has a 20-second human fix.

The Practical Setup Checklist

If you want to run /goal this week, here’s the sequence that actually works:

Edit the TOML config file to enable the Ralph loop feature. Create an agents.md in your project with your goal, context, and any known constraints. Switch to full access in Settings > General > Auto Review. Verify the model is set to GPT-5.5 at high intelligence. Use plan mode to align on the approach before you commit. Build and validate the relevant skills first in a normal session. Check your rate limits remaining before starting. Close any files the agent might need to write to. Start the run and check back periodically — not constantly, but not never.

The /goal command is genuinely useful for autonomous agent workflows that would otherwise require you to babysit a session for hours. The TOML edit is a small friction cost for a meaningful capability unlock. The bigger investment is the scaffolding — agents.md, skills, permissions, model selection — that makes long runs reliable rather than lucky.

One opinion worth stating plainly: the teams getting the most out of tools like Codex aren’t the ones running the most ambitious single /goal commands. They’re the ones who’ve built up a library of validated skills, a well-maintained agents.md, and a deployment pipeline they trust. The /goal command is the engine. The scaffolding is what makes it go somewhere useful.