Alex Finn Built a Full Video Game in 1 Hour Using Codex's /goal Command — Here's How

Alex Finn Built a Complete Extraction Shooter in One Hour Using Codex’s /goal Command

Alex Finn ran Codex’s /goal command, walked away for an hour, and came back to a fully playable extraction shooter — complete with auto-generated art assets. That’s the demo that’s been circulating, and it’s worth understanding exactly how it works rather than treating it as magic.

The /goal command activates a multi-hour agentic loop inside the Codex desktop app. It’s not a standard feature you’ll find in the onboarding docs. To enable it, you have to edit a TOML configuration file — it’s explicitly an early access feature, which means the people finding it are the ones digging into config files rather than reading changelogs. Once enabled, you type /goal in chat, describe what you want to build, and Codex runs an autonomous loop that can sustain itself for “multiple hours or even up to days,” according to the people who’ve tested it.

The extraction shooter demo is the clearest evidence of what that loop can actually do.

What the /goal Loop Actually Did

The key detail in the Alex Finn demo isn’t just that Codex wrote game code. It’s that the loop also generated the visual assets autonomously — because Finn had enabled the image generation skill before triggering /goal.

Codex uses GPT Image 2 internally for asset generation. When you enable that skill before running a long agentic loop, the model doesn’t just write code and leave placeholder sprites — it generates the actual game art as part of the same session. The resulting top-down shooter had assets that, by the accounts of people who watched the demo, looked “surprisingly good for a top-down shooter.” Not AAA. Not animated. But coherent, stylistically consistent, and functional.

This is the workflow that produced it:

Enable the image generation skill (accessible via /image-gen slash command in chat)
Edit the TOML config file to enable /goal early access
Type /goal with a detailed description of the game
Leave it running

The loop handles code generation, asset creation, and iteration without requiring human approval at each step. That last part matters. By default, Codex pauses to ask permission before taking actions — you can see this in Settings > General, where the toggle reads “Auto Review or Full Access.” Full Access mode lets the loop run uninterrupted. For a multi-hour /goal session, you’d want Full Access enabled, otherwise the loop stalls waiting for approvals that nobody’s there to give.

Why a One-Hour Game Build Is Significant

The obvious reaction is “neat demo.” The less obvious reaction is to think about what infrastructure makes this possible.

Codex’s context window sits at approximately 256,000 tokens. Claude Code with Opus runs at around 1 million tokens. On paper, that’s a significant disadvantage for long-running sessions. In practice, people running both tools report that Codex sessions last noticeably longer for equivalent work — because GPT 5.5 is substantially more token-efficient than Opus on both input and output. The /goal loop benefits directly from this: more work gets done before the session limit bites.

Codex also auto-compacts context, the same way Claude Code does. The context window bar at the bottom of the chat shows session usage as a percentage, and the session resets every 5 hours (with a weekly reset as well). For a one-hour game build, you’re unlikely to hit the ceiling — but for the “up to days” use case the feature advertises, context management becomes the binding constraint.

The other thing that makes the game demo possible is the agents.md file — Codex’s equivalent of Claude Code’s CLAUDE.md. Every new chat in a project reads this file first. If you’ve structured it well, the agent starts each session with full project context: what’s been built, what the conventions are, what not to repeat. For a long agentic loop that might span multiple internal sessions, this file is what keeps the agent oriented. Without it, a multi-hour loop risks losing the thread.

If you’re thinking about the broader pattern here — agents that maintain state across long sessions, pick up context from structured files, and chain tools together — that’s also the territory that platforms like MindStudio occupy, where you can wire together 200+ models and 1,000+ integrations visually to build agent workflows without writing the orchestration layer yourself.

The Non-Obvious Part: Plan Mode and the Skill System

There’s a detail in how Codex is designed that the game demo illustrates indirectly.

TIME SPENT BUILDING REAL SOFTWARE

95%

5% Typing the code

95% Knowing what to build · Coordinating agents · Debugging + integrating · Shipping to production

Coding agents automate the 5%. Remy runs the 95%.

The bottleneck was never typing the code. It was knowing what to build.

Before triggering /goal, you’d want to run in Plan mode. The Plan mode toggle makes Codex brainstorm and outline without executing anything. You get a structured plan, you can push back on it, and then you switch Plan mode off to execute. For a complex build like a game, this front-loaded planning step is what prevents the agentic loop from going down wrong paths for an hour before you notice.

The image generation skill that Finn enabled is part of Codex’s Skills system. Skills are reusable markdown instruction files — stored either globally in ~/.codex/skills/ or locally within a specific project. They’re not plugins or API connections. They’re closer to recipes: structured instructions that tell Codex how to approach a category of task consistently. The global skills are available across every Codex project. Project-local skills only activate within that project’s context.

When you call a skill via slash command — /image-gen, /browser-use, /pdf, /skill-creator — you’re invoking one of these markdown files. The skill creator (/skill-creator) can reverse-engineer a skill from something Codex just did, which is how you build a library of reusable workflows over time. Do something once, get an output you like, then say “turn that into a skill.” The next time you need it, it’s a slash command away.

For the game build specifically: enabling /image-gen before /goal means the image generation instructions are active context for the entire loop. The agent knows to generate assets, knows the format, knows the conventions — because the skill file says so.

This kind of structured, spec-driven approach to agent behavior has parallels in how Remy works: you write annotated markdown that carries intent and precision, and the system compiles a complete application from it. The spec is the source of truth; the generated output is derived. The Codex skill system is a lighter version of the same idea applied to agent instructions rather than full-stack apps.

Browser Use as Built-In QA

One thing the game demo doesn’t show — but the broader Codex workflow does — is the browser use feature as an automated QA pass.

Codex has a built-in browser that it can control with mouse and keyboard. When you ask it to stress-test something with /browser-use, it opens the app in the in-app browser, moves the cursor around, clicks buttons, and documents what breaks. In one documented case, Codex’s browser use found six real bugs in a freshly built dashboard — including broken YouTube external links — without any human prompting. It filed the bugs, proposed fixes, and waited for approval to implement them.

For a game build, this matters. A one-hour autonomous loop can produce code that compiles and runs but has interaction bugs — buttons that don’t respond, collision detection that’s off, UI elements that overlap. Browser use can catch these before the loop hands control back to you. If you bake the instruction into your agents.md — “before returning any deliverable, run a browser use QA pass and fix what you find” — the loop handles this automatically.

The combination of /goal + image generation skill + browser use QA is what makes the extraction shooter demo more than a party trick. It’s a pipeline: generate, test, fix, iterate, all within the same session.

Other agents start typing. Remy starts asking.

YOU SAID "Build me a sales CRM."

REMY ASKS

01 DESIGN Should it feel like Linear, or Salesforce?

02 UX How do reps move deals — drag, or dropdown?

03 ARCH Single team, or multi-org with permissions?

Scoping, trade-offs, edge cases — the real work. Before a line of code.

This is also where the agentic workflow patterns that Claude Code has documented become relevant context. The underlying patterns — autonomous loops, self-correction, tool use — are consistent across agent harnesses. What differs is the specific tooling and the model underneath.

The Practical Setup

If you want to replicate the game demo, here’s the concrete sequence:

Enable Full Access. Go to Settings > General and switch from “Auto Review” to “Full Access.” Without this, the loop pauses for approvals.

Create an agents.md file. This is the project context file Codex reads at the start of every new chat. Include the game concept, the tech stack, any conventions you want enforced, and any known constraints. The more specific this is, the less the loop drifts.

Enable the image generation skill. Type /image-gen in chat before triggering /goal. Confirm it’s active.

Edit the TOML config to enable /goal. This is the early access gate. The TOML file lives in the Codex configuration directory — the exact path depends on your OS, but Codex will tell you where it is if you ask.

Trigger /goal with a detailed prompt. Vague prompts produce vague games. Specify the genre, the mechanics, the visual style, the win condition. The more the agent knows upfront, the less it has to guess during the loop.

Check the context window bar. It’s at the bottom of the chat and shows session usage as a percentage. If you’re approaching the limit on a long session, you’ll want to know before the loop auto-compacts and potentially loses state.

Check rate limits. Settings > Rate Limits Remaining shows what percentage of your session you have left and when it expires. The 5-hour reset means a one-hour game build is well within a single session.

One thing to watch: the model intelligence setting. Codex offers Low, Medium, High, and Extra High. Extra High is recommended only for hard bugs — it’s expensive in terms of session usage and can over-engineer simple tasks. For a creative build like a game, High is probably the right setting. Medium for planning, High for execution, Extra High only if the loop gets stuck on something it can’t solve.

The AutoResearch loop pattern — where agents autonomously run experiments, measure results, and iterate — maps reasonably well to what /goal does for code. The agent runs, evaluates its own output, and continues without waiting for human feedback at each step.

What This Points To

The extraction shooter demo is a proof of concept for a specific class of task: creative builds where the requirements are clear enough to specify upfront, the output is self-contained, and the quality bar is “functional and coherent” rather than “production-ready.”

Games fit this well. So do internal dashboards, data visualization tools, prototypes, and automation scripts. The /goal loop is less suited to tasks that require ongoing human judgment — anything where the requirements are ambiguous, where external APIs need credentials you haven’t pre-configured, or where the definition of “done” keeps shifting.

REMY IS NOT

✕a coding agent
✕no-code
✕vibe coding
✕a faster Cursor

IT IS

✓a general contractor for software

The one that tells the coding agents what to build.

The image generation integration is what makes the game demo visually compelling. Without it, you’d get a functional game with placeholder assets. With it, you get something that looks like a real game. The same pattern applies to any build where visual assets matter: enable /image-gen before triggering /goal, and the loop handles art alongside code.

For anyone building tools that need to go from idea to deployed URL — not just a local prototype — the GitHub and Vercel deployment workflow is the natural next step after a /goal session. Codex can initialize a git repo, push to GitHub, and Vercel auto-deploys on commit. The game Finn built in an hour could be live on a public URL in another ten minutes.

The /goal command is early access for a reason — it’s not polished, the TOML edit is a deliberate friction point, and multi-hour loops can go wrong in ways that are hard to debug after the fact. But the extraction shooter demo is evidence that when the setup is right, the loop produces real output. That’s worth paying attention to.

If you’re curious about how Codex compares to Claude Code on the underlying model capabilities — context window, token efficiency, and which tool lasts longer per session — the GPT-5.4 vs Claude Opus 4.6 comparison covers the model-level tradeoffs in detail. The short version: different strengths, and the right choice depends on what you’re building.

Alex Finn Built a Full Video Game in 1 Hour Using Codex's /goal Command — Here's How

Alex Finn Built a Complete Extraction Shooter in One Hour Using Codex’s /goal Command

What the /goal Loop Actually Did

Why a One-Hour Game Build Is Significant

The Non-Obvious Part: Plan Mode and the Skill System

Coding agents automate the 5%. Remy runs the 95%.

Browser Use as Built-In QA

Other agents start typing. Remy starts asking.

The Practical Setup

What This Points To

Related Articles

How Alex Finn Built a Complete Game in 1 Hour Using Codex's /goal Command

How to Deploy Hermes Agent on a VPS in Under an Hour: Step-by-Step Docker Setup Guide

Codex /goal: OpenAI's 'Ralph Loop' Feature That Ran a Device Driver Project for 14 Hours Without Stopping

GPT Realtime 2 Can Stay Silent on Command and Keep Listening — Here's Why That Changes Voice Agents