ChatGPT Codex Hidden Features: 6 Capabilities Most Users Don't Know Exist

Most Codex Users Are Running It Wrong — Here Are 6 Features That Change That

ChatGPT Codex shipped with a capability most users walk right past: you can run multiple tasks concurrently across separate chat windows, not sequentially like standard ChatGPT. That single fact reframes what the tool is. While you’re waiting on a Word document summary in one window, a second window is already generating Instagram carousel images from a different file. This isn’t a minor UX detail — it’s the difference between a chatbot and an actual work environment. And before you burn through your credits finding that out the hard way, there’s a usage monitor sitting at File > Settings > Usage that most people never open.

Six features in Codex are doing real work that most users haven’t touched. Here’s what they are and how they actually behave.

The Concurrent Task Engine Nobody Talks About

Standard ChatGPT is a turn-based conversation. You ask, it answers, you ask again. Codex breaks that model entirely.

You can open multiple chat windows inside Codex and run separate tasks in each — simultaneously. One window is summarizing two transcript files into a Word document (that took about 5 minutes in testing). Another is generating four Instagram carousel images from an existing Word doc (also around 5 minutes). Neither is waiting on the other.

Not a coding agent. A product manager.

Remy doesn't type the next file. Remy runs the project — manages the agents, coordinates the layers, ships the app.

BY MINDSTUDIO

This matters most when you have a batch of related but independent jobs. A content creator, for instance, might run a document summary in window one, a revenue model spreadsheet in window two, and a social media image set in window three — all at once. The total wall-clock time is the longest single task, not the sum of all three.

The practical implication: Codex isn’t a replacement for ChatGPT in conversation. It’s a replacement for the mental overhead of managing multiple AI sessions manually. If you’re evaluating how Codex compares to other frontier models for this kind of agentic work, the GPT-5.4 vs Claude Opus 4.6 comparison breaks down where each model’s strengths actually land.

Usage Monitoring: The Tab You Need to Check First

Codex runs on GPT-5.5, and GPT-5.5 burns credits faster than most users expect — especially when the agent is doing multi-step work inside a project folder.

The usage monitor is at File > Settings > Usage. It’s not surfaced prominently anywhere in the main interface, which is why most people miss it until they’ve already run through a significant chunk of their allocation.

The reasoning level you set directly affects how fast that meter moves. Codex offers four levels: low, medium, high, and extra-high. For standard document work — summaries, spreadsheets, content generation — medium is the right default. For anything involving code on the backend, like building a local web app, high or extra-high is worth the cost. Running extra-high reasoning on a simple email summary is just waste.

Check the usage tab before you start a heavy session. Set a reasoning level that matches the actual complexity of the task. These two habits will extend your effective working time significantly.

The Settings Switch That Changes How Codex Talks to You

There’s a setting buried in File > Settings that most new users never find: a toggle between “coding mode” and “everyday work.”

By default, Codex leans toward technical output — the kind of responses that make sense if you’re a developer reading stack traces and function signatures. Uncheck coding mode and enable everyday work, and the responses shift to plain language without losing the underlying capability. The agent is still doing the same work; it’s just not narrating it in developer syntax.

This is the first thing to configure before you do anything else. A non-technical user who opens Codex in default mode and gets back a wall of code commentary will reasonably conclude the tool isn’t for them. It is for them — the setting is just wrong.

Project Folders: Working With Files That Already Exist

Codex can work inside a local folder on your computer, reading and writing files directly. This is the feature that separates it most clearly from a standard chat interface.

The setup is straightforward: click the project area in the sidebar, add a new project, and point it at a folder. Codex loads whatever files are in that folder and can reference them in any task you give it.

In practice, this means you can drop two transcript files into a folder, ask Codex to summarize both and produce a Word document, and come back five minutes later to find the document sitting in the same folder. No copy-pasting. No uploading files into a chat window. The output lands where your other files already live.

One coffee. One working app.

You bring the idea. Remy manages the project.

WHILE YOU WERE AWAY

✓Designed the data model

✓Picked an auth scheme — sessions + RBAC

✓Wired up Stripe checkout

✓Deployed to production

Live at yourapp.msagent.ai

The same folder context powers image generation. In one documented session, a Word document about GPT-5 workspace agents was already in the project folder. The prompt was simply: create four Instagram carousel images based on this document. Codex read the existing file, generated the images, and wrote them back to the folder — no new uploads, no re-explaining the content. The whole run took around five minutes.

This is also where the permissions model becomes relevant. Default permissions will prompt you to approve certain file operations as they happen. Full access lets Codex work autonomously through the entire task and surface the completed output when it’s done. For long multi-step jobs where you want to walk away and come back, full access is the right choice.

Skills: Reusable Workflows With a Single @ Call

Skills are Codex’s answer to the problem of re-explaining your preferences every session.

A skill is a reusable, task-specific capability — a set of instructions that gets applied automatically whenever you invoke it. You create one via the sidebar, under Plugins > Create Skill. Once it exists, you call it in any prompt with an @ mention.

The concrete example from testing: a Twitter skill that enforces a 240-character limit, requires plain language at roughly a 90 IQ reading level, and prohibits hashtags, emojis, and jargon unless explicitly requested. Once that skill is created, the prompt to generate ten tweets about the GPT-5.5 announcement is just: “Create 10 tweets about the recent GPT-5.5 announcement — @plain Twitter post.” Codex applies the full style guide automatically. The output came back in one minute and forty-nine seconds, formatted correctly, no additional prompting required.

The distinction worth understanding: a skill isn’t a system prompt or a “please behave this way” instruction. It’s closer to a workflow definition — when I ask for this type of job, follow this exact process. That makes it composable with automations, which is where the real leverage appears.

If you’re building more complex agent workflows that need to run across multiple tools and models, MindStudio takes a similar composability approach at a larger scale — 200+ models, 1,000+ integrations, and a visual builder for chaining agents together without writing orchestration code.

Automations: Scheduled Tasks That Run Without You

Automations are in the sidebar under the Automations tab. They’re scheduled tasks — things Codex does on a recurring basis without any manual trigger.

The setup is a new automation with a name, an optional project folder, a schedule (daily, weekly, specific day and time), and a prompt describing what to do. A simple example: every Sunday at 9:00 a.m., summarize the video transcripts in the Codex demo folder and produce a weekly update document.

Where it gets more interesting is when you chain an automation with the Gmail plugin. The automation prompt becomes: summarize the video topics I’ve worked on this week, then use @Gmail to send the summary to [email address]. Now you have a scheduled marketing report that writes itself and delivers itself, every Friday, without anyone touching it.

This is the feature most worth thinking carefully about before you set it up. An automation running on full access permissions, connected to your email, will actually send those messages. Test with a low-stakes email address first. Set the reasoning level to medium or low for recurring summary tasks — there’s no reason to burn extra-high reasoning credits on a weekly digest.

The AI agents for personal productivity use case that people describe in theory — “I want AI to handle my recurring admin” — is what Codex automations actually deliver in practice. The scheduling is real, the Gmail connection is real, and the output lands in your inbox without any manual steps.

The Gmail Plugin: Your Inbox as a Data Source

The Gmail plugin installs via the sidebar: Plugins tab > Gmail > Install Plugin. After OAuth authentication, Codex can read your inbox and act on it.

The immediate use case is triage. Ask Codex what’s urgent in your email, and it returns a structured breakdown: what needs attention now, what needs attention soon, what’s worth noting. That’s useful on its own, but it’s a starting point.

The more interesting application is combining Gmail with project folders and automations. Codex can read email, extract relevant information, write that information into a document in your project folder, and then — if you’ve set up the automation — email a summary back out. The loop closes without you in it.

Draft replies are also supported. Codex can write draft responses based on email context and stage them in your Gmail drafts folder, where you review and send them manually. That’s a reasonable middle ground between full autonomy and doing everything yourself.

Building Actual Apps, Not Just Documents

One use case that doesn’t fit neatly into the “document work” framing: Codex can build local web applications.

The example from testing was a valuation tracker — a local web app that displays OpenAI versus Dropbox funding and valuation rates over time, with events and updates plotted on a timeline. The prompt was basic. The output was a functional, interactive web app that opened in a browser.

This is where the high and extra-high reasoning levels earn their cost. A simple document summary doesn’t need extra-high reasoning. A web app with data visualization and interactive filtering does. The reasoning level setting isn’t just about quality — it’s about matching the cognitive complexity of the task to the compute you’re spending.

For teams that want to go further — from a Codex-generated prototype to a production-grade full-stack application — Remy takes a different approach: you write a spec in annotated markdown, and it compiles a complete TypeScript backend, SQLite database, auth layer, and deployment from that spec. The spec is the source of truth; the code is derived output. That’s a different abstraction layer than Codex, but the direction of travel is the same.

What This Actually Adds Up To

The six features here — concurrent task execution, usage monitoring, the everyday work setting, project folders, Skills, and Automations — aren’t independent tricks. They’re a system.

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

Concurrent tasks mean you’re not bottlenecked by sequential processing. Usage monitoring means you’re not surprised by credit depletion mid-project. The everyday work setting means the output is readable without a technical background. Project folders mean Codex works with your existing files rather than requiring you to re-upload everything. Skills mean you stop re-explaining your preferences every session. Automations mean recurring work happens on a schedule without manual triggers.

Most Codex users are using one or two of these. The people getting the most out of it are using all six together — and the gap in output between those two groups is significant.

The Claude Code hidden features post covers a similar pattern on the Anthropic side: the most useful capabilities in these agentic tools are rarely the ones in the marketing copy. They’re the ones you find by actually running the tool through real work and paying attention to what it does. That same principle applies to understanding how Claude Code handles batch commands and built-in slash workflows — the surface-level feature list undersells what’s actually available once you dig in.

Codex is not a coding assistant with a misleading name. It’s a work environment that happens to be able to write code when you need it to. The sooner you configure it that way, the more useful it becomes.