Box's CEO Is Hiring 'Agent Engineers' — The New Role That Runs AI Across Every Business System
Aaron Levy is creating internal FTE roles to wire AI agents across Salesforce, Workday, and Box. Here's what the job actually requires.
Aaron Levy Is Already Hiring for a Job That Didn’t Exist Last Year
Aaron Levy, the CEO of Box, posted something last week that deserves more attention than it got. He’s actively hiring and retraining for what he’s calling “agent engineering” roles — internal FTE positions whose entire job is to wire AI agents into the business systems his company already runs: Box, Salesforce, Workday. Not a consulting engagement. Not a pilot program. Full-time employees, on the org chart, with a defined scope.
That’s a signal worth paying attention to.
Levy isn’t describing a vague future. He’s describing something he’s doing right now, in 2025, at a company with thousands of employees and real enterprise infrastructure. And the specificity of what he wrote is what makes it interesting.
The role, as he sketched it, requires someone who is “extremely technical and capable of building secure governed agents for internal workflows” — someone who can connect business systems, codify workflows, and in some cases understand the underlying business process well enough to do the whole thing themselves. But here’s the part that caught my eye: Levy also said that in many cases, this person won’t be able to do it alone. They’ll need to work “with the businesses directly in an embedded fashion.” And that, he suspects, may introduce yet another new role — something closer to “agent product management” on the business side.
Two new roles. From one tweet. From a CEO who is actively building this.
Remy doesn't build the plumbing. It inherits it.
Other agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.
Remy ships with all of it from MindStudio — so every cycle goes into the app you actually want.
Why This Isn’t Just Another “Automation Engineer”
You’ve heard versions of this story before. Every technology wave produces a round of “new jobs created” discourse that tends to be either breathlessly optimistic or dismissively vague. The argument usually goes: yes, technology displaces some work, but it also creates new categories of work, and it’ll all balance out. The problem is that nobody ever gets specific about what those new jobs actually look like.
Levy is getting specific. And the specificity reveals something that the generic “automation engineer” framing misses entirely.
The traditional automation engineer — the person who builds RPA scripts, maintains ETL pipelines, writes integration code — is working on a relatively bounded problem. The system does X, you need it to do Y, you write the bridge. The work is technical, but the scope is defined. You’re not making judgment calls about what the business should be doing. You’re executing on a spec someone else wrote.
Agent engineering, as Levy is describing it, is different in kind. The person in this role isn’t just building the bridge. They’re deciding which rivers need crossing. They’re working across multiple teams, understanding business processes deeply enough to codify them, and then building agents that can execute on those processes autonomously. That’s a much harder job. It requires a combination of skills that doesn’t currently exist as a packaged credential anywhere.
The closest analog might be the internal “vibe coder” — someone who can translate between what a business team needs and what a technical system can do. But even that framing undersells it. Vibe coding is mostly about building tools. Agent engineering is about building autonomous systems that run business processes without continuous human oversight.
What the Job Actually Requires
Levy’s description gives you the skeleton. The flesh comes from understanding what “wiring up” Salesforce, Workday, and Box actually means in practice.
These are not simple systems. Salesforce alone has an object model that takes months to understand properly. Workday’s API surface is notoriously complex. Box has its own permissioning and governance layer. Getting agents to work across all three — passing context, respecting access controls, making decisions that touch data in multiple systems — is a genuinely hard engineering problem.
But the technical challenge is only half of it. The other half is the process problem. An agent that can query Salesforce and write to Box is useless if it doesn’t know when to do that, why, and what to do when the data doesn’t match what it expected. That’s where the “process people spanning multiple teams” part of Levy’s description comes in. You need someone who understands the business logic well enough to encode it — and who can work with the humans who currently carry that logic in their heads.
Remy doesn't write the code. It manages the agents who do.
Remy runs the project. The specialists do the work. You work with the PM, not the implementers.
This is also where the “agent product management” role starts to make sense. The agent engineer can build the system. But someone has to own the requirements. Someone has to decide what the agent should do when a Workday record conflicts with a Salesforce record. Someone has to define the edge cases. In a traditional software project, that’s the product manager. In an agentic workflow, it’s something similar — but the stakes are higher because the agent is making decisions autonomously, not waiting for a human to click a button.
If you’re building agent infrastructure at this level of complexity, the orchestration layer matters enormously. Platforms like MindStudio handle this kind of cross-system work with 200+ model integrations and 1,000+ pre-built connectors — including Salesforce and Slack — so the agent engineer isn’t stitching raw APIs together from scratch every time.
The Constraint Nobody Is Talking About
Here’s what’s missing from most of the “agent engineering” conversation: the bottleneck isn’t the technology anymore.
Cheyen Jiao put it well in response to Sam Altman’s now-famous tweet — the one where Altman shared two contrasting quotes side by side. The first: “Post-AGI, no one is going to work and the economy is going to collapse.” The second: “I’m switching to polyphasic sleep because GPT-5 and Codex is so good that I can’t afford to be sleeping for such long stretches and miss out on working.”
Jiao’s read: “The constraint isn’t model quality anymore. It’s how many hours per day you can feed at work.”
That’s the revealed preference of the person building AGI. The CEO of OpenAI is considering restructuring his sleep schedule to get more hours of productive agent use. That’s not a sign of a tool that’s going to make people work less. That’s a sign of a tool so productive that time itself becomes the binding constraint.
Levy said it more bluntly: “Sorry to anyone who thought AI would mean we’d work less, at least for now.”
For the agent engineer specifically, this creates a particular kind of pressure. Tang Yan observed something important about how agent work actually drains people: it’s not the typing, it’s the judgment. Reviewing outputs, catching errors, making decisions about what to run next, context-switching between multiple agent threads — that kind of cognitive load burns through your capacity faster than traditional work. Yan’s estimate: instead of 8-10 normal productive hours, you might get 4-5 extremely intense hours before your brain is fully cooked.
An agent engineer managing a fleet of agents across Salesforce, Workday, and Box isn’t doing one thing. They’re monitoring multiple processes simultaneously, evaluating outputs, debugging failures, and making judgment calls about what to fix and how. That’s cognitively expensive in a way that’s hard to anticipate if you’ve never done it.
The Taxonomy Taking Shape
Levy’s tweet is the most concrete public signal, but it’s not the only one. The broader pattern suggests a whole cluster of new roles forming around agent infrastructure.
Some of them are technical. Agent ops engineers who keep the fleets running — monitoring uptime, handling failures, managing token costs. Context librarians whose job is to curate what agents know: what documents they have access to, what’s current, what’s permissioned for which workflows. Eval engineers who build quality gates rather than assuming every agent output is correct.
Remy is new. The platform isn't.
Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.
Some of them are coordination roles. When you have agents running across multiple departments, each unlocking different parts of the business’s backlog, someone has to make sure they’re not working at cross-purposes. Coordination architects who design how everything stays legible. Information pipeline owners who route signals to the right places.
And some of them are strategic. Experiment portfolio managers who decide which agentic projects to scale, which to kill, and which to merge. This is essentially a VC function applied to internal agent deployments — and it’s a role that doesn’t exist anywhere on a current org chart.
The through-line across all of these is that they’re not about replacing human judgment. They’re about structuring human judgment so it can operate at a higher level of abstraction. The agent does the execution. The human decides what to execute, evaluates whether it worked, and figures out what to do next.
For teams building toward this kind of infrastructure, the multi-agent system comparison between Paperclip and OpenClaw is a useful reference point — the architectural choices you make early will shape what your agent ops function looks like later.
What “Wiring Up” Actually Looks Like in Practice
One concrete example of what this work involves: the open-source tool Paperclip, which describes itself as “an orchestration layer for zero human companies,” generated enough excitement recently that at least one person — Abdul Khadir — stayed up all night after discovering it at 1 a.m., unable to stop thinking about what it unlocked for his business. That’s not a marketing claim. That’s someone encountering a new capability and immediately seeing the backlog of things it makes possible.
The Paperclip multi-agent setup with Claude Code gives you a sense of what that orchestration actually looks like in practice — a CEO agent, an engineer agent, and a QA agent running in parallel with heartbeat scheduling. It’s a toy version of what Levy is describing, but the architecture is the same: multiple agents, coordinated, running against a shared objective.
The difference between that toy version and what Box is building is the enterprise context. Salesforce data. Workday records. Governed access controls. Audit trails. The technical complexity scales significantly, but the underlying pattern — agents coordinated around a business process, with humans evaluating outputs and making judgment calls — is the same.
For teams thinking about how to build toward this, the question of how to keep agents running reliably is non-trivial. The guide to keeping Claude Code agents running 24/7 covers some of the practical infrastructure considerations — the kind of thing an agent ops engineer would own.
When the agents are generating code or building internal tools as part of their workflow, the question of what to do with that output matters too. Tools like Remy take a different approach to this layer: you write a spec — annotated markdown — and a complete full-stack application gets compiled from it, TypeScript backend, database, auth, deployment and all. For an agent engineer building internal tooling on top of agent outputs, that’s a meaningfully different workflow than hand-coding each integration.
What This Means If You’re Hiring (or Being Hired)
Hire a contractor. Not another power tool.
Cursor, Bolt, Lovable, v0 are tools. You still run the project.
With Remy, the project runs itself.
The honest answer is that nobody knows exactly what this role will look like in two years. Levy himself said as much — he’s in the early innings, figuring it out as he goes. But the shape is clear enough to act on.
If you’re building this function inside a company, the profile you’re looking for is genuinely unusual. Technical enough to build secure, governed agents. Process-literate enough to understand the business workflows those agents will run. Cross-functional enough to work across sales, HR, finance, and operations without losing credibility in any of them. And humble enough to know that the agent’s output needs to be evaluated, not trusted blindly.
That last part is underrated. One of the failure modes Levy is implicitly guarding against is the assumption that agents always get it right. They don’t. An agent wired into Workday that makes a wrong decision about an employee record isn’t a minor bug. It’s a compliance problem. The eval function — building quality gates, catching errors before they propagate — is as important as the build function.
If you’re thinking about this as a career path, the combination of skills that makes someone effective in this role — technical depth, business process understanding, cross-functional communication, judgment under uncertainty — is the same combination that makes someone effective as a founding engineer at a startup. That’s not a coincidence. The AI daily brief framing of “agents make every job a startup” is most literally true for the people building and running the agent infrastructure itself.
The AI agents for marketing teams post is a useful reference for what this looks like in one specific domain — the kinds of agents that are already running, and the kinds of human judgment they still require. The agent engineer’s job is to build and maintain those systems. The agent PM’s job is to decide what they should be doing.
The Org Chart Is About to Get Weird
Levy’s tweet is a small thing. One CEO, one company, a few new roles. But the pattern it represents is going to show up everywhere over the next 18 months.
Every enterprise that runs on Salesforce, Workday, SAP, or any other major business system is going to face the same question: who owns the agents that run on top of these systems? Who builds them, maintains them, evaluates their outputs, and decides what they should do next? Right now, that work is being done ad hoc — by whoever is most technical on the team, or by external consultants, or not at all.
The companies that figure out how to staff this function properly — not just technically, but with the process and coordination layer Levy is describing — are going to have a significant advantage. Not because agents are magic, but because the bottleneck is no longer the technology. It’s the human infrastructure around it.
That’s the thing Levy understood when he wrote that tweet. He’s not hiring agent engineers because the technology is impressive. He’s hiring them because without them, the technology doesn’t actually do anything useful. The agents need someone to wire them up, point them at the right problems, and make sure they’re getting it right.
That job exists now. It just doesn’t have a name yet at most companies. Box is working on that.