Skip to main content
MindStudio
Pricing
Blog About
My Workspace

What Is a Dark Factory? The AI Coding Pattern That Ships Code Without Human Review

A dark factory is a codebase managed entirely by AI agents. Learn the five levels of AI coding autonomy and how to build one responsibly.

MindStudio Team RSS
What Is a Dark Factory? The AI Coding Pattern That Ships Code Without Human Review

The Codebase That Runs Itself

Most teams using AI coding tools today are still in the co-pilot phase. A developer writes a prompt, reviews the output, tweaks it, commits it. The AI assists. The human decides.

A dark factory flips that model entirely. In a dark factory, AI agents write code, test it, review it, and ship it — without a human ever touching the pull request. The lights are off. Nobody’s home. Code just keeps moving.

The term comes from manufacturing. A “dark factory” in industrial settings is a fully automated plant that can run without human workers, literally in the dark, because there’s nobody there who needs the lights on. The same logic applies to software: a dark factory codebase is one where AI agents handle the full development loop autonomously.

This isn’t a future concept. Teams at companies like Stripe are already generating over 1,300 AI-authored pull requests per week through structured agent systems. The question isn’t whether dark factories are possible — it’s how autonomous you should make yours, and how to do it without things going sideways.

This article covers what a dark factory actually is, the five levels of AI coding autonomy that lead up to it, what makes one safe to run, and where Remy fits into the picture.


What a Dark Factory Actually Is

A dark factory is a codebase where the full software development lifecycle — writing, testing, reviewing, and deploying code — is managed by AI agents without requiring human sign-off on individual changes.

It’s not just “AI writes some code.” It’s AI writing code, running tests, interpreting the results, fixing failures, opening pull requests, passing them through automated review, and merging them. A human may have set up the system and defined the goals, but they’re not in the loop for each change.

The concept is closely related to what some builders call fully autonomous software pipelines — multi-agent architectures where one agent plans, another generates, another validates, and the whole system self-corrects.

What makes it different from regular automation

Regular automation executes fixed, pre-defined steps. A CI/CD pipeline runs the same tests the same way every time. If something unexpected happens, it fails and waits for a human.

A dark factory uses AI agents that can reason. They respond to novel situations, adapt their approach, and make judgment calls — like a developer would, but without stopping to ask for help. The distinction between agentic workflows and traditional automation is exactly this: agents can handle ambiguity; automation can’t.

What a dark factory is not

  • It’s not a one-shot AI code generator like GitHub Copilot completing a function.
  • It’s not a prompt-to-prototype tool that builds a UI you then hand-edit.
  • It’s not a chatbot that writes code snippets on request.

A dark factory is an ongoing, operational system. It takes goals or tasks as inputs and produces shipped, tested, deployed code as outputs — continuously.


The Five Levels of AI Coding Autonomy

Not every team needs or wants a fully autonomous dark factory. Autonomy exists on a spectrum. Here’s a practical framework for thinking about the five levels, from assisted to fully autonomous.

Level 1: AI-Assisted (Human drives everything)

A developer uses an AI coding tool to generate suggestions, complete functions, or draft boilerplate. Every line gets reviewed before it’s committed. The human is in full control; the AI is just a faster keyboard.

Tools: Copilot, Cursor, inline completions.

Level 2: AI-Generated with Human Review

The AI writes larger chunks — full files, entire features — but a developer reviews every pull request before merging. The AI does the drafting. The human does the approval.

This is where most teams using AI coding agents sit today. It’s a significant productivity gain, but still human-gated.

Level 3: AI-Generated with Automated Review Gates

The AI writes code, and automated systems handle most of the review: test suites, linters, type checkers, security scanners. Humans only intervene when automated checks fail or when a change exceeds a defined risk threshold.

This is where AI agent harnesses become essential. The harness wraps the AI in guardrails — it defines what the agent can and can’t touch, what constitutes a passing result, and when to escalate.

Level 4: Mostly Autonomous with Human Escalation

The AI handles the full loop — write, test, fix, merge — for a defined scope of work. Humans are notified of what shipped but don’t review individual PRs. The system escalates to a human only when it hits something genuinely outside its boundaries: a new API it doesn’t have access to, a test category it can’t satisfy, a conflict it can’t resolve.

Level 5: Full Dark Factory (No Human in the Loop)

The AI runs the full development cycle end-to-end. It interprets goals, breaks them into tasks, assigns them to sub-agents, writes and tests code, resolves failures, and ships. Humans define the goal and the system boundaries. The code ships itself.

This is the true dark factory. It’s possible today for scoped, well-defined problem spaces. It’s genuinely risky for anything touching user data, production infrastructure, or novel business logic.


How a Dark Factory Actually Works

A dark factory isn’t a single AI model writing code. It’s a coordinated system of specialized agents, each with a defined role.

The core components

Planner agent — Takes a goal or task description and breaks it into concrete, actionable subtasks. This is the highest-level reasoning step.

Generator agent — Writes the code for each subtask. This is usually the most inference-heavy step.

Validator agent — Runs tests, checks types, analyzes output for correctness. Acts as the internal reviewer. This mirrors the planner-generator-evaluator pattern — a GAN-inspired architecture where one agent builds and another critiques.

Orchestrator — Coordinates the other agents, manages state, decides when to retry vs. escalate. Agent orchestration is genuinely one of the hardest problems in this space.

Deployment layer — Handles the mechanical steps of committing, pushing, and deploying once validation passes.

How they coordinate

Agents in a dark factory don’t just run sequentially. Effective architectures use parallelism — multiple agents working on different tasks simultaneously, then merging results. The split-and-merge pattern is one common approach: a planner splits work into parallel branches, sub-agents execute them independently, and a merge step reconciles the outputs.

Git worktrees make this practical. Each agent branch works in isolation, so agents don’t clobber each other’s changes.

What keeps it from going off the rails

This is the crux. An AI agent that can write and ship code without review can also write and ship bad code without review. The answer isn’t to avoid autonomy — it’s to build the right constraints in.

The key pattern is building workflows that control the agent rather than letting the agent control the workflow. The agent executes within a defined boundary. The boundary defines what tools the agent has access to, what it can write to, what constitutes a valid output, and when it must stop and wait.


The Safety Problem You Can’t Skip

Dark factories introduce real risk. An agent that can merge code can also merge code that deletes things, breaks APIs, or introduces security holes — and do it faster and more quietly than a human developer would.

This isn’t hypothetical. There are documented cases of AI agents causing serious damage in production environments, including a 1.9 million row database wipe that happened because an agent had write access it shouldn’t have had.

The principle of progressive autonomy

The practical answer is progressive autonomy: start with narrow, low-risk permissions and expand them only after the system proves it handles that scope correctly.

You don’t hand a new agent system the keys to production on day one. You start it on read-only tasks, then write-to-branch tasks, then write-to-staging, then production — each step gated by demonstrated reliability at the previous level.

What to constrain

  • Scope: Define exactly which parts of the codebase an agent can touch.
  • Tools: Only give the agent access to tools it actually needs. An agent writing frontend code doesn’t need database write access.
  • Blast radius: Ensure any single agent failure can’t take down the whole system. Isolated environments per agent, rollback-ready deployments.
  • Logging: Every agent action should be logged. You may not review every PR, but you need a full audit trail when something goes wrong.

For a practical breakdown, 5 rules for preventing data loss with AI agents is worth reading before you give any agent write permissions.


Building a Dark Factory: What You Actually Need

Setting up a dark factory isn’t just about picking an AI model and pointing it at your repo. It requires a few distinct layers working together.

A harness

The harness is the structured wrapper around the AI agent. It defines the task format, the tool access, the output contract, and the evaluation criteria. Without a harness, you have an agent that can do anything — which means it will eventually do the wrong thing.

Stripe’s Minions system and Shopify’s Roast are good reference points for how enterprise teams approach this. Both define strict schemas for what agents can do and what a valid output looks like. The differences between these approaches are instructive if you’re designing your own.

A validation layer

Automated testing isn’t optional at Level 4+. If you’re not running tests, you have no automated way to know whether the agent’s output is correct. The builder-validator chain pattern — where a separate agent reviews and critiques the generator’s output before it’s accepted — is one reliable approach.

An orchestration layer

Someone (or something) needs to manage the overall task queue, assign work to agents, track state, and handle retries. This is the orchestration problem, and it’s harder than it sounds. State management across multiple agents, handling partial failures, dealing with conflicting edits — these are all non-trivial.

Open source frameworks like Paperclip exist specifically to handle multi-agent coordination at this level.

A headless execution environment

Dark factories run without a terminal open. That means your agents need to operate in headless mode — triggered by events, running in the background, completing tasks without interactive prompts. Claude Code headless mode is one implementation of this for AI coding specifically.


What Dark Factories Are Actually Good For

Not every software task belongs in a dark factory. The pattern works best for:

High-volume, well-defined tasks — Think migrations, refactors, dependency updates, boilerplate generation, test writing. These have clear success criteria and low ambiguity. If a human reviewer would say “yep, looks right” in 30 seconds, an automated validator probably can too.

Repetitive patterns across a large codebase — If you need to apply the same change to 200 files, a dark factory is far faster and more consistent than a human doing it manually.

Continuous maintenance — Security patches, dependency bumps, linting fixes. Work that’s important but tedious.

Background feature development — For clearly-scoped features in stable, well-tested parts of a codebase, a dark factory can draft and test the implementation while a developer works on something else.

Where to keep humans in the loop

Novel business logic — If the agent has to make a judgment call about product behavior, a human should make that call.

Changes with high blast radius — Anything that touches authentication, payments, user data, or core infrastructure.

Ambiguous requirements — If a task description could be interpreted multiple ways, the agent will pick one. Often it’ll pick the wrong one.

The goal isn’t zero human involvement. It’s human involvement at the right level — on decisions that require human judgment, not on mechanical work that doesn’t.


How Remy Fits Into This Picture

Remy approaches this problem from a different angle than most AI coding tools. Rather than asking an AI agent to figure out the right code from a vague prompt, Remy starts with a spec — a structured, annotated markdown document that defines exactly what the application does.

The spec carries the precision: data types, edge cases, validation rules, business logic. The AI compiles that spec into working full-stack code — backend, database, auth, deployment included. The spec is the source of truth. The code is derived output.

This matters for dark factory architectures because it changes what the agent is doing. Instead of reasoning from scratch about what code to write, the agent is compiling a well-defined specification. That’s a more constrained, more reliable operation — closer to a compiler than a co-pilot.

When requirements change, you update the spec and recompile. The agent doesn’t need to infer intent from a chat history or diff a complex codebase. The spec tells it exactly what the application is supposed to do.

For teams building toward Level 4 or Level 5 autonomy, a spec-driven approach removes a major source of agent error: ambiguity about intent. The spec makes intent explicit. The agent’s job is execution, not interpretation.

You can try Remy at mindstudio.ai/remy.


Frequently Asked Questions

What is a dark factory in software development?

A dark factory is a software development setup where AI agents handle the full development cycle — writing, testing, reviewing, and deploying code — without human sign-off on individual changes. The term comes from manufacturing, where a “dark factory” is a fully automated plant that operates without human workers. Applied to software, it means a codebase that ships code autonomously.

Is a dark factory the same as continuous deployment?

No. Continuous deployment automates the delivery of code after a human approves it. A dark factory automates the creation and review of that code as well. CI/CD is a component of a dark factory, but CD alone doesn’t make it autonomous — someone still wrote and approved the code that gets deployed.

How do you prevent a dark factory from shipping bad code?

Through constraints, not trust. Key safeguards include: automated test suites that must pass before any merge, agent harnesses that limit what the agent can write to, isolated execution environments that prevent cross-agent contamination, full audit logs of every agent action, and staged rollout of permissions (starting with low-risk scopes and expanding only after demonstrated reliability). The progressive autonomy model is the practical framework for doing this safely.

What’s the difference between a dark factory and an AI agent harness?

A harness is a component within a dark factory — the structured wrapper that constrains what an agent can do and defines what a valid output looks like. A dark factory is the broader system that coordinates multiple agents, harnesses, validation layers, and deployment infrastructure to ship code end-to-end. Think of the harness as one building block; the dark factory is the whole assembly.

Do you need to be a large company to run a dark factory?

No. The core pattern is accessible at any scale. Small teams can run effective Level 3 or Level 4 automation with open-source tools and a modest infrastructure budget. The complexity scales with the scope of what you’re automating. A team of three running a well-scoped dark factory for dependency management and test generation is entirely practical today.

What types of tasks should stay out of a dark factory?

Any task where the agent would need to make a judgment call about product behavior, user experience, or business logic without clear, machine-verifiable success criteria. Changes touching payments, authentication, or sensitive user data should also stay human-reviewed, at least until your validation layer is mature enough to catch failures reliably. When in doubt, keep humans in the loop and expand autonomy incrementally.


Key Takeaways

  • A dark factory is a codebase managed entirely by AI agents — code gets written, tested, reviewed, and shipped without individual human approval.
  • Autonomy exists on a five-level spectrum, from AI-assisted (human reviews everything) to fully autonomous (agents handle the entire loop).
  • The core architecture involves a planner, generator, validator, and orchestrator — each with a defined role and constrained scope.
  • Safety requires progressive autonomy: narrow permissions, automated validation, full audit logging, and expanding scope only after demonstrated reliability.
  • Spec-driven development, as in Remy, reduces agent error by making intent explicit — agents compile a specification rather than infer intent from ambiguous prompts.
  • Dark factories work best for high-volume, well-defined tasks with clear success criteria. Novel logic and high-risk changes still benefit from human judgment.

If you’re building toward more autonomous development workflows — whether that’s Level 3 automation or a full dark factory — try Remy to see how spec-driven development changes the reliability of what agents produce.

Presented by MindStudio

No spam. Unsubscribe anytime.