Four Types of AI Agents Explained: Coding Harnesses, Dark Factories, Auto Research, and Orchestration

Why “AI Agent” Means Four Very Different Things

The term “AI agent” gets applied to everything from a simple chatbot that books meetings to a complex multi-agent system that processes thousands of documents overnight. That vagueness isn’t just sloppy vocabulary — it causes real engineering failures.

When teams treat all AI agents as structurally equivalent, they build systems that don’t fit their actual needs. A coding harness built like an orchestrator will be slow and expensive. A dark factory designed like an auto research agent will produce unreliable outputs. The wrong architecture for the job isn’t just inefficient — it typically fails outright.

This article breaks down the four distinct types of AI agents used in production multi-agent workflows: coding harnesses, dark factories, auto research agents, and orchestration agents. For each, you’ll see how it works, what it’s built for, and — just as important — when it’s the wrong choice.

Why Architecture Matters More Than Model Choice

Most conversations about AI agents focus on which model to use: GPT-4o, Claude 3.5 Sonnet, Gemini 2.0. That matters, but it’s secondary to architecture.

The type of agent you build shapes everything else: how it reasons, what tools it can access, how it handles errors, and whether it needs human oversight. Two agents running the same model but built as different types will behave completely differently — and succeed or fail for completely different reasons.

There’s also a cost dimension. Orchestration agents, for example, spawn subagents and make many LLM calls per task. Running an orchestration architecture for a simple coding task burns tokens unnecessarily and adds latency with no benefit. Matching architecture to task isn’t just about capability. It’s about efficiency.

Researchers studying multi-agent system design have found that architectural choices — including how agents are structured, how they communicate, and how they handle errors — frequently matter more than raw model capability for task completion. Let’s go through each type.

Coding Harnesses: Agents That Work Inside Development Environments

What They Are

A coding harness is an AI agent that operates within a bounded technical environment — typically a codebase. It can read and write files, run tests, execute terminal commands, and iterate on its own output.

The defining characteristic is a tight feedback loop with a deterministic execution environment. The agent writes code, runs it, sees what breaks, and revises. It doesn’t need to reason abstractly about whether its output is “good” — the tests pass or they don’t.

The word “harness” here is deliberate. A test harness in software development is the scaffolding that lets you execute code and observe results. A coding harness agent operates within exactly that kind of scaffolding.

How They Work

Coding harnesses typically have access to a defined set of tools:

File system access — read, write, create, and delete files
Terminal and shell execution — run scripts, install packages, execute builds
Test runners — run automated tests and capture output
Code search — search across a codebase for relevant functions or patterns
Version control — commit, branch, and diff via git

The agent loops through a plan-execute-observe cycle: decide what to do, do it, check the result, adjust. Modern examples include Claude Code, GitHub Copilot Workspace, and Devin. What distinguishes them from a simple “code completion” tool is that they act over multiple steps, handling real consequences (broken builds, failing tests) and adapting accordingly.

When to Use a Coding Harness

The task involves writing, editing, or debugging code
There’s a clear success condition you can test programmatically
The environment is deterministic — the same input should produce consistent output
The scope is bounded to a single repository or project

When Not To Use One

Coding harnesses struggle outside their native environment. Ask one to “research competitors and write a market analysis” and it will either fail or produce something useless. The tight technical feedback loop that makes it powerful inside a codebase becomes a liability outside it.

They also don’t coordinate well with other agents. A coding harness is designed to work alone in its lane. If your task requires multiple agents handling different parts of a problem, you need something else.

Dark Factories: Fully Automated, Humanless Pipelines

What They Are

The term “dark factory” comes from manufacturing. A lights-out facility — sometimes called a dark factory — is so automated it can run without lights because there are no humans present to need them.

In AI, a dark factory agent is a fully automated pipeline that processes work at scale without human involvement. It receives inputs (documents, records, data feeds), processes them through a defined set of steps, and produces outputs — often thousands of items at a time — without anyone watching.

These aren’t conversational agents. They don’t wait for a user’s next message. They run, finish, and stop — or loop indefinitely on a schedule.

How They Work

Dark factory agents are typically structured as:

Input ingestion — pulling from a data source (email inbox, database, file storage, API feed)
Processing steps — one or more AI or non-AI operations per item (classify, extract, summarize, transform)
Output routing — writing results to a database, triggering downstream systems, sending notifications
Error handling and logging — since there’s no human to catch mistakes, robust error handling must be built in from the start

The AI component might be a single LLM call per item, or it might be a mini-pipeline with several steps. The key is that the entire thing runs unattended. Reliability and throughput are what matter. Flexibility is secondary.

When to Use a Dark Factory

Dark factories are right when:

Volume is high and inputs are structurally similar — classifying thousands of support tickets, extracting data from thousands of contracts, summarizing hundreds of research papers
Processing is scheduled or event-driven — nightly data enrichment, weekly report generation, processing every new record that hits a database
The task has known, consistent structure — inputs are similar enough that the same processing logic works reliably across all of them
Human review isn’t required per item — the downstream use case tolerates a small error rate, or errors are caught at a sampling stage

When Not To Use One

Dark factories break down when tasks require dynamic judgment or inputs with high variance. If every item requires a different approach, or if the agent needs to decide mid-task what to do next, a rigid pipeline will produce unreliable outputs on the edge cases — and often fail silently.

They also don’t work for exploratory tasks. A dark factory can extract structured information from legal contracts, but it can’t “figure out what information matters.” That requires a more adaptive agent type.

Auto Research Agents: Autonomous Information Gathering

What They Are

An auto research agent browses, reads, and synthesizes information autonomously. It starts with a question or goal, decides where to look, retrieves information from multiple sources, evaluates relevance, and produces a coherent output — without a human directing each step.

This is meaningfully different from a dark factory. A dark factory processes a known set of inputs. An auto research agent decides what inputs to retrieve based on what it finds along the way. The retrieval path is dynamic.

It’s also different from a standard RAG (retrieval-augmented generation) system. RAG retrieves from a pre-indexed, static knowledge base — you know where to look before you start. An auto research agent can search the web, follow links, query multiple sources, reformulate its queries based on interim findings, and change direction entirely if the initial approach isn’t working.

How They Work

A typical auto research agent loop:

Goal decomposition — break the research question into sub-questions
Search and retrieval — query search engines, read web pages, pull documents, query APIs
Evaluation — assess whether retrieved information is relevant and sufficient
Follow-up retrieval — if not sufficient, decide what to look for next
Synthesis — compile findings into a structured output
Validation — check for contradictions, gaps, or low-confidence claims

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

The agent exercises real judgment at each step. It might decide a source isn’t credible and look elsewhere. It might discover that the original question was framed wrong and reformulate it. This adaptive loop is what makes auto research agents powerful — and also what makes them more expensive and harder to control than other types.

When to Use an Auto Research Agent

The answer requires gathering from multiple sources, not one
The retrieval path is uncertain upfront — you don’t know exactly where to look until you start
The output requires synthesis, not just extraction — comparing, reasoning across sources, drawing conclusions
The research domain changes frequently enough that a static knowledge base won’t stay current

Common use cases: competitive intelligence, literature reviews, due diligence, real-time market monitoring, and investigative research across web sources.

When Not To Use One

Auto research agents are expensive and slow. They make many LLM calls and many web requests per task. Don’t use one when information is available in a structured database, or when a simple RAG query would suffice.

They also require careful handling of source quality. An auto research agent that can read any web page will hit low-quality, misleading, or outright false content. Without guardrails, it can synthesize misinformation confidently. This is the agent type that most needs human review before outputs are used in decisions.

Orchestration Agents: Agents That Manage Other Agents

What They Are

An orchestration agent coordinates other agents. It receives a complex goal, breaks it into tasks, assigns those tasks to specialized subagents, collects results, and assembles a final output.

Think of it as a project manager that delegates to a team. The orchestrator doesn’t do the individual work — it decides what work needs to be done, routes it to the right specialist, and handles dependencies between tasks.

How They Work

Orchestration typically follows this pattern:

Task decomposition — break the high-level goal into discrete, assignable subtasks
Agent selection — choose the right subagent or tool for each subtask
Parallel or sequential execution — run subtasks simultaneously where possible, handle dependencies where they exist
Result collection — gather outputs from subagents
Integration — synthesize outputs into a coherent final result
Error handling — decide what to do when a subagent fails or returns unexpected output

The subagents can be any of the other types: coding harnesses, dark factory pipelines, research agents, or even other orchestrators (for hierarchical multi-agent systems). The orchestrator’s job is coordination, not execution.

When to Use Orchestration

A single task requires multiple distinct capabilities — e.g., researching a topic, writing copy based on research, generating images to accompany that copy, and formatting everything for publication
Subtasks can be parallelized — running them simultaneously reduces total time
You need specialization — different subagents can be prompted or configured differently for different types of work
The workflow is complex enough that a single monolithic agent would struggle with context length or reasoning depth

When Not To Use Orchestration

Orchestration adds overhead. Every coordination layer means more LLM calls, more latency, and more points of failure. If a task can be handled well by a single agent, an orchestrator just makes it slower and more complex.

It also increases debugging difficulty. When something goes wrong in an orchestrated system, tracing the failure requires understanding every agent’s role and what it produced. Well-designed orchestration systems log intermediate outputs for exactly this reason.

One common mistake: teams reach for orchestration because it feels sophisticated. But the most effective system is usually the simplest one that gets the job done. Start with a single agent. Add orchestration only when a single agent demonstrably can’t handle the task.

How the Four Types Work Together in Production

These agent types aren’t mutually exclusive. Real production systems often combine several of them under a single orchestrator, with each type deployed where it has a structural advantage.

Here’s a concrete example — a competitive intelligence system:

The orchestration agent receives a weekly research brief request
It dispatches an auto research agent to gather recent news, product updates, and pricing changes from competitor sources
It dispatches a dark factory pipeline to process thousands of customer reviews from app stores, extracting sentiment and feature mentions at scale
It dispatches a coding harness to pull structured data from internal analytics systems and generate comparison charts
The orchestration agent collects all outputs and synthesizes them into a formatted report

Each agent type is used where it has a natural advantage. The auto research agent handles dynamic, unpredictable retrieval. The dark factory handles volume with consistent structure. The coding harness handles structured data and visualization. The orchestrator ties it together.

This is what multi-agent workflow actually means in practice — not just multiple LLM calls, but multiple architecturally distinct agents, each optimized for a specific kind of work.

A Decision Framework for Choosing Agent Type

Before building, run through these questions:

Question	If yes →
Does the task involve writing or running code with testable outputs?	Coding Harness
Does the task involve processing large volumes of structurally similar items without human review?	Dark Factory
Does the task require gathering information from multiple sources that aren’t predefined?	Auto Research
Does the task require multiple distinct capabilities or parallel workstreams?	Orchestration

When in doubt:

Start simple. Try a single agent first. Add complexity only when you hit a clear limit.
Match architecture to failure mode. If your biggest risk is hallucination on dynamic information, use auto research with source citations. If your biggest risk is missing volume SLAs, use a dark factory with error handling.
Don’t over-architect. An orchestrated system of five agents that could be replaced by one well-prompted agent is a liability, not an asset.

Building These Agent Types with MindStudio

MindStudio is a no-code platform that supports all four agent types natively. You can build and deploy any of them — or combinations — without writing infrastructure code.

Here’s how each type maps to the platform:

Coding harnesses can be built as webhook or API endpoint agents that accept code tasks, using MindStudio’s JavaScript and Python function support for execution and validation
Dark factory pipelines map directly to MindStudio’s scheduled background agents — set a trigger (time-based, webhook, or email-triggered), define the processing steps, connect to data sources using 1,000+ pre-built integrations, and deploy
Auto research agents use MindStudio’s built-in web search, fetch, and document processing capabilities to gather and synthesize information across sources — no external API management required
Orchestration is handled through MindStudio’s workflow builder, where you can chain agents, pass outputs as inputs to downstream steps, handle branching logic, and run subworkflows in parallel

Hermes, walked through line by line — free 1-hour workshop

If you’re building orchestrated systems where external agents (Claude Code, LangChain, CrewAI) need to call MindStudio workflows as tools, the Agent Skills Plugin is worth knowing. It exposes MindStudio’s capabilities as typed method calls — agent.searchGoogle(), agent.runWorkflow(), agent.sendEmail() — so external agents can delegate specific subtasks without managing authentication, rate limiting, or retries.

For teams who want to understand how multi-agent orchestration works in practice, MindStudio handles the infrastructure layer so the focus stays on what each agent should actually do.

You can start building for free at mindstudio.ai.

Frequently Asked Questions

What is a coding harness AI agent?

A coding harness is an AI agent designed to operate inside a software development environment. It can read and write code files, run terminal commands, execute tests, and iterate on its own output based on the results. The key feature is a tight feedback loop with a deterministic execution environment — the agent knows whether its output succeeded because the tests pass or fail. Common examples include Claude Code and GitHub Copilot Workspace.

What does “dark factory” mean in AI?

The term comes from lights-out manufacturing — facilities that run without humans and therefore without lights. In AI, a dark factory agent is a fully automated pipeline that processes work at scale without any human in the loop. It ingests inputs, applies defined processing steps (often including one or more LLM calls), and routes outputs — running unattended on a schedule or in response to events like incoming emails or database records.

What’s the difference between an auto research agent and a RAG system?

A RAG system retrieves information from a known, pre-indexed knowledge base. The retrieval path is defined in advance. An auto research agent decides what to retrieve based on what it’s looking for — it can search the web, follow links, query multiple sources, and change its retrieval strategy mid-task. RAG is better for stable, structured knowledge bases. Auto research is better when information is dynamic or when you don’t know where to look upfront.

What is an orchestration agent in AI?

An orchestration agent manages other agents. It breaks a complex goal into subtasks, routes each subtask to a specialized subagent, handles dependencies between tasks, and assembles the final output. It’s the coordination layer in a multi-agent system — it doesn’t do the specialized work itself, it delegates, monitors, and integrates results.

Can one AI system use multiple agent types at once?

Yes — and this is how most sophisticated production systems work. An orchestration agent typically coordinates a mix of other agent types, each optimized for a specific kind of work. A competitive intelligence system might use an auto research agent for web gathering, a dark factory for high-volume data processing, and a coding harness for structured data analysis, all coordinated by an orchestrator.

When should I avoid using an orchestration agent?

When a single agent can handle the task adequately. Orchestration adds overhead: more LLM calls, more latency, more failure points, and more debugging complexity. Those costs are worth paying when a task genuinely requires multiple distinct capabilities or parallel workstreams. They’re not worth paying when they just make a simple task more complicated. A good rule: if you can describe a single agent that could do the whole task, start there.

Key Takeaways

The four agent types — coding harnesses, dark factories, auto research agents, and orchestration agents — each have a specific architecture optimized for a specific kind of work. They are not interchangeable.
Coding harnesses work inside deterministic development environments with testable feedback loops. Dark factories process high volumes of similar items unattended. Auto research agents gather and synthesize from dynamic, unpredictable sources. Orchestration agents coordinate other agents.
Mismatching agent type to task is one of the most common reasons AI systems fail in production — not model quality, but architectural fit.
Real production systems frequently combine multiple agent types under an orchestrator, each deployed where it has a structural advantage.
Start with the simplest architecture that can work. Add complexity only when a single agent demonstrably hits a wall.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

Ready to build? MindStudio gives you the tools to build, test, and deploy all four agent types — and combine them — without writing infrastructure code. You can learn more about building AI agents for specific use cases or explore how automated workflows handle complex, multi-step tasks to see how these patterns apply to real problems.

Why “AI Agent” Means Four Very Different Things

Why Architecture Matters More Than Model Choice

Coding Harnesses: Agents That Work Inside Development Environments

What They Are

How They Work

When to Use a Coding Harness

When Not To Use One

Dark Factories: Fully Automated, Humanless Pipelines

What They Are

How They Work

When to Use a Dark Factory

When Not To Use One

Auto Research Agents: Autonomous Information Gathering

What They Are

How They Work

Seven tools to build an app. Or just Remy.

When to Use an Auto Research Agent

When Not To Use One

Orchestration Agents: Agents That Manage Other Agents

What They Are

How They Work

When to Use Orchestration

When Not To Use Orchestration

How the Four Types Work Together in Production

A Decision Framework for Choosing Agent Type

Building These Agent Types with MindStudio

Frequently Asked Questions

What is a coding harness AI agent?

What does “dark factory” mean in AI?

What’s the difference between an auto research agent and a RAG system?

What is an orchestration agent in AI?

Can one AI system use multiple agent types at once?

When should I avoid using an orchestration agent?

Key Takeaways

Other agents ship a demo. Remy ships an app.

Related Articles

Why Consumer AI Agents Still Feel Disappointing: 5 Rungs They Haven't Climbed Yet

How to Know When Proactive Consumer Agents Actually Arrive: 3 Early Warning Signs to Watch

What Is Pika Me? How to Have a Real-Time Video Chat With Your AI Agent

Stochastic Multi-Agent Consensus: How to Get Better AI Ideas at Scale