Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Multi-AgentWorkflowsAI Concepts

Four Types of AI Agents Explained: Coding Harnesses, Dark Factories, Auto Research, and Orchestration

Not all AI agents are the same. Learn the four distinct agent types used in production, when to use each, and why mixing them up leads to failure.

MindStudio Team
Four Types of AI Agents Explained: Coding Harnesses, Dark Factories, Auto Research, and Orchestration

Why “AI Agent” Means Four Very Different Things

The term “AI agent” gets applied to everything from a simple chatbot that books meetings to a complex multi-agent system that processes thousands of documents overnight. That vagueness isn’t just sloppy vocabulary — it causes real engineering failures.

When teams treat all AI agents as structurally equivalent, they build systems that don’t fit their actual needs. A coding harness built like an orchestrator will be slow and expensive. A dark factory designed like an auto research agent will produce unreliable outputs. The wrong architecture for the job isn’t just inefficient — it typically fails outright.

This article breaks down the four distinct types of AI agents used in production multi-agent workflows: coding harnesses, dark factories, auto research agents, and orchestration agents. For each, you’ll see how it works, what it’s built for, and — just as important — when it’s the wrong choice.


Why Architecture Matters More Than Model Choice

Most conversations about AI agents focus on which model to use: GPT-4o, Claude 3.5 Sonnet, Gemini 2.0. That matters, but it’s secondary to architecture.

The type of agent you build shapes everything else: how it reasons, what tools it can access, how it handles errors, and whether it needs human oversight. Two agents running the same model but built as different types will behave completely differently — and succeed or fail for completely different reasons.

There’s also a cost dimension. Orchestration agents, for example, spawn subagents and make many LLM calls per task. Running an orchestration architecture for a simple coding task burns tokens unnecessarily and adds latency with no benefit. Matching architecture to task isn’t just about capability. It’s about efficiency.

Researchers studying multi-agent system design have found that architectural choices — including how agents are structured, how they communicate, and how they handle errors — frequently matter more than raw model capability for task completion. Let’s go through each type.


Coding Harnesses: Agents That Work Inside Development Environments

What They Are

A coding harness is an AI agent that operates within a bounded technical environment — typically a codebase. It can read and write files, run tests, execute terminal commands, and iterate on its own output.

The defining characteristic is a tight feedback loop with a deterministic execution environment. The agent writes code, runs it, sees what breaks, and revises. It doesn’t need to reason abstractly about whether its output is “good” — the tests pass or they don’t.

The word “harness” here is deliberate. A test harness in software development is the scaffolding that lets you execute code and observe results. A coding harness agent operates within exactly that kind of scaffolding.

How They Work

Coding harnesses typically have access to a defined set of tools:

  • File system access — read, write, create, and delete files
  • Terminal and shell execution — run scripts, install packages, execute builds
  • Test runners — run automated tests and capture output
  • Code search — search across a codebase for relevant functions or patterns
  • Version control — commit, branch, and diff via git

The agent loops through a plan-execute-observe cycle: decide what to do, do it, check the result, adjust. Modern examples include Claude Code, GitHub Copilot Workspace, and Devin. What distinguishes them from a simple “code completion” tool is that they act over multiple steps, handling real consequences (broken builds, failing tests) and adapting accordingly.

When to Use a Coding Harness

  • The task involves writing, editing, or debugging code
  • There’s a clear success condition you can test programmatically
  • The environment is deterministic — the same input should produce consistent output
  • The scope is bounded to a single repository or project

When Not To Use One

Coding harnesses struggle outside their native environment. Ask one to “research competitors and write a market analysis” and it will either fail or produce something useless. The tight technical feedback loop that makes it powerful inside a codebase becomes a liability outside it.

They also don’t coordinate well with other agents. A coding harness is designed to work alone in its lane. If your task requires multiple agents handling different parts of a problem, you need something else.


Dark Factories: Fully Automated, Humanless Pipelines

What They Are

The term “dark factory” comes from manufacturing. A lights-out facility — sometimes called a dark factory — is so automated it can run without lights because there are no humans present to need them.

In AI, a dark factory agent is a fully automated pipeline that processes work at scale without human involvement. It receives inputs (documents, records, data feeds), processes them through a defined set of steps, and produces outputs — often thousands of items at a time — without anyone watching.

These aren’t conversational agents. They don’t wait for a user’s next message. They run, finish, and stop — or loop indefinitely on a schedule.

How They Work

Dark factory agents are typically structured as:

  1. Input ingestion — pulling from a data source (email inbox, database, file storage, API feed)
  2. Processing steps — one or more AI or non-AI operations per item (classify, extract, summarize, transform)
  3. Output routing — writing results to a database, triggering downstream systems, sending notifications
  4. Error handling and logging — since there’s no human to catch mistakes, robust error handling must be built in from the start

The AI component might be a single LLM call per item, or it might be a mini-pipeline with several steps. The key is that the entire thing runs unattended. Reliability and throughput are what matter. Flexibility is secondary.

When to Use a Dark Factory

Dark factories are right when:

  • Volume is high and inputs are structurally similar — classifying thousands of support tickets, extracting data from thousands of contracts, summarizing hundreds of research papers
  • Processing is scheduled or event-driven — nightly data enrichment, weekly report generation, processing every new record that hits a database
  • The task has known, consistent structure — inputs are similar enough that the same processing logic works reliably across all of them
  • Human review isn’t required per item — the downstream use case tolerates a small error rate, or errors are caught at a sampling stage

When Not To Use One

Dark factories break down when tasks require dynamic judgment or inputs with high variance. If every item requires a different approach, or if the agent needs to decide mid-task what to do next, a rigid pipeline will produce unreliable outputs on the edge cases — and often fail silently.

They also don’t work for exploratory tasks. A dark factory can extract structured information from legal contracts, but it can’t “figure out what information matters.” That requires a more adaptive agent type.


Auto Research Agents: Autonomous Information Gathering

What They Are

An auto research agent browses, reads, and synthesizes information autonomously. It starts with a question or goal, decides where to look, retrieves information from multiple sources, evaluates relevance, and produces a coherent output — without a human directing each step.

This is meaningfully different from a dark factory. A dark factory processes a known set of inputs. An auto research agent decides what inputs to retrieve based on what it finds along the way. The retrieval path is dynamic.

It’s also different from a standard RAG (retrieval-augmented generation) system. RAG retrieves from a pre-indexed, static knowledge base — you know where to look before you start. An auto research agent can search the web, follow links, query multiple sources, reformulate its queries based on interim findings, and change direction entirely if the initial approach isn’t working.

How They Work

A typical auto research agent loop:

  1. Goal decomposition — break the research question into sub-questions
  2. Search and retrieval — query search engines, read web pages, pull documents, query APIs
  3. Evaluation — assess whether retrieved information is relevant and sufficient
  4. Follow-up retrieval — if not sufficient, decide what to look for next
  5. Synthesis — compile findings into a structured output
  6. Validation — check for contradictions, gaps, or low-confidence claims

The agent exercises real judgment at each step. It might decide a source isn’t credible and look elsewhere. It might discover that the original question was framed wrong and reformulate it. This adaptive loop is what makes auto research agents powerful — and also what makes them more expensive and harder to control than other types.

When to Use an Auto Research Agent

  • The answer requires gathering from multiple sources, not one
  • The retrieval path is uncertain upfront — you don’t know exactly where to look until you start
  • The output requires synthesis, not just extraction — comparing, reasoning across sources, drawing conclusions
  • The research domain changes frequently enough that a static knowledge base won’t stay current

Common use cases: competitive intelligence, literature reviews, due diligence, real-time market monitoring, and investigative research across web sources.

When Not To Use One

Auto research agents are expensive and slow. They make many LLM calls and many web requests per task. Don’t use one when information is available in a structured database, or when a simple RAG query would suffice.

They also require careful handling of source quality. An auto research agent that can read any web page will hit low-quality, misleading, or outright false content. Without guardrails, it can synthesize misinformation confidently. This is the agent type that most needs human review before outputs are used in decisions.


Orchestration Agents: Agents That Manage Other Agents

What They Are

An orchestration agent coordinates other agents. It receives a complex goal, breaks it into tasks, assigns those tasks to specialized subagents, collects results, and assembles a final output.

Think of it as a project manager that delegates to a team. The orchestrator doesn’t do the individual work — it decides what work needs to be done, routes it to the right specialist, and handles dependencies between tasks.

How They Work

Orchestration typically follows this pattern:

  1. Task decomposition — break the high-level goal into discrete, assignable subtasks
  2. Agent selection — choose the right subagent or tool for each subtask
  3. Parallel or sequential execution — run subtasks simultaneously where possible, handle dependencies where they exist
  4. Result collection — gather outputs from subagents
  5. Integration — synthesize outputs into a coherent final result
  6. Error handling — decide what to do when a subagent fails or returns unexpected output

The subagents can be any of the other types: coding harnesses, dark factory pipelines, research agents, or even other orchestrators (for hierarchical multi-agent systems). The orchestrator’s job is coordination, not execution.

When to Use Orchestration

  • A single task requires multiple distinct capabilities — e.g., researching a topic, writing copy based on research, generating images to accompany that copy, and formatting everything for publication
  • Subtasks can be parallelized — running them simultaneously reduces total time
  • You need specialization — different subagents can be prompted or configured differently for different types of work
  • The workflow is complex enough that a single monolithic agent would struggle with context length or reasoning depth

When Not To Use Orchestration

Orchestration adds overhead. Every coordination layer means more LLM calls, more latency, and more points of failure. If a task can be handled well by a single agent, an orchestrator just makes it slower and more complex.

It also increases debugging difficulty. When something goes wrong in an orchestrated system, tracing the failure requires understanding every agent’s role and what it produced. Well-designed orchestration systems log intermediate outputs for exactly this reason.

One common mistake: teams reach for orchestration because it feels sophisticated. But the most effective system is usually the simplest one that gets the job done. Start with a single agent. Add orchestration only when a single agent demonstrably can’t handle the task.


How the Four Types Work Together in Production

These agent types aren’t mutually exclusive. Real production systems often combine several of them under a single orchestrator, with each type deployed where it has a structural advantage.

Here’s a concrete example — a competitive intelligence system:

  1. The orchestration agent receives a weekly research brief request
  2. It dispatches an auto research agent to gather recent news, product updates, and pricing changes from competitor sources
  3. It dispatches a dark factory pipeline to process thousands of customer reviews from app stores, extracting sentiment and feature mentions at scale
  4. It dispatches a coding harness to pull structured data from internal analytics systems and generate comparison charts
  5. The orchestration agent collects all outputs and synthesizes them into a formatted report

Each agent type is used where it has a natural advantage. The auto research agent handles dynamic, unpredictable retrieval. The dark factory handles volume with consistent structure. The coding harness handles structured data and visualization. The orchestrator ties it together.

This is what multi-agent workflow actually means in practice — not just multiple LLM calls, but multiple architecturally distinct agents, each optimized for a specific kind of work.


A Decision Framework for Choosing Agent Type

Before building, run through these questions:

QuestionIf yes →
Does the task involve writing or running code with testable outputs?Coding Harness
Does the task involve processing large volumes of structurally similar items without human review?Dark Factory
Does the task require gathering information from multiple sources that aren’t predefined?Auto Research
Does the task require multiple distinct capabilities or parallel workstreams?Orchestration

When in doubt:

  • Start simple. Try a single agent first. Add complexity only when you hit a clear limit.
  • Match architecture to failure mode. If your biggest risk is hallucination on dynamic information, use auto research with source citations. If your biggest risk is missing volume SLAs, use a dark factory with error handling.
  • Don’t over-architect. An orchestrated system of five agents that could be replaced by one well-prompted agent is a liability, not an asset.

Building These Agent Types with MindStudio

MindStudio is a no-code platform that supports all four agent types natively. You can build and deploy any of them — or combinations — without writing infrastructure code.

Here’s how each type maps to the platform:

  • Coding harnesses can be built as webhook or API endpoint agents that accept code tasks, using MindStudio’s JavaScript and Python function support for execution and validation
  • Dark factory pipelines map directly to MindStudio’s scheduled background agents — set a trigger (time-based, webhook, or email-triggered), define the processing steps, connect to data sources using 1,000+ pre-built integrations, and deploy
  • Auto research agents use MindStudio’s built-in web search, fetch, and document processing capabilities to gather and synthesize information across sources — no external API management required
  • Orchestration is handled through MindStudio’s workflow builder, where you can chain agents, pass outputs as inputs to downstream steps, handle branching logic, and run subworkflows in parallel

If you’re building orchestrated systems where external agents (Claude Code, LangChain, CrewAI) need to call MindStudio workflows as tools, the Agent Skills Plugin is worth knowing. It exposes MindStudio’s capabilities as typed method calls — agent.searchGoogle(), agent.runWorkflow(), agent.sendEmail() — so external agents can delegate specific subtasks without managing authentication, rate limiting, or retries.

For teams who want to understand how multi-agent orchestration works in practice, MindStudio handles the infrastructure layer so the focus stays on what each agent should actually do.

You can start building for free at mindstudio.ai.


Frequently Asked Questions

What is a coding harness AI agent?

A coding harness is an AI agent designed to operate inside a software development environment. It can read and write code files, run terminal commands, execute tests, and iterate on its own output based on the results. The key feature is a tight feedback loop with a deterministic execution environment — the agent knows whether its output succeeded because the tests pass or fail. Common examples include Claude Code and GitHub Copilot Workspace.

What does “dark factory” mean in AI?

The term comes from lights-out manufacturing — facilities that run without humans and therefore without lights. In AI, a dark factory agent is a fully automated pipeline that processes work at scale without any human in the loop. It ingests inputs, applies defined processing steps (often including one or more LLM calls), and routes outputs — running unattended on a schedule or in response to events like incoming emails or database records.

What’s the difference between an auto research agent and a RAG system?

A RAG system retrieves information from a known, pre-indexed knowledge base. The retrieval path is defined in advance. An auto research agent decides what to retrieve based on what it’s looking for — it can search the web, follow links, query multiple sources, and change its retrieval strategy mid-task. RAG is better for stable, structured knowledge bases. Auto research is better when information is dynamic or when you don’t know where to look upfront.

What is an orchestration agent in AI?

An orchestration agent manages other agents. It breaks a complex goal into subtasks, routes each subtask to a specialized subagent, handles dependencies between tasks, and assembles the final output. It’s the coordination layer in a multi-agent system — it doesn’t do the specialized work itself, it delegates, monitors, and integrates results.

Can one AI system use multiple agent types at once?

Yes — and this is how most sophisticated production systems work. An orchestration agent typically coordinates a mix of other agent types, each optimized for a specific kind of work. A competitive intelligence system might use an auto research agent for web gathering, a dark factory for high-volume data processing, and a coding harness for structured data analysis, all coordinated by an orchestrator.

When should I avoid using an orchestration agent?

When a single agent can handle the task adequately. Orchestration adds overhead: more LLM calls, more latency, more failure points, and more debugging complexity. Those costs are worth paying when a task genuinely requires multiple distinct capabilities or parallel workstreams. They’re not worth paying when they just make a simple task more complicated. A good rule: if you can describe a single agent that could do the whole task, start there.


Key Takeaways

  • The four agent types — coding harnesses, dark factories, auto research agents, and orchestration agents — each have a specific architecture optimized for a specific kind of work. They are not interchangeable.
  • Coding harnesses work inside deterministic development environments with testable feedback loops. Dark factories process high volumes of similar items unattended. Auto research agents gather and synthesize from dynamic, unpredictable sources. Orchestration agents coordinate other agents.
  • Mismatching agent type to task is one of the most common reasons AI systems fail in production — not model quality, but architectural fit.
  • Real production systems frequently combine multiple agent types under an orchestrator, each deployed where it has a structural advantage.
  • Start with the simplest architecture that can work. Add complexity only when a single agent demonstrably hits a wall.

Ready to build? MindStudio gives you the tools to build, test, and deploy all four agent types — and combine them — without writing infrastructure code. You can learn more about building AI agents for specific use cases or explore how automated workflows handle complex, multi-step tasks to see how these patterns apply to real problems.

Presented by MindStudio

No spam. Unsubscribe anytime.