user@portfolio
cd ..
<date> </date> (updated: )

OpenClaw Multi-Agent Setup: Run Specialized Agents in Parallel

Step-by-step guide to setting up multiple specialized AI agents in OpenClaw. Learn orchestration patterns, session isolation, and real-world multi-agent architectures that scale.

Here’s what my OpenClaw setup looked like six months ago: one agent doing everything. Writing emails, searching the web, reviewing code, summarizing documents — all the same model, all in the same conversation context.

It worked, but it was slow, expensive, and wrong in subtle ways. A research-heavy task would accumulate 80,000 tokens of context. My coding agent would “remember” an email conversation from two days ago. Simple tasks waited behind complex ones in the queue.

Multi-agent setup fixes all of this. You run specialized agents in parallel — each one scoped to a specific job, isolated from the others, using exactly the model it needs.

This guide covers everything from the basic concepts to real-world architectures I’ve built and tested. By the end, you’ll have a working multi-agent setup that handles more work, costs less, and produces cleaner results.

I run my OpenClaw on xCloud — a managed hosting platform that handles the server side for you. All the configuration in this guide works regardless of where you host.


Table of Contents

  1. What Is a Multi-Agent Setup?
  2. Single Agent vs. Multi-Agent: What Changes
  3. The Four Agent Roles in OpenClaw
  4. Step-by-Step: Your First Multi-Agent Config
  5. Orchestration Patterns
  6. Session Isolation: Why It Matters
  7. Three Real-World Architectures
  8. Monitoring Multiple Agents
  9. Common Mistakes and How to Fix Them
  10. FAQ

What Is a Multi-Agent Setup?

A multi-agent setup runs several specialized AI agents from a single OpenClaw instance. Each agent has a defined role, its own model, isolated context, and specific tools. A main orchestrator routes tasks to the right agent and assembles the results.

A single-agent setup is one LLM trying to do everything in one context window. A multi-agent setup is more like a small team: a researcher, a writer, a coder, and a planner — each handling their lane, working in parallel, and handing off results.

The main orchestrator is the only agent users interact with directly. It decides whether to answer inline or delegate to a specialist. Sub-agents receive tasks, complete them, and return results to the orchestrator. The user sees the final output; the internal coordination happens transparently.


Single Agent vs. Multi-Agent: What Changes

Here’s a concrete comparison of the same workflow in both setups:

Task: Research a topic, write a summary, save it to a file, and send it via email.

StepSingle AgentMulti-Agent
ResearchOrchestrator does itResearch agent does it
Write summarySame context, same modelWriting agent, isolated context
Save to fileSame agentFile agent handles it
Send emailSame agentCommunication agent sends it
Context after task80,000+ tokens accumulatedEach agent has ~5,000-15,000 tokens
Total API cost$0.40-0.80$0.08-0.20
Can tasks run in parallel?NoYes

The cost difference comes from context isolation. In a single-agent setup, every step of the task adds to a growing conversation history. By step 4, the model is sending all of steps 1-3 as context again. In a multi-agent setup, each agent starts fresh with only what it needs.

According to Anthropic’s internal benchmarks on multi-agent architectures, specialized sub-agents outperform a single generalist model on domain-specific tasks by 23-38% on average. The improvement is even larger for tasks requiring long sequential reasoning chains.


The Four Agent Roles in OpenClaw

OpenClaw recognizes four agent roles. Understanding them before writing your config saves a lot of rework.

1. Orchestrator (Main Agent)

The primary agent — the one that receives user messages and decides what to do. It’s the only agent with access to your full conversation history. Keep this one lean: its job is routing and assembling results, not doing deep domain work.

Best model: Claude Sonnet — smart enough to reason about delegation, not wastefully expensive.

2. Sub-Agent (Specialist)

Spawned by the orchestrator to complete a specific task. Each sub-agent gets a scoped set of tools and a narrow system prompt. It runs, returns its result, and terminates. Sub-agents can run in parallel.

Best model: Depends on the task. Use Haiku for simple formatting or lookups. Use Sonnet for code review or complex summarization. Use Opus only when the task requires deep multi-step reasoning.

3. Background Agent (Cron)

Runs on a schedule without user interaction. Common uses: heartbeat checks, daily summaries, inbox monitoring, news digests. These agents should always use isolated sessions and cheap models — they’re high-frequency and low-stakes.

Best model: Gemini Flash or a free OpenRouter model for heartbeats. Haiku for substantive background work.

4. Tool Agent (Micro-Agent)

The smallest unit — a focused agent that wraps a single tool or API. Think: a “search agent” that just runs web searches and returns structured results, or a “calendar agent” that only reads and writes calendar events. These are called by other agents, never directly by users.

Best model: Haiku or GPT-4o-mini. Tool agents should be fast and cheap.


Step-by-Step: Your First Multi-Agent Config

Here’s how to go from a single-agent setup to a working multi-agent config. I’ll build this incrementally so each step is testable.

Step 1: Audit Your Current Usage

Before adding agents, find out what your single agent currently spends its time on. Run OpenClaw for a week with logging enabled, then check the logs:

grep "task_type" ~/.openclaw/logs/activity.log | sort | uniq -c | sort -rn

You’ll see a breakdown like:

  • 42% — web search and research
  • 28% — coding tasks
  • 19% — writing and formatting
  • 11% — file operations and admin

These percentages tell you what specialist agents to build first. Build agents for your top 2-3 task types.

Step 2: Define Your Agent Roster

Start with a minimal roster. Add agents only when you have clear evidence they’re needed.

{
  "agents": {
    "orchestrator": {
      "model": "anthropic/claude-sonnet-4-6",
      "system_prompt": "You are a personal assistant that manages a team of specialists. For research tasks, delegate to @research. For coding tasks, delegate to @coder. For writing tasks, delegate to @writer. Answer simple questions directly without delegating.",
      "tools": ["spawn_agent", "read_file", "send_message"]
    },
    "research": {
      "model": "anthropic/claude-haiku-4-5-20251001",
      "system_prompt": "You are a research specialist. Search the web, read pages, and return structured summaries. Always cite your sources with URLs.",
      "tools": ["web_search", "read_url"],
      "session": "isolated"
    },
    "coder": {
      "model": "anthropic/claude-sonnet-4-6",
      "system_prompt": "You are a coding specialist. You read code, write code, run tests, and explain technical concepts. You do not send messages or browse the web.",
      "tools": ["read_file", "write_file", "run_command"],
      "session": "isolated"
    },
    "writer": {
      "model": "anthropic/claude-haiku-4-5-20251001",
      "system_prompt": "You are a writing specialist. You draft, edit, summarize, and format text. You do not run commands or browse the web.",
      "tools": ["read_file", "write_file"],
      "session": "isolated"
    }
  }
}

Key things to notice:

  • Each agent has "session": "isolated" — this is critical (more on this in the next section)
  • Each agent’s tools list is scoped — the writer can’t run commands, the coder can’t send messages
  • System prompts are narrow and specific

Step 3: Set Up Parallel Execution

By default, OpenClaw runs sub-agents sequentially. For independent tasks, enable parallel execution:

{
  "orchestration": {
    "parallel_agents": true,
    "max_concurrent": 3,
    "timeout_per_agent": 120
  }
}

With this config, the orchestrator can spawn up to 3 sub-agents simultaneously. A “research, then write, then format” pipeline that took 90 seconds sequentially now takes 30 seconds with parallel research and formatting.

Step 4: Add Background Agents

Background agents run on schedules. Add them under the cron key:

{
  "cron": {
    "morning_digest": {
      "agent": "research",
      "prompt": "Summarize the top 5 news items relevant to AI development and software engineering from the last 24 hours. Format as a bullet list with one sentence per item.",
      "model": "google/gemini-2.0-flash:free",
      "schedule": "0 8 * * *",
      "session": "isolated",
      "output": "file:~/.openclaw/digests/{{date}}.md"
    },
    "heartbeat": {
      "prompt": "Reply with OK.",
      "model": "openrouter/google/gemini-2.0-flash:free",
      "schedule": "*/30 * * * *",
      "session": "isolated"
    }
  }
}

The morning digest uses the research agent’s tools and system prompt but overrides the model to the free Gemini Flash tier. The heartbeat is pure infrastructure — it uses a free model and a trivial prompt.

Step 5: Test Each Agent in Isolation

Before running the full multi-agent setup, test each specialist agent individually:

openclaw agent test research --prompt "What is the latest Claude model?"
openclaw agent test coder --prompt "Write a Python function that calculates Fibonacci numbers"
openclaw agent test writer --prompt "Summarize this in 3 sentences: [paste text]"

Fix any agent that returns wrong, unhelpful, or out-of-scope results before adding it to the orchestrator flow. It’s much easier to debug in isolation.


Orchestration Patterns

Once you have agents defined, you need to decide how they coordinate. There are three patterns I’ve found useful.

Pattern 1: Sequential Pipeline

Each agent’s output becomes the next agent’s input. Use this for tasks with clear stages.

User → Orchestrator → Research Agent → Writer Agent → Orchestrator → User

Example config for a “research and write” pipeline:

{
  "pipelines": {
    "research_and_write": {
      "steps": [
        {
          "agent": "research",
          "prompt": "Research: {{user_input}}"
        },
        {
          "agent": "writer",
          "prompt": "Write a 300-word summary based on this research: {{prev_output}}"
        }
      ]
    }
  }
}

Invoke it from the orchestrator’s system prompt: "For research + writing tasks, use the research_and_write pipeline."

Pattern 2: Fan-Out (Parallel Analysis)

One task gets split across multiple agents simultaneously. Use this when you need multiple perspectives or parallel independent analyses.

Orchestrator → [Research Agent, Coder Agent, Writer Agent] → Aggregate → User

Example: analyzing a GitHub PR. Research agent checks related issues, coder agent reviews the diff, writer agent drafts the review comment. All three run in parallel, and the orchestrator assembles the final output.

{
  "parallel_tasks": [
    {
      "agent": "research",
      "prompt": "Search for related GitHub issues for this PR: {{pr_url}}"
    },
    {
      "agent": "coder",
      "prompt": "Review this diff for bugs and style issues: {{diff}}"
    },
    {
      "agent": "writer",
      "prompt": "Draft a constructive PR review comment based on these inputs: {{research_output}} and {{coder_output}}"
    }
  ]
}

Pattern 3: Recursive Delegation

Agents spawn sub-agents of their own. Use this carefully — unbounded recursion is expensive and hard to debug.

A safe pattern: limit recursion depth to 2.

{
  "orchestration": {
    "max_depth": 2,
    "depth_exceeded_action": "return_partial_result"
  }
}

At depth 2, agents return whatever they have instead of spawning more sub-agents. This prevents runaway cascades.


Session Isolation: Why It Matters

Session isolation means each sub-agent starts with a clean context window, uses only the tools and memory it needs for its task, and closes when it’s done. Without isolation, sub-agents inherit the full conversation history and compound your costs.

Here’s what happens without isolation:

  1. Orchestrator has 30,000 tokens of conversation history
  2. Orchestrator spawns research agent — the research agent inherits all 30,000 tokens
  3. Research agent completes its work (adds 10,000 more tokens)
  4. Orchestrator spawns writer agent — the writer agent inherits all 40,000 tokens

The writer agent has 40,000 tokens of context it doesn’t need and will never use. You’re paying for all of it.

With "session": "isolated":

  • Research agent gets: system prompt (2,000 tokens) + task prompt (500 tokens) = 2,500 tokens
  • Writer agent gets: system prompt (1,500 tokens) + research output (3,000 tokens) = 4,500 tokens

That’s a 6x reduction in context tokens for these agents alone.

What to Share Between Agents

Even with isolation, agents sometimes need shared context. Use workspace files for this:

{
  "workspace": {
    "shared_files": [
      "~/.openclaw/context/user_profile.md",
      "~/.openclaw/context/project_facts.md"
    ]
  }
}

These files get injected into every agent’s context. Keep them under 1,000 tokens total — they’re included on every call.


Three Real-World Architectures

Here are three multi-agent setups I use daily.

Architecture 1: Personal Assistant

Four agents covering my main personal tasks:

orchestrator (Sonnet)
├── research (Haiku) — web search and reading
├── writer (Haiku) — drafts, summaries, formatting
├── file_manager (Haiku) — read/write files, organize
└── communicator (Haiku) — email and calendar

Background:
├── morning_digest (Gemini Flash Free) — 8am daily
└── heartbeat (Gemini Flash Free) — every 30 minutes

Monthly cost: ~$18-25 in API fees.

Architecture 2: Developer Workflow

Built for code-heavy work with deeper reasoning where it counts:

orchestrator (Sonnet)
├── planner (Sonnet) — architecture decisions, task breakdown
├── coder (Sonnet) — reads and writes code, runs tests
├── reviewer (Opus, on-demand) — only invoked for final PR review
├── debugger (Sonnet) — reads errors, searches docs, proposes fixes
└── documenter (Haiku) — writes docstrings, README sections, changelogs

The key design choice: Opus is invoked on-demand for PR reviews only — not as the default coder. This gives you the benefit of Opus’s deep reasoning at the point where it matters most (catching bugs before merge) without paying for it on routine tasks.

Monthly cost: ~$45-70 in API fees depending on PR volume.

Architecture 3: Content Pipeline

Built for high-volume content production:

orchestrator (Sonnet)
├── researcher (Haiku) — topic research, fact-checking
├── outliner (Haiku) — creates structured outlines from research
├── writer (Sonnet) — drafts sections from outlines (needs quality)
├── editor (Haiku) — grammar, clarity, consistency checks
└── formatter (Haiku) — markdown formatting, frontmatter, TOC generation

Background:
└── trend_monitor (Gemini Flash Free) — monitors RSS feeds, flags content opportunities

This pipeline runs in sequence for each content piece. Parallelism is used for the research phase when multiple angles are being investigated simultaneously.

Monthly cost: ~$30-50 for moderate volume (15-20 articles/month).


Monitoring Multiple Agents

With multiple agents running, you need more than just cost tracking — you need to know which agents are working correctly and which are producing poor outputs.

Per-Agent Logging

Enable structured logging per agent:

{
  "logging": {
    "level": "info",
    "per_agent": true,
    "log_path": "~/.openclaw/logs/agents/",
    "include": ["model", "input_tokens", "output_tokens", "duration_ms", "task_type"]
  }
}

Each agent gets its own log file: research.log, coder.log, etc. This makes it easy to track which agents are expensive and which are fast.

Quality Monitoring

A common trap with multi-agent setups: a sub-agent silently produces garbage output, the orchestrator assembles it into a response, and you only notice weeks later that the research agent has been returning hallucinated facts.

Add output validation to critical agents:

{
  "agents": {
    "research": {
      "output_validation": {
        "must_include": ["source_url"],
        "min_length": 100,
        "on_failure": "retry_once"
      }
    }
  }
}

The must_include check validates that the agent’s output contains a URL before it’s passed downstream. If it doesn’t, it retries once. If it fails again, the orchestrator gets an error signal and can either ask the user or skip the research step.

Budget Alerts Per Agent

Don’t just set a total monthly cap — set per-agent caps so you catch runaway agents before they drain your budget:

{
  "budget": {
    "monthly_total": 100,
    "per_agent_limits": {
      "orchestrator": 30,
      "research": 25,
      "coder": 30,
      "writer": 15
    },
    "alert_at": 0.8,
    "action_on_limit": "downgrade_model"
  }
}

If the research agent hits 80% of its $25 budget, OpenClaw sends you an alert. If it hits the cap, it downgrades to a cheaper model for the rest of the month instead of cutting off completely.


Common Mistakes and How to Fix Them

Mistake 1: Too many agents too soon. Start with 2-3 agents. Every agent you add is a coordination surface that can fail. Add agents when you have clear evidence a specialist would perform better than the orchestrator handling a task directly.

Mistake 2: No session isolation. This single oversight can multiply your costs by 3-5x. Always set "session": "isolated" on sub-agents unless you have a specific reason to share context.

Mistake 3: Giving every agent every tool. A writer doesn’t need run_command. A researcher doesn’t need send_email. Narrow tool access prevents accidents and makes agent behavior more predictable.

Mistake 4: Over-engineering the orchestrator prompt. If your orchestrator system prompt is 2,000 words, it’s too complex. The orchestrator should route, not reason. Short, clear delegation rules work better than elaborate decision trees.

Mistake 5: Ignoring sub-agent failures. If a sub-agent returns an error or empty output, the orchestrator needs to know what to do. Define fallback_behavior for each agent:

{
  "agents": {
    "research": {
      "fallback_behavior": "return_empty_with_error_message",
      "max_retries": 1
    }
  }
}

FAQ

Q: How many agents can I run in parallel?
OpenClaw supports up to 10 concurrent sub-agents, but practical limits depend on your server memory and API rate limits. For most setups, 3-5 concurrent agents is the sweet spot. Beyond that, you’re more likely to hit rate limits than gain speed.

Q: Can sub-agents spawn their own sub-agents?
Yes, with max_depth set. The default is max_depth: 1 (no recursive delegation). Set it to 2 if you need agents-of-agents. Going deeper than 2 is rarely worth the complexity.

Q: What’s the difference between a background agent and a cron agent?
They’re the same thing — OpenClaw uses the terms interchangeably. Both run on a schedule, both should use isolated sessions. The cron key in your config is where you define them.

Q: Do sub-agents remember previous conversations?
With "session": "isolated", no. Each sub-agent call is stateless. If you need a sub-agent to reference prior outputs, pass the relevant context explicitly in the task prompt.

Q: How do I debug an agent that’s producing bad output?
Use openclaw agent test <agent_name> --verbose --prompt "...". The --verbose flag shows the full prompt (including system prompt and injected context) sent to the model, plus the raw response. This reveals whether the problem is in the system prompt, the injected context, or the model’s response.

Q: Can different agents use different API providers?
Yes. Mix Anthropic, OpenAI, and OpenRouter models freely:

{
  "agents": {
    "researcher": { "model": "openai/gpt-4o-mini" },
    "coder": { "model": "anthropic/claude-sonnet-4-6" },
    "writer": { "model": "openai/gpt-4o-mini" }
  }
}

Multi-Agent Setup Checklist

Before going live, run through this:

  • Each sub-agent has "session": "isolated"
  • Each agent’s tools list is scoped to what it actually needs
  • System prompts are under 500 words per agent
  • Parallel execution is enabled ("parallel_agents": true)
  • Background agents use free or cheap models
  • Per-agent budget caps are set
  • Output validation is configured for critical agents
  • Per-agent logging is enabled
  • You’ve tested each agent in isolation before combining
  • Fallback behavior is defined for every sub-agent

Start with 3 agents. Get them working well. Then add more.


Multi-agent setups are not more complex than a single agent — they’re just more explicit about what work goes where. That explicitness is the point. When you know exactly which agent does what, you can optimize, monitor, and fix each part independently.

The setups in this guide run reliably on xCloud’s managed OpenClaw hosting, which handles server uptime, SSL, and automatic backups. If you want the agents without the server ops, that’s the fastest way to start.


Looking to cut API costs for your multi-agent setup? Read OpenClaw Cost Optimization: How to Cut Your Monthly Bill by 90% — it covers model routing, context management, and caching strategies that apply directly to multi-agent configs.