Feature deep-dive

How On Device AI coordinates a team of specialized agents on your device

On Device AI adds two multi-agent modes: subagent delegation, where a main agent routes tasks to specialized experts and synthesizes their results, and Chat Flow, where multiple agents respond in parallel to the same input. Both modes run locally on your Apple device.

Why single agents hit a ceiling

The familiar pattern: you ask, the model answers. For a quick question, that works. But ask something like "what are the strongest arguments for and against entering the Japanese market?" and you're asking one model to do five different jobs at the same time.

It has to research market data, assess competitive dynamics, evaluate risk, identify opportunities, and organize all of it into a useful response—while managing a context window that shrinks as the conversation gets longer. The deeper the question, the more likely something gets dropped or flattened.

Subagent delegation solves this by giving the main agent a roster of specialists it can call. The Researcher looks up facts with a web search tool. The Brainstormer generates angles the main agent might have missed. The main agent synthesizes what comes back and responds to the user. Each expert sees only what it needs to see. You get a better answer without managing any of the handoff yourself.

How delegation works in practice

When the main agent decides a specialist can help, it emits a structured delegation request that On Device AI intercepts at the tool-call layer. The request carries three fields: the target expert name, the task description, and an output_mode (currently json). The app parses this, runs the designated expert, and reinjects a canonicalized result back to the main agent's context.

The result injected back to the main agent carries the expert name, a status of completed, failed, or cancelled, a plain-text summary, and a structured data object with the expert's actual output—making it easy for the main agent to consume and reference specific fields in its synthesis.

The main agent now has clean, structured output from a specialist without you touching a single prompt. You just asked a question.

Expert configuration and context strategies

A delegation flow has a main agent and an expert roster. Each expert is a configured AI participant with its own system prompt, an optional preferred model (local or cloud), an optional tool allowlist, and a context strategy.

Context strategy is the decision that matters most per expert—it controls what the expert sees when it runs:

Strategy What the expert sees
taskOnly Only the delegated task. Isolated, no conversation history.
ownHistory The delegated task plus this expert's own prior turns in the session.
fullConversation The complete conversation context up to the delegation point.

taskOnly is the default, and it's the right choice for most experts. Each delegation is fully isolated, with no risk of prior unrelated messages contaminating the expert's output. The ownHistory strategy is useful for experts that need continuity across multiple delegations in the same session—for example, a planning agent building an iterative analysis across several rounds.

Expert session memory under ownHistory is in-memory and session-scoped. Starting a new conversation clears it entirely.

Depth, model switching, and loop prevention

Delegation depth defaults to 1, meaning experts answer the main agent but cannot themselves delegate further. This is intentional—one level of delegation covers the vast majority of real workflows without the complexity cost of nested agent trees. Pro users can increase delegation depth for flows that genuinely need multi-level routing.

The system prevents self-delegation, where an expert would call itself at the same depth. At maximum depth, delegation instructions are stripped from the expert's prompt automatically, so agents cannot attempt to go deeper even if their prompt would otherwise allow it.

Each expert can have a preferred model—local GGUF, MLX, or any configured cloud provider. If the expert's preferred model differs from the current one, On Device AI switches models automatically before running the expert turn. You'll see the model name in a loading indicator. If a model fails to load, the delegation fails cleanly and the main agent gets a failure result rather than silently continuing without it.

Chat Flow mode: when you want agents running in parallel

Subagent delegation is sequential—one expert runs, returns, and the main agent continues. Chat Flow mode is a different pattern. Multiple AI participants respond to the conversation in sequence, but each one's input can be configured independently, which enables fan-out and fan-in workflows.

A typical configuration might route the user's input to a Researcher and a Creative agent simultaneously (fan-out), then pass their outputs to a Synthesizer that sees everything (fan-in). The Researcher and Creative agents each see the user's original message but not each other's work. The Synthesizer sees the full thread.

Each Chat Flow participant has a context base selector—the rule that determines what message their response is based on:

Context base What the participant responds to
previousMessage (default) The most recent message in the flow chain.
latestUserMessage The triggering user input directly—enables the fan-out pattern.
latestMessageFromRole A specific prior role's output—useful for targeted synthesis.

The conversation view labels each role message with its base message so you can trace exactly what each agent was responding to. Invalid context base references—for example, a role that was deleted—automatically fall back to latestUserMessage.

What you see while delegation runs

On Device AI doesn't hide delegation progress behind a spinner. When an expert is running, the interface shows the active stage (starting, loading the model, streaming output, or completed), the expert's name, the depth level, and live streaming text as the expert generates its response. If the model supports chain-of-thought reasoning, both the reasoning stream and the answer stream are visible in real time.

You can expand any delegation run to see the full task sent to the expert, the streaming output as it's generated, and a terminal summary once the run completes. Or keep it collapsed for a clean conversation view. Failed delegations surface the failure back to the main agent so it can acknowledge the issue in its final response rather than silently producing an incomplete answer.

Default expert templates

Every new delegation flow starts with two pre-configured experts you can use immediately or customize:

Expert Default role Tools
Researcher Accurate, structured information gathering using web search Web search
Brainstormer Creative lateral thinking—generates 5–10 diverse angles or ideas None

Both experts use taskOnly context strategy by default, keeping each delegation isolated. You can edit, delete, or add new experts. Common additions include an Analyst for structured data evaluation, a Writer for drafting and editing, or a Strategist that reads full conversation context to synthesize recommendations from prior expert outputs.

Use cases worth building

Business analysis. Configure a Strategic Coordinator as the main agent with a Researcher and Risk Assessor in the expert roster. Ask "Should we expand into this market?" The Researcher pulls relevant data using web search. The Risk Assessor evaluates the inputs. The main agent synthesizes a recommendation. The entire analysis happens on your device, not on someone else's server.

Research synthesis. Enable the Researcher expert with web search. The main agent delegates literature lookups and returns a clean synthesis. Each search delegation is isolated via taskOnly, which prevents prior search sessions from bleeding into new ones. The main agent holds the thread and builds on each result.

Content development. Use Chat Flow mode with a Researcher, a Writer, and a Fact-Checker. The Researcher responds to the user's input directly, the Writer receives the Researcher's output, and the Fact-Checker verifies the draft. Each agent does one job. The flow handles the handoffs.

Parallel code review. Fan out to an Architecture Reviewer, Security Auditor, and Performance Analyst—all reading the same code input simultaneously—then route their outputs to a Synthesizer who delivers consolidated feedback. Three specialist perspectives, one coherent response.

Task Planner tool

For workflows that require step-by-step execution, On Device AI includes a Task Planner tool for Pro subscribers. The planner creates a structured, in-memory task plan from your request, tracks progress across tool calls, and injects a compact plan summary into the system prompt so the model always knows where it is in the sequence.

Available operations include creating and clearing a plan, getting the current plan state, setting the current task, updating task status, and appending, editing, or deleting individual tasks. The planner runs within the bounded tool loop—no infinite execution, and no step-by-step confirmation required from you.

Privacy: why running on-device changes the calculation

Most multi-agent platforms send data to cloud infrastructure. Every delegation, every expert invocation—each one is a round-trip to a remote server. That means whatever your agents are working on (strategy documents, customer data, source code, private messages) leaves your device on every single step.

On Device AI runs everything locally on your Apple hardware. No round-trips for AI inference unless you explicitly add a cloud provider using your own API keys. Apple Silicon's unified memory architecture means the full context is available without the latency of a remote call, and nothing is transmitted during the delegation process.

Expert session memory under ownHistory is runtime-only and cleared when you start a new conversation. There is no persistent expert surveillance between sessions.

The single-active-model constraint

On-device hardware has limits that cloud platforms don't. On Device AI enforces a hard constraint: only one LLM instance is active at any time. In both subagent delegation and Chat Flow modes, agents run sequentially, not simultaneously—which avoids memory contention on consumer Apple hardware.

For Chat Flow, this means three parallel agents actually run one at a time, in order. The fan-out is logical (each agent sees the same input independently) but the execution is sequential. The result looks the same to you—three independent outputs. You just can't run them at the exact same millisecond.

If experts in your delegation flow use different models, On Device AI handles the model switching between expert turns automatically. You'll see a loading indicator with the model name as it transitions.

Subscription tiers

Feature Free Pro
Delegation flows 1 flow Unlimited
Experts per delegation flow Up to 2 More experts
Delegation depth Fixed at 1 Configurable
Chat Flow (multi-participant) 1 flow, up to 4 participants Unlimited flows
Task Planner tool Included

Pro-only flows are preserved if you downgrade—no data is lost. They become read-only until Pro access is restored.

Getting started

To try subagent delegation in On Device AI, open the app on iPhone, iPad, or Mac and start a new conversation. Tap the role or agent button at the top of the conversation view and switch to Subagents mode. The default flow includes a Researcher with web search and a Brainstormer. You can start chatting immediately, or tap to edit either expert's system prompt, preferred model, and tool allowlist before you begin.

For Chat Flow mode (parallel agents), use the same mode selector and switch to Flow Agent mode. Configure participants and set each one's context base selector to control the fan-out and fan-in pattern. Both modes are available on all supported Apple devices running On Device AI.

The delegation flow editor and Chat Flow editor are the same interface. Add as many experts or participants as your plan allows, reorder them by dragging, and set per-participant models and tools independently. The app handles everything from model loading to result injection once you send your first message.

← Back to News & Blogs IM Autopilot for macOS → Download On Device AI →