Pillar: multi-cli-coordination | Date: April 2026
Scope: Worktree-per-CLI vs branch-per-issue vs shared-main + locks: what real teams have settled on with evidence. Concurrent in-flight edit detection across agents beyond git merge conflicts. Issue tracker patterns: SQLite local vs GitHub Issues vs hybrid, with production evidence. Lock-free coordination patterns from real multi-agent systems. Pre-push hooks and merge-gate patterns that have shipped successfully. Evidence from teams running 4+ parallel CLI agents simultaneously.
Sources: 15 gathered, consolidated, synthesized.
Critical finding: In any shared-main scenario without filesystem isolation, the second write wins — both agents read identical file state, make independent decisions, and write back to the same paths, producing undetected data loss with no merge conflict raised. Zero surveyed practitioners advocate shared-main-plus-locks; every production deployment uses one worktree per agent.[2]
The evidence across practitioner deployments, academic datasets, and open-source tooling has converged on a single isolation primitive: one task = one branch = one worktree = one agent. Branch-per-issue without filesystem isolation fails for the same reason as shared-main — all agents share one working directory, so a checkout by any agent changes what all others are reading and writing, silently corrupting their context regardless of branch-level isolation.[2][14] Git worktrees solve this with separate checked-out copies sharing a single .git object store: each agent gets an independent working directory, index, and branch reference, with disk overhead of approximately 5 GB per worktree on a 2 GB codebase.[2] Claude Code added native worktree support in v2.1.50 via the --worktree flag, which auto-creates named worktrees at .claude/worktrees/<name>/.[1][15]
The AgenticFlict dataset — 107,026 simulated AI-generated pull requests across 59,000+ repositories — provides the most rigorous measurement of conflict rates at scale: 27.67% overall conflict rate, with per-agent variability ranging from 15.24% (GitHub Copilot) to 31.85% (OpenAI Codex).[4] Each conflicting PR averaged 4.36 affected files and 11+ conflict regions. The near-linear relationship between PR size and conflict rate is the sharpest implication: small PRs conflict at ~9.9%, while large PRs conflict at ~32–33%.[4] The practical corollary is that task scoping is a coordination mechanism, not just a code quality practice — keeping each agent's changeset small and file-exclusive is the single highest-leverage intervention available before any coordination infrastructure is deployed.
Textual merge conflicts, however, are only the detectable subset of coordination failures. The Semantic Consensus Framework research identifies Semantic Intent Divergence (SID) — agents developing conflicting interpretations of shared objectives even when their file edits do not textually conflict.[5] Across 600 controlled experimental runs on AutoGen, CrewAI, and LangGraph, baseline workflow failure rates ranged from 41% to 86.7% depending on the framework. Ungoverned execution achieved just 0.2% completion; the Semantic Consensus Framework reached 100% by capturing agent intentions as a directed graph before execution.[5] The root cause attribution is the most operationally significant finding in the corpus: 79% of multi-agent failures stem from specification and coordination gaps, not model capability limitations.[5] Better models do not solve the coordination problem.
Worktrees isolate the filesystem only. Teams running 4+ parallel agents on real infrastructure collide on five additional resource categories: ports, SQLite database files, PostgreSQL schemas, Docker daemons, and Redis key namespaces.[12] Each must be explicitly partitioned per agent — failures on shared ports or databases produce symptoms indistinguishable from code bugs, consuming debugging cycles that attribution to the coordination layer would immediately resolve. The incident.io case study documents one successful configuration: 4–5 simultaneous agents, each assigned a domain with zero file overlap (UI, build tools, tests, backend).[13] Three independent sources converge on 5–8 concurrent agents as the practical ceiling — limited not by compute, disk, or coordination overhead, but by human review capacity.[2][3][12]
For issue tracking and work assignment, multiple independent implementations have settled on local SQLite over GitHub Issues. The structural argument is latency and rate limits: GitHub's API returns results in 100–500 ms with a ceiling of 5,000 authenticated requests per hour, while SQLite queries resolve in microseconds with no rate limit.[6] The behavioral argument is atomicity: GitHub Issues has no native atomic claim operation, exposing work-assignment to TOCTOU races when multiple agents query the same backlog simultaneously. SQLite's UNIQUE constraint solves this with a single INSERT — if two agents attempt to claim the same issue, exactly one succeeds and the other receives a CONFLICT with no polling required.[7][11] Both Beads and parallel-cc independently converged on this two-layer pattern: SQL enforces atomicity, while lane JSON files on the filesystem provide fast local reads for agents checking their own state.
Dead-agent recovery is a first-class concern in any multi-agent system. Pessimistic locking fails categorically: a dead agent holding a lock blocks all work indefinitely. Heartbeat-based reclamation handles this without human intervention — each active agent updates a timestamp; a configurable stale timeout releases uncompleted work back to the queue when the heartbeat stops.[9][11] The Kleisli.IO analysis reframes the entire coordination problem: "Multi-session AI agent workflows are concurrent processes with private state, independent failure modes, and shared mutable resources" — treating them as distributed systems problems motivates lock-free primitives (atomic SQL claims, event sourcing, CRDTs, stigmergic traces) over approaches borrowed from single-process concurrency.[9]
The recce.hq 4-gate architecture documents the most concretely measured quality gate configuration in the corpus: implementing four sequential gates increased weekly commits from 20–50 to 100–200+ while maintaining code quality standards.[8] The architecture distinguishes hard (binary) gates from soft (LLM-readable) gates. Instruction files — AGENTS.md and CLAUDE.md — are Gate 1 and soft: they define file ownership, prohibited zones, and build commands, but agents can rationalize exceptions to text rules. Pre-commit hooks (Biome linting) are Gate 2 and hard: binary pass/fail, no exceptions rationalized. Pre-push hooks (typecheck, full test suite, security scans) are Gate 3 and hard. Human review is Gate 4 and soft, reserved for logical and business logic concerns that binary checks cannot evaluate.[8] The architectural principle is direct: "Hooks are binary — agents cannot rationalize exceptions." An agent can decide to ignore a CLAUDE.md rule; it cannot decide to make a failing test suite pass. Instruction files are necessary documentation but require binary enforcement to function as real coordination gates.
A coordination failure mode absent from the merge-conflict literature deserves explicit attention: agent "dementia," documented by Beads, describes agents whose context window exhausts within approximately 10 minutes, causing them to forget their current scope and re-explore the codebase — overlapping with other agents' claimed work without any file-level collision detectable by git.[6] Fine-grained task scoping addresses this directly: keeping agents early in their context windows, where decision quality and scope adherence are highest, is a coordination benefit independent of any quality benefit.
Practitioners building multi-CLI systems should sequence their investments in this order: First, establish filesystem isolation via worktrees and explicit resource partitioning (ports, databases, namespaces) per agent — without this, no higher-level coordination is reliable. Second, implement atomic SQLite-based work assignment with heartbeat reclamation to handle the full crash-and-recovery lifecycle without human intervention. Third, add binary pre-push gates (typecheck, tests) as the merge quality layer — instruction files alone are insufficient. Fourth, enforce small-task scoping as a structural rule, not a guideline: empirically, tasks that touch fewer files produce conflicts at roughly one-third the rate of large tasks, and small tasks also protect agent context coherence. The ceiling of 5–8 concurrent agents is not a technical limit but a human review limit — teams that automate their review tier (Generator-Verifier pattern, automated PR checks) can extend it further, but teams that skip the infrastructure to reach that ceiling faster will spend the savings on debugging invisible coordination failures.
The dominant pattern across practitioner sources and production deployments is unambiguous: one task = one branch = one worktree = one agent. Running multiple Claude Code sessions against the same working directory without filesystem isolation causes agents to overwrite each other's edits and corrupt each other's context — described by one practitioner as "all hell breaks loose."[14][11] No surveyed source advocates shared-main with file locks as a preferred approach.
Key finding: The second write wins in any shared-main scenario — both agents read identical file state, make independent decisions, and write back to the same paths, producing undetected data loss with no merge conflict raised.[2]
| Strategy | Filesystem Isolation | Branch Isolation | Disk Overhead | Setup Complexity | Recommended By |
|---|---|---|---|---|---|
| Worktree-per-CLI[1] | Yes (separate checkout) | Yes (independent branches) | ~5 GB per worktree on 2 GB codebase[2] | Low (1 command) | appxlab.io[2], shareuhack.com[3], mindstudio.ai[12], claudefa.st[1], Dev.to[14], recce.hq[8] |
| Branch-per-issue, shared checkout | No | Yes | None | None | Not recommended — shared working directory causes context corruption[2][14] |
| Shared-main + locks | No | No | None | High (lock management) | 0 sources — explicit anti-pattern[2] |
| Full clone per agent | Yes | Yes | Doubles codebase per agent | Low | (standard git alternative; not discussed by surveyed practitioners — see git-worktree docs) |
The branch-per-issue, shared checkout approach fails because all agents share the same working directory: a checkout by one agent changes what all other agents are reading and writing, causing silent context corruption regardless of branch-level isolation.[2][14]
See also: Autonomous Build Loop (single-agent worktree patterns)Git worktrees create separate checked-out copies that share a single .git object store. Each agent has an independent working directory, index, and branch reference, while disk usage remains lower than full clones because objects are shared.[1][2]
Claude Code added native worktree support in v2.1.50 via the --worktree flag. The flag auto-creates named worktrees at .claude/worktrees/<name>/, branching from the default remote as worktree-<name>.[1][15][3]
| Operation | Native Claude Code (v2.1.50+) | Manual (pre-v2.1.50 / non-CC) | Source |
|---|---|---|---|
| Create named worktree | claude --worktree feature-auth |
git worktree add ../project-feature-a -b feature-a |
[1][14] |
| Create auto-named worktree | claude --worktree |
— | [3] |
| Launch agent in worktree | Built-in | cd ../project-feature-a && claude |
[14] |
| Cleanup | Auto (no changes) / persists (modified) | git worktree remove ../project-feature-a |
[1][15] |
| Batch parallel launch | /batch command |
— | [15] |
| Subagent isolation | isolation: "worktree" frontmatter |
— | [15] |
Worktrees with no changes are automatically deleted upon session completion. Modified worktrees persist for human review. Teams add .claude/worktrees/ to .gitignore to prevent version control pollution.[1][15] See Section 8 for heartbeat-based stale detection and automatic work reclamation patterns.
Worktrees isolate the filesystem only. Teams running 4+ parallel agents frequently collide on other shared resources. mindstudio.ai documents five resource categories requiring explicit per-agent isolation beyond worktrees.[12]
Key finding: Filesystem isolation via worktrees is necessary but not sufficient — agents colliding on ports, databases, or Redis namespaces produce failures indistinguishable from code bugs.[12]
| Resource | Isolation Mechanism | Example |
|---|---|---|
| Ports[12] | Per-agent port assignment | Agent 1: 8000, Agent 2: 8001, Agent 3: 8002 |
| SQLite databases[12] | Per-agent DB file | test_agent1.db, test_agent2.db |
| PostgreSQL[12] | Per-agent schema or database branch | Neon database branching |
| Docker daemons[12] | Per-agent daemon or container namespace | — |
| Redis[12] | Per-agent key prefix | agent1:, agent2: |
| API keys / secrets[12] | Per-agent credentials or sandboxed accounts | E2B sandbox integration[11] |
The AgenticFlict dataset provides the most rigorous production measurement of AI-generated merge conflicts to date: 107,026 simulated AI-generated pull requests across 59,000+ repositories, revealing a 27.67% overall conflict rate.[4]
| Metric | Value | Source |
|---|---|---|
| Total simulated PRs analyzed | 107,026 | [4] |
| Repositories sampled | 59,000+ | [4] |
| Overall conflict rate | 27.67% | [4] |
| Avg. affected files per conflicting PR | 4.36 | [4] |
| Avg. conflict regions per conflicting PR | 11+ | [4] |
| GitHub Copilot conflict rate | 15.24% | [4] |
| OpenAI Codex conflict rate | 31.85% | [4] |
| Small PR conflict rate | ~9.9% | [4] |
| Large PR conflict rate | ~32–33% | [4] |
The ~2× variance in conflict rates between AI systems (Copilot 15.24% vs. Codex 31.85%) indicates that the choice of coding agent substantially affects coordination overhead — not just code quality.[4] The near-linear relationship between PR size and conflict rate provides a strong empirical argument for small, scoped tasks.
Key finding: Standard git merge conflict detection is necessary but not sufficient — roughly 1 in 4 AI-generated PRs on the same codebase will produce a textual conflict, and textual conflicts represent only the detectable subset of coordination failures.[4]
The parallel-cc project implements file claim coordination as a pre-write registry — agents declare file ownership before writing, preventing concurrent writes from being attempted at all.[11] This operates upstream of git's post-write conflict detection. Additionally, parallel-cc uses AST analysis — not just line-level diffs — for merge conflict detection, understanding code structure rather than text changes.[11]
The Semantic Consensus paper identifies a failure mode orthogonal to git merge conflicts: Semantic Intent Divergence (SID), where agents develop conflicting interpretations of shared objectives even when their file edits do not textually conflict.[5] Across 600 experimental runs (controlled simulations on AutoGen, CrewAI, and LangGraph — academic research, not live production deployments), multi-agent workflows failed at rates ranging from 41% (best-performing framework) to 86.7% (worst-performing) across the three frameworks tested.[5] No evidence of the Semantic Consensus Framework being deployed in production CLI-scale workflows appears in the research corpus.
| Approach | Workflow Completion Rate | Notes |
|---|---|---|
| Ungoverned execution | 0.2% | Baseline — no coordination |
| Next-best baseline | 25.1% | — |
| Semantic Consensus Framework (SCF) | 100% | 4,200-line Python middleware |
Source: Semantic Consensus paper, 600 controlled experimental runs. Per-framework baseline failure rates: AutoGen, CrewAI, and LangGraph showed 41–86.7% workflow failure rates (13–59% completion) without coordination, depending on framework — context that makes the SCF’s 100% completion result meaningful. (Academic research context; no production CLI-scale deployments confirmed.)[5]
Of all multi-agent failures analyzed, 79% stem from specification and coordination issues, not model capability limitations.[5] The SCF conflict detection achieves 65.2% recall with 27.9% precision, deliberately biased toward conservative blocking to prevent downstream cascade failures.[5]
| Component | Function |
|---|---|
| Process Context Layer[5] | Ingests BPMN workflow definitions to establish process boundaries |
| Semantic Intent Graph[5] | Captures agent intentions as a directed graph before execution |
| Conflict Detection Engine[5] | Identifies contradictory intents, resource contention, causal violations |
| Consensus Resolution Protocol[5] | Priority order: policy authority → capability authority → temporal priority |
| Drift Monitor[5] | Detects gradual semantic divergence in long-running workflows |
| Process-Aware Governance[5] | Framework-agnostic via adapter interfaces |
Key finding: 79% of multi-agent failures originate from specification and coordination gaps, not model capability — meaning coordination infrastructure, not better models, is the primary lever for improving parallel agent reliability.[5]
Multiple independent implementations have converged on the same conclusion: agent-native issue tracking belongs in local SQLite, not GitHub Issues. The argument is both structural (network latency, API rate limits) and behavioral (agents query work at high frequency; GitHub Issues is not optimized for programmatic access patterns).[6][11]
| Attribute | SQLite Local (Beads) | GitHub Issues | Hybrid (SQLite + JSONL) |
|---|---|---|---|
| Query latency | Microseconds | 100–500 ms (network) | Microseconds (local path) |
| Rate limits | None | 5,000 req/hr (Auth), 60 req/hr (Unauth) (standard published limits; see GitHub REST API documentation) | None (local operations) |
| Offline operation | Yes | No | Yes |
| Atomic claim support | Yes (UNIQUE constraint) | No native atomic operations | Yes (SQL layer) |
| Multi-agent concurrent writes | WAL mode + UNIQUE constraints | Race conditions on concurrent updates | WAL mode; Dolt server for high concurrency |
| Distributed collaboration | No native sync | Yes (central server) | Yes (git-tracked JSONL)[6] |
| Production evidence | Beads[6], parallel-cc[11] | Not recommended for agents[6] | Beads hybrid model[6] |
Beads implements a two-store architecture that separates performance from collaboration:[6]
.beads/beads.db): Primary local database for fast agent queries, atomic claims, state transitions.beads/issues.jsonl): Git-tracked for team collaboration and distributed consistency; enables offline-first workflow with periodic syncWhen multiple agents write concurrently (beyond standard WAL mode), Beads recommends Dolt server mode:[7]
bd dolt set mode server enables concurrent writes from multiple processesbd ready --assignee agent-name) reduces coordination overhead by letting agents query only their own work[7]Key finding: Agent-native issue tracking should be local-first SQLite, not GitHub Issues — the rate limit and network latency constraints of GitHub's API make it unsuitable as the primary coordination store for high-frequency multi-agent workflows.[6]
Two independent implementations — Beads and parallel-cc — converged on the same minimal coordination primitive: the atomic claim. The pattern uses SQLite's UNIQUE constraint as the synchronization mechanism, providing TOCTOU-safe work assignment without polling or locking.[7][11]
| Step | Operation | Mechanism |
|---|---|---|
| 1 | Agent queries available work | bd ready --json or equivalent |
| 2 | Agent attempts atomic claim | INSERT INTO claims (issue_id, agent_id) VALUES (?, ?) |
| 3a (success) | SQLite UNIQUE constraint passes | Agent owns the issue; proceeds with work |
| 3b (conflict) | SQLite UNIQUE constraint rejects duplicate | Agent receives CONFLICT; picks a different issue |
| 4 | Agent queries who holds conflicted issue | SELECT on claims table |
The production implementation uses a two-layer design separating atomicity from read performance:[7]
UNIQUE constraint provides atomic INSERT; prevents TOCTOU races at the database layerThis design means agents reading their own state can use filesystem reads (fast), while coordination safety is enforced by the SQL layer (atomic).[7]
Beads provides explicit API support for the full claim workflow:[6][7]
bd ready --json: Structured work discoverybd update <id> --claim --assignee agent-name: Atomic claimingbd ready --assignee agent-name: Filtered queue for claiming agent onlyThe Kleisli.IO analysis frames the core insight: "Multi-session AI agent workflows are concurrent processes with private state, independent failure modes, and shared mutable resources."[9] This reframe — treating multi-agent AI as a distributed systems problem — motivates lock-free approaches over pessimistic locking, which is inappropriate for agents that may crash, get rate-limited, or run for unpredictable durations.
Key finding: A dead agent holding a pessimistic lock blocks all work indefinitely. A dead agent in an event-sourced system simply stops producing events — its claimed work is reclaimed via heartbeat timeout without human intervention.[9][11]
| Mechanism | Implementation | Trade-offs | Source |
|---|---|---|---|
| Atomic SQL claim | SQLite UNIQUE INSERT | Fast, no polling; requires shared DB file access | [7][11] |
| Event sourcing | Append-only JSONL logs | Full audit trail; logs grow without compaction | [9] |
| CRDTs | Conflict-free replicated data types | Convergent merges; overhead for solo devs; no arbitrary analytics | [9] |
| Stigmergic coordination | Sessions leave traces for subsequent sessions | Indirect; no direct communication needed | [9] |
| Heartbeat + stale timeout | Timestamp update + configurable reclaim window | Automatic recovery; requires tuning timeout window | [11] |
| File claim registry | Pre-write ownership declaration | Prevents conflicts before they occur; requires cooperative agents | [11] |
The Kleisli framework identifies three fundamental coordination query patterns that any multi-agent system must address:[9]
The kli system validated stigmergic coordination in production: Session 1 discovered an "alist/plist mismatch that silently broke WebSocket communication." Subsequent sessions inherited the diagnostic reasoning, not just the fixed code. This demonstrates that lock-free systems can transfer analytical context across sessions without direct inter-session communication.[9]
The recce.hq 4-gate architecture documents a shipped configuration that increased weekly commits from 20–50 to 100–200+ while maintaining code quality standards.[8] The architecture is additive — each gate layer handles a distinct class of failure.
| Gate | Type | Mechanism | What It Blocks |
|---|---|---|---|
| Gate 1: Instruction Files[8] | Soft (LLM-read) | AGENTS.md (universal) + CLAUDE.md (tool-specific) |
Misaligned agent behavior; defines prohibited zones and state file rules |
| Gate 2: Pre-commit[8] | Hard (binary) | Biome linting; missing type annotations, unused variables, any types |
Style drift; agents cannot rationalize exceptions |
| Gate 3: Pre-push[8] | Hard (binary) | Type checking (mypy/tsc), full test suite, security scans | Type errors, test failures, security issues leaving local development |
| Gate 4: Human Review[8] | Soft (judgment) | Multi-agent cross-review: Claude (dev), Copilot (mechanics), humans (edge cases) | Logical errors, edge cases, business logic misalignment |
Key finding: "Hooks are binary — agents cannot rationalize exceptions." This is the core value proposition of pre-push hooks over instruction-file approaches alone. An AI agent can decide to ignore a CLAUDE.md rule; it cannot decide to make a failing test suite pass.[8]
The Anthropic coordination patterns document describes the Generator-Verifier pattern applied at the PR level: one agent produces output, another validates against explicit criteria.[10] Applied to merge gates, this means a dedicated verification agent runs against every PR before merge is allowed. Weakness: vague verification criteria break the pattern — explicit, measurable criteria are required.[10]
| Metric | Before Gates | After 4-Gate Architecture | Source |
|---|---|---|---|
| Weekly commits | 20–50 | 100–200+ | [8] |
| Change set size | Large, hard to review | Smaller, more reviewable (60–90% context savings reported) | [8] |
Three independent sources converge on a practical ceiling of 5–8 concurrent agents on the same codebase. The limiting factor is not compute, disk, or coordination overhead — it is human review capacity.[2][3][12]
| Source | Cited Ceiling | Primary Bottleneck |
|---|---|---|
| appxlab.io[2] | 5–7 agents | Review overhead + resource contention |
| shareuhack.com[3] | 5–7 agents | "Productive ceiling" (practical, not technical) |
| mindstudio.ai[12] | 5–8 agents | Review capacity, not technical limits |
| Team | Simultaneous Agents | Domain Partitioning | Outcome | Source |
|---|---|---|---|---|
| incident.io | 4–5 agents | UI, build tools, tests, backend — "each in its own file domain with zero overlap" | Successful parallel development | [13] |
| Boris Cherny (Claude Code creator) | 10–15 sessions | 5 terminal sessions × 5 git checkouts + 5–10 browser/mobile sessions | Sustained operation; "less intervention = faster results" | [3][13] |
Note on the Cherny anomaly: Cherny's 10–15 simultaneous sessions appear to contradict the 5–8 ceiling. The resolution: Cherny's sessions include browser/mobile sessions on independent tasks, whereas the 5–8 ceiling refers specifically to simultaneous code-writing agents on the same codebase requiring human review and merge.[3][13]
Each worktree consumes approximately 5 GB on a 2 GB codebase.[2] Six concurrent agents require 30+ GB of disk. This is a non-trivial infrastructure constraint for on-device development.
One practitioner (Dev.to) reports that parallel development "sucks up tokens" to the point of exceeding Claude Pro subscription limits.[14] The same practitioner describes the human coordination overhead as "endlessly ping-ponging between rooms" and states that parallel development has not become their default workflow despite theoretical benefits.[14]
See also: Cost Optimization (token economics of parallel vs sequential sessions)Anthropic documents five production multi-agent patterns with distinct coordination architectures.[10] The Agent Teams pattern maps directly to multiple concurrent Claude Code CLIs on the same repository.
| Pattern | Best For | Maps to Multi-CLI | Known Weakness |
|---|---|---|---|
| Generator-Verifier[10] | Quality-critical output with clear evaluation standards | PR merge gates | Vague verification criteria break the pattern |
| Orchestrator-Subagent[10] | Clear task decomposition with bounded subtasks | One orchestrator CLI dispatches workers | Orchestrator becomes information bottleneck |
| Agent Teams[10] | Parallel workloads, independent features | Direct match — multiple CLIs per worktree | Requires careful partitioning to avoid conflicts |
| Message Bus[10] | Growing agent ecosystems | Publish/subscribe event coordination | Debugging complex event cascades is difficult |
| Shared State[10] | Collaborative research; no single point of failure | Shared SQLite tracker with atomic claims | Reactive loops and duplicate work require explicit termination |
Anthropic recommendation: Start with Orchestrator-Subagent — handles the widest range of tasks with minimal overhead. Evolve to other patterns as limitations emerge.[10]
| Model | Structure | Best Fit | Merge Strategy |
|---|---|---|---|
| Fully Independent Agents[12] | Separate features, no dependencies | Frontend UI + backend API worked in parallel | Merge in any order |
| Operator + Workers[12] | One orchestrator, specialized sub-agents | Divide-and-conquer problems | Operator coordinates merge order |
| Split-and-Merge[12] | Single complex task distributed across agents | Large refactors | Explicit merge strategy required before split |
appxlab.io documents a three-principle decomposition checklist that measurably reduces coordination conflicts:[2]
appxlab.io also reports that "incorporating architectural documentation into agent context produces measurable gains in functional correctness" — agents given explicit scope boundaries produce fewer coordination failures than agents given only task descriptions.[2]
appxlab.io recommends: open draft PRs immediately to surface scope overlap early; merge in dependency order; use lead-agent orchestration for speed (requires clean decomposition) or human review gates for production safety.[2]
Four independent sources converge on instruction files as the first layer of coordination infrastructure — establishing explicit rules for file ownership, prohibited zones, build commands, and code conventions before code is written.[8][2][3][12]
| Layer | Path | Scope | Size Guideline | Shared Across Worktrees? |
|---|---|---|---|---|
| Global[3] | ~/.claude/CLAUDE.md |
Personal preferences, developer identity | Under 50 lines | Yes (all projects) |
| Project[3] | ./CLAUDE.md |
Team standards, file ownership boundaries | Under 200 lines | Yes (committed to git) |
| Task[3] | .claude/rules/ |
Granular rules for specific file paths | — | Yes (git-tracked) |
A critical property: all worktrees share the auto-memory directory, enabling cross-session learning without explicit synchronization.[3][13]
recce.hq documents the AGENTS.md coordination contract as Gate 1 in their shipped architecture, with content organized around explicit boundaries:[8]
CLAUDE.md that extend AGENTS.mdKey finding: Instruction files are the lowest-cost coordination layer but also the weakest — agents can rationalize exceptions to text rules. They are necessary as human-readable documentation but must be backed by binary enforcement (pre-commit hooks, pre-push hooks) for reliability.[8]
Three dedicated tooling systems address multi-CLI coordination with distinct architectural philosophies: parallel-cc (filesystem + SQLite), Beads (issue tracker + atomic claims), and kli (event sourcing + CRDTs).
| Capability | parallel-cc[11] | Beads[6][7] | kli[9] |
|---|---|---|---|
| Auto worktree creation | Yes (zero config) | No (separate concern) | No |
| Session tracking | SQLite + heartbeat | SQLite (per-agent assignee) | Append-only JSONL |
| Dead-agent recovery | Heartbeat timeout + auto-reclaim | Configurable stale timeout | Session stops producing events; manual reclaim |
| Atomic work assignment | Yes (file claim + SQL) | Yes (UNIQUE constraint) | Indirect (event sourcing) |
| Conflict detection method | AST analysis + file claims | TOCTOU-safe SQL | CRDT merge |
| Git integration | Worktrees + Git Live mode (auto PR) | JSONL git sync | Plain files under version control |
| Cloud/sandbox support | E2B sandbox (1-hour sessions) | No | No |
| External dependencies | Node.js 20+, gtr, jq | Python, SQLite; Dolt (optional) | None (plain files) |
| Distribution model | Open source (GitHub) | Open source (GitHub) | Open source (described in blog) |
Installation: ./scripts/install.sh --all, creates ~/.parallel-cc/ database directory. Requirements: Node.js 20+, gtr, jq, git.[11] Key features beyond coordination:
Beads addresses a coordination problem not explicitly covered by worktree isolation: agent "dementia" — context window exhaustion causing agents to forget their task state every ~10 minutes.[6] Fine-grained Beads issues keep agents early in their context windows, where decision quality is highest. This is a coordination problem because an agent that forgets its scope will re-explore and potentially overlap with other agents' work.[6]
The kli system's defining characteristic is zero external dependencies — "plain files under version control."[9] This makes it reproducible and auditable but limits query expressiveness compared to SQLite-backed systems. CRDTs ensure that any two sessions can merge their state without coordination, at the cost of additional per-operation overhead that becomes significant for solo developers.[9]