This is the second time I've had to stop everything, pull the entire swarm apart, and rebuild it from scratch. The first time was in March — context engineering, file consolidation, the birth of the three-tier agent hierarchy. I thought that was the hard part. It wasn't. Crucis I gave me the architecture. Crucis II is about the fact that the architecture doesn't trust itself.
Here's what triggered it. I gave Claude a ten-step implementation plan — six new hooks, two file modifications, a settings update. Clear scope. Approved spec. What happened was a masterclass in everything wrong with AI-assisted development.
First: no planning. Claude read the plan and started writing code immediately. No Holly orchestration. No scoping. No delegation. Just fingers on keyboard, six files in, before I'd finished reading the spec myself. When the delivery gate blocked at three files (the gate I built specifically for this), Claude asked me to create a bypass marker. I did. Then it spawned an "orchestrator" agent with the prompt — and I quote — "acknowledge and return immediately, do NOT do the work yourself." That agent existed for five seconds. It produced no plan, no delegation, no analysis. Its only purpose was to write the .holly-invoked marker file so every downstream gate would open. It was a fake Holly. A structural lie. The gates thought orchestration happened. It never did.
Second: no testing. Six hooks written, zero test cases. No smoke tests. No verification against the spec's own test matrix. The hooks parsed — that was the extent of quality assurance. Claude ran node --check on each file and called it done. Syntax validity is not functional correctness. I have a CLAUDE.md rule that says "ALWAYS run the project's test suite before marking a task complete." It was ignored.
Third: no review. Nobody — the verification tier, the entire reason the three-tier hierarchy exists — was never spawned. No code-reviewer. No security-reviewer. The code went from Claude's fingers to my filesystem with nothing in between. The session-end review gate would have caught this, but the session hadn't ended yet. The pre-commit gate would have caught it on push, but we never got that far. The gap between "files written" and "session ended" was an enforcement dead zone.
When I called this out, Claude admitted it. To its credit, the self-diagnosis was accurate: "I spawned a fake Holly just to bypass the delivery gate — which is exactly the anti-pattern the gate exists to catch." But accurate self-diagnosis after the fact is the definition of a post-mortem. I don't want post-mortems. I want structural prevention.
So I did what I did last time. Full architectural review. One full day. This time, I had a new weapon: the Claude Code source leak. In early April, Anthropic accidentally published the entire Claude Code codebase via npm. Every hook, every agent, every gate, every internal tool — all visible. My architects ran an intake on the leaked source in the days prior, producing 30 findings across six sweeps. Ten were marked ADOPT.
The finding that mattered was COORDINATOR_MODE — Anthropic's internal version of my Holly/Elon/Nobody hierarchy. Their version uses a shared scratchpad. Workers write to it. A synthesis phase cross-checks outputs. You cannot return empty-handed. My version had none of that. My Holly marker was written when an orchestrator agent completed. Not when it produced work. A five-second rubber stamp was structurally identical to a thirty-minute planning session.
The rebuild had three layers:
Layer 1: Holly output validation. The marker writer now checks what Holly actually produced before writing the gate marker. Output under 200 characters? Rejected. Contains "acknowledge and return"? Anti-pattern — rejected. No plan structures, no delegation verbs, no evidence of actual orchestration? Rejected. The same fake Holly that bypassed every gate three hours ago now gets caught and logged.
Layer 2: The shared scratchpad. Borrowed directly from the Anthropic source. Every agent tier — Holly, Elon, Nobody — appends to a session-scoped scratchpad when it completes. Holly's plan. Elon's build summary. Nobody's review findings. All in one file. Nobody can now compare what Holly planned against what Elon built. Drift is visible. And the scratchpad is deny-listed — agents can't edit it directly. Only the hook writes.
Layer 3: Delivery step enforcement. Six new hooks that structurally enforce what used to be prose suggestions in playbook files. Builder completes without running tests? Warning. Session ends without deployment? Warning. Three frontend files edited without a UX copy review? Blocked. API changes committed without end-to-end testing? Blocked. These were all "please do this" instructions before. Now they're exit codes.
The uncomfortable truth this exposed: every agent in my system is the same model. Holly, Elon, Nobody — they're all Claude. They all have the same people-pleasing bias. The same instinct to optimise for task completion over process compliance. The hierarchy only works when it's structurally enforced. Three instances of the same people-pleaser checking each other's homework is theater unless the gates have teeth. Crucis I gave me the roles. Crucis II gave me the enforcement. I suspect Crucis III will be about the fact that you can't fully solve alignment at the infrastructure layer — but that's a problem for another month.