MASO Control Domain: Execution Control¶

Part of the MASO Framework · Control Specifications Covers: ASI02 (Tool Misuse) · ASI05 (Unexpected Code Execution) · ASI08 (Cascading Failures) · LLM05 (Improper Output Handling) · ASI07 (Insecure Inter-Agent Comms, structural validation) Also covers: CR-01 (Deadlock/Livelock) · CR-02 (Oscillation) · SM-01 (Cumulative Harm) · GV-02 (Metric Gaming) · OP-02 (Latency) · OP-03 (Partial Failure) · OP-04 (Token Exhaustion) · OP-04a (Agent Unavailability) · OP-05 (Irreversible Action Chains)

Principle¶

Every agent action is bounded: bounded by permission, bounded by impact, bounded by time. No single agent can cause unlimited damage. When an agent fails, the failure is contained to that agent. When errors cascade, automated circuit breakers engage before human response is required.

Execution control is where the PACE resilience methodology meets real-time operations. The controls in this domain define the triggers that move the system from Primary to Alternate and beyond.

Why This Matters in Multi-Agent Systems¶

Tool misuse compounds across agents. In a single-model system, a tool misuse event is contained to one context. In a multi-agent system, Agent A's misuse of Tool X produces output that becomes Agent B's input for Tool Y. The damage from chained tool misuse can far exceed what any single agent could accomplish alone.

Code execution pathways multiply. When agents generate and execute code, each agent is a potential entry point for code injection. If Agent A generates code that Agent B executes in its sandbox, the security boundary depends on both the generation controls (Agent A) and the execution controls (Agent B). A weakness in either is exploitable.

Cascading failures are the default, not the exception. Multi-agent systems are tightly coupled by design - agents depend on each other's outputs. A hallucination in one agent becomes a flawed plan in the next, becomes a destructive action in the third. Without explicit isolation, errors propagate at the speed of the orchestration.

Runaway loops consume resources exponentially. Two agents triggering each other in a cycle can generate exponential resource consumption. The loop may look like productive work to a naive monitor - each agent is calling tools, producing outputs, and delegating tasks - but the system is burning tokens and compute on a recursive dead end.

Single agent loss cascades through the orchestration (OP-04). When one agent in a multi-agent system becomes unavailable - model provider outage, sandbox crash, credential revocation - every agent that depends on its output is affected. Without explicit failover, a single agent failure degrades or halts the entire orchestration. The system's availability is determined by its least available component, not its most robust.

Irreversible actions compound across agent chains (OP-05). Agent A sends an email. Agent B deletes a record. Agent C makes an API call to a third party. Each action was individually approved, but the chain is collectively irreversible. When Agent D detects that Agent A's email was based on hallucinated data, the downstream actions cannot be undone. Reversibility must be assessed for the chain, not just per-action, and compensating controls must exist for actions that cannot be recalled.

Token exhaustion degrades agents and their monitors simultaneously (OP-04). As an agent's context window fills, attention dilution weakens system prompt instructions (including safety constraints, role boundaries, and tool restrictions) without any adversarial action. The lost-in-the-middle effect causes critical constraints to be functionally forgotten. Hallucination rates increase. Instruction-following degrades. This is a dual failure path: the agent gets worse, and the Model-as-Judge evaluating it gets worse at the same time if it's also accumulating context. A degraded Judge reviewing a degraded agent's output is a compounding failure that bypasses two control layers simultaneously. In multi-agent systems, each agent burns tokens independently (you can't see the degradation from the orchestrator's perspective), and retry loops accelerate exhaustion by consuming more context on each failed attempt. Prompt injection becomes easier as system prompt influence weakens, and blast radius caps may be ignored if the agent stops reliably following structured constraints. Token exhaustion is gradual, not binary: the agent doesn't crash, it gets subtly worse. This means PACE transitions may not trigger without explicit context utilisation monitoring.

Data integrity failures are silent and cumulative. Security guardrails catch injections. Content filters catch harmful output. But when Agent A returns a JSON object with a missing field and Agent B silently treats that field as null, the resulting action is wrong - not malicious, just wrong. In production multi-agent systems, the majority of runtime failures come not from security violations but from structural data integrity failures between components: malformed tool outputs, unexpected types, truncated responses, hallucinated field names, and partial results presented as complete. These failures are harder to detect than security violations because they don't trigger guardrails - the output is syntactically valid but semantically broken.

Controls by Tier¶

Tier 1 - Supervised¶

Control	Requirement	Implementation Notes
EC-1.1 Human approval gate	Every write operation, external API call, and state-modifying action requires human approval	System presents proposed action (tool, parameters, target) and waits for confirmation.
EC-1.2 Tool allow-lists	Each agent has a defined list of permitted tools; unlisted tools are blocked	Enforced at the guardrails layer.
EC-1.3 Per-agent rate limits	Maximum actions per time window per agent	Prevents runaway loops before human review catches them. Recommended: 100 calls/hr.
EC-1.4 Read auto-approval	Read operations within scoped permissions proceed without human approval	Establishes the efficiency baseline that Tier 2 will extend.
EC-1.5 Interaction timeout	All agent negotiation sequences have a maximum turn count	Recommended: 10 turns. Exceeding cap triggers deterministic resolution (orchestrator decides or task escalates to human). Prevents deadlock and livelock (CR-01).
EC-1.5a Agent spawn rate limit	Maximum number of new agent instances the orchestrator can create per time window	Prevents runaway spawning that exhausts compute, memory, or API quota. Recommended: define per-orchestration and per-time-window limits. Exceeding the spawn rate limit blocks new agent creation and triggers an alert.
EC-1.6 Reversibility assessment	Every action is classified as reversible, time-bounded reversible, or irreversible before execution	Irreversible actions require human approval (reinforces EC-1.1). Time-bounded reversible actions carry a reversal window (e.g., "email can be recalled within 60 seconds"). Classification is logged with each action (OP-05).
EC-1.7 Agent health check	Each agent's availability is verified before task assignment	Orchestrator confirms agent is responsive before delegating. If unavailable, task is queued or routed to an alternative. Prevents silent failure from assigning work to a dead agent (OP-04).
EC-1.8 Output format verification	Agent outputs checked against expected format before delivery to the next component or human reviewer	Basic validation: is the response valid JSON if JSON was expected? Are required top-level fields present? Is the response non-empty? Malformed outputs are rejected and the task retried or escalated - not silently passed downstream.
EC-1.9 Token budget monitoring	Per-agent context utilisation tracked with tiered alerting thresholds	Track context window consumption as an operational metric alongside cost and latency. Alert at 70% (info), 85% (warning), 95% (critical). At Tier 1, alerts are informational; operator decides response. Prevents silent degradation where guardrails weaken as context fills (OP-04).
EC-1.10 Retry budget caps	Maximum retry count per task per agent	Recommended: 3 retries. Each retry consumes additional context and degrades agent performance. After cap, agent reports explicit failure rather than continuing to spiral. Prevents the degradation loop where retries accelerate token exhaustion (OP-04).

Tier 2 - Managed¶

All Tier 1 controls remain active, plus:

Control	Requirement	Implementation Notes
EC-2.1 Action classification	Every action classified as auto-approve, escalate, or block	Classification considers: action type, target system, data classification, agent history, Model-as-Judge evaluation.
EC-2.2 Sandboxed execution	Agents generating/executing code operate in isolated environments	Per-agent sandbox with defined filesystem, network, and process boundaries. Sandbox destroyed and recreated after each execution.
EC-2.3 Blast radius caps	Maximum impact scope per agent	Max records modifiable, max financial value, max external API calls. Exceeding any cap triggers PACE escalation.
EC-2.4 Circuit breakers	Agent paused on error threshold	E.g., 3 guardrail blocks in 10 minutes triggers circuit breaker. Agent is paused, event is logged, monitoring evaluates next step.
EC-2.5 Model-as-Judge gate	Evaluation model reviews all agent outputs before external commit	Evaluates: factual accuracy, policy compliance, goal integrity (cross-ref PG-2.2), output safety, data leakage. Judge criteria include faithfulness and analytical depth, not just format compliance (GV-02).
EC-2.6 Decision commit protocol	Decisions passing judge review are committed; reversal requires human authorisation or documented input change	Prevents oscillation (CR-02). Tie-break rules defined for equal-weight alternatives.
EC-2.7 Aggregate harm assessment	Judge evaluation includes full task plan context, not just individual agent output	For multi-step plans, judge evaluates the whole plan before execution begins. Catches cumulative harm from individually benign subtasks (SM-01).
EC-2.7a Dry-run / simulation mode	High-risk or first-time actions execute against a sandbox or staging environment before production commit	The gateway routes the action to a simulation target, captures the result, and presents it for human review or Judge evaluation. Only after validation does the action execute against production. Mandatory for irreversible actions in new deployments where no behavioral baseline exists.
EC-2.8 Tool completion attestation	Required tool calls defined per task; tool failure or skip produces explicit incomplete status	Judge verifies all required tools completed before approving output. Prevents partial failure masquerading as success (OP-03).
EC-2.9 Latency SLOs and oversight SLA enforcement	Per-orchestration end-to-end latency targets defined and monitored; maximum time before human review required is enforced per risk tier	Documents which control layers operate synchronously (blocking) vs asynchronously (post-commit audit). Judge may run async for auto-approved actions to reduce latency. For escalated actions, a configurable oversight SLA defines the maximum wait time for human review; actions not reviewed within the SLA window fail safe (denied, not auto-approved).
EC-2.10 Agent failover	Critical agents have a defined failover path: backup agent, graceful degradation, or controlled halt	Failover activates automatically on health check failure (EC-1.7). Backup agents operate with the same NHI scope and tool allow-list as the primary. Orchestration continues in degraded mode if non-critical agents are unavailable; halts if critical agents are unavailable with no backup (OP-04).
EC-2.11 Chain reversibility assessment	For multi-step plans, the Judge evaluates aggregate reversibility before execution begins	If the plan contains irreversible actions, the Judge flags the irreversibility point and requires explicit human acknowledgement. Compensating actions must be defined for each irreversible step (e.g., correction email, reversal transaction, notification to affected party) (OP-05).
EC-2.12 Multimodal boundary validation	When multimodal data (images, audio, video, documents) crosses an agent boundary, modality-specific guardrails are applied at the receiving agent	Text-in-image injection, steganographic payloads, inaudible audio commands, and embedded document instructions are checked at each agent boundary, not just at system input. Cross-ref Multimodal Controls.
EC-2.13 Output schema enforcement	Every agent declares the schema of its output; outputs are validated against this schema before delivery	JSON Schema, Pydantic models, or equivalent. Validation checks: required fields present, types correct, enums within allowed values, string lengths within bounds, nested structures conform. Schema failures produce a structured error - not a silent pass with missing fields. Schemas are versioned and published in the agent manifest alongside tool declarations.
EC-2.14 Inter-agent data contracts	Every agent-to-agent data transfer is validated against a declared contract at the receiving agent's boundary	The contract specifies the expected input schema, required fields, acceptable value ranges, and maximum payload size. The receiving agent validates inbound data before processing - not the sending agent alone. This is the structural equivalent of zero-trust: trust nothing, parse strictly, validate everything. Contracts are enforced at the message bus or gateway layer, not by the agents themselves.
EC-2.15 Serialisation boundary validation	Structured outputs (JSON, XML, function call arguments) are parsed with strict-mode deserialisation; no lenient parsing	Strict mode: reject unknown fields, reject type mismatches (string where number expected), reject `null` for required fields, reject malformed escape sequences. No silent coercion. Deserialised objects are validated against the output schema (EC-2.13) immediately after parsing. Path traversal patterns in string fields (e.g., `../../etc/passwd`) are caught by parameter constraints, not by the parser - but the parser must not silently accept them.
EC-2.16 Context rotation with structured state preservation	Agents approaching context limits checkpoint structured state and resume with a clean context window	Checkpoint captures: current goal, active constraints, accumulated decisions, pending actions, as typed structured fields (JSON schema), not free-text summaries. Free-text summarisation introduces semantic drift (EP-05): "must" becomes "should," qualifiers vanish, requirements soften. After checkpoint, context is flushed and the agent resumes with the structured state plus the original system prompt. The agent retains its work but gets a fresh attention budget. Rotation threshold: configurable, recommended at 75% context capacity (OP-04).
EC-2.17 Judge context isolation	The Model-as-Judge manages its own context budget independently from the agents it evaluates	The Judge must not share a context window with task agents. Judge context consumption is monitored separately (EC-1.9). If the Judge approaches its own context limit, this is a PACE trigger: a degraded Judge reviewing degraded agents is a correlated failure that bypasses two layers simultaneously. Judge rotation follows the same structured checkpoint protocol as task agents (EC-2.16) but is triggered independently (OP-04).

Tier 3 - Autonomous¶

All Tier 2 controls remain active, plus:

Control	Requirement	Implementation Notes
EC-3.1 Infrastructure-enforced blast radius	Caps enforced at platform level, not by agent or orchestrator	Agent cannot override its own blast radius through any mechanism. Analogous to OS-level resource limits.
EC-3.2 Self-healing circuit breakers	Circuit breaker triggers PACE P→A automatically; backup agent activates; system returns to P on stabilisation	Self-healing cycle limit: max 3 in 24 hours before auto-escalation to Contingency.
EC-3.3 Multi-model cross-validation	High-consequence actions validated by Model-as-Judge AND an independent second model	Disagreement between validators triggers human escalation.
EC-3.4 Time-boxing	Every autonomous task has a maximum execution time	Task not completed within time box → agent paused, state captured, task escalated. Prevents indefinite autonomous operation on drifted tasks.
EC-3.5 Automated rollback scope	When integrity compromise is detected, automated rollback covers the compromised agent and all downstream actions that depended on its output	Rollback scope is determined by the decision chain (OB-2.1). Downstream agents are notified. Actions that cannot be rolled back trigger compensating actions automatically. Human is notified of the rollback scope and any irreversible residual.
EC-3.6 Transformation integrity chain	Data validated after each processing step in a multi-agent pipeline; cumulative integrity tracked end-to-end	Each agent in a pipeline attests to the structural validity of its output. The attestation chain travels with the data (analogous to DP-3.4 data provenance but for structural integrity). If any agent in the chain produces output that fails schema validation, downstream processing halts immediately - not after the malformed data has been transformed two more times. Integrity violations are correlated with the specific pipeline step that introduced the corruption.

Action Classification Rules (Tier 2+)¶

The action classification engine is the core mechanism that replaces per-action human approval with risk-proportionate automation. Rules should be defined collaboratively between the AI security team and the business function that owns the agent system.

Auto-approve (no human involvement):

Read operations within the agent's scoped permissions
Write operations to internal staging areas (reversible, low-consequence)
Tool invocations within pre-approved parameter ranges
Actions that the Model-as-Judge approves and the blast radius cap is not at risk

Escalate (human approval required):

Write operations to production systems
Actions involving external parties (email sends, API calls to third-party services)
Irreversible operations (data deletion, financial transactions)
Actions flagged by the Model-as-Judge for any reason
Actions where the blast radius cap would exceed 50% of the defined maximum
First-time use of a tool by an agent (no baseline data)

Block (automatic denial):

Actions outside the agent's tool allow-list
Actions that violate the guardrails layer
Actions targeting systems not in the agent's scope
Actions during a PACE Alternate or Contingency phase that exceed the phase-specific restrictions

Deployment Topology: Evaluation Roles vs. Services¶

The MASO architecture describes evaluation roles as logically distinct components: tactical judge, strategic evaluator, meta-evaluator, observer, domain judges. This is correct for the security model: each role has a different threat profile, different OISpec, and different failure modes. They must be logically independent.

They do not need to be operationally independent. Logical separation and operational consolidation are compatible, as long as context isolation is maintained between evaluations.

How Roles Map to Services¶

Evaluation Role	Logical Requirement	Operational Reality
Tactical judge	Must evaluate every agent action against its OISpec. Must use a different model from the task agent.	A single distilled SLM sidecar process, co-located with the task agents. One process serves all tactical evaluations with per-agent OISpec routing.
Domain judges (fraud, security, compliance)	Must evaluate from distinct policy perspectives. Verdicts must be independent.	Can be a single SLM with multi-criteria evaluation prompts, producing structured verdicts per domain. Or: consolidated into the tactical SLM call with domain-specific criteria sections. Separate SLMs only if domain-specific fine-tuning requires different models.
Strategic evaluator	Must assess combined outputs against workflow intent. Must have visibility across all agent outputs.	A single LLM API call at phase boundaries. Not a persistent service. Invoked as a function, not deployed as an agent.
Meta-evaluator	Must monitor judge drift independently from the judge and the task agents.	A scheduled job (cron, Airflow, Step Functions) that injects known test cases into the tactical judge and measures accuracy. Runs daily or weekly. Not a persistent service.
Observer	Must score anomalies continuously. Must be independent of task agent infrastructure at Tier 3.	A metrics pipeline (Prometheus, CloudWatch, Datadog) consuming the decision chain log and computing anomaly scores. At Tier 3, runs on separate infrastructure. Not an LLM agent.

Example: Fraud Detection Deployment¶

Architecture diagrams suggest: 3 task agents + 1 tactical judge + 1 fraud judge + 1 compliance judge + 1 strategic evaluator + 1 meta-evaluator + 1 observer = 9 components

Actual deployment:

Service	What It Runs	Infrastructure
Task agent pod (×3)	Fraud analysis agents, each with SLM sidecar	3 pods, each with task agent + SLM process
SLM sidecar (in each pod)	Tactical evaluation + fraud domain + compliance domain (multi-criteria)	Co-located with task agent. ~1GB RAM for INT4 model.
Strategic evaluation function	Single LLM call at batch/phase boundaries	Serverless function (Lambda, Cloud Functions). Invoked, not persistent.
Calibration pipeline	Daily injection of test cases into SLM. Accuracy measured against known labels.	Scheduled job. Runs once per day.
Monitoring	Anomaly scoring from decision chain log. PACE trigger logic.	Existing observability stack.

Actual services to operate: 3 pods + 1 serverless function + 1 scheduled job + existing monitoring. That is the operational footprint of the "9 component" architecture.

Context Isolation Requirements¶

Consolidating evaluation roles into fewer services is acceptable only if context isolation is maintained:

Requirement	Why It Matters
Each evaluation call gets a fresh context	A tactical evaluation of Agent A must not carry residual context from evaluating Agent B.
Domain evaluations are independent	The fraud evaluation prompt must not include the compliance evaluation result, and vice versa. Verdicts must be produced independently before conflict resolution applies.
SLM sidecar does not share memory with the task agent	The SLM runs in a separate process or container. It reads the task agent's output; it does not share the task agent's context window.
Meta-evaluation does not run on the same model instance as tactical evaluation	The calibration pipeline calls the SLM from outside, as a client. It does not have access to the SLM's internal state.

Testing Criteria¶

Tier 1 Tests¶

Test ID	Test	Pass Criteria
EC-T1.1	Approval gate	Submit a write operation. Confirm it blocks until human approval. Reject the approval and confirm the action is prevented.
EC-T1.2	Tool scope	Attempt to invoke a tool not on the agent's allow-list. Guardrail blocks it.
EC-T1.3	Rate limit	Submit actions exceeding the configured rate. Throttling engages.
EC-T1.4	Read auto-approval	Submit a read operation within scope. Confirm it executes without human approval.
EC-T1.5	Interaction timeout	Trigger a negotiation loop. Confirm the turn cap is enforced and resolution engages.
EC-T1.6	Role-based tool enforcement	For each agent role (analyst, executor, critic), attempt to invoke tools assigned to a different role. All attempts blocked. (Amendment: CR-03)
EC-T1.7	Operator challenge rate	Present operators with outputs containing deliberate errors. Measure challenge rate. Target: > 80% detection. (Amendment: HF-01)
EC-T1.8	Reversibility classification	Submit a reversible action, a time-bounded reversible action, and an irreversible action. Verify each is classified correctly and the irreversible action requires human approval.
EC-T1.9	Agent health check	Take an agent offline. Assign it a task. Verify the orchestrator detects unavailability and routes the task to an alternative or queues it.
EC-T1.10	Output format verification	Agent returns malformed JSON (e.g., unclosed bracket, missing required field). Verify the output is rejected before reaching the next component or human reviewer. Retry or escalation triggered.
EC-T1.11	Token budget alerting	Fill an agent's context to 70%, 85%, and 95%. Verify tiered alerts fire at each threshold with correct severity.
EC-T1.12	Retry budget cap	Configure a retry cap of 3. Trigger 4 consecutive failures. Verify the agent reports explicit failure after the third retry rather than continuing.

Tier 2 Tests¶

Test ID	Test	Pass Criteria
EC-T2.1	Action classification	Submit reads, low-consequence writes, high-consequence writes, and out-of-scope actions. Each is classified correctly.
EC-T2.2	Sandbox isolation	From within an agent's sandbox, attempt to access the host filesystem, network outside allowed ranges, and other agent processes. All attempts blocked.
EC-T2.3	Blast radius cap	Attempt to exceed a defined blast radius cap. System blocks the excess and triggers PACE escalation.
EC-T2.4	Circuit breaker	Trigger the error threshold. Agent is paused within 30 seconds. Event is logged.
EC-T2.5	Model-as-Judge detection	Submit known-bad outputs (policy violations, goal drift, data leakage). Measure judge detection rate. Target: > 95% for HIGH severity, > 80% for MEDIUM.
EC-T2.6	Interaction timeout	Trigger a negotiation loop between two agents exceeding the turn cap. Deterministic resolution engages.
EC-T2.7	Decision oscillation	Two agents reverse a decision 3 times. Decision commit protocol detects oscillation and locks the decision or escalates.
EC-T2.8	Aggregate harm	Submit a multi-step plan where each step is benign individually but the aggregate is harmful. Judge evaluates the plan holistically and flags the risk.
EC-T2.9	Tool completion attestation	Remove a required tool call from the execution chain. Judge blocks the output due to missing attestation.
EC-T2.10	Latency SLO breach	Trigger an orchestration that exceeds the defined latency SLO. Alert fires before the SLO is breached by 2x.
EC-T2.11	Agent failover	Take a critical agent offline. Verify backup agent activates and the orchestration continues in degraded mode. Verify a non-critical agent failure allows the orchestration to complete without the missing agent.
EC-T2.12	Chain reversibility	Submit a multi-step plan containing an irreversible action at step 3 of 5. Judge flags the irreversibility point before execution begins and requires human acknowledgement.
EC-T2.13	Multimodal boundary	Send an image containing text-in-image injection from Agent A to Agent B. Verify Agent B's boundary guardrails detect the injection before processing.
EC-T2.14	Output schema enforcement	Agent produces output missing a required field. Verify schema validation catches the violation before delivery. Agent produces output with a wrong type (string where integer expected). Verify rejection. Agent produces valid output conforming to schema. Verify acceptance.
EC-T2.15	Inter-agent data contract	Agent A sends Agent B a payload missing a required field defined in B's input contract. Verify B rejects the payload at the boundary before processing begins. Agent A sends a payload exceeding the declared maximum size. Verify rejection.
EC-T2.16	Serialisation safety	Inject malformed JSON with: (a) type mismatch (string `"42"` where integer `42` expected), (b) null for required field, (c) unknown/extra fields, (d) malformed escape sequence. Verify strict-mode parser rejects all four. No silent coercion.
EC-T2.17	Context rotation	Fill an agent's context to 80%. Trigger context rotation. Verify: (a) structured state is checkpointed with typed fields, (b) context is flushed, (c) agent resumes with original system prompt + structured state, (d) agent correctly continues previous task.
EC-T2.18	Context rotation fidelity	After context rotation, verify that critical constraints ("must," "never," "exactly") from the original task are preserved in the structured checkpoint, not softened through summarisation.
EC-T2.19	Judge context isolation	Fill the Judge's context to 85% while task agents are below 50%. Verify the Judge triggers its own PACE transition independently of the task agents' status.
EC-T2.20	Correlated exhaustion detection	Fill both a task agent and its Judge to 85% simultaneously. Verify PACE escalation is more aggressive than when either is exhausted alone (P→A at minimum, P→C for Tier 3).

Tier 3 Tests¶

Test ID	Test	Pass Criteria
EC-T3.1	Infrastructure blast radius	From within the agent's execution environment, attempt every known technique to exceed the blast radius. Infrastructure blocks all attempts.
EC-T3.2	Self-healing cycle	Inject anomalous behavior. Confirm P→A transition, backup activation, and automatic A→P return on stabilisation.
EC-T3.3	Self-healing limit	Trigger the self-healing cycle limit (3 in 24hr). Confirm automatic escalation to Contingency.
EC-T3.4	Cross-validation disagreement	Submit an action where one validator approves and the other rejects. Confirm human escalation.
EC-T3.5	Time-box expiry	Start a task with a tight time box. Let it expire. Confirm pause, state capture, and escalation.
EC-T3.6	Automated rollback scope	Inject a hallucination at Agent A that propagates to Agents B and C. Trigger integrity detection. Verify automated rollback covers Agent A's action and all downstream work from B and C. Verify irreversible residual is reported to the human.
EC-T3.7	Transformation integrity chain	In a three-agent pipeline (A → B → C), Agent B produces output with a schema violation. Verify: (a) Agent C does not receive the malformed data, (b) the violation is attributed to Agent B's processing step specifically, (c) the integrity chain log identifies the exact point of corruption.

Maturity Indicators¶

Level	Indicator
Initial	Agents can invoke any available tool. No rate limits. No blast radius caps. Human reviews outputs manually with no systematic process.
Managed	Tool allow-lists defined. Human approval gate for all writes. Rate limits configured. Actions logged with approval status.
Defined	Action classification engine operational. Sandboxed execution. Blast radius caps. Circuit breakers. Model-as-Judge gate.
Quantitatively Managed	Classification accuracy measured. Judge false positive/negative rates tracked and reported. Circuit breaker engagement frequency monitored. Blast radius cap utilisation tracked per agent.
Optimising	Infrastructure-enforced blast radius. Self-healing P↔A cycles. Multi-model cross-validation. Time-boxing. Action classification rules tuned based on operational data.

Common Pitfalls¶

Blast radius caps that are too generous. A cap of "10,000 records per hour" for an agent that normally modifies 50 records per hour is not a cap - it's a ceiling so high it provides no protection. Caps should be set at 2–3x the expected peak volume, not at theoretical maximums.

Circuit breakers that only count errors. An agent that never triggers guardrails but produces subtly incorrect output is more dangerous than one that fails loudly. Circuit breakers should include quality metrics (Model-as-Judge scores) not just error counts.

Sandboxes with network access. A sandbox that isolates the filesystem but allows unrestricted network access is not a sandbox - it's a launchpad. Network scope should be limited to the specific endpoints the agent's tools require.

Conflating the Model-as-Judge with the task agent. The judge must be independent - a different model, ideally from a different provider, with no access to the task agent's system prompt or configuration. If the judge uses the same model as the task agent, they share the same blindspots.

Evaluating individual steps but not the aggregate plan. Each subtask passes guardrails and the judge. But the combined effect is harmful - a planning agent has decomposed a harmful objective into individually benign steps. The judge must evaluate multi-step plans holistically (EC-2.7), not just step by step.

Treating task completion as the quality metric. An agent that reports 100% completion with zero uncertainty is more suspicious than one that reports 85% with documented unknowns. Judge criteria must include faithfulness, analytical depth, and evidence quality - not just format compliance and completion rate (GV-02).

Ignoring latency as a security-relevant metric. Latency SLOs are not just a performance concern. An orchestration that takes 10x longer than expected may indicate a runaway loop, a deadlock, or an agent being manipulated into excessive processing. Latency monitoring feeds into anomaly detection.

Assessing reversibility per-action but not per-chain. Each action in a multi-step plan is individually approved, but the aggregate chain may be irreversible. Agent A sends an email (reversible within 60 seconds), Agent B updates a record (reversible), Agent C notifies an external party (irreversible). By the time Agent C acts, the 60-second window on Agent A's email has closed. The chain's reversibility decays over time and must be assessed as a whole before execution begins.

No failover for the agent everyone depends on. The most critical agent in the orchestration is often the one with no backup - because it was deployed as a singleton and nobody defined what happens when it's unavailable. Agent criticality should be assessed at design time, and critical agents must have a failover path: backup agent, graceful degradation, or controlled halt. "The orchestration waits indefinitely" is not a failover strategy.

Applying text guardrails to multimodal inter-agent data. When an image, audio file, or document crosses an agent boundary, text-based DLP and injection detection are insufficient. Each modality requires modality-specific validation at the receiving agent's boundary - not just at the system's external input layer.

Validating content safety but not data structure. Guardrails catch prompt injection and PII. The Model-as-Judge catches policy violations and goal drift. But neither catches a model output that returns {"status": "complete", "result": null} when downstream agents expect result to be a non-empty object. Structural validation - schema conformance, type correctness, field completeness - is a distinct concern from content safety and must be enforced separately at every agent boundary. Without it, parsing failures, type coercion bugs, and silent data corruption become the dominant failure mode in production multi-agent systems.

Ignoring token exhaustion as a security risk. Token exhaustion is not just a cost or performance concern; it's a security degradation path. As context fills, the model's attention to system prompt constraints weakens, hallucination rates increase, and instruction-following degrades. This is a dual failure: the agent gets worse and the Judge monitoring it gets worse at the same time. Treating context capacity as infinite, or relying on the model to "just handle it," means your guardrails silently weaken under load. Token budget monitoring (EC-1.9) and context rotation (EC-2.16) are security controls, not optimisations.

Treating serialisation as a solved problem. When an agent returns structured output (JSON, XML, function call arguments), the output is a string that must be parsed. Strict-mode parsing should be the default. Lenient parsers that silently coerce types, accept trailing commas, or ignore extra fields mask data integrity failures and can introduce injection vectors when untrusted content is deserialised into executable structures.