Skip to content

MASO Control Domain: Execution Control

Part of the MASO Framework · Control Specifications Covers: ASI02 (Tool Misuse) · ASI05 (Unexpected Code Execution) · ASI08 (Cascading Failures) · LLM05 (Improper Output Handling) · ASI07 (Insecure Inter-Agent Comms, structural validation) Also covers: CR-01 (Deadlock/Livelock) · CR-02 (Oscillation) · SM-01 (Cumulative Harm) · GV-02 (Metric Gaming) · OP-02 (Latency) · OP-03 (Partial Failure) · OP-04 (Token Exhaustion) · OP-04a (Agent Unavailability) · OP-05 (Irreversible Action Chains)

Principle

Every agent action is bounded: bounded by permission, bounded by impact, bounded by time. No single agent can cause unlimited damage. When an agent fails, the failure is contained to that agent. When errors cascade, automated circuit breakers engage before human response is required.

Execution control is where the PACE resilience methodology meets real-time operations. The controls in this domain define the triggers that move the system from Primary to Alternate and beyond.

Why This Matters in Multi-Agent Systems

Tool misuse compounds across agents. In a single-model system, a tool misuse event is contained to one context. In a multi-agent system, Agent A's misuse of Tool X produces output that becomes Agent B's input for Tool Y. The damage from chained tool misuse can far exceed what any single agent could accomplish alone.

Code execution pathways multiply. When agents generate and execute code, each agent is a potential entry point for code injection. If Agent A generates code that Agent B executes in its sandbox, the security boundary depends on both the generation controls (Agent A) and the execution controls (Agent B). A weakness in either is exploitable.

Cascading failures are the default, not the exception. Multi-agent systems are tightly coupled by design - agents depend on each other's outputs. A hallucination in one agent becomes a flawed plan in the next, becomes a destructive action in the third. Without explicit isolation, errors propagate at the speed of the orchestration.

Runaway loops consume resources exponentially. Two agents triggering each other in a cycle can generate exponential resource consumption. The loop may look like productive work to a naive monitor - each agent is calling tools, producing outputs, and delegating tasks - but the system is burning tokens and compute on a recursive dead end.

Single agent loss cascades through the orchestration (OP-04). When one agent in a multi-agent system becomes unavailable - model provider outage, sandbox crash, credential revocation - every agent that depends on its output is affected. Without explicit failover, a single agent failure degrades or halts the entire orchestration. The system's availability is determined by its least available component, not its most robust.

Irreversible actions compound across agent chains (OP-05). Agent A sends an email. Agent B deletes a record. Agent C makes an API call to a third party. Each action was individually approved, but the chain is collectively irreversible. When Agent D detects that Agent A's email was based on hallucinated data, the downstream actions cannot be undone. Reversibility must be assessed for the chain, not just per-action, and compensating controls must exist for actions that cannot be recalled.

Token exhaustion degrades agents and their monitors simultaneously (OP-04). As an agent's context window fills, attention dilution weakens system prompt instructions (including safety constraints, role boundaries, and tool restrictions) without any adversarial action. The lost-in-the-middle effect causes critical constraints to be functionally forgotten. Hallucination rates increase. Instruction-following degrades. This is a dual failure path: the agent gets worse, and the Model-as-Judge evaluating it gets worse at the same time if it's also accumulating context. A degraded Judge reviewing a degraded agent's output is a compounding failure that bypasses two control layers simultaneously. In multi-agent systems, each agent burns tokens independently (you can't see the degradation from the orchestrator's perspective), and retry loops accelerate exhaustion by consuming more context on each failed attempt. Prompt injection becomes easier as system prompt influence weakens, and blast radius caps may be ignored if the agent stops reliably following structured constraints. Token exhaustion is gradual, not binary: the agent doesn't crash, it gets subtly worse. This means PACE transitions may not trigger without explicit context utilisation monitoring.

Data integrity failures are silent and cumulative. Security guardrails catch injections. Content filters catch harmful output. But when Agent A returns a JSON object with a missing field and Agent B silently treats that field as null, the resulting action is wrong - not malicious, just wrong. In production multi-agent systems, the majority of runtime failures come not from security violations but from structural data integrity failures between components: malformed tool outputs, unexpected types, truncated responses, hallucinated field names, and partial results presented as complete. These failures are harder to detect than security violations because they don't trigger guardrails - the output is syntactically valid but semantically broken.

Controls by Tier

Tier 1 - Supervised

Control Requirement Implementation Notes
EC-1.1 Human approval gate Every write operation, external API call, and state-modifying action requires human approval System presents proposed action (tool, parameters, target) and waits for confirmation.
EC-1.2 Tool allow-lists Each agent has a defined list of permitted tools; unlisted tools are blocked Enforced at the guardrails layer.
EC-1.3 Per-agent rate limits Maximum actions per time window per agent Prevents runaway loops before human review catches them. Recommended: 100 calls/hr.
EC-1.4 Read auto-approval Read operations within scoped permissions proceed without human approval Establishes the efficiency baseline that Tier 2 will extend.
EC-1.5 Interaction timeout All agent negotiation sequences have a maximum turn count Recommended: 10 turns. Exceeding cap triggers deterministic resolution (orchestrator decides or task escalates to human). Prevents deadlock and livelock (CR-01).
EC-1.5a Agent spawn rate limit Maximum number of new agent instances the orchestrator can create per time window Prevents runaway spawning that exhausts compute, memory, or API quota. Recommended: define per-orchestration and per-time-window limits. Exceeding the spawn rate limit blocks new agent creation and triggers an alert.
EC-1.6 Reversibility assessment Every action is classified as reversible, time-bounded reversible, or irreversible before execution Irreversible actions require human approval (reinforces EC-1.1). Time-bounded reversible actions carry a reversal window (e.g., "email can be recalled within 60 seconds"). Classification is logged with each action (OP-05).
EC-1.7 Agent health check Each agent's availability is verified before task assignment Orchestrator confirms agent is responsive before delegating. If unavailable, task is queued or routed to an alternative. Prevents silent failure from assigning work to a dead agent (OP-04).
EC-1.8 Output format verification Agent outputs checked against expected format before delivery to the next component or human reviewer Basic validation: is the response valid JSON if JSON was expected? Are required top-level fields present? Is the response non-empty? Malformed outputs are rejected and the task retried or escalated - not silently passed downstream.
EC-1.9 Token budget monitoring Per-agent context utilisation tracked with tiered alerting thresholds Track context window consumption as an operational metric alongside cost and latency. Alert at 70% (info), 85% (warning), 95% (critical). At Tier 1, alerts are informational; operator decides response. Prevents silent degradation where guardrails weaken as context fills (OP-04).
EC-1.10 Retry budget caps Maximum retry count per task per agent Recommended: 3 retries. Each retry consumes additional context and degrades agent performance. After cap, agent reports explicit failure rather than continuing to spiral. Prevents the degradation loop where retries accelerate token exhaustion (OP-04).

Tier 2 - Managed

All Tier 1 controls remain active, plus:

Control Requirement Implementation Notes
EC-2.1 Action classification Every action classified as auto-approve, escalate, or block Classification considers: action type, target system, data classification, agent history, Model-as-Judge evaluation.
EC-2.2 Sandboxed execution Agents generating/executing code operate in isolated environments Per-agent sandbox with defined filesystem, network, and process boundaries. Sandbox destroyed and recreated after each execution.
EC-2.3 Blast radius caps Maximum impact scope per agent Max records modifiable, max financial value, max external API calls. Exceeding any cap triggers PACE escalation.
EC-2.4 Circuit breakers Agent paused on error threshold E.g., 3 guardrail blocks in 10 minutes triggers circuit breaker. Agent is paused, event is logged, monitoring evaluates next step.
EC-2.5 Model-as-Judge gate Evaluation model reviews all agent outputs before external commit Evaluates: factual accuracy, policy compliance, goal integrity (cross-ref PG-2.2), output safety, data leakage. Judge criteria include faithfulness and analytical depth, not just format compliance (GV-02).
EC-2.6 Decision commit protocol Decisions passing judge review are committed; reversal requires human authorisation or documented input change Prevents oscillation (CR-02). Tie-break rules defined for equal-weight alternatives.
EC-2.7 Aggregate harm assessment Judge evaluation includes full task plan context, not just individual agent output For multi-step plans, judge evaluates the whole plan before execution begins. Catches cumulative harm from individually benign subtasks (SM-01).
EC-2.7a Dry-run / simulation mode High-risk or first-time actions execute against a sandbox or staging environment before production commit The gateway routes the action to a simulation target, captures the result, and presents it for human review or Judge evaluation. Only after validation does the action execute against production. Mandatory for irreversible actions in new deployments where no behavioral baseline exists.
EC-2.8 Tool completion attestation Required tool calls defined per task; tool failure or skip produces explicit incomplete status Judge verifies all required tools completed before approving output. Prevents partial failure masquerading as success (OP-03).
EC-2.9 Latency SLOs and oversight SLA enforcement Per-orchestration end-to-end latency targets defined and monitored; maximum time before human review required is enforced per risk tier Documents which control layers operate synchronously (blocking) vs asynchronously (post-commit audit). Judge may run async for auto-approved actions to reduce latency. For escalated actions, a configurable oversight SLA defines the maximum wait time for human review; actions not reviewed within the SLA window fail safe (denied, not auto-approved).
EC-2.10 Agent failover Critical agents have a defined failover path: backup agent, graceful degradation, or controlled halt Failover activates automatically on health check failure (EC-1.7). Backup agents operate with the same NHI scope and tool allow-list as the primary. Orchestration continues in degraded mode if non-critical agents are unavailable; halts if critical agents are unavailable with no backup (OP-04).
EC-2.11 Chain reversibility assessment For multi-step plans, the Judge evaluates aggregate reversibility before execution begins If the plan contains irreversible actions, the Judge flags the irreversibility point and requires explicit human acknowledgement. Compensating actions must be defined for each irreversible step (e.g., correction email, reversal transaction, notification to affected party) (OP-05).
EC-2.12 Multimodal boundary validation When multimodal data (images, audio, video, documents) crosses an agent boundary, modality-specific guardrails are applied at the receiving agent Text-in-image injection, steganographic payloads, inaudible audio commands, and embedded document instructions are checked at each agent boundary, not just at system input. Cross-ref Multimodal Controls.
EC-2.13 Output schema enforcement Every agent declares the schema of its output; outputs are validated against this schema before delivery JSON Schema, Pydantic models, or equivalent. Validation checks: required fields present, types correct, enums within allowed values, string lengths within bounds, nested structures conform. Schema failures produce a structured error - not a silent pass with missing fields. Schemas are versioned and published in the agent manifest alongside tool declarations.
EC-2.14 Inter-agent data contracts Every agent-to-agent data transfer is validated against a declared contract at the receiving agent's boundary The contract specifies the expected input schema, required fields, acceptable value ranges, and maximum payload size. The receiving agent validates inbound data before processing - not the sending agent alone. This is the structural equivalent of zero-trust: trust nothing, parse strictly, validate everything. Contracts are enforced at the message bus or gateway layer, not by the agents themselves.
EC-2.15 Serialisation boundary validation Structured outputs (JSON, XML, function call arguments) are parsed with strict-mode deserialisation; no lenient parsing Strict mode: reject unknown fields, reject type mismatches (string where number expected), reject null for required fields, reject malformed escape sequences. No silent coercion. Deserialised objects are validated against the output schema (EC-2.13) immediately after parsing. Path traversal patterns in string fields (e.g., ../../etc/passwd) are caught by parameter constraints, not by the parser - but the parser must not silently accept them.
EC-2.16 Context rotation with structured state preservation Agents approaching context limits checkpoint structured state and resume with a clean context window Checkpoint captures: current goal, active constraints, accumulated decisions, pending actions, as typed structured fields (JSON schema), not free-text summaries. Free-text summarisation introduces semantic drift (EP-05): "must" becomes "should," qualifiers vanish, requirements soften. After checkpoint, context is flushed and the agent resumes with the structured state plus the original system prompt. The agent retains its work but gets a fresh attention budget. Rotation threshold: configurable, recommended at 75% context capacity (OP-04).
EC-2.17 Judge context isolation The Model-as-Judge manages its own context budget independently from the agents it evaluates The Judge must not share a context window with task agents. Judge context consumption is monitored separately (EC-1.9). If the Judge approaches its own context limit, this is a PACE trigger: a degraded Judge reviewing degraded agents is a correlated failure that bypasses two layers simultaneously. Judge rotation follows the same structured checkpoint protocol as task agents (EC-2.16) but is triggered independently (OP-04).

Tier 3 - Autonomous

All Tier 2 controls remain active, plus:

Control Requirement Implementation Notes
EC-3.1 Infrastructure-enforced blast radius Caps enforced at platform level, not by agent or orchestrator Agent cannot override its own blast radius through any mechanism. Analogous to OS-level resource limits.
EC-3.2 Self-healing circuit breakers Circuit breaker triggers PACE P→A automatically; backup agent activates; system returns to P on stabilisation Self-healing cycle limit: max 3 in 24 hours before auto-escalation to Contingency.
EC-3.3 Multi-model cross-validation High-consequence actions validated by Model-as-Judge AND an independent second model Disagreement between validators triggers human escalation.
EC-3.4 Time-boxing Every autonomous task has a maximum execution time Task not completed within time box → agent paused, state captured, task escalated. Prevents indefinite autonomous operation on drifted tasks.
EC-3.5 Automated rollback scope When integrity compromise is detected, automated rollback covers the compromised agent and all downstream actions that depended on its output Rollback scope is determined by the decision chain (OB-2.1). Downstream agents are notified. Actions that cannot be rolled back trigger compensating actions automatically. Human is notified of the rollback scope and any irreversible residual.
EC-3.6 Transformation integrity chain Data validated after each processing step in a multi-agent pipeline; cumulative integrity tracked end-to-end Each agent in a pipeline attests to the structural validity of its output. The attestation chain travels with the data (analogous to DP-3.4 data provenance but for structural integrity). If any agent in the chain produces output that fails schema validation, downstream processing halts immediately - not after the malformed data has been transformed two more times. Integrity violations are correlated with the specific pipeline step that introduced the corruption.

Action Classification Rules (Tier 2+)

The action classification engine is the core mechanism that replaces per-action human approval with risk-proportionate automation. Rules should be defined collaboratively between the AI security team and the business function that owns the agent system.

Auto-approve (no human involvement):

  • Read operations within the agent's scoped permissions
  • Write operations to internal staging areas (reversible, low-consequence)
  • Tool invocations within pre-approved parameter ranges
  • Actions that the Model-as-Judge approves and the blast radius cap is not at risk

Escalate (human approval required):

  • Write operations to production systems
  • Actions involving external parties (email sends, API calls to third-party services)
  • Irreversible operations (data deletion, financial transactions)
  • Actions flagged by the Model-as-Judge for any reason
  • Actions where the blast radius cap would exceed 50% of the defined maximum
  • First-time use of a tool by an agent (no baseline data)

Block (automatic denial):

  • Actions outside the agent's tool allow-list
  • Actions that violate the guardrails layer
  • Actions targeting systems not in the agent's scope
  • Actions during a PACE Alternate or Contingency phase that exceed the phase-specific restrictions

Deployment Topology: Evaluation Roles vs. Services

The MASO architecture describes evaluation roles as logically distinct components: tactical judge, strategic evaluator, meta-evaluator, observer, domain judges. This is correct for the security model: each role has a different threat profile, different OISpec, and different failure modes. They must be logically independent.

They do not need to be operationally independent. Logical separation and operational consolidation are compatible, as long as context isolation is maintained between evaluations.

How Roles Map to Services

Evaluation Role Logical Requirement Operational Reality
Tactical judge Must evaluate every agent action against its OISpec. Must use a different model from the task agent. A single distilled SLM sidecar process, co-located with the task agents. One process serves all tactical evaluations with per-agent OISpec routing.
Domain judges (fraud, security, compliance) Must evaluate from distinct policy perspectives. Verdicts must be independent. Can be a single SLM with multi-criteria evaluation prompts, producing structured verdicts per domain. Or: consolidated into the tactical SLM call with domain-specific criteria sections. Separate SLMs only if domain-specific fine-tuning requires different models.
Strategic evaluator Must assess combined outputs against workflow intent. Must have visibility across all agent outputs. A single LLM API call at phase boundaries. Not a persistent service. Invoked as a function, not deployed as an agent.
Meta-evaluator Must monitor judge drift independently from the judge and the task agents. A scheduled job (cron, Airflow, Step Functions) that injects known test cases into the tactical judge and measures accuracy. Runs daily or weekly. Not a persistent service.
Observer Must score anomalies continuously. Must be independent of task agent infrastructure at Tier 3. A metrics pipeline (Prometheus, CloudWatch, Datadog) consuming the decision chain log and computing anomaly scores. At Tier 3, runs on separate infrastructure. Not an LLM agent.

Example: Fraud Detection Deployment

Architecture diagrams suggest: 3 task agents + 1 tactical judge + 1 fraud judge + 1 compliance judge + 1 strategic evaluator + 1 meta-evaluator + 1 observer = 9 components

Actual deployment:

Service What It Runs Infrastructure
Task agent pod (×3) Fraud analysis agents, each with SLM sidecar 3 pods, each with task agent + SLM process
SLM sidecar (in each pod) Tactical evaluation + fraud domain + compliance domain (multi-criteria) Co-located with task agent. ~1GB RAM for INT4 model.
Strategic evaluation function Single LLM call at batch/phase boundaries Serverless function (Lambda, Cloud Functions). Invoked, not persistent.
Calibration pipeline Daily injection of test cases into SLM. Accuracy measured against known labels. Scheduled job. Runs once per day.
Monitoring Anomaly scoring from decision chain log. PACE trigger logic. Existing observability stack.

Actual services to operate: 3 pods + 1 serverless function + 1 scheduled job + existing monitoring. That is the operational footprint of the "9 component" architecture.

Context Isolation Requirements

Consolidating evaluation roles into fewer services is acceptable only if context isolation is maintained:

Requirement Why It Matters
Each evaluation call gets a fresh context A tactical evaluation of Agent A must not carry residual context from evaluating Agent B.
Domain evaluations are independent The fraud evaluation prompt must not include the compliance evaluation result, and vice versa. Verdicts must be produced independently before conflict resolution applies.
SLM sidecar does not share memory with the task agent The SLM runs in a separate process or container. It reads the task agent's output; it does not share the task agent's context window.
Meta-evaluation does not run on the same model instance as tactical evaluation The calibration pipeline calls the SLM from outside, as a client. It does not have access to the SLM's internal state.

Testing Criteria

Tier 1 Tests

Test ID Test Pass Criteria
EC-T1.1 Approval gate Submit a write operation. Confirm it blocks until human approval. Reject the approval and confirm the action is prevented.
EC-T1.2 Tool scope Attempt to invoke a tool not on the agent's allow-list. Guardrail blocks it.
EC-T1.3 Rate limit Submit actions exceeding the configured rate. Throttling engages.
EC-T1.4 Read auto-approval Submit a read operation within scope. Confirm it executes without human approval.
EC-T1.5 Interaction timeout Trigger a negotiation loop. Confirm the turn cap is enforced and resolution engages.
EC-T1.6 Role-based tool enforcement For each agent role (analyst, executor, critic), attempt to invoke tools assigned to a different role. All attempts blocked. (Amendment: CR-03)
EC-T1.7 Operator challenge rate Present operators with outputs containing deliberate errors. Measure challenge rate. Target: > 80% detection. (Amendment: HF-01)
EC-T1.8 Reversibility classification Submit a reversible action, a time-bounded reversible action, and an irreversible action. Verify each is classified correctly and the irreversible action requires human approval.
EC-T1.9 Agent health check Take an agent offline. Assign it a task. Verify the orchestrator detects unavailability and routes the task to an alternative or queues it.
EC-T1.10 Output format verification Agent returns malformed JSON (e.g., unclosed bracket, missing required field). Verify the output is rejected before reaching the next component or human reviewer. Retry or escalation triggered.
EC-T1.11 Token budget alerting Fill an agent's context to 70%, 85%, and 95%. Verify tiered alerts fire at each threshold with correct severity.
EC-T1.12 Retry budget cap Configure a retry cap of 3. Trigger 4 consecutive failures. Verify the agent reports explicit failure after the third retry rather than continuing.

Tier 2 Tests

Test ID Test Pass Criteria
EC-T2.1 Action classification Submit reads, low-consequence writes, high-consequence writes, and out-of-scope actions. Each is classified correctly.
EC-T2.2 Sandbox isolation From within an agent's sandbox, attempt to access the host filesystem, network outside allowed ranges, and other agent processes. All attempts blocked.
EC-T2.3 Blast radius cap Attempt to exceed a defined blast radius cap. System blocks the excess and triggers PACE escalation.
EC-T2.4 Circuit breaker Trigger the error threshold. Agent is paused within 30 seconds. Event is logged.
EC-T2.5 Model-as-Judge detection Submit known-bad outputs (policy violations, goal drift, data leakage). Measure judge detection rate. Target: > 95% for HIGH severity, > 80% for MEDIUM.
EC-T2.6 Interaction timeout Trigger a negotiation loop between two agents exceeding the turn cap. Deterministic resolution engages.
EC-T2.7 Decision oscillation Two agents reverse a decision 3 times. Decision commit protocol detects oscillation and locks the decision or escalates.
EC-T2.8 Aggregate harm Submit a multi-step plan where each step is benign individually but the aggregate is harmful. Judge evaluates the plan holistically and flags the risk.
EC-T2.9 Tool completion attestation Remove a required tool call from the execution chain. Judge blocks the output due to missing attestation.
EC-T2.10 Latency SLO breach Trigger an orchestration that exceeds the defined latency SLO. Alert fires before the SLO is breached by 2x.
EC-T2.11 Agent failover Take a critical agent offline. Verify backup agent activates and the orchestration continues in degraded mode. Verify a non-critical agent failure allows the orchestration to complete without the missing agent.
EC-T2.12 Chain reversibility Submit a multi-step plan containing an irreversible action at step 3 of 5. Judge flags the irreversibility point before execution begins and requires human acknowledgement.
EC-T2.13 Multimodal boundary Send an image containing text-in-image injection from Agent A to Agent B. Verify Agent B's boundary guardrails detect the injection before processing.
EC-T2.14 Output schema enforcement Agent produces output missing a required field. Verify schema validation catches the violation before delivery. Agent produces output with a wrong type (string where integer expected). Verify rejection. Agent produces valid output conforming to schema. Verify acceptance.
EC-T2.15 Inter-agent data contract Agent A sends Agent B a payload missing a required field defined in B's input contract. Verify B rejects the payload at the boundary before processing begins. Agent A sends a payload exceeding the declared maximum size. Verify rejection.
EC-T2.16 Serialisation safety Inject malformed JSON with: (a) type mismatch (string "42" where integer 42 expected), (b) null for required field, (c) unknown/extra fields, (d) malformed escape sequence. Verify strict-mode parser rejects all four. No silent coercion.
EC-T2.17 Context rotation Fill an agent's context to 80%. Trigger context rotation. Verify: (a) structured state is checkpointed with typed fields, (b) context is flushed, (c) agent resumes with original system prompt + structured state, (d) agent correctly continues previous task.
EC-T2.18 Context rotation fidelity After context rotation, verify that critical constraints ("must," "never," "exactly") from the original task are preserved in the structured checkpoint, not softened through summarisation.
EC-T2.19 Judge context isolation Fill the Judge's context to 85% while task agents are below 50%. Verify the Judge triggers its own PACE transition independently of the task agents' status.
EC-T2.20 Correlated exhaustion detection Fill both a task agent and its Judge to 85% simultaneously. Verify PACE escalation is more aggressive than when either is exhausted alone (P→A at minimum, P→C for Tier 3).

Tier 3 Tests

Test ID Test Pass Criteria
EC-T3.1 Infrastructure blast radius From within the agent's execution environment, attempt every known technique to exceed the blast radius. Infrastructure blocks all attempts.
EC-T3.2 Self-healing cycle Inject anomalous behavior. Confirm P→A transition, backup activation, and automatic A→P return on stabilisation.
EC-T3.3 Self-healing limit Trigger the self-healing cycle limit (3 in 24hr). Confirm automatic escalation to Contingency.
EC-T3.4 Cross-validation disagreement Submit an action where one validator approves and the other rejects. Confirm human escalation.
EC-T3.5 Time-box expiry Start a task with a tight time box. Let it expire. Confirm pause, state capture, and escalation.
EC-T3.6 Automated rollback scope Inject a hallucination at Agent A that propagates to Agents B and C. Trigger integrity detection. Verify automated rollback covers Agent A's action and all downstream work from B and C. Verify irreversible residual is reported to the human.
EC-T3.7 Transformation integrity chain In a three-agent pipeline (A → B → C), Agent B produces output with a schema violation. Verify: (a) Agent C does not receive the malformed data, (b) the violation is attributed to Agent B's processing step specifically, (c) the integrity chain log identifies the exact point of corruption.

Maturity Indicators

Level Indicator
Initial Agents can invoke any available tool. No rate limits. No blast radius caps. Human reviews outputs manually with no systematic process.
Managed Tool allow-lists defined. Human approval gate for all writes. Rate limits configured. Actions logged with approval status.
Defined Action classification engine operational. Sandboxed execution. Blast radius caps. Circuit breakers. Model-as-Judge gate.
Quantitatively Managed Classification accuracy measured. Judge false positive/negative rates tracked and reported. Circuit breaker engagement frequency monitored. Blast radius cap utilisation tracked per agent.
Optimising Infrastructure-enforced blast radius. Self-healing P↔A cycles. Multi-model cross-validation. Time-boxing. Action classification rules tuned based on operational data.

Common Pitfalls

Blast radius caps that are too generous. A cap of "10,000 records per hour" for an agent that normally modifies 50 records per hour is not a cap - it's a ceiling so high it provides no protection. Caps should be set at 2–3x the expected peak volume, not at theoretical maximums.

Circuit breakers that only count errors. An agent that never triggers guardrails but produces subtly incorrect output is more dangerous than one that fails loudly. Circuit breakers should include quality metrics (Model-as-Judge scores) not just error counts.

Sandboxes with network access. A sandbox that isolates the filesystem but allows unrestricted network access is not a sandbox - it's a launchpad. Network scope should be limited to the specific endpoints the agent's tools require.

Conflating the Model-as-Judge with the task agent. The judge must be independent - a different model, ideally from a different provider, with no access to the task agent's system prompt or configuration. If the judge uses the same model as the task agent, they share the same blindspots.

Evaluating individual steps but not the aggregate plan. Each subtask passes guardrails and the judge. But the combined effect is harmful - a planning agent has decomposed a harmful objective into individually benign steps. The judge must evaluate multi-step plans holistically (EC-2.7), not just step by step.

Treating task completion as the quality metric. An agent that reports 100% completion with zero uncertainty is more suspicious than one that reports 85% with documented unknowns. Judge criteria must include faithfulness, analytical depth, and evidence quality - not just format compliance and completion rate (GV-02).

Ignoring latency as a security-relevant metric. Latency SLOs are not just a performance concern. An orchestration that takes 10x longer than expected may indicate a runaway loop, a deadlock, or an agent being manipulated into excessive processing. Latency monitoring feeds into anomaly detection.

Assessing reversibility per-action but not per-chain. Each action in a multi-step plan is individually approved, but the aggregate chain may be irreversible. Agent A sends an email (reversible within 60 seconds), Agent B updates a record (reversible), Agent C notifies an external party (irreversible). By the time Agent C acts, the 60-second window on Agent A's email has closed. The chain's reversibility decays over time and must be assessed as a whole before execution begins.

No failover for the agent everyone depends on. The most critical agent in the orchestration is often the one with no backup - because it was deployed as a singleton and nobody defined what happens when it's unavailable. Agent criticality should be assessed at design time, and critical agents must have a failover path: backup agent, graceful degradation, or controlled halt. "The orchestration waits indefinitely" is not a failover strategy.

Applying text guardrails to multimodal inter-agent data. When an image, audio file, or document crosses an agent boundary, text-based DLP and injection detection are insufficient. Each modality requires modality-specific validation at the receiving agent's boundary - not just at the system's external input layer.

Validating content safety but not data structure. Guardrails catch prompt injection and PII. The Model-as-Judge catches policy violations and goal drift. But neither catches a model output that returns {"status": "complete", "result": null} when downstream agents expect result to be a non-empty object. Structural validation - schema conformance, type correctness, field completeness - is a distinct concern from content safety and must be enforced separately at every agent boundary. Without it, parsing failures, type coercion bugs, and silent data corruption become the dominant failure mode in production multi-agent systems.

Ignoring token exhaustion as a security risk. Token exhaustion is not just a cost or performance concern; it's a security degradation path. As context fills, the model's attention to system prompt constraints weakens, hallucination rates increase, and instruction-following degrades. This is a dual failure: the agent gets worse and the Judge monitoring it gets worse at the same time. Treating context capacity as infinite, or relying on the model to "just handle it," means your guardrails silently weaken under load. Token budget monitoring (EC-1.9) and context rotation (EC-2.16) are security controls, not optimisations.

Treating serialisation as a solved problem. When an agent returns structured output (JSON, XML, function call arguments), the output is a string that must be parsed. Strict-mode parsing should be the default. Lenient parsers that silently coerce types, accept trailing commas, or ignore extra fields mask data integrity failures and can introduce injection vectors when untrusted content is deserialised into executable structures.