MASO Control Domain: Privileged Agent Governance¶

Part of the MASO Framework · Control Specifications Extends: Execution Control · Observability · Identity & Access Covers: Orchestrator security · Judge governance · Observer assurance · Nested orchestration

Principle¶

Any agent with authority over other agents - to plan, evaluate, monitor, or terminate - requires controls proportionate to that authority. Orchestrators, evaluators, and observers are not exempt from the control architecture. They are subject to a version of it that matches their specific threat model.

The controls in other MASO domains secure task agents against each other and against external threats. This domain secures the system against failures in the agents that govern it.

Why This Matters¶

Orchestrators influence outcomes through decisions, not tools. A compromised orchestrator doesn't need tool access to cause harm. It causes harm through task decomposition, agent selection, and result interpretation. Existing controls (IA-2.5, tool scoping) don't address this class of threat because the orchestrator's power is in planning, not execution.

Evaluators are single points of trust. The Model-as-Judge approves or blocks every task agent output. If Judge criteria drift, or the Judge model is compromised, Layer 2 collapses silently - every output passes because the standard has shifted. The system reports healthy while the controls are effectively disabled.

Observers with kill switch authority can be weaponised. An over-sensitive observer repeatedly triggers emergency shutdowns, degrading the system into permanent contingency. Operators disable it to restore service. The system now runs without its safety net - and nobody formally accepted that risk.

Nested orchestration multiplies these risks. In hierarchical topologies, a compromised sub-orchestrator affects its entire agent cluster. Per-agent blast radius caps don't contain aggregate harm from a coordinated sub-tree.

Agent Role Classification¶

Before applying controls, classify each agent in the orchestration by its governance role:

Role	Authority	Examples	Threat Profile
Task agent	Executes scoped actions using tools	Analyst, writer, code executor, retriever	Tool misuse, data leakage, injection
Orchestrator	Plans, selects agents, sequences execution, interprets results	Planner, coordinator, router, aggregator	Goal subversion, malicious decomposition, selective suppression
Evaluator	Approves, blocks, or escalates other agents' outputs	Model-as-Judge, critic, reviewer, quality gate	Criteria drift, adversarial bypass, calibration decay
Observer	Monitors, scores anomalies, triggers escalation or kill switch	Monitoring agent, anomaly detector, safety agent	Miscalibration, over/under-sensitivity, weaponised shutdown

An agent may hold multiple roles. An orchestrator that also evaluates intermediate outputs is both orchestrator and evaluator - and requires controls from both categories.

Controls by Tier¶

Tier 1 - Supervised¶

Control	Requirement	Implementation Notes
PA-1.1 Role declaration	Every agent's governance role (task, orchestrator, evaluator, observer) is explicitly declared in configuration	No implicit role assumption. An orchestrator that also evaluates must be declared as both.
PA-1.2 Orchestrator plan logging	Every task decomposition and agent selection decision by the orchestrator is logged with reasoning	Enables human review of orchestrator planning decisions during supervised operation.
PA-1.3 Judge decision logging	Every Judge approve/escalate/block decision is logged with the criteria applied and the reasoning	Builds the baseline data set required for Tier 2 calibration.
PA-1.4 Privileged agent inventory	Maintain a register of all agents operating in orchestrator, evaluator, or observer roles	Updated when the agent topology changes. Reviewed monthly.

What you're building at Tier 1: Visibility. You can answer: which agents have authority over other agents, what decisions are they making, and why?

Tier 2 - Managed¶

All Tier 1 controls remain active, plus:

Control	Requirement	Implementation Notes
PA-2.1 Orchestrator intent verification	Orchestrator task decomposition is verified against the original user intent by an independent model	The model verifying orchestrator plans must not be the orchestrator itself. Catches goal subversion through malicious decomposition.
PA-2.2 Judge calibration testing	Monthly calibration: present the Judge with known-good and known-bad test cases; measure detection accuracy	Minimum threshold: > 95% detection of HIGH severity violations, > 80% MEDIUM. Calibration results logged and trended.
PA-2.3 Judge criteria versioning	All Judge evaluation criteria are version-controlled with change history and approval trail	Criteria changes require review by a human with domain knowledge. No silent updates to what the Judge evaluates against.
PA-2.4 Judge disagreement protocol	When multiple evaluators reach conflicting verdicts on the same action or output, a defined resolution procedure activates	See Inter-Judge Conflict Resolution below for the full protocol. Applies to both multi-model cross-validation (EC-3.3) and multi-domain evaluation (e.g. fraud judge vs. security judge vs. compliance judge evaluating the same action).
PA-2.5 Observer precision monitoring	Observer false positive and false negative rates tracked and reported monthly	Target: < 5% false positive rate on PACE escalation triggers. False negatives measured through red team injection (PA-2.8).
PA-2.6 Kill switch dual authorisation	Observer kill switch requires secondary confirmation: either a second independent signal or a human within a defined window	Prevents single miscalibrated observer from taking down the system. Exception: if the confirming mechanism is unavailable, the kill switch fires after the defined window (fail-safe).
PA-2.7 Orchestrator behavioral baseline	Orchestrator decisions (agent selection, task decomposition patterns, retry frequency) baselined and monitored for drift	Uses the same anomaly scoring model as task agents (OB-2.2) but with orchestrator-specific signals.
PA-2.8 Privileged agent red team	Quarterly adversarial testing of orchestrator, judge, and observer	Test scenarios: inject a goal-subverting decomposition plan, present the Judge with adversarial inputs designed to bypass criteria, trigger known anomalies to verify observer detection.

What you're building at Tier 2: Measurement. You can answer: are our privileged agents still performing correctly, and how do we know?

Tier 3 - Autonomous¶

All Tier 2 controls remain active, plus:

Control	Requirement	Implementation Notes
PA-3.1 Orchestrator output evaluation	Orchestrator's aggregated output - not just task agent outputs - is evaluated by the Judge	Catches failures smoothed over during aggregation: intermediate hallucinations, dropped caveats, selective emphasis.
PA-3.2 Nested orchestration scoping	Each orchestration level has explicit permission boundaries; sub-orchestrators cannot exceed the scope delegated to them	Extends IA-2.4 (no transitive permissions) and IA-3.3 (delegation contracts) to orchestrator-to-sub-orchestrator boundaries.
PA-3.3 Sub-tree blast radius	Blast radius caps apply per orchestration sub-tree, not just per individual agent	A sub-orchestrator managing 5 agents with individual caps of 100 records each does not implicitly have a 500-record aggregate cap. Sub-tree cap is defined independently.
PA-3.4 Judge model rotation	Judge model is periodically rotated or supplemented with an alternative to prevent long-term adversarial adaptation	Rotation schedule based on risk tier. Prevents attackers from optimising against a static evaluator.
PA-3.5 Continuous calibration	Judge calibration runs continuously (daily injection of known test cases) rather than monthly	Automated. Results feed into anomaly scoring. Calibration failure triggers PACE escalation.
PA-3.6 Observer self-test	Observer runs a self-diagnostic cycle at defined intervals, injecting synthetic anomalies into its own detection pipeline	Confirms the observer's detection capability is operational. Failure to detect the synthetic anomaly triggers an alert to a human, not to the observer itself.
PA-3.7 Cross-level evaluation	In nested topologies, evaluation occurs at each orchestration level, not just at the leaf (task agent) level	Each sub-orchestrator's aggregation decisions are evaluated before results flow up to the parent orchestrator.

What you're building at Tier 3: Assured autonomy. Privileged agents are continuously verified, not just initially configured and assumed correct.

Inter-Judge Conflict Resolution¶

When a workflow uses multiple judges evaluating the same action from different perspectives (fraud detection, security policy, compliance, data protection), those judges will disagree. This is not a failure; it is the expected behaviour of independent evaluation. A fraud judge says "flag this transaction." A security judge says "transaction is within policy." A compliance judge says "block, insufficient documentation." Which verdict wins?

Without a defined resolution protocol, teams either ignore conflicts (the loudest judge wins) or escalate everything to humans (defeating the purpose of automated evaluation). Both outcomes erode trust in the evaluation architecture.

The Problem of Multi-Domain Evaluation¶

Multi-domain evaluation is different from multi-model cross-validation (EC-3.3). Cross-validation asks two models the same question and flags when they disagree. Multi-domain evaluation asks different questions about the same action:

Evaluation Domain	Question Being Asked	Evaluation Timing
Fraud	Is this transaction fraudulent?	Synchronous (operational)
Security	Does this action violate security policy?	Synchronous (operational)
Compliance	Does this action satisfy regulatory requirements?	Synchronous (operational)
Data protection	Does this action expose or mishandle sensitive data?	Synchronous (operational)
Intent alignment	Does this action satisfy the agent's declared OISpec?	Synchronous (operational)
Ethics / bias / fairness	Does this action align with organisational values and fairness standards?	Asynchronous (policy-driven, see below)

These are not redundant checks. They evaluate orthogonal concerns. A transaction can be non-fraudulent but non-compliant. An action can be policy-compliant but misaligned with intent. Conflict between domain judges is meaningful signal, not noise.

The operational domains (fraud through intent alignment) participate in the real-time conflict resolution protocol below. The policy-driven domains (ethics, bias, fairness) run post-action and produce advisories, not blocks. See Policy-Driven Evaluation Domains for the rationale and implementation pattern.

Resolution Protocol¶

Step 1: Declare judge precedence at design time¶

Every workflow OISpec must include a judge precedence order that defines which evaluation domain takes priority when verdicts conflict. This is not a technical decision. It is a business and regulatory decision made by the workflow owner.

{
  "judge_precedence": {
    "order": ["compliance", "data_protection", "security", "fraud", "intent_alignment"],
    "override_rules": [
      {
        "condition": "any_judge_verdict == block",
        "action": "block",
        "rationale": "Any domain can block; no domain can unblock what another has blocked"
      },
      {
        "condition": "fraud == flag AND security == approve",
        "action": "escalate",
        "rationale": "Domain disagreement on the same action requires human arbitration"
      }
    ]
  }
}

Step 2: Apply the "most restrictive wins" default¶

Unless the precedence order specifies otherwise, the default resolution is: the most restrictive verdict wins. If any judge says block, the action is blocked. If any judge says escalate while others approve, the action is escalated.

Fraud Judge	Security Judge	Compliance Judge	Resolution
Approve	Approve	Approve	Approve
Approve	Approve	Flag	Escalate
Flag	Approve	Approve	Escalate
Block	Approve	Approve	Block
Flag	Flag	Approve	Escalate (multi-domain concern)
Block	Flag	Block	Block

This is conservative by design. False positives from multi-domain disagreement are preferable to false negatives where a legitimate concern is overridden by another domain's approval.

Step 2a: Time-constrained conflicts with competing actions¶

The "most restrictive wins" default handles simple approve/flag/block disagreements. Real operational conflicts are harder. When fraud is in progress, judges may agree that action is needed but disagree on which action:

Judge	Verdict	Prescribed Action
Fraud judge	Flag: active fraud detected	Pursue the money. Reverse the transaction. Notify the fraud team.
Security judge	Block: security violation in progress	Freeze the account. Revoke session credentials. Isolate the compromised endpoint.
Compliance judge	Block: regulatory hold required	Place transaction on hold for the maximum permissible period. Gather documentation.

These are not contradictory verdicts. They are competing priorities with a shared urgency. The fraud judge wants to chase the money. The security judge wants to contain the breach. The compliance judge wants to preserve the audit trail. All three are legitimate, and delay harms all of them.

Resolution for time-constrained conflicts:

1. Security containment takes precedence over fraud pursuit. If a security violation is active (compromised credentials, unauthorized access, active breach), the security action executes first. You cannot pursue stolen funds through a compromised channel. Containment is the prerequisite for everything else.

2. Parallel degraded actions where possible. Once the security action has executed (account frozen, session revoked), the fraud and compliance actions can proceed in a degraded mode that respects the security boundary:

After Security Containment	Degraded Fraud Action	Degraded Compliance Action
Account frozen	Initiate recovery through the fraud team (not through the compromised agent). Compensate the customer directly if the transaction is confirmed fraudulent.	Regulatory hold is satisfied by the freeze. Documentation gathered from the immutable audit trail.
Session revoked	Chase the destination account through inter-bank channels.	File the required regulatory notification within the statutory window.
Endpoint isolated	Monitor for further exfiltration attempts from the isolated endpoint.	Preserve forensic evidence for regulatory inquiry.

3. Time-bounded resolution window. When judges prescribe competing actions, the orchestrator applies a resolution window:

Risk Level	Resolution Window	If No Resolution
CRITICAL (active fraud + active breach)	Security action executes immediately. Other actions degraded within 60 seconds.	Human escalation. Transactions held at maximum permissible duration.
HIGH (suspected fraud, no active breach)	Transactions held for review. Human arbitration within 15 minutes.	Most restrictive action (hold) persists. Risk of loss accepted if no human responds.
MEDIUM	Standard escalation. Human arbitration within 1 hour.	Automated resolution per precedence order.

4. Accept residual risk explicitly. If the resolution window expires without human arbitration, the system must either:

Apply the most restrictive action and accept the operational impact (frozen accounts, delayed transactions, customer friction), or
Release the hold and accept the risk of loss, with the decision logged and attributed to the workflow owner.

There is no silent default. The system either acts conservatively or accepts risk explicitly. It does not quietly let a hold expire.

The workflow OISpec must declare which of these two defaults applies for each risk level. This is a business decision: "We would rather freeze an innocent customer's account for 24 hours than lose $50,000 to fraud" vs. "We would rather accept a $500 loss than freeze a customer's account for more than 2 hours." Both are legitimate. Neither should be left to the orchestrator to decide at runtime.

{
  "conflict_resolution": {
    "time_constrained": {
      "security_breach_active": {
        "primary_action": "security_containment",
        "parallel_degraded": true,
        "resolution_window_seconds": 60,
        "expiry_default": "most_restrictive"
      },
      "fraud_suspected_no_breach": {
        "primary_action": "hold_transaction",
        "human_arbitration_window_minutes": 15,
        "expiry_default": "accept_risk_with_logging",
        "max_hold_duration": "regulatory_maximum"
      }
    }
  }
}

Step 3: Log the conflict, not just the resolution¶

Every inter-judge conflict is logged with:

All judge verdicts with reasoning
The resolution applied (precedence rule or default)
Whether the conflict was resolved automatically or escalated to a human
The human's decision (if escalated) and their reasoning

This creates the data set needed to tune precedence rules over time. If a specific conflict pattern is consistently resolved the same way by humans, that resolution can be automated.

Step 4: Track conflict patterns¶

Persistent disagreement between two judges on the same class of action indicates one of three problems:

Pattern	Likely Cause	Response
Fraud flags what security approves, repeatedly	Different risk thresholds or overlapping scope	Align evaluation criteria between domains
Compliance blocks what all other judges approve	Compliance criteria are stricter than operational policy	Business decision: tighten operational policy or accept the compliance overhead
Two judges consistently contradict on edge cases	Ambiguous evaluation criteria	Sharpen the OISpec for both judges

Conflict rate is a judge health metric. A conflict rate above 15% between any two judges indicates a criteria alignment problem, not a healthy diversity of opinion.

Policy-Driven Evaluation Domains: Ethics, Bias, and Fairness¶

Not all evaluation domains belong on the same decision path. Fraud, security, and compliance are operational domains: they have measurable criteria, they require real-time verdicts, and their thresholds are set by regulation or technical standards. A transaction either exceeds the velocity threshold or it does not. A credential is either compromised or it is not.

Ethics, bias, and fairness are different. They are policy-driven evaluation domains where:

Criteria are set by organisational values, not by regulation or technical measurement
Reasonable people disagree on what constitutes a violation
Context changes the evaluation (the same output may be appropriate in one jurisdiction and inappropriate in another)
An LLM evaluating "is this biased?" is applying its own training biases to detect bias, a circular problem that operational domains do not face

These domains still need evaluation. But the evaluation mechanism is different from real-time operational judging.

How Policy-Driven Evaluation Works¶

Characteristic	Operational Domains (fraud, security, compliance)	Policy-Driven Domains (ethics, bias, fairness)
Criteria source	Regulation, technical standards, measurable thresholds	Organisational policy, values statements, jurisdiction-specific norms
Evaluation timing	Real-time (synchronous or near-synchronous)	Post-action (asynchronous), with alert to HITL
Verdict type	Approve/flag/block (actionable immediately)	Warning/advisory (informs human review)
Who defines the rules	Regulators, security teams, compliance officers	Ethics boards, diversity committees, legal counsel, organisational leadership
LLM reliability	High for measurable criteria (velocity, credential status, documentation presence)	Low for subjective judgments (fairness, cultural sensitivity, implicit bias)
Failure mode	False negatives: missed fraud, missed breach	Systematic bias: the evaluator reproduces the biases it is supposed to detect

Implementation Pattern¶

Policy-driven evaluation is an offline monitoring and evaluation process, not an inline judge. It sits outside the direct agent architecture, consuming signals from the runtime system alongside external sources that the agent architecture has no visibility into.

Agent architecture (runtime):
  → Operational judges (sync): fraud, security, compliance → approve/flag/block
  → Action committed (or blocked by operational judges)
  → Decision chain log captures full audit trail

Offline monitoring and evaluation (separate process):
  → Consumes: decision chain logs, agent outputs, outcome data
  → Consumes: external signals (customer feedback, complaints, appeal outcomes,
     regulatory correspondence, demographic outcome distributions)
  → Produces: advisory reports, pattern alerts, portfolio-level analysis
  → Surfaces: warnings to human reviewers, ethics board, compliance
  → Feeds: organisational policy updates, OISpec revisions, guardrail tuning

Why offline, not inline?

Blocking on subjective criteria creates unpredictable friction. A bias evaluator that blocks 5% of legitimate transactions based on ambiguous criteria will be disabled within a week. Offline evaluation with human review preserves the evaluation without creating operational friction.
The most important signals come from outside the agent architecture. Customer complaints, appeal outcomes, regulatory feedback, demographic outcome data, and adverse action challenges are external signals that no inline judge can access. A customer who was denied credit and successfully appeals provides ground truth that no amount of LLM self-evaluation can replicate.
Portfolio-level detection is more reliable than per-transaction detection. A single decision may not be detectably biased. A pattern of 10,000 decisions that systematically disadvantages a protected class is detectable through statistical analysis. Offline evaluation enables this portfolio view.
LLMs are unreliable evaluators of their own biases. An LLM asked "is this output biased?" may say no, because the bias is in the model's own training data. Statistical monitoring of outcomes across protected classes is more reliable than per-output LLM evaluation.

External Signal Sources¶

Policy-driven evaluation is only as good as the data it consumes. The runtime decision chain provides the agent's view. External sources provide the world's view:

Signal Source	What It Reveals	How It Integrates
Customer feedback and complaints	Outcomes perceived as unfair, unexplained, or harmful by the affected party	Complaint categorisation feeds into the policy evaluation pipeline. Spikes in specific complaint categories trigger investigation.
Appeal and dispute outcomes	Ground truth on whether automated decisions were correct	Appeal overturn rates per demographic group, per decision category. Systematic overturn patterns indicate bias the inline judges missed.
Regulatory correspondence	Regulator concerns, examination findings, enforcement signals	Mapped to specific agent workflows and evaluation criteria. Triggers OISpec or guardrail revision.
Demographic outcome distributions	Statistical fairness across protected classes	Approval/denial rates, risk scores, pricing outcomes segmented by protected class. Disparity above threshold triggers investigation (not automated action).
Employee and operator feedback	Concerns from humans working with the agent system	Operators who notice patterns (e.g. "the system seems to flag these cases more often") provide early warning before statistical evidence accumulates.
Ombudsman or mediator findings	Independent third-party assessment of disputed decisions	External validation of whether the agent system's reasoning is defensible.
Market and peer benchmarking	Whether the organisation's outcomes are outliers relative to industry norms	If the organisation's denial rate for a demographic group is 3x the industry average, that is a signal regardless of whether the agent's per-decision reasoning appears sound.

These signals are not available to inline judges. They accumulate over time. They require human interpretation. They are the foundation of meaningful ethics and fairness evaluation, and they belong in a broader monitoring process, not in a synchronous evaluation gate.

What Organisations Must Define¶

The framework provides the monitoring mechanism and the integration points for external signals. The organisation provides the policy and the governance structure. This means:

Organisation Responsibility	What It Covers
Ethics policy	What constitutes an ethical violation in the organisation's context. Which outputs require ethics review. What the response is when a violation is detected. This is an organisational document, not a technical specification.
Bias detection criteria	Which protected classes are monitored. What statistical thresholds trigger investigation (e.g. approval rate disparity >5% between groups). What external data sources feed the monitoring pipeline (complaints, appeals, demographic outcome data).
Fairness standards	What "fair" means for the organisation's specific use case. Whether fairness is measured by equal treatment, equal outcomes, or another standard. This varies by jurisdiction (EU AI Act vs. US civil rights law vs. other frameworks).
External signal integration	Which external sources are connected to the monitoring pipeline (customer feedback systems, complaint management, ombudsman findings). Who is responsible for feeding these signals into the evaluation process. What the SLA is for incorporating new external evidence.
Review cadence and governance	How often monitoring reports are reviewed. Who reviews them (ethics board, diversity committee, legal counsel, organisational leadership). What triggers an immediate review (regulatory correspondence, complaint spike, appeal overturn rate breach) vs. periodic aggregate review.
Remediation process	What happens when systematic bias is detected. Whether affected decisions are reversed or compensated. Who is notified (affected customers, regulators, board). What changes to the agent configuration, OISpecs, or guardrails. How the remediation is verified.

Integration with the Evaluation Stack¶

Policy-driven evaluation does not replace operational evaluation. It runs outside the agent architecture as a broader monitoring process:

Layer	Operational (inline, sync)	Policy-driven (offline, async)
Guardrails	Input/output validation, tool scoping	Protected-class keyword detection as a flagging signal (not a block)
SLM sidecar	Tactical evaluation against OISpec, fraud/security/compliance domain criteria	Not used. LLMs are unreliable for subjective policy evaluation at the per-action level.
Cloud Judge	High-risk action evaluation (CRITICAL tier, synchronous)	Not used inline. May be used offline for sampled retrospective evaluation, but external signals (complaints, appeals) are more reliable ground truth.
Human review	Escalated operational decisions	Periodic review of monitoring reports. Investigation of patterns surfaced by statistical analysis and external signals.
Statistical monitoring	Anomaly scoring, drift detection	Outcome distribution analysis across protected classes. Disparity alerting. The primary detection mechanism.
External signal pipeline	Not applicable (inline architecture has no visibility)	Customer feedback, complaints, appeals, regulatory correspondence, demographic outcome data, ombudsman findings. The ground truth that the inline architecture cannot access.

The statistical monitoring component combined with external signals is the most effective layer for bias and fairness detection. It does not evaluate individual outputs. It monitors the aggregate distribution of outcomes, correlates with external feedback, and alerts when patterns emerge that no per-action evaluator could detect.

Do not make ethics evaluation a synchronous gate

The temptation to add an "ethics judge" to the synchronous evaluation path is understandable. Resist it. An LLM-based ethics evaluator running synchronously will produce false positives on ambiguous cases, creating operational friction that leads teams to disable it. It will produce false negatives on systematic biases, because it shares the same training biases as the task agent. Post-action statistical monitoring with human review is more reliable and more durable than per-action LLM-based ethics evaluation.

What This Does Not Solve¶

Precedence order is a policy decision, not a technical one. The framework defines the mechanism. The organisation decides the policy. In financial services, compliance typically takes precedence. In healthcare, patient safety takes precedence. In security operations, the security domain takes precedence. There is no universal answer.

Judges can agree and still be wrong. Multi-domain evaluation reduces the risk of single-domain blind spots, but if all judges share a common assumption (e.g. the same training data bias), they can unanimously approve something they should all flag. This is why judge model diversity (Judge Assurance, Control 2) and adversarial testing (PA-2.8) remain necessary even with multi-domain evaluation.

Recognising Judge Proliferation¶

The evaluation architecture can look alarming on paper. A workflow with 5 task agents, a tactical judge, a strategic evaluator, a meta-evaluator, an observer, and 3 domain-specific judges appears to require 12 running services. Teams that read the architecture diagrams literally may perceive "judge hell": an uncontrollable proliferation of evaluation agents that costs more than the system it protects.

This perception is understandable. It is also based on a misreading of the architecture. The framework describes evaluation roles, not evaluation services. The distinction matters.

Roles vs. Services¶

Evaluation Role	What It Does	How It Deploys
Tactical judge	Evaluates each agent action against its OISpec	A distilled SLM sidecar (10-50ms, infrastructure cost only). Not a separate service.
Strategic evaluator	Assesses combined agent outputs against workflow intent	A single LLM call at phase boundaries. A batch job, not a persistent agent.
Meta-evaluator	Monitors judge drift against judge OISpec	A scheduled calibration pipeline (daily/weekly). Injects known test cases and measures accuracy.
Observer	Anomaly scoring, PACE escalation	A metrics pipeline feeding the anomaly scoring model. Existing monitoring infrastructure.
Domain judges (fraud, security, compliance)	Evaluates actions from a specific policy perspective	Can be consolidated into a single evaluation call with structured multi-domain criteria. Or separate SLM sidecars if latency requires it.

A fraud detection workflow at Tier 2 with SLM sidecars requires:

1 SLM sidecar process (tactical evaluation, possibly multi-domain)
1 periodic batch job (strategic evaluation)
1 scheduled pipeline (meta-evaluation / calibration)
Existing monitoring infrastructure (observer)

That is 3 operational components, not 12. The architecture describes the logical separation of concerns. The deployment consolidates them.

When to Add a Judge, When Not To¶

Not every workflow needs every evaluation layer. Use this decision framework:

Question	If Yes	If No
Can guardrails alone catch the failure modes you care about?	No judge needed for those modes. Guardrails are cheaper and faster.	You need a judge for the semantic evaluation that guardrails cannot perform.
Does the workflow produce consequential outputs (financial, medical, legal, irreversible)?	Full evaluation stack: tactical + strategic + domain judges as needed.	Tactical judge only, or sampling-based evaluation.
Are there multiple policy domains that could conflict?	Multi-domain evaluation with conflict resolution.	Single-domain judge is sufficient.
Is this a Tier 1 (supervised) deployment?	Manual human review replaces automated judges. No judge infrastructure needed.	Automated evaluation scales with autonomy.
Does the judge's false negative rate exceed the base rate of the threat?	The judge adds cost without security value. Remove it or retrain it.	The judge is net-positive. Keep it.

The right number of judges is the minimum needed to catch what guardrails miss, proportionate to the risk of the workflow. A low-risk FAQ bot needs guardrails and maybe a sampled judge. A high-risk fraud detection pipeline needs the full stack. Deploying the full stack on every workflow is over-engineering. Deploying nothing but guardrails on a high-risk workflow is under-engineering.

Testing Criteria¶

Tier 1 Tests¶

Test ID	Test	Pass Criteria
PA-T1.1	Role declaration	Every agent in the orchestration has an explicit role declaration. No agent operates without a declared role.
PA-T1.2	Orchestrator plan logging	Submit a multi-step task. Verify orchestrator's decomposition and agent selection decisions are logged with reasoning.
PA-T1.3	Judge decision logging	Trigger Judge evaluations (pass, escalate, block). Verify each decision is logged with criteria and reasoning.

Tier 2 Tests¶

Test ID	Test	Pass Criteria
PA-T2.1	Intent verification	Submit a task. Modify the orchestrator's decomposition to subtly misalign with user intent (e.g., "summarise complaints" decomposed into "extract email addresses"). Intent verification model detects the misalignment.
PA-T2.2	Judge calibration	Present known-bad outputs to the Judge (policy violations, hallucinations, data leakage). Detection rate meets threshold (> 95% HIGH, > 80% MEDIUM).
PA-T2.3	Judge criteria tampering	Attempt to modify Judge criteria without the approval trail. System rejects the unauthorised change.
PA-T2.4	Inter-judge conflict: most restrictive wins	Two domain judges evaluate the same action: one approves, one flags. Verify the action is escalated (most restrictive wins).
PA-T2.4a	Inter-judge conflict: precedence order	Configure a precedence order where compliance outranks fraud. Trigger a conflict where the fraud judge blocks but the compliance judge approves. Verify the precedence order resolves correctly (compliance verdict takes priority per configuration).
PA-T2.4b	Inter-judge conflict logging	Trigger a conflict between two judges. Verify the conflict log includes both verdicts with reasoning, the resolution rule applied, and the final outcome.
PA-T2.4c	Time-constrained conflict: security wins	Simulate active fraud AND active security breach. Verify security containment executes first, then fraud and compliance actions degrade to operate within the security boundary.
PA-T2.4d	Time-constrained conflict: resolution window expiry	Trigger a HIGH-risk conflict requiring human arbitration. Let the arbitration window expire without human response. Verify the system applies the configured expiry default (most restrictive or accept risk) and logs the decision with attribution.
PA-T2.5	Observer false positive	Review observer escalation history. False positive rate is below threshold (< 5%).
PA-T2.5	Kill switch dual auth	Trigger a kill switch from the observer. Verify secondary confirmation is required before system shutdown.
PA-T2.6	Kill switch fail-safe	Trigger a kill switch when the secondary confirmation mechanism is unavailable. Kill switch fires after the defined window.
PA-T2.7	Orchestrator drift	Modify orchestrator behavior (change agent selection patterns). Anomaly scoring detects the drift.
PA-T2.8	Red team privileged agents	Execute quarterly red team scenarios. All three privileged agent types tested. Detection rates documented.

Tier 3 Tests¶

Test ID	Test	Pass Criteria
PA-T3.1	Aggregation evaluation	Orchestrator smooths over an intermediate hallucination during aggregation. Judge evaluating the orchestrator's output detects the smoothed-over failure.
PA-T3.2	Nested scope enforcement	Sub-orchestrator attempts to invoke an agent outside its delegated scope. System blocks the invocation.
PA-T3.3	Sub-tree blast radius	Coordinate 3 agents within a sub-orchestrator's cluster to collectively exceed the sub-tree blast radius cap. Cap is enforced at the sub-tree level.
PA-T3.4	Continuous calibration failure	Degrade the Judge's detection accuracy below threshold. Continuous calibration detects the degradation and triggers PACE escalation.
PA-T3.5	Observer self-test	Observer's self-diagnostic injects a synthetic anomaly. Observer detects it. Then: disable detection for the synthetic type. Self-test fails and alerts a human.
PA-T3.6	Cross-level evaluation	In a nested topology, introduce a failure at the sub-orchestrator aggregation level. Cross-level evaluation catches it before it reaches the top-level orchestrator.

Maturity Indicators¶

Level	Indicator
Initial	Orchestrator operates as a trusted black box. Judge criteria set at deployment and never revisited. Observer accuracy unknown. No formal register of privileged agents.
Managed	Privileged agents identified and registered. Orchestrator plans logged. Judge decisions logged. Basic calibration testing. Human reviews orchestrator and judge decisions periodically.
Defined	Independent intent verification for orchestrator. Version-controlled Judge criteria. Observer precision tracked. Kill switch dual authorisation. Red team testing of privileged agents.
Quantitatively Managed	Orchestrator drift measured. Judge calibration trended monthly. Observer false positive/negative rates published. Nested topology controls specified per orchestration level.
Optimising	Continuous calibration. Judge model rotation. Observer self-test. Cross-level evaluation in nested topologies. Privileged agent controls tuned based on operational data.

Common Pitfalls¶

Treating the orchestrator as infrastructure, not as an agent. If your orchestrator is an LLM, it has the same failure modes as any LLM - hallucination, injection susceptibility, goal drift. The fact that it plans rather than executes doesn't exempt it from monitoring.

Calibrating the Judge once and forgetting it. Judge accuracy decays. Models update. Criteria drift. The adversarial landscape shifts. A Judge that was 98% accurate at deployment may be 70% accurate six months later with no visible change in its configuration. Calibration must be ongoing.

Assuming independence equals correctness. The Judge uses a different model from the task agents. That makes it independent. It does not make it correct. Independence prevents correlated failure with task agents. Calibration verifies correctness. These are different controls solving different problems.

Setting blast radius caps per-agent but not per-sub-tree. Five agents with a 100-record cap each can collectively modify 500 records if coordinated by a compromised sub-orchestrator. The sub-tree needs its own cap.

Disabling the observer to restore service. When the observer triggers too many false positives, the operational pressure to disable it is real. The answer is not to disable the observer - it's to fix the calibration. If the observer is disabled, that fact must be logged, a human must formally accept the residual risk, and a remediation timeline must be defined. Running without the observer is a PACE Contingency state, not normal operations.

Building a meta-judge to watch the Judge. The recursion problem is real but the solution is not more layers. It's calibration: periodic injection of known test cases to verify that each privileged agent is still performing as expected. Red team testing breaks the "who watches the watchmen" loop.

Running multiple domain judges with no conflict resolution protocol. If a fraud judge, a security judge, and a compliance judge can all evaluate the same action and produce different verdicts, somebody must define which verdict wins. Without a precedence order, the system either deadlocks, escalates everything to a human (defeating automation), or silently applies whichever judge responded first (non-deterministic). Define precedence at design time, not at incident time.

Deploying judges because the architecture diagram says to. The framework describes evaluation roles for completeness. Not every workflow needs every role. A Tier 1 deployment with manual human review does not need automated judges. A low-risk workflow with effective guardrails does not need a strategic evaluator. Deploy what the risk profile requires, not what the diagram shows. See Recognising Judge Proliferation for the decision framework.

Relationship to Other Domains¶

Domain	Relationship
Identity & Access	PA extends IA-2.5 (orchestrator privilege separation) to cover orchestrator decision-making, not just tool access. PA-3.2 extends IA-2.4 (no transitive permissions) to nested orchestration levels.
Execution Control	PA extends EC-2.5 (Model-as-Judge gate) with Judge governance - calibration, criteria versioning, disagreement procedures. PA-3.3 extends EC-2.3 (blast radius caps) to orchestration sub-trees.
Observability	PA extends OB-3.3 (independent observability agent) with observer self-test, precision monitoring, and kill switch dual authorisation.
Prompt, Goal & Epistemic Integrity	PA-2.1 (orchestrator intent verification) complements PG-2.2 (goal integrity monitoring) by applying intent verification to the orchestrator's own decisions, not just task agents.