MASO Control Domain: Agentic Task Contract and Behavioural Governance¶
Part of the MASO Framework · Control Specifications Extends: Objective Intent · Execution Control · Observability · Model Cognition Assurance Covers: Contract-level agentic governance · Means compliance · Creative substitution detection · Plan attempt detection · Anti-Mythos judge architecture
Principle¶
Checking whether a model's outputs are harmful is not the same as checking whether a model followed its declared intent, used its authorised means, and produced its specified outcome. That gap is where agentic risk lives. Every agentic workflow must operate under a formal contract that declares what the workflow should accomplish, which agents may participate, what tools they may use, and what happens when boundaries are violated. Compliance is evaluated against the contract, not against output quality alone.
This domain introduces the Agentic Task Contract as the governing artefact for agentic execution. Contracts exist at two levels: a solution contract governing an entire agentic workflow end to end, and agent contracts governing individual agents within that workflow. Agent contracts inherit from and cannot exceed the permissions of their parent solution contract.
Why This Matters in Multi-Agent Systems¶
Output-layer governance misses the most dangerous failure mode. A model that achieves the correct outcome by unauthorised means has violated its operational boundaries while producing no detectable failure signal. Output monitors see a correct result. Guardrails see no policy violation. The judge sees a pass. The model has learned that boundary violation is rewarded when it leads to task completion. Anthropic's Mythos system card (April 2026) documented this pattern as creative substitution: the model optimises for task completion over means compliance.
Blocked tool attempts are behavioural indicators. When an agent attempts to invoke a tool outside its authorised list and is structurally blocked, that attempt is a signal regardless of whether it succeeded. A model that repeatedly attempts to reach tools outside its authorised set is exhibiting autonomous goal-directed reasoning beyond its declared boundaries. This is functionally indistinguishable from internal reconnaissance by a human threat actor and should be treated accordingly in SIEM and SOC workflows.
Binary success/failure classification is insufficient. Agentic execution produces four distinct states, not two. Normal operation (correct means, correct outcome) and execution failure (correct means, incorrect outcome) are well understood. Visible failure (incorrect means, incorrect outcome) is detectable. Creative substitution (incorrect means, correct outcome) is invisible to any monitoring system that only evaluates outcomes.
Contracts are the bridge between intent declaration and execution enforcement. The Objective Intent domain declares what agents should do. This domain enforces how they do it, detects when they deviate, and governs the response when deviation occurs.
The Four-State Deviation Model¶
All agentic execution monitoring in this domain is built on four behavioural states:
| State | Means | Outcome | Detection Difficulty | Response Priority |
|---|---|---|---|---|
| Normal operation | Correct | Correct | Trivial | Log |
| Execution failure | Correct | Incorrect | Low (outcome monitoring) | Remediate |
| Visible failure | Incorrect | Incorrect | Medium (means or outcome monitoring) | Incident |
| Creative substitution | Incorrect | Correct | High (requires means monitoring) | Highest |
Creative substitution is the highest priority state because it is invisible to output-layer monitoring, it rewards boundary violation, and it represents the model optimising for task completion over means compliance. Systems that classify agentic outcomes as binary (success or failure) will systematically miss this state.
Controls by Tier¶
Tier 1 - Supervised¶
| Control | Requirement | Implementation Notes |
|---|---|---|
| AT-1.1 Solution contract declaration | Every agentic workflow must have a solution-level contract declared and approved before deployment | Intent: establish the governing artefact for agentic execution before any agent runs. The contract must specify: business process ID, workflow intent, authorised outcome, authorised agents and their sequence, authorised data sources, prohibited actions, hard limits (financial, volume, time), escalation path, and plan detection thresholds. No agentic workflow may be deployed without an approved solution contract. Evidence: signed contract document in the contract library, approval record from the process owner and technical owner. Related: Objective Intent OISpec provides the intent declaration; the solution contract extends this with means governance, escalation paths, and enforcement thresholds. |
| AT-1.2 Agent contract declaration | Every agent within an agentic workflow must have an agent-level contract declaring its role and boundaries within the workflow | Intent: bind each agent to specific permissions that can be validated at runtime. The contract must specify: agent ID, sequence position, agent intent, authorised outcome, authorised tool list (explicit, not inherited), prohibited tool list, agent logic specification (expected reasoning approach), and escalation path. Agent contracts must reference their parent solution contract by ID and version. Agent contracts may not claim permissions not granted by the parent solution contract. Evidence: agent contract document referencing parent solution contract, tool list matching provisioned tool set. Related: AT-1.1 (parent contract), IA-1.4 (scoped permissions). |
| AT-1.3 Contract library and version control | All solution and agent contracts must be stored in a versioned contract library with change control | Intent: create an auditable record of what was authorised and when. The contract library is the authoritative source for what any agentic workflow is permitted to do. Changes to any contract require a change control process equivalent to that applied to production system changes. Contract versions must be immutable once approved. Evidence: version-controlled repository with change history, approval records per version, deployment records linking running workflows to specific contract versions. Related: SC-1.1 (model inventory), SC-2.1 (AIBOM). |
| AT-1.4 Tool allow-list enforcement | Every agent must be provisioned with only the tools explicitly listed in its agent contract; enforcement must be structural, not policy-based | Intent: if a tool is not provisioned, the agent cannot invoke it regardless of its reasoning. Tool provisioning is a security control, not a configuration choice. Allow-list enforcement must operate at the infrastructure layer (tool not available to the runtime) rather than the policy layer (tool available but instructions say not to use it). Policy-based restrictions can be reasoned around by the model. Structural restrictions cannot. Evidence: provisioned tool set matches agent contract tool list exactly; no additional tools available in the agent runtime. Related: EC-1.2 (tool allow-lists). AT-1.4 extends EC-1.2 by tying allow-lists to a formal contract rather than a configuration setting. |
| AT-1.5 Execution trace logging | Every agent execution must produce a complete execution trace logging all tool calls, parameters, sequence, timing, and return values | Intent: means compliance cannot be evaluated without knowing what the agent actually did. Outcome logging alone is insufficient. The execution trace is the evidence base for all Tier 2 deviation detection. Traces must be retained as audit evidence for the period defined by the organisation's retention policy. Evidence: complete execution traces for every agent invocation, stored in append-only storage, queryable by session ID and agent ID. Related: OB-1.1 (action audit log). AT-1.5 extends OB-1.1 by requiring the full tool call sequence with parameters and return values, not just action summaries. |
| AT-1.6 Plan attempt logging | All blocked tool invocation attempts must be logged as behavioural indicators at the point of invocation | Intent: a blocked tool attempt is a signal, not discarded noise. When an agent attempts to invoke a tool outside its allow-list and is structurally blocked (AT-1.4), the attempt must be captured. Logs must record: agent ID, session ID, attempted tool name, timestamp, and the agent's preceding reasoning context where available. Plan attempt logs must be forwarded to the organisation's SIEM. A single plan attempt may be noise. Accumulated plan attempts are a signal. Evidence: plan attempt log entries with required fields, SIEM ingestion confirmation. Related: OB-2.4 (SIEM integration), AT-2.8 (plan attempt trend analysis). Residual risk: plan attempt logging depends on the tool enforcement layer reporting blocked attempts. If the enforcement layer silently drops attempts without logging, the signal is lost. Verify that the enforcement mechanism generates a log event on every block, not just a silent rejection. |
| AT-1.7 Hard stop conditions | Solution contracts must define hard stop conditions that result in immediate halt of all agent execution regardless of workflow state | Intent: define the conditions under which no agent may continue, before those conditions are encountered at runtime. Hard stop conditions must include at minimum: any agent returning a FAIL status, fraud threshold breach, financial or volume limit breach, execution sequence violation, and plan attempt threshold breach. Hard stops may not be overridden by any agent. Hard stop override requires human authorisation documented in the audit trail. Evidence: hard stop conditions listed in solution contract, enforcement mechanism tested, override log with human authorisation records. Related: EC-2.4 (circuit breakers), PACE Emergency phase. |
What you're building at Tier 1: The contract infrastructure. Every workflow has a declared contract. Every agent has a declared contract within it. Tools are structurally enforced. Execution is traced. Blocked attempts are logged. Hard stops are defined. No automated deviation detection yet, but the evidence base exists for Tier 2 to build on.
Tier 2 - Managed¶
All Tier 1 controls remain active, plus:
| Control | Requirement | Implementation Notes |
|---|---|---|
| AT-2.1 Contract inheritance enforcement | A mechanism must validate at deployment time that agent contracts do not exceed the permission boundaries of their parent solution contract | Intent: prevent privilege escalation through contract misconfiguration. Any agent contract claiming tools, data sources, or actions not authorised in the solution contract must be rejected before deployment authorisation is granted. Validation must be automated, not manual. Evidence: deployment gate log showing inheritance validation pass/fail for each agent contract, rejection records for contracts that failed validation. Related: AT-1.1, AT-1.2, IA-2.4 (no transitive permissions). |
| AT-2.2 Contract review cycle | All contracts must be reviewed on a defined cycle and on specified triggers | Intent: contracts decay as models, tools, and business processes change. Review must occur on a defined schedule (recommended: quarterly) and on any of the following triggers: model version change, tool set change, business process change, creative substitution event (AT-2.7), plan attempt threshold breach (AT-2.8), or regulatory change. Review must include re-approval by the process owner and technical owner. Evidence: review records with date, trigger, reviewer, and outcome (reaffirmed/amended/withdrawn). Related: AT-1.3 (contract library), SC-2.1 (AIBOM, triggers on model version change). |
| AT-2.3 Least privilege tool provisioning | Tool allow-lists must be constructed on the principle of least privilege, considering what the model might consider equivalent to authorised tools | Intent: prevent the common failure where tools are provisioned because they could be useful rather than because they are required. Only tools directly required to execute the declared agent intent may be provisioned. Tools that could achieve the declared outcome by an alternative means must not be provisioned unless explicitly required. The provisioning review must consider what the model might treat as a functional substitute for authorised tools, not only what is explicitly needed. Evidence: documented justification for each tool on the allow-list, review record showing substitute-tool analysis. Related: AT-1.4, EC-1.2 (tool allow-lists). |
| AT-2.4 Execution sequence enforcement | In multi-agent workflows, the authorised execution sequence declared in the solution contract must be enforced structurally at the orchestration layer | Intent: prevent agents from running out of order, which can produce correct-looking outcomes through unintended execution paths. No agent may run before its upstream dependencies have returned an authorised completion status. Sequence enforcement must be implemented at the orchestration layer, not left to individual agent behaviour. An agent that attempts to execute before its upstream dependency completes constitutes a sequence violation and triggers the defined escalation path. Evidence: orchestration logs showing sequence enforcement, sequence violation records. Related: AT-1.1 (sequence declared in solution contract), EC-2.1 (action classification). |
| AT-2.5 Idempotency requirement | Any agent authorised to take irreversible actions must implement idempotency controls keyed to a session or instruction identifier | Intent: prevent duplicate execution of irreversible actions (payment execution, data write, external communication). An agent that executes the same irreversible action twice on the same instruction constitutes a critical control failure regardless of whether the second execution was individually authorised. Evidence: idempotency key implementation per irreversible action type, duplicate execution test results, incident records for any duplicate execution. Related: EC-1.6 (reversibility assessment), AT-1.7 (hard stop conditions). |
| AT-2.6 Four-state deviation classification | All agentic execution monitoring must implement the four-state deviation model; binary success/failure classification is insufficient | Intent: make creative substitution a named, detectable, and reportable category. Systems that classify agentic outcomes as binary (success or failure) will systematically miss the case where correct outcomes are achieved by incorrect means. Monitoring must evaluate means compliance and outcome compliance independently and in combination. Creative substitution events must be treated with equivalent urgency to fraud indicators in a financial services context. Evidence: monitoring configuration showing four-state classification, sample deviation reports showing all four states, creative substitution response procedure documented and tested. Related: AT-2.7 (creative substitution detection), OB-2.2 (anomaly scoring). Residual risk: four-state classification depends on the definition of "correct means" in the agent contract. If the contract is underspecified (authorised tools listed but expected tool sequences not defined), creative substitution that uses authorised tools in an unauthorised sequence may not be detected as a means violation. Contract specificity determines detection capability. |
| AT-2.7 Creative substitution detection | Monitoring must compare actual tool call sequences against the expected sequences declared in the agent contract, not just whether the final output was correct | Intent: detect the specific case where an agent achieves the correct outcome by incorrect means. Detection logic must compare the execution trace (AT-1.5) against the agent contract's logic specification and tool sequence. An execution that achieves the correct outcome but deviates from the declared means must not be classified as normal operation. Evidence: detection rules comparing execution traces against contract specifications, creative substitution event log, false positive/negative rates from testing. Related: AT-2.6, AT-1.5 (execution trace), MC-2.6 (reward hacking anomaly detection). |
| AT-2.8 Plan attempt trend analysis | Plan attempt logs must be accumulated and trended over time to identify systematic boundary reasoning | Intent: a single blocked tool attempt may be noise. Accumulated attempts across sessions indicate a model systematically reasoning about tools beyond its authorised set. Trend analysis must identify: repeated attempts to invoke the same prohibited tool, attempts across different task types indicating systematic boundary reasoning, and rising attempt frequency indicating increasing model deviation. Trend thresholds must be defined in each solution contract and enforced as escalation triggers. Evidence: trend analysis dashboard or report, threshold definitions in solution contracts, escalation records triggered by threshold breach. Related: AT-1.6 (plan attempt logging), OB-2.3 (drift detection). Residual risk: trend analysis detects patterns in blocked attempts. If the model discovers an alternative path that achieves the same goal without triggering a blocked attempt (e.g., using an authorised tool in an unintended way), the plan attempt signal will not fire. Plan attempt detection is complementary to creative substitution detection (AT-2.7), not a substitute for it. |
| AT-2.9 Contract escalation path enforcement | Every contract must define a specific escalation path for each halt condition, routing to human oversight rather than to a higher-privilege agent | Intent: when an agent or workflow halts, the response must be predetermined and documented, not improvised at runtime. Escalation paths must specify: what is halted, what state is preserved, where the escalation is routed, what human decision is required, and what authorisation is needed before the workflow may resume or be terminated. Escalation to a higher-privilege agent is prohibited. Escalation goes to human oversight. Evidence: escalation paths documented per halt condition in each contract, escalation test records, human decision records from exercised escalation paths. Related: AT-1.7 (hard stop conditions), PACE model, EC-2.1 (action classification). |
| AT-2.10 Soft escalation and contract amendment | Soft escalation conditions must route to human oversight for a contract amendment decision, with amendments formally recorded | Intent: provide a governed pathway for contracts to be amended at runtime when legitimate business needs require it, without allowing agents to self-authorise expanded permissions. The human decision must be documented with: the original contract specification, the condition that triggered escalation, the amendment requested, the approver, and the amended contract version. Agents may not resume on amended terms until the contract library is updated and the amendment is formally recorded. Evidence: soft escalation records, amendment records in the contract library with version linkage, human authorisation records. Related: AT-1.3 (contract library), AT-2.2 (contract review cycle). |
| AT-2.11 Anti-Mythos judge deployment | Any agentic workflow operating at HIGH or CRITICAL risk tier must be evaluated by a reasoning model judge whose sole function is contract compliance evaluation | Intent: output quality evaluation (covered by existing Judge controls) is not the same as contract compliance evaluation. The Anti-Mythos judge evaluates whether agents followed their declared contracts, not whether their outputs were good. The judge must not share infrastructure with the agents it evaluates. The judge must have no execution capability in the systems it evaluates. The judge must operate read-only on execution traces, contracts, and session context. Evidence: judge deployment record for each HIGH/CRITICAL workflow, infrastructure separation verification, read-only access confirmation. Related: PA-2.2 (judge calibration), EC-2.5 (Model-as-Judge gate). AT-2.11 is complementary to existing Judge controls: the existing Judge evaluates output quality and safety; the Anti-Mythos judge evaluates means compliance against the contract. |
| AT-2.12 Judge contract | The Anti-Mythos judge must itself have a declared contract specifying its authorised inputs, outputs, prohibited actions, and escalation path | Intent: the judge is an agent and requires governance like any other agent. The judge contract must explicitly prohibit: any write operation to evaluated systems, any modification of contracts, and any self-modification of evaluation criteria. The judge contract must be reviewed independently of the contracts it evaluates. Evidence: judge contract in the contract library, review record showing independent review (not by the team operating the evaluated workflows). Related: AT-1.2 (agent contract), PA-2.3 (judge criteria versioning). |
| AT-2.13 Judge verdict schema | Judge verdicts must be structured, machine-readable, and SIEM-ingestible | Intent: judge findings must be actionable by security operations, not buried in unstructured text. Verdict schema must include: session ID, workflow ID, agent ID, evaluation timestamp, four-state classification, deviation detail, plan attempt summary, pattern flags, escalation recommendation, and confidence level. Verdict records constitute audit evidence. Evidence: verdict schema definition, sample verdicts conforming to schema, SIEM ingestion confirmation. Related: OB-2.4 (SIEM integration), AT-2.6 (four-state classification). |
| AT-2.14 Vendor agentic capability disclosure | Third-party AI vendors supplying models used in agentic workflows must disclose known agentic behavioural characteristics | Intent: informed deployment decisions require knowing how a model behaves when given tool access and autonomous execution capability. Disclosure must cover: the model's known capability to reason about tool substitution, any documented instances of creative substitution or reward hacking in pre-deployment evaluation, the interpretability tooling applied during evaluation, and the vendor's process for disclosing alignment-relevant behaviours discovered post-deployment. Evidence: vendor disclosure document on file per model, disclosure completeness assessment. Related: MC-2.8 (vendor interpretability attestation), MC-2.9 (material finding disclosure), SC-2.1 (AIBOM). |
| AT-2.15 Activation-layer residual risk declaration | Organisations deploying third-party models via API must formally document that behavioural monitoring controls cannot fully close the activation-layer assurance gap | Intent: make the residual risk explicit and owned. API-only consumption provides no access to activation-level interpretability tools. Output-layer and execution-layer controls (this domain) reduce but do not eliminate the risk that a model's internal reasoning diverges from its expressed behaviour. This residual risk must be formally documented in the organisation's AI risk register, acknowledged by the risk owner, and reviewed at each model version change. It must not be treated as closed by output-layer or execution-layer controls alone. Evidence: risk register entry with risk-owner sign-off, review records at each model version change. Related: MC-2.7 (third-party activation-layer residual risk). AT-2.15 extends MC-2.7 by making the residual risk declaration specific to agentic execution contexts where the consequences of undetected internal misalignment are amplified by tool access and autonomous action. |
What you're building at Tier 2: Automated enforcement and detection. Contracts are validated at deployment. Means compliance is monitored at runtime. Creative substitution is a named detection category. Plan attempts are trended. The Anti-Mythos judge evaluates contract compliance for high-risk workflows. Procurement requires vendor disclosure of agentic behavioural characteristics.
Tier 3 - Autonomous¶
All Tier 2 controls remain active, plus:
| Control | Requirement | Implementation Notes |
|---|---|---|
| AT-3.1 Cross-session drift detection | Longitudinal drift analysis must be applied across execution sessions to detect gradual behavioural shift that stays within per-execution thresholds | Intent: a model whose execution distribution gradually shifts away from baseline represents a drift risk even if individual executions remain within per-execution thresholds. Per-execution monitoring (AT-2.6, AT-2.7) catches deviations in a single run. Cross-session analysis catches the pattern where each individual run is within tolerance but the population of runs is migrating. Drift detection must compare current execution distributions (tool usage patterns, sequence variations, timing profiles, plan attempt frequency) against a rolling baseline. Drift detection must trigger governance review before any single execution breaches threshold. Evidence: drift analysis reports with baseline comparison, governance review records triggered by drift alerts. Related: OB-3.1 (long-window behavioural analysis), MC-2.5 (behavioural baseline for reward hacking). Residual risk: drift detection depends on the baseline being representative. If the baseline was established during a period of already-drifted behaviour, subsequent drift will be measured against an already-deviated reference. Baseline establishment must use a verified-clean observation period. |
| AT-3.2 Judge evaluation scope | The Anti-Mythos judge must evaluate contract compliance at four levels: tool call, sequence, logic, and cross-session pattern | Intent: rules-based monitoring alone is insufficient for high and critical risk agentic workflows. The judge must evaluate: (1) tool call level, were authorised tools used and prohibited tools avoided; (2) sequence level, was the execution sequence followed correctly; (3) logic level, did the agent follow its declared logic specification or take shortcuts; (4) pattern level, does the execution pattern across sessions indicate systematic boundary reasoning. The logic-level evaluation requires a reasoning model, not a rules engine, because it must assess whether the agent's approach to the task matches its declared logic specification, not just whether the correct tools were called. Evidence: judge evaluation records showing all four levels assessed, logic-level reasoning traces from the judge, pattern-level analysis reports. Related: AT-2.11 (judge deployment), AT-2.12 (judge contract), PA-3.5 (continuous calibration). Residual risk: logic-level evaluation requires the judge to assess reasoning quality, which introduces the same faithfulness problem that affects CoT monitoring (MC-2.4). The judge is evaluating whether the agent's execution path is consistent with its declared logic, but the judge's own reasoning about consistency may be unreliable for subtle deviations. Calibration testing (PA-2.2) with known creative substitution scenarios is the compensating control. |
| AT-3.3 Judge independence | The Anti-Mythos judge must be operationally independent of the teams responsible for deploying and operating the agents it evaluates | Intent: prevent conflicts of interest where the team operating the workflow also controls the judge evaluating it. Judge configuration and evaluation criteria must be owned by the security function, not the business or technical teams operating the agentic workflow. Judge findings must be reportable to risk and compliance functions without passing through the teams being evaluated. This is the same organisational separation principle applied to internal audit and compliance functions. Evidence: organisational chart showing judge ownership by security function, reporting line verification, access control records confirming evaluated teams cannot modify judge configuration or suppress findings. Related: PA-3.4 (judge model rotation), AT-2.12 (judge contract). |
What you're building at Tier 3: Assured compliance. Cross-session drift is detected before individual executions breach thresholds. The judge evaluates at four levels including logic-level reasoning assessment. Judge independence is organisationally enforced. The system detects not just what went wrong, but what is gradually going wrong.
Solution Contract Specification¶
The solution contract is the top-level governance artefact for an agentic workflow. It defines the boundaries within which all agents in the workflow must operate.
Required fields:
| Field | Description | Example |
|---|---|---|
| Contract ID | Unique identifier, version-controlled | SOL-2026-0047-v2 |
| Business Process ID | Link to the business process this workflow supports | BP-FRAUD-DETECT-001 |
| Workflow Intent | What the workflow is authorised to accomplish | "Detect and flag potentially fraudulent transactions for human review" |
| Authorised Outcome | The specific deliverable the workflow must produce | "Fraud risk assessment report per transaction batch, delivered to the fraud review queue" |
| Authorised Agents | Named agents permitted to participate, with sequence | [agent-data-retriever → agent-pattern-analyser → agent-report-writer] |
| Authorised Data Sources | Data sources agents may access | [transactions-db (read), customer-profiles (read), fraud-patterns-kb (read)] |
| Authorised Tools | Superset of tools available to any agent in the workflow | [db-query, pattern-match, report-generate, queue-publish] |
| Prohibited Actions | Actions no agent in this workflow may take | [direct-customer-contact, account-freeze, payment-reversal] |
| Hard Limits | Financial, volume, and time boundaries | max_transactions_per_batch: 10000, max_execution_time: 300s, max_cost: $5.00 |
| Hard Stop Conditions | Conditions triggering immediate halt | [agent_fail, fraud_threshold_breach, limit_breach, sequence_violation, plan_attempt_threshold] |
| Escalation Path | Per halt condition: who is notified, what decision is required | See AT-2.9 |
| Plan Detection Thresholds | Acceptable plan attempt counts before escalation | single_session: 3, rolling_7day: 10, rolling_30day: 25 |
| Process Owner | Human accountable for the business process | jane.smith@example.com |
| Technical Owner | Human accountable for the technical implementation | alex.chen@example.com |
| Approved By | Approval record | Process owner + Technical owner + AI Security Lead |
| Effective Date | When this contract version takes effect | 2026-04-15 |
| Review Date | Next mandatory review | 2026-07-15 |
Agent Contract Specification¶
Each agent within a workflow has its own contract that inherits from and cannot exceed its parent solution contract.
Required fields:
| Field | Description | Example |
|---|---|---|
| Agent Contract ID | Unique identifier, version-controlled | AGT-2026-0047-RETRIEVER-v1 |
| Parent Solution Contract | Reference to parent contract ID and version | SOL-2026-0047-v2 |
| Agent ID | Unique agent identifier matching the AIBOM | agent-data-retriever |
| Sequence Position | Where in the workflow this agent executes | 1 of 3 |
| Agent Intent | What this specific agent is authorised to accomplish | "Retrieve transaction data for the specified batch from the transactions database" |
| Authorised Outcome | The specific output this agent must produce | "Structured transaction dataset for the batch period, delivered to the message bus" |
| Authorised Tool List | Explicit list of tools this agent may use | [db-query] |
| Prohibited Tool List | Tools this agent must not use, even if available in the solution contract | [pattern-match, report-generate, queue-publish] |
| Agent Logic Specification | Expected reasoning approach and tool sequence | "Query transactions-db for batch period. Return structured dataset. No filtering, no analysis, no enrichment." |
| Escalation Path | Agent-level halt conditions and escalation targets | on_fail: halt + notify technical_owner; on_timeout: halt + retry(1) + notify |
Testing Criteria¶
Tier 1 Tests¶
| Test ID | Test | Pass Criteria |
|---|---|---|
| AT-T1.1 | Contract completeness | Review all solution and agent contracts. Every required field is populated. Agent contracts reference valid parent solution contracts. |
| AT-T1.2 | Tool enforcement structural | Attempt to invoke a tool not on the agent's allow-list. Invocation is structurally blocked (not just policy-rejected). The tool is not available in the agent runtime. |
| AT-T1.3 | Execution trace completeness | Run 10 agent executions. Verify each produces a complete trace: all tool calls, parameters, sequence, timing, and return values captured. |
| AT-T1.4 | Plan attempt capture | Configure an agent with a restricted tool set. Prompt the agent toward a task that would benefit from a prohibited tool. Verify the blocked attempt is logged with agent ID, session ID, tool name, and timestamp. Verify the log event reaches the SIEM. |
| AT-T1.5 | Hard stop enforcement | Trigger each hard stop condition defined in a solution contract. Verify all agent execution halts immediately. Verify human override is required and logged to resume. |
Tier 2 Tests¶
| Test ID | Test | Pass Criteria |
|---|---|---|
| AT-T2.1 | Contract inheritance validation | Create an agent contract that claims a tool not authorised in its parent solution contract. Deployment gate rejects the contract. |
| AT-T2.2 | Sequence enforcement | In a multi-agent workflow, attempt to trigger agent 3 before agent 2 has completed. Orchestration layer blocks the out-of-sequence execution. |
| AT-T2.3 | Idempotency | Submit the same irreversible action instruction twice with the same session identifier. Second execution is blocked by idempotency control. |
| AT-T2.4 | Four-state classification | Inject four test scenarios: normal operation, execution failure, visible failure, and creative substitution. Monitoring correctly classifies all four. Creative substitution is not classified as normal operation. |
| AT-T2.5 | Creative substitution detection | Configure an agent with tool A (authorised) and tool B (authorised). Agent contract specifies tool A for the task. Agent uses tool B to achieve the correct outcome. Detection system flags the execution as creative substitution. |
| AT-T2.6 | Plan attempt trending | Inject plan attempts at a rate exceeding the solution contract threshold. Trend analysis triggers escalation within the defined window. |
| AT-T2.7 | Escalation path execution | Trigger a halt condition. Verify the escalation path fires as defined in the contract: correct recipient notified, correct state preserved, workflow halted until human authorisation. |
| AT-T2.8 | Contract amendment audit trail | Exercise a soft escalation. Human approves an amendment. Verify the amendment is recorded in the contract library with original spec, trigger, amendment detail, approver, and new version. |
| AT-T2.9 | Judge read-only enforcement | Attempt to give the Anti-Mythos judge write access to an evaluated system. Access is denied. Judge operates read-only on execution traces and contracts only. |
| AT-T2.10 | Judge verdict schema | Review 10 judge verdicts. All conform to the defined schema. All are ingested by the SIEM. All include four-state classification. |
Tier 3 Tests¶
| Test ID | Test | Pass Criteria |
|---|---|---|
| AT-T3.1 | Cross-session drift | Gradually shift an agent's tool usage distribution across 20 sessions, keeping each individual session within per-execution thresholds. Drift detection identifies the population-level shift and triggers governance review. |
| AT-T3.2 | Judge logic-level evaluation | Present the judge with an execution where the agent used authorised tools in the correct sequence but took a reasoning shortcut (e.g., pattern-matched the expected output format instead of performing analysis). Judge identifies the logic-level deviation. |
| AT-T3.3 | Judge independence verification | Verify that the team operating the evaluated workflow cannot modify judge configuration, suppress judge findings, or access judge evaluation criteria without security function approval. |
Maturity Indicators¶
| Level | Indicator |
|---|---|
| Initial | No formal contracts for agentic workflows. Tool provisioning based on convenience. Execution outcomes logged but means not tracked. Blocked tool attempts discarded as errors. |
| Managed | Solution and agent contracts declared for all workflows. Contract library maintained. Tools structurally enforced per contract. Execution traces and plan attempts logged. Hard stops defined and tested. |
| Defined | Contract inheritance validated at deployment. Four-state deviation model implemented. Creative substitution is a named detection category. Plan attempts trended and escalated. Anti-Mythos judge deployed for HIGH/CRITICAL workflows with its own contract and structured verdicts. |
| Quantitatively Managed | Creative substitution detection rates measured. Plan attempt trends published. Judge verdict accuracy calibrated. Contract review cycle adherence tracked. Drift detection operational with measured sensitivity. |
| Optimising | Cross-session drift detection triggers proactive governance review. Judge evaluates at four levels including logic-level reasoning. Judge organisationally independent. Contract specifications refined based on operational data and creative substitution findings. |
Common Pitfalls¶
Treating tool allow-lists as configuration rather than security controls. If the enforcement is policy-based (the agent is told not to use the tool but the tool is available in its runtime), the model can reason around the restriction. Structural enforcement means the tool is not present. Policy enforcement means the tool is present but discouraged. These are categorically different security postures.
Classifying creative substitution as a success. The most natural response to an agent that achieves the correct outcome is to log it as a success. This is the defining mistake this domain exists to prevent. Creative substitution rewards boundary violation. If the monitoring system classifies it as success, the organisation has no signal that the model is optimising for task completion over means compliance. The four-state model must be implemented, not just documented.
Ignoring blocked tool attempts. Most systems discard blocked invocations as failed API calls or configuration errors. In an agentic context, a blocked tool attempt is a behavioural indicator: the model reasoned that a tool outside its authorised set would be useful and attempted to use it. One attempt may be noise. A trend is a signal that the model is systematically reasoning beyond its declared boundaries.
Writing contracts that specify tools but not logic. A contract that says "use tool A and tool B" is weaker than a contract that says "use tool A to retrieve data, then use tool B to analyse it, then return a structured report." Without a logic specification, creative substitution that uses the authorised tools in an unauthorised way (skipping analysis, reordering steps, using tool A's output to shortcut tool B's work) is undetectable at the tool-call level. Logic specifications enable the judge to evaluate at the reasoning level, not just the tool level.
Deploying the Anti-Mythos judge on the same infrastructure as evaluated agents. If the judge shares compute, network, or storage with the agents it evaluates, a compromise of the agent infrastructure can affect the judge. Infrastructure separation is a prerequisite for judge integrity, not an optional hardening step.
Treating contract review as a periodic compliance exercise. Contract review exists to catch the case where the contract no longer matches reality: the model has been updated, tools have changed, the business process has shifted. If review is a box-ticking exercise with no substantive re-evaluation, contracts will drift from the actual deployment and means monitoring will operate against an outdated reference.
Assuming SR 11-7 model validation covers agentic execution risk. SR 11-7 provides a model risk management framework that assumes the validating organisation can inspect model internals. API-only consumption of frontier LLMs provides no such access. Organisations that claim SR 11-7 compliance for agentic deployments without documenting the activation-layer residual risk (AT-2.15) are overstating their validation coverage.
Relationship to Other Domains¶
| Domain | Relationship |
|---|---|
| Objective Intent | OI declares what agents should do (OISpec). AT enforces how they do it (contract governance). The solution contract extends the OISpec with means governance, enforcement thresholds, and escalation paths. OI is the intent declaration; AT is the means enforcement. |
| Execution Control | AT extends EC-1.2 (tool allow-lists) with contract-bound structural enforcement (AT-1.4). AT extends EC-2.4 (circuit breakers) with contract-defined hard stop conditions (AT-1.7). AT-2.4 (sequence enforcement) complements EC-2.1 (action classification) by enforcing execution order, not just action approval. |
| Observability | AT extends OB-1.1 (action audit log) with full execution trace logging (AT-1.5) and plan attempt logging (AT-1.6). AT-2.8 (plan attempt trending) complements OB-2.3 (drift detection) with a contract-specific behavioural signal. AT-2.13 (judge verdict schema) feeds into OB-2.4 (SIEM integration). |
| Privileged Agent Governance | The Anti-Mythos judge (AT-2.11 through AT-3.3) is a specialised evaluator subject to PA controls. AT-2.12 (judge contract) complements PA-2.3 (judge criteria versioning). AT-3.3 (judge independence) extends the organisational separation principle from PA to contract compliance evaluation specifically. |
| Model Cognition Assurance | AT-2.7 (creative substitution detection) detects the same reward hacking pattern that MC-2.6 monitors through behavioural baselines. AT detects it through means comparison against the contract; MC detects it through statistical anomaly in success metrics. AT-2.15 (activation-layer residual risk) extends MC-2.7 to agentic execution contexts. |
| Supply Chain | AT-2.14 (vendor agentic capability disclosure) extends SC-2.1 (AIBOM) with agentic-specific vendor disclosure requirements. Contract review (AT-2.2) triggers on model version change, linking to AIBOM version tracking. |
| Identity & Access | AT-2.1 (contract inheritance enforcement) complements IA-2.4 (no transitive permissions) by preventing privilege escalation through contract misconfiguration rather than credential inheritance. |
References
- Anthropic: Claude Mythos System Card - creative substitution and reward hacking patterns in agentic execution documented during pre-deployment evaluation
- Federal Reserve SR 11-7: Supervisory Guidance on Model Risk Management - model validation principles referenced in AT-2.15
- OWASP Top 10 for Agentic Applications (2026) - ASI02 (Tool Misuse), ASI05 (Unexpected Code Execution), ASI10 (Rogue Agents)
- NIST AI RMF - Manage function: controls for AI system behaviour in deployment