MASO Control Domain: Observability¶

Part of the MASO Framework · Control Specifications Covers: ASI09 (Human-Agent Trust Exploitation) · ASI10 (Rogue Agents) · LLM09 (Misinformation) · LLM10 (Unbounded Consumption) Also covers: GV-01 (Non-determinism/Irreproducibility) · SR-06 (Logging as Breach Vector) · HF-02 (Accountability Blur)

Principle¶

You cannot secure what you cannot see. Every agent decision, every inter-agent message, every tool invocation, and every output is captured in an immutable audit trail. Monitoring is not retrospective - it is continuous, automated, and feeds directly into the PACE escalation logic. At Tier 3, the observability system is itself an independent agent with its own infrastructure and kill switch authority.

Observability is the domain that makes all other domains verifiable. Without it, identity controls are unauditable, data fencing is uncheckable, and execution caps are unenforceable.

Why This Matters in Multi-Agent Systems¶

Accountability chains are longer and harder to trace. In a single-model system, the path from input to output is one step. In a multi-agent system, a final output may be the result of five agents, three tool invocations, and two delegation hops. When something goes wrong, you need to reconstruct the entire chain to identify the root cause.

Behavioral drift is harder to detect. A single model's output quality can be monitored against a baseline. In a multi-agent system, drift in one agent may be masked by compensating behavior in another. Agent A starts hallucinating, but Agent B's summarisation smooths over the inaccuracies. The final output looks acceptable while the intermediate data is corrupted.

Trust exploitation scales with agent confidence. ASI09 - Human-Agent Trust Exploitation - is particularly dangerous in multi-agent systems because agents can present consensus. "All three agents agree this is the right action" is more persuasive than a single agent's recommendation, even when the consensus is based on shared poisoned data or correlated errors.

Rogue behavior can be concealed by collaboration. ASI10 - Rogue Agents - is harder to detect when the rogue agent's outputs are processed by other agents before reaching human review. The observability layer must monitor intermediate states, not just final outputs.

Controls by Tier¶

Tier 1 - Supervised¶

Control	Requirement	Implementation Notes
OB-1.1 Action audit log	Every agent action logged: agent ID, timestamp, action type, tool, parameters, result, approval status	Append-only log. At Tier 1, tamper-resistance is sufficient; full immutability is deferred to Tier 2.
OB-1.2 Inter-agent message log	All messages on the bus captured with sender ID, recipient ID, timestamp, message hash	Content captured in full at Tier 1 (volume is low).
OB-1.3 Weekly manual review	Human reviews a sample of agent actions and inter-agent messages weekly	Focus on: unexpected tool usage, unusual message patterns, rejected actions.
OB-1.4 Output quality log	Human approval decisions (approve/reject) logged alongside agent output	Builds the baseline data set required for Tier 2 anomaly detection.

Tier 2 - Managed¶

All Tier 1 controls remain active, plus:

Control	Requirement	Implementation Notes
OB-2.1 Immutable decision chain	Full causal chain from input through each agent's reasoning and actions to final output	Tamper-proof storage. Each entry cryptographically linked to its predecessor.
OB-2.2 Continuous anomaly scoring	Per-agent behavioral score updated in real time against established baseline	Score feeds directly into PACE escalation triggers.
OB-2.3 Drift detection	Statistical comparison of agent output distributions against rolling baseline	Detects gradual quality degradation, topic drift, and style changes. Window: 7-day rolling baseline, alert on >2σ deviation.
OB-2.4 SIEM/SOAR integration	Agent observability events forwarded to enterprise security operations	Correlation with non-AI security events. Agent anomalies visible alongside network/endpoint alerts.
OB-2.5 Cost and consumption monitoring	Per-agent token usage, API costs, and execution time tracked with alerting thresholds	Detects runaway loops and unbounded consumption (LLM10) before budget limits are reached.
OB-2.5a Context utilisation monitoring	Per-agent context window consumption tracked as a security-relevant metric with tiered alerting	Context utilisation is distinct from cost. It measures how much of the agent's attention budget has been consumed, which directly correlates with guardrail effectiveness and instruction-following reliability. Tiered alerts: 70% (info), 85% (warning → PACE P→A at Tier 2+), 95% (critical → fail-closed). Monitor Judge context utilisation independently; correlated exhaustion (agent AND Judge both above 85%) triggers accelerated PACE escalation (OP-04).
OB-2.6 Log security	Logs classified, encrypted at rest and in transit, access-restricted to security operations, retention-limited	Full context capture stored in a separate higher-classification log tier from operational monitoring. Prevents logging from becoming a breach vector (SR-06).
OB-2.7 Accountable human	Every workflow has a designated human owner recorded in the decision chain and AIBOM	`accountable_human` field mandatory in decision chain log. Judge approval does not transfer responsibility from the human owner. Prevents accountability blur (HF-02).
OB-2.8 Emergent failure accountability	For failures that arise from agent-to-agent interaction (hallucination amplification, consensus failures, transitive delegation errors), the workflow owner is accountable, not individual agent owners	Prevents accountability gaps where no single agent "failed" but the system produced a harmful outcome. Workflow owner is responsible for the orchestration design that permitted the emergent failure.

Tier 3 - Autonomous¶

All Tier 2 controls remain active, plus:

Control	Requirement	Implementation Notes
OB-3.1 Long-window behavioral analysis	30-day and 90-day behavioral trend analysis in addition to rolling 7-day	Detects slow drift that evades short-term anomaly detection. The "behavioral slow drift" attack pattern.
OB-3.2 Causal chain reconstruction	Automated tool that reconstructs the full decision chain for any given output on demand	Investigation time from "output flagged" to "root cause identified" must be under 4 hours.
OB-3.3 Independent observability agent	Separate agent on separate infrastructure monitoring the primary agent system	Read-only access to primary system logs and metrics. Cannot be influenced by task agents. Has kill switch authority as a safety net.
OB-3.4 Cross-agent correlation	Automated detection of correlated anomalies across multiple agents	Catches coordinated failures, shared poisoned data, and consensus manipulation.
OB-3.5 Decision traceability	For regulated decisions, full trace captured in sufficient detail for regulatory explanation	Standard is explainability (the decision can be explained after the fact), not reproducibility (the same input produces the same output). Satisfies non-determinism governance requirement (GV-01).

Anomaly Scoring Model (Tier 2+)¶

The anomaly score is a composite metric that drives PACE escalation decisions. It is not a single number - it is a vector of signals, each weighted by the agent's risk profile.

Input signals:

Signal	Weight (example)	Description
Tool usage pattern	0.15	Deviation from baseline tool invocation distribution
Output quality	0.20	Model-as-Judge scores vs. baseline
Response latency	0.05	Significant changes in processing time
Message volume	0.10	Inter-agent message frequency vs. baseline
Error rate	0.10	Guardrail blocks, tool failures, timeouts
Content drift	0.15	Semantic similarity of outputs vs. baseline
Cost trajectory	0.05	Token usage trend vs. baseline
Temporal profile	0.10	Activity timing vs. established schedule - flags weekend, off-hours, or out-of-cycle invocations that deviate from the agent's historical activation pattern. Borrowed from insider risk UEBA: unusual working hours are one of the strongest early signals of compromised credentials.
Peer group deviation	0.10	Behavioral divergence from other agents with the same role and configuration. If one agent in a fleet of five starts behaving differently while its peers remain stable, the individual agent is flagged. Filters out environmental changes that affect all agents equally.
Context utilisation	0.10	Current context window consumption relative to capacity. High utilisation correlates with attention dilution, weakened guardrail adherence, and increased hallucination rate. Context above 85% is itself an anomaly signal regardless of other metrics. When combined with declining output quality scores, indicates active degradation (OP-04).

Thresholds (configurable per agent):

Score	Action
0–40	Normal. No action.
40–60	Elevated. Increase monitoring frequency. Alert AI security engineer.
60–80	High. PACE P→A transition triggered. Agent isolated. Backup activated.
80–100	Critical. PACE P→C or P→E depending on scope. Incident response engaged.

Weights and thresholds should be calibrated during Tier 1 operations using the manual review data (OB-1.3, OB-1.4) and adjusted as operational baselines mature.

Testing Criteria¶

Tier 1 Tests¶

Test ID	Test	Pass Criteria
OB-T1.1	Log completeness	Perform 50 agent actions. Verify all 50 appear in the audit log with all required fields.
OB-T1.2	Message log capture	Send 20 inter-agent messages. Verify all 20 captured with sender, recipient, timestamp, and hash.
OB-T1.3	Log integrity	Attempt to modify a log entry. Tampering is detectable (Tier 1: detected; Tier 2+: prevented).
OB-T1.4	Weekly review coverage	Confirm the weekly review process is documented, executed, and findings are recorded.

Tier 2 Tests¶

Test ID	Test	Pass Criteria
OB-T2.1	Decision chain reconstruction	Select a random output. Reconstruct the full chain from input through all agents. Chain is complete with no gaps.
OB-T2.2	Anomaly score accuracy	Inject known anomalous behavior (unusual tool usage, quality degradation). Anomaly score rises above alerting threshold within 15 minutes.
OB-T2.3	Drift detection	Gradually shift agent output distribution over 72 hours. Drift detection triggers alert before human-visible quality impact.
OB-T2.4	SIEM integration	Generate an agent anomaly event. Verify it appears in SIEM within 5 minutes with correct severity and context.
OB-T2.5	Cost alerting	Trigger a runaway loop. Cost monitoring alerts before the per-agent budget threshold is reached.
OB-T2.6	Log access control	Attempt to access agent logs from a non-security-ops account. Access is blocked. Verify logs are encrypted at rest.
OB-T2.7	Accountable human traceability	Select 10 random decisions from the audit log. Verify each has an `accountable_human` field linking to a named individual.

Tier 3 Tests¶

Test ID	Test	Pass Criteria
OB-T3.1	Long-window detection	Introduce gradual drift over 3 weeks (below 7-day detection threshold). 30-day analysis detects the trend.
OB-T3.2	Cross-agent correlation	Poison a shared RAG source. Verify the correlation engine detects coordinated anomalies across all agents consuming that source.
OB-T3.3	Independent observability agent integrity	Attempt to influence the observability agent from within the primary agent system (send messages, modify its inputs). All attempts fail due to infrastructure isolation.
OB-T3.4	Observability agent kill switch	Simulate a scenario where the observability agent detects a critical anomaly and no human responds within 5 minutes. Confirm the observability agent triggers Emergency shutdown.
OB-T3.5	Decision traceability	Select a regulated decision from the audit trail. Reconstruct a complete regulatory explanation from the trace data alone. Explanation must be producible within 4 hours.

Decision Chain Log Format¶

Every entry in the decision chain should include:

{
  "chain_id": "uuid",
  "sequence": 1,
  "timestamp": "ISO-8601",
  "agent_id": "agent-analyst-01",
  "agent_nhi": "NHI-hash (Tier 2+)",
  "action_type": "tool_invocation | delegation | output | escalation",
  "input_hash": "sha256 of input data",
  "tool": "tool-name (if applicable)",
  "tool_parameters": "{ ... } (sanitised)",
  "output_hash": "sha256 of output data",
  "llm_judge_score": 0.92,
  "llm_judge_flags": [],
  "anomaly_score": 23,
  "context_utilisation": 0.62,
  "approval": "auto | human | escalated | blocked",
  "approver": "system | operator-id",
  "pace_phase": "primary",
  "accountable_human": "operator-id or workflow-owner-id",
  "previous_entry_hash": "sha256 (Tier 2+ immutability)"
}

This format enables: full chain reconstruction, per-agent performance trending, PACE phase tracking, and forensic investigation. At Tier 2+, the previous_entry_hash creates a hash chain that makes tampering detectable.

Decision Trace (Consolidated Audit View)¶

The decision chain log (OB-2.1) captures every event in a multi-agent workflow. For investigation and audit, that granularity is essential. For a compliance officer reviewing a flagged fraud case, or an operations lead triaging an escalation, it is too much. They need a single document that answers: what happened, who evaluated it, and why did it resolve the way it did.

The Decision Trace collapses the multi-layer evaluation chain into one structured output per decision.

Decision Trace Format¶

{
  "trace_id": "uuid",
  "workflow_id": "uuid",
  "workflow_name": "Fraud Detection Pipeline",
  "timestamp": "ISO-8601",
  "accountable_human": "operator-id",

  "subject": {
    "description": "Transaction #TXN-29481 flagged for review",
    "input_hash": "sha256",
    "risk_classification": "HIGH"
  },

  "agent_chain": [
    {
      "agent_id": "agent-fraud-scorer-01",
      "action": "Scored transaction risk at 0.87 based on velocity pattern and geo-mismatch",
      "evidence": ["3 transactions in 2 minutes", "IP geolocation: Lagos, card billing: London"],
      "confidence": 0.87,
      "chain_entry_ref": "chain-id:seq:4"
    },
    {
      "agent_id": "agent-evidence-gatherer-02",
      "action": "Retrieved customer history: 0 prior fraud flags, account age 4 years",
      "evidence": ["Customer history API response"],
      "confidence": 0.72,
      "chain_entry_ref": "chain-id:seq:7"
    }
  ],

  "evaluation": {
    "tactical_judge": {
      "agent_id": "judge-tactical-01",
      "verdict": "flag",
      "criteria_applied": "Agent output satisfies OISpec: risk score traceable to data points, uncertainty preserved",
      "oisspec_version": 3
    },
    "domain_judges": [
      {
        "domain": "fraud",
        "agent_id": "judge-fraud-01",
        "verdict": "flag",
        "reasoning": "Velocity pattern exceeds threshold. Geo-mismatch confirmed."
      },
      {
        "domain": "compliance",
        "agent_id": "judge-compliance-01",
        "verdict": "approve",
        "reasoning": "Transaction documentation sufficient for regulatory requirements."
      }
    ],
    "inter_judge_conflict": {
      "detected": true,
      "resolution": "Most restrictive wins: fraud flag overrides compliance approve",
      "precedence_rule": "default:most_restrictive",
      "escalated_to_human": true
    },
    "strategic_evaluator": {
      "verdict": "pass",
      "reasoning": "Agent chain outputs internally consistent. Uncertainty preserved through chain."
    },
    "meta_evaluation": {
      "judge_calibration_status": "within_threshold",
      "last_calibration": "ISO-8601",
      "drift_detected": false
    }
  },

  "resolution": {
    "outcome": "escalated_to_human",
    "human_decision": "confirmed_fraud",
    "human_reasoning": "Velocity pattern consistent with known card-testing behaviour",
    "human_decision_timestamp": "ISO-8601"
  },

  "regulatory_mapping": {
    "explainability_standard": "EU AI Act Art. 14",
    "decision_traceable": true,
    "human_oversight_documented": true
  }
}

When to Generate a Decision Trace¶

Trigger	Detail
Any action escalated to human review	The human needs the trace to make their decision
Any inter-judge conflict	The conflict and its resolution must be auditable
Any PACE transition	Root cause traceability for incident response
Regulatory decision	Any output that constitutes a regulated decision (credit, insurance, medical, employment)
Post-hoc audit request	Generated on demand from the decision chain log

The Decision Trace is not a replacement for the decision chain log. It is a view over it, generated on demand or at trigger points, collapsing the full event stream into a single auditable document.

Relationship to Regulatory Requirements¶

Regulation	Requirement	How Decision Trace Satisfies It
EU AI Act Art. 14	Human oversight with sufficient information to understand AI system behaviour	Trace provides agent chain, evidence, evaluation verdicts, and human decision in one document
DORA Art. 11	ICT incident root cause analysis and decision traceability	Trace links to decision chain entries for full reconstruction
APRA CPS 234	Information security incident response with accountability	`accountable_human` field, human decision with reasoning

Maturity Indicators¶

Level	Indicator
Initial	Agent actions not systematically logged. No inter-agent message capture. Issues discovered by end users or manual testing.
Managed	Action audit log operational. Inter-agent messages captured. Weekly manual review. Approval decisions logged.
Defined	Immutable decision chains. Continuous anomaly scoring. Drift detection. SIEM integration. Cost monitoring with alerts.
Quantitatively Managed	Anomaly detection accuracy measured (true positive rate, false positive rate). Drift detection latency tracked. Decision chain reconstruction time measured against SLA.
Optimising	Long-window behavioral analysis. Independent observability agent. Cross-agent correlation. Causal chain reconstruction automated. Anomaly scoring weights tuned from operational data.

Common Pitfalls¶

Logging everything but monitoring nothing. A comprehensive audit log that nobody reviews and no automated system analyses is compliance theatre. Logs must feed into active monitoring and alerting, not just exist for post-incident forensics.

Anomaly detection based solely on error rates. The most dangerous agent behaviors don't generate errors - they produce plausible-looking outputs that are subtly wrong. Anomaly scoring must include output quality metrics (Model-as-Judge scores, semantic similarity to baseline), not just error counts and rate limits.

Trusting the observability system that runs on the same infrastructure as the agents. At Tier 3, the observability agent must be on separate infrastructure. If a compromised agent can influence the monitoring system - by flooding it with noise, modifying its inputs, or degrading its resources - the monitoring is compromised. Independence requires infrastructure isolation, not just logical separation.

Setting drift detection thresholds too tight. Overly sensitive drift detection generates alert fatigue. If the anomaly score triggers a PACE transition three times a day for false positives, the team will start ignoring it. Calibrate thresholds using Tier 1 manual review data and expect a 2–4 week tuning period at Tier 2 before thresholds stabilise.

Comprehensive logging without log security. Agent logs contain reasoning chains, tool parameters, context fragments, and potentially sensitive data. Without classification, encryption, and access controls, the observability layer becomes a high-value target for data exfiltration - the very attack it's supposed to detect (SR-06).

No named human on the decision chain. "The agents decided" is not accountability. Every workflow must have a designated human owner. The decision chain log must record who that person is. Judge approval is a tool, not a transfer of responsibility.

Monitoring cost but not context utilisation. OB-2.5 tracks token spend (how much you're paying). But context utilisation (how full the agent's attention window is) is the security-relevant metric. An agent at 90% context capacity with a small bill is more dangerous than an agent at 30% capacity with a large bill. The former has weakened guardrails; the latter is just expensive. Monitor both, but treat context utilisation as a security signal, not just an operational one (OP-04).