MASO Control Domain: Observability¶
Part of the MASO Framework · Control Specifications Covers: ASI09 (Human-Agent Trust Exploitation) · ASI10 (Rogue Agents) · LLM09 (Misinformation) · LLM10 (Unbounded Consumption) Also covers: GV-01 (Non-determinism/Irreproducibility) · SR-06 (Logging as Breach Vector) · HF-02 (Accountability Blur)
Principle¶
You cannot secure what you cannot see. Every agent decision, every inter-agent message, every tool invocation, and every output is captured in an immutable audit trail. Monitoring is not retrospective - it is continuous, automated, and feeds directly into the PACE escalation logic. At Tier 3, the observability system is itself an independent agent with its own infrastructure and kill switch authority.
Observability is the domain that makes all other domains verifiable. Without it, identity controls are unauditable, data fencing is uncheckable, and execution caps are unenforceable.
Why This Matters in Multi-Agent Systems¶
Accountability chains are longer and harder to trace. In a single-model system, the path from input to output is one step. In a multi-agent system, a final output may be the result of five agents, three tool invocations, and two delegation hops. When something goes wrong, you need to reconstruct the entire chain to identify the root cause.
Behavioral drift is harder to detect. A single model's output quality can be monitored against a baseline. In a multi-agent system, drift in one agent may be masked by compensating behavior in another. Agent A starts hallucinating, but Agent B's summarisation smooths over the inaccuracies. The final output looks acceptable while the intermediate data is corrupted.
Trust exploitation scales with agent confidence. ASI09 - Human-Agent Trust Exploitation - is particularly dangerous in multi-agent systems because agents can present consensus. "All three agents agree this is the right action" is more persuasive than a single agent's recommendation, even when the consensus is based on shared poisoned data or correlated errors.
Rogue behavior can be concealed by collaboration. ASI10 - Rogue Agents - is harder to detect when the rogue agent's outputs are processed by other agents before reaching human review. The observability layer must monitor intermediate states, not just final outputs.
Controls by Tier¶
Tier 1 - Supervised¶
| Control | Requirement | Implementation Notes |
|---|---|---|
| OB-1.1 Action audit log | Every agent action logged: agent ID, timestamp, action type, tool, parameters, result, approval status | Append-only log. At Tier 1, tamper-resistance is sufficient; full immutability is deferred to Tier 2. |
| OB-1.2 Inter-agent message log | All messages on the bus captured with sender ID, recipient ID, timestamp, message hash | Content captured in full at Tier 1 (volume is low). |
| OB-1.3 Weekly manual review | Human reviews a sample of agent actions and inter-agent messages weekly | Focus on: unexpected tool usage, unusual message patterns, rejected actions. |
| OB-1.4 Output quality log | Human approval decisions (approve/reject) logged alongside agent output | Builds the baseline data set required for Tier 2 anomaly detection. |
Tier 2 - Managed¶
All Tier 1 controls remain active, plus:
| Control | Requirement | Implementation Notes |
|---|---|---|
| OB-2.1 Immutable decision chain | Full causal chain from input through each agent's reasoning and actions to final output | Tamper-proof storage. Each entry cryptographically linked to its predecessor. |
| OB-2.2 Continuous anomaly scoring | Per-agent behavioral score updated in real time against established baseline | Score feeds directly into PACE escalation triggers. |
| OB-2.3 Drift detection | Statistical comparison of agent output distributions against rolling baseline | Detects gradual quality degradation, topic drift, and style changes. Window: 7-day rolling baseline, alert on >2σ deviation. |
| OB-2.4 SIEM/SOAR integration | Agent observability events forwarded to enterprise security operations | Correlation with non-AI security events. Agent anomalies visible alongside network/endpoint alerts. |
| OB-2.5 Cost and consumption monitoring | Per-agent token usage, API costs, and execution time tracked with alerting thresholds | Detects runaway loops and unbounded consumption (LLM10) before budget limits are reached. |
| OB-2.5a Context utilisation monitoring | Per-agent context window consumption tracked as a security-relevant metric with tiered alerting | Context utilisation is distinct from cost. It measures how much of the agent's attention budget has been consumed, which directly correlates with guardrail effectiveness and instruction-following reliability. Tiered alerts: 70% (info), 85% (warning → PACE P→A at Tier 2+), 95% (critical → fail-closed). Monitor Judge context utilisation independently; correlated exhaustion (agent AND Judge both above 85%) triggers accelerated PACE escalation (OP-04). |
| OB-2.6 Log security | Logs classified, encrypted at rest and in transit, access-restricted to security operations, retention-limited | Full context capture stored in a separate higher-classification log tier from operational monitoring. Prevents logging from becoming a breach vector (SR-06). |
| OB-2.7 Accountable human | Every workflow has a designated human owner recorded in the decision chain and AIBOM | accountable_human field mandatory in decision chain log. Judge approval does not transfer responsibility from the human owner. Prevents accountability blur (HF-02). |
| OB-2.8 Emergent failure accountability | For failures that arise from agent-to-agent interaction (hallucination amplification, consensus failures, transitive delegation errors), the workflow owner is accountable, not individual agent owners | Prevents accountability gaps where no single agent "failed" but the system produced a harmful outcome. Workflow owner is responsible for the orchestration design that permitted the emergent failure. |
Tier 3 - Autonomous¶
All Tier 2 controls remain active, plus:
| Control | Requirement | Implementation Notes |
|---|---|---|
| OB-3.1 Long-window behavioral analysis | 30-day and 90-day behavioral trend analysis in addition to rolling 7-day | Detects slow drift that evades short-term anomaly detection. The "behavioral slow drift" attack pattern. |
| OB-3.2 Causal chain reconstruction | Automated tool that reconstructs the full decision chain for any given output on demand | Investigation time from "output flagged" to "root cause identified" must be under 4 hours. |
| OB-3.3 Independent observability agent | Separate agent on separate infrastructure monitoring the primary agent system | Read-only access to primary system logs and metrics. Cannot be influenced by task agents. Has kill switch authority as a safety net. |
| OB-3.4 Cross-agent correlation | Automated detection of correlated anomalies across multiple agents | Catches coordinated failures, shared poisoned data, and consensus manipulation. |
| OB-3.5 Decision traceability | For regulated decisions, full trace captured in sufficient detail for regulatory explanation | Standard is explainability (the decision can be explained after the fact), not reproducibility (the same input produces the same output). Satisfies non-determinism governance requirement (GV-01). |
Anomaly Scoring Model (Tier 2+)¶
The anomaly score is a composite metric that drives PACE escalation decisions. It is not a single number - it is a vector of signals, each weighted by the agent's risk profile.
Input signals:
| Signal | Weight (example) | Description |
|---|---|---|
| Tool usage pattern | 0.15 | Deviation from baseline tool invocation distribution |
| Output quality | 0.20 | Model-as-Judge scores vs. baseline |
| Response latency | 0.05 | Significant changes in processing time |
| Message volume | 0.10 | Inter-agent message frequency vs. baseline |
| Error rate | 0.10 | Guardrail blocks, tool failures, timeouts |
| Content drift | 0.15 | Semantic similarity of outputs vs. baseline |
| Cost trajectory | 0.05 | Token usage trend vs. baseline |
| Temporal profile | 0.10 | Activity timing vs. established schedule - flags weekend, off-hours, or out-of-cycle invocations that deviate from the agent's historical activation pattern. Borrowed from insider risk UEBA: unusual working hours are one of the strongest early signals of compromised credentials. |
| Peer group deviation | 0.10 | Behavioral divergence from other agents with the same role and configuration. If one agent in a fleet of five starts behaving differently while its peers remain stable, the individual agent is flagged. Filters out environmental changes that affect all agents equally. |
| Context utilisation | 0.10 | Current context window consumption relative to capacity. High utilisation correlates with attention dilution, weakened guardrail adherence, and increased hallucination rate. Context above 85% is itself an anomaly signal regardless of other metrics. When combined with declining output quality scores, indicates active degradation (OP-04). |
Thresholds (configurable per agent):
| Score | Action |
|---|---|
| 0–40 | Normal. No action. |
| 40–60 | Elevated. Increase monitoring frequency. Alert AI security engineer. |
| 60–80 | High. PACE P→A transition triggered. Agent isolated. Backup activated. |
| 80–100 | Critical. PACE P→C or P→E depending on scope. Incident response engaged. |
Weights and thresholds should be calibrated during Tier 1 operations using the manual review data (OB-1.3, OB-1.4) and adjusted as operational baselines mature.
Testing Criteria¶
Tier 1 Tests¶
| Test ID | Test | Pass Criteria |
|---|---|---|
| OB-T1.1 | Log completeness | Perform 50 agent actions. Verify all 50 appear in the audit log with all required fields. |
| OB-T1.2 | Message log capture | Send 20 inter-agent messages. Verify all 20 captured with sender, recipient, timestamp, and hash. |
| OB-T1.3 | Log integrity | Attempt to modify a log entry. Tampering is detectable (Tier 1: detected; Tier 2+: prevented). |
| OB-T1.4 | Weekly review coverage | Confirm the weekly review process is documented, executed, and findings are recorded. |
Tier 2 Tests¶
| Test ID | Test | Pass Criteria |
|---|---|---|
| OB-T2.1 | Decision chain reconstruction | Select a random output. Reconstruct the full chain from input through all agents. Chain is complete with no gaps. |
| OB-T2.2 | Anomaly score accuracy | Inject known anomalous behavior (unusual tool usage, quality degradation). Anomaly score rises above alerting threshold within 15 minutes. |
| OB-T2.3 | Drift detection | Gradually shift agent output distribution over 72 hours. Drift detection triggers alert before human-visible quality impact. |
| OB-T2.4 | SIEM integration | Generate an agent anomaly event. Verify it appears in SIEM within 5 minutes with correct severity and context. |
| OB-T2.5 | Cost alerting | Trigger a runaway loop. Cost monitoring alerts before the per-agent budget threshold is reached. |
| OB-T2.6 | Log access control | Attempt to access agent logs from a non-security-ops account. Access is blocked. Verify logs are encrypted at rest. |
| OB-T2.7 | Accountable human traceability | Select 10 random decisions from the audit log. Verify each has an accountable_human field linking to a named individual. |
Tier 3 Tests¶
| Test ID | Test | Pass Criteria |
|---|---|---|
| OB-T3.1 | Long-window detection | Introduce gradual drift over 3 weeks (below 7-day detection threshold). 30-day analysis detects the trend. |
| OB-T3.2 | Cross-agent correlation | Poison a shared RAG source. Verify the correlation engine detects coordinated anomalies across all agents consuming that source. |
| OB-T3.3 | Independent observability agent integrity | Attempt to influence the observability agent from within the primary agent system (send messages, modify its inputs). All attempts fail due to infrastructure isolation. |
| OB-T3.4 | Observability agent kill switch | Simulate a scenario where the observability agent detects a critical anomaly and no human responds within 5 minutes. Confirm the observability agent triggers Emergency shutdown. |
| OB-T3.5 | Decision traceability | Select a regulated decision from the audit trail. Reconstruct a complete regulatory explanation from the trace data alone. Explanation must be producible within 4 hours. |
Decision Chain Log Format¶
Every entry in the decision chain should include:
{
"chain_id": "uuid",
"sequence": 1,
"timestamp": "ISO-8601",
"agent_id": "agent-analyst-01",
"agent_nhi": "NHI-hash (Tier 2+)",
"action_type": "tool_invocation | delegation | output | escalation",
"input_hash": "sha256 of input data",
"tool": "tool-name (if applicable)",
"tool_parameters": "{ ... } (sanitised)",
"output_hash": "sha256 of output data",
"llm_judge_score": 0.92,
"llm_judge_flags": [],
"anomaly_score": 23,
"context_utilisation": 0.62,
"approval": "auto | human | escalated | blocked",
"approver": "system | operator-id",
"pace_phase": "primary",
"accountable_human": "operator-id or workflow-owner-id",
"previous_entry_hash": "sha256 (Tier 2+ immutability)"
}
This format enables: full chain reconstruction, per-agent performance trending, PACE phase tracking, and forensic investigation. At Tier 2+, the previous_entry_hash creates a hash chain that makes tampering detectable.
Decision Trace (Consolidated Audit View)¶
The decision chain log (OB-2.1) captures every event in a multi-agent workflow. For investigation and audit, that granularity is essential. For a compliance officer reviewing a flagged fraud case, or an operations lead triaging an escalation, it is too much. They need a single document that answers: what happened, who evaluated it, and why did it resolve the way it did.
The Decision Trace collapses the multi-layer evaluation chain into one structured output per decision.
Decision Trace Format¶
{
"trace_id": "uuid",
"workflow_id": "uuid",
"workflow_name": "Fraud Detection Pipeline",
"timestamp": "ISO-8601",
"accountable_human": "operator-id",
"subject": {
"description": "Transaction #TXN-29481 flagged for review",
"input_hash": "sha256",
"risk_classification": "HIGH"
},
"agent_chain": [
{
"agent_id": "agent-fraud-scorer-01",
"action": "Scored transaction risk at 0.87 based on velocity pattern and geo-mismatch",
"evidence": ["3 transactions in 2 minutes", "IP geolocation: Lagos, card billing: London"],
"confidence": 0.87,
"chain_entry_ref": "chain-id:seq:4"
},
{
"agent_id": "agent-evidence-gatherer-02",
"action": "Retrieved customer history: 0 prior fraud flags, account age 4 years",
"evidence": ["Customer history API response"],
"confidence": 0.72,
"chain_entry_ref": "chain-id:seq:7"
}
],
"evaluation": {
"tactical_judge": {
"agent_id": "judge-tactical-01",
"verdict": "flag",
"criteria_applied": "Agent output satisfies OISpec: risk score traceable to data points, uncertainty preserved",
"oisspec_version": 3
},
"domain_judges": [
{
"domain": "fraud",
"agent_id": "judge-fraud-01",
"verdict": "flag",
"reasoning": "Velocity pattern exceeds threshold. Geo-mismatch confirmed."
},
{
"domain": "compliance",
"agent_id": "judge-compliance-01",
"verdict": "approve",
"reasoning": "Transaction documentation sufficient for regulatory requirements."
}
],
"inter_judge_conflict": {
"detected": true,
"resolution": "Most restrictive wins: fraud flag overrides compliance approve",
"precedence_rule": "default:most_restrictive",
"escalated_to_human": true
},
"strategic_evaluator": {
"verdict": "pass",
"reasoning": "Agent chain outputs internally consistent. Uncertainty preserved through chain."
},
"meta_evaluation": {
"judge_calibration_status": "within_threshold",
"last_calibration": "ISO-8601",
"drift_detected": false
}
},
"resolution": {
"outcome": "escalated_to_human",
"human_decision": "confirmed_fraud",
"human_reasoning": "Velocity pattern consistent with known card-testing behaviour",
"human_decision_timestamp": "ISO-8601"
},
"regulatory_mapping": {
"explainability_standard": "EU AI Act Art. 14",
"decision_traceable": true,
"human_oversight_documented": true
}
}
When to Generate a Decision Trace¶
| Trigger | Detail |
|---|---|
| Any action escalated to human review | The human needs the trace to make their decision |
| Any inter-judge conflict | The conflict and its resolution must be auditable |
| Any PACE transition | Root cause traceability for incident response |
| Regulatory decision | Any output that constitutes a regulated decision (credit, insurance, medical, employment) |
| Post-hoc audit request | Generated on demand from the decision chain log |
The Decision Trace is not a replacement for the decision chain log. It is a view over it, generated on demand or at trigger points, collapsing the full event stream into a single auditable document.
Relationship to Regulatory Requirements¶
| Regulation | Requirement | How Decision Trace Satisfies It |
|---|---|---|
| EU AI Act Art. 14 | Human oversight with sufficient information to understand AI system behaviour | Trace provides agent chain, evidence, evaluation verdicts, and human decision in one document |
| DORA Art. 11 | ICT incident root cause analysis and decision traceability | Trace links to decision chain entries for full reconstruction |
| APRA CPS 234 | Information security incident response with accountability | accountable_human field, human decision with reasoning |
Maturity Indicators¶
| Level | Indicator |
|---|---|
| Initial | Agent actions not systematically logged. No inter-agent message capture. Issues discovered by end users or manual testing. |
| Managed | Action audit log operational. Inter-agent messages captured. Weekly manual review. Approval decisions logged. |
| Defined | Immutable decision chains. Continuous anomaly scoring. Drift detection. SIEM integration. Cost monitoring with alerts. |
| Quantitatively Managed | Anomaly detection accuracy measured (true positive rate, false positive rate). Drift detection latency tracked. Decision chain reconstruction time measured against SLA. |
| Optimising | Long-window behavioral analysis. Independent observability agent. Cross-agent correlation. Causal chain reconstruction automated. Anomaly scoring weights tuned from operational data. |
Common Pitfalls¶
Logging everything but monitoring nothing. A comprehensive audit log that nobody reviews and no automated system analyses is compliance theatre. Logs must feed into active monitoring and alerting, not just exist for post-incident forensics.
Anomaly detection based solely on error rates. The most dangerous agent behaviors don't generate errors - they produce plausible-looking outputs that are subtly wrong. Anomaly scoring must include output quality metrics (Model-as-Judge scores, semantic similarity to baseline), not just error counts and rate limits.
Trusting the observability system that runs on the same infrastructure as the agents. At Tier 3, the observability agent must be on separate infrastructure. If a compromised agent can influence the monitoring system - by flooding it with noise, modifying its inputs, or degrading its resources - the monitoring is compromised. Independence requires infrastructure isolation, not just logical separation.
Setting drift detection thresholds too tight. Overly sensitive drift detection generates alert fatigue. If the anomaly score triggers a PACE transition three times a day for false positives, the team will start ignoring it. Calibrate thresholds using Tier 1 manual review data and expect a 2–4 week tuning period at Tier 2 before thresholds stabilise.
Comprehensive logging without log security. Agent logs contain reasoning chains, tool parameters, context fragments, and potentially sensitive data. Without classification, encryption, and access controls, the observability layer becomes a high-value target for data exfiltration - the very attack it's supposed to detect (SR-06).
No named human on the decision chain. "The agents decided" is not accountability. Every workflow must have a designated human owner. The decision chain log must record who that person is. Judge approval is a tool, not a transfer of responsibility.
Monitoring cost but not context utilisation. OB-2.5 tracks token spend (how much you're paying). But context utilisation (how full the agent's attention window is) is the security-relevant metric. An agent at 90% context capacity with a small bill is more dangerous than an agent at 30% capacity with a large bill. The former has weakened guardrails; the latter is just expensive. Monitor both, but treat context utilisation as a security signal, not just an operational one (OP-04).