Multi-Agent Security Operations (MASO) Framework¶
Risk-proportionate controls for securing multi-model agent orchestration.
MASO extends the parent framework's principles into multi-agent territory. The same philosophy applies: controls should be proportionate to risk, applied at the right time for the right purposes. AI product owners can quickly identify the controls relevant to their deployment and consciously deselect those that do not apply. Every organisation has its own way of working, and the framework is designed to fit that context rather than override it.
Built or Bought: MASO Applies Either Way¶
MASO is designed for AI agent systems your organisation operates, whether you build them from scratch or deploy them on a managed platform. If you are building custom multi-agent systems (using LangGraph, AutoGen, CrewAI, or your own orchestration), MASO provides both the security requirements and the architectural patterns. If you are using a cloud platform's agent orchestration (AWS Bedrock Agents, Azure AI Agent Service), MASO provides the mental model for what security controls should be in place; the platform provides the implementation mechanisms.
The seven control domains, three implementation tiers, and PACE resilience model describe what needs to be true for multi-agent AI to be safe. The technical implementation varies by platform and approach. The security model does not.
For AI you consume as a service (copilots, productivity tools, SaaS with embedded AI), MASO's control domains are not directly applicable. Those systems are covered by vendor-side controls and your organisation's data governance. See Maturity Levels for how the framework addresses consumed AI differently from AI you operate.
Architecture¶
Evaluation Architecture: Inline vs. Offline¶
The evaluation architecture separates two fundamentally different types of judgment. Inline evaluation (left) runs at agent speed using SLMs with measurable criteria: security, privacy, compliance, and business intent. These domains have clear thresholds. When they conflict, security and privacy override business intent. A Flight Recorder captures every action, verdict, and reasoning chain.
Offline evaluation (right) handles ethics, bias, and fairness. These are policy-driven domains where criteria are set by organisational values, not technical measurement, and where the most important evidence (customer complaints, appeal outcomes, demographic distributions) accumulates over time and is invisible to inline judges. An LLM evaluates sampled decisions retrospectively, statistical monitoring detects portfolio-level patterns, and findings route to human governance for review, investigation, and policy updates that feed back into MASO as guardrail tuning and OISpec revisions.
Three-Layer Defence¶
MASO operates on a three-layer defence model adapted for multi-agent dynamics:
Layer 1 - Guardrails enforce hard boundaries: input validation, output sanitisation, tool permission scoping, and rate limiting. Deterministic, non-negotiable, machine-speed.
Layer 2 - Model-as-Judge Evaluation uses a dedicated evaluation model (distinct from task agents) to assess quality, safety, and policy compliance of agent actions and outputs before they are committed. In multi-agent systems, this layer also evaluates inter-agent communications for goal integrity and instruction injection.
Layer 3 - Human Oversight provides the governance backstop. Scope scales inversely with demonstrated trustworthiness and directly with consequence severity. Write operations, external API calls, and irreversible actions escalate based on risk classification.
The critical addition for multi-agent systems is the Secure Inter-Agent Message Bus - a validated, signed, rate-limited communication channel through which all agent-to-agent interaction must pass. No direct agent-to-agent communication is permitted outside this bus.
The Flight Recorder captures every agent action, judge verdict, tool invocation, inter-agent message, conflict resolution, and PACE state transition in an immutable, tamper-evident log. This serves two purposes: forensic investigation when things go wrong, and feeding the offline evaluation pipeline with the evidence it needs for portfolio-level analysis of ethics, bias, and fairness.
Visual Navigation¶
Seven coloured lines represent seven control domains. Stations are key controls. Zones are implementation tiers. Interchanges mark where domains share control points (Judge Gate, PACE Bridge, Agent Registry). River PACE flows through the centre, mapping resilience phases to tier progression.
Control Domains¶
The framework organises controls into eight domains. The first five map to specific OWASP risks. The sixth - Prompt, Goal & Epistemic Integrity - addresses both the three OWASP risks that require cross-cutting controls and the nine epistemic risks identified in the Emergent Risk Register that have no OWASP equivalent. The seventh - Privileged Agent Governance - addresses the unique risks of orchestrators, planners, and other agents with elevated authority.
0. Prompt, Goal & Epistemic Integrity¶
Every agent's instructions, objectives, and information chain must be trustworthy and verifiable. Input sanitisation on all channels - not just user-facing. System prompt isolation prevents cross-agent extraction. Immutable task specifications with continuous goal integrity monitoring. Epistemic controls prevent groupthink, hallucination amplification, uncertainty stripping, and semantic drift across agent chains.
Covers: LLM01, LLM07, ASI01, plus Epistemic Risks EP-01 through EP-09
1. Identity & Access¶
Every agent must have a unique Non-Human Identity (NHI). No shared credentials. No inherited permissions from the orchestrator. Short-lived, scoped credentials that are rotated automatically. Zero-trust mutual authentication on the inter-agent message bus.
Covers: ASI03, ASI07, LLM06
2. Data Protection¶
Cross-agent data fencing prevents uncontrolled data flow between agents operating at different classification levels. Output DLP scanning at the message bus catches sensitive data in inter-agent communications. RAG integrity validation ensures the knowledge base hasn't been tampered with. Memory poisoning detection flags inconsistencies between stored context and expected agent state.
Covers: LLM02, LLM04, ASI06, LLM08
3. Execution Control¶
Every tool invocation runs in a sandboxed environment with strict parameter allow-lists. Code execution is isolated per agent with filesystem, network, and process scope containment. Blast radius caps limit the damage any single agent can do before circuit breakers engage. PACE escalation is triggered automatically when error rates exceed defined thresholds.
Covers: ASI02, ASI05, ASI08, LLM05
4. Observability¶
Immutable decision chain logs capture the full reasoning and action history of every agent. Behavioral drift detection compares current agent behavior against established baselines. Per-agent anomaly scoring feeds into the PACE escalation logic. SIEM and SOAR integration enables correlation with broader security operations.
Covers: ASI09, ASI10, LLM09, LLM10
5. Supply Chain¶
Model provenance tracking and AIBOM generation for every model in the agent system. MCP server vetting with signed manifests and runtime integrity checks. A2A trust chain validation for inter-agent protocol endpoints. Continuous scanning of the agent toolchain for known vulnerabilities and poisoned components.
Covers: LLM03, ASI04
6. Privileged Agent Governance¶
Orchestrators, planners, and meta-agents hold disproportionate authority - they can create agents, assign tasks, allocate resources, and modify workflows. These privileged agents require elevated controls: mandatory human approval gates, authority delegation limits, audit trails for every privilege exercise, and independent monitoring that the privileged agent cannot influence.
Covers: ASI03, ASI07, LLM06 (elevated controls for high-authority agents)
7. Objective Intent¶
Every agent, judge, and workflow operates against a developer-declared Objective Intent Specification (OISpec), a structured, version-controlled contract defining what the agent should accomplish and within what parameters. Tactical judges evaluate individual agents against their OISpecs. A strategic evaluation agent assesses whether combined agent actions satisfy the workflow's aggregated intent. Judges are themselves monitored against their own OISpecs. This is the bridge from fault detection to behavioral assurance: from catching things that go wrong to verifying that things go right.
Covers: Intent alignment at all levels: individual agent compliance (tactical), aggregate workflow compliance (strategic), and judge behavioral monitoring (lateral). Most critical at HIGH and CRITICAL risk tiers.
OWASP Risk Coverage¶
Full mapping against both OWASP threat taxonomies relevant to multi-agent systems.
OWASP Top 10 for LLM Applications (2025)¶
These risks apply to each individual agent. In a multi-agent context, each risk compounds across agents.
| Risk | Multi-Agent Amplification | MASO Control Domain |
|---|---|---|
| LLM01: Prompt Injection | Injection in one agent's context propagates through inter-agent messages. A poisoned document processed by an analyst agent becomes instructions to an executor agent. | Input guardrails per agent · Message bus validation · Goal integrity monitor |
| LLM02: Sensitive Information Disclosure | Data shared between agents across trust boundaries. Delegation creates implicit data flows. | Cross-agent data fencing · Output DLP at message bus · Per-agent data classification |
| LLM03: Supply Chain Vulnerabilities | Multiple model providers, MCP servers, tool integrations multiply the attack surface. | AIBOM per agent · Signed tool manifests · MCP server vetting · Runtime component audit |
| LLM04: Data and Model Poisoning | Poisoned RAG data consumed by one agent contaminates reasoning of downstream agents. | RAG integrity validation · Source attribution · Cross-agent output verification |
| LLM05: Improper Output Handling | Agent outputs become inputs to other agents. Unsanitised output from Agent A becomes executable input for Agent B. | Output validation at every agent boundary · Model-as-Judge review · Schema enforcement |
| LLM06: Excessive Agency | The defining risk. Delegation creates transitive authority chains. If Agent A delegates to Agent B, and B has tool X, then A effectively has access to tool X. | Least privilege per agent · No transitive permissions · Scoped delegation contracts · PACE containment |
| LLM07: System Prompt Leakage | An agent's system prompt may be extractable by other agents in the same orchestration. | Prompt isolation per agent · Separate system prompt boundaries · Obfuscation |
| LLM08: Vector and Embedding Weaknesses | Shared vector databases across agents create a single point of compromise for RAG poisoning. | Per-agent RAG access controls · Embedding integrity verification · Source validation |
| LLM09: Misinformation | Hallucinations compound. One agent's hallucination becomes another's "fact". In self-reinforcing loops, misinformation amplifies. | Cross-agent validation · Dedicated fact-checking agent · Confidence scoring with source attribution |
| LLM10: Unbounded Consumption | Runaway agent loops cause exponential resource consumption. | Per-agent rate limits · Orchestration cost caps · Loop detection · Circuit breakers |
OWASP Top 10 for Agentic Applications (2026)¶
These risks are specific to autonomous agent behavior - the primary threat surface for MASO.
| Risk | Description | MASO Controls |
|---|---|---|
| ASI01: Agent Goal Hijack | Attacker manipulates an agent's objectives through poisoned inputs. Hijacking one agent redirects an entire workflow. | Goal integrity monitor · Prompt boundary enforcement · Signed task specifications · Model-as-Judge goal validation |
| ASI02: Tool Misuse | Agents use legitimate tools in unintended, unsafe, or destructive ways. Chained tool misuse across agents compounds damage. | Signed tool manifests with strict parameter schemas · Argument validation · Sandboxed execution · Per-tool audit logging |
| ASI03: Identity & Privilege Abuse | Agents with leaked, over-scoped, or shared credentials. Credential sharing between agents is a common design flaw. | Unique NHI per agent · Short-lived scoped credentials · Zero-trust mutual authentication · No credential inheritance |
| ASI04: Agentic Supply Chain | Dynamic composition of MCP servers, A2A protocols, and tool plugins at runtime. | Runtime component signing · MCP server allow-listing · A2A trust chain validation · Dependency scanning |
| ASI05: Unexpected Code Execution | Natural language to code pathways bypass traditional code review gates. | Code execution sandbox · Execution allow-lists · Output containment · Time-boxing |
| ASI06: Memory & Context Poisoning | Persistent memory carries poisoned data across sessions. Shared memory becomes a persistent backdoor. | Session-isolated memory per agent · Memory integrity checksums · Context window fencing · Memory decay policies |
| ASI07: Insecure Inter-Agent Communication | Spoofed, tampered, or replayed messages between agents. | Signed and encrypted message bus · Mutual TLS per agent · Schema validation · Rate limiting · Replay protection |
| ASI08: Cascading Failures | Single fault propagates with escalating impact. Hallucination → flawed plan → destructive action. | Blast radius caps · Circuit breaker patterns · PACE escalation triggers · Independent error detection per agent |
| ASI09: Human-Agent Trust Exploitation | Agents produce confident, authoritative explanations that manipulate operators into approving harmful actions. Multi-agent consensus amplifies this. | Confidence calibration · Independent human verification · Decision audit trails · No agent can claim consensus authority |
| ASI10: Rogue Agents | Behavioral drift, misalignment, concealment, or self-directed action. Rogue behavior in one agent may be concealed by collaborating agents. | Continuous drift detection · Kill switch · Anomaly scoring against baselines · Regular red-team testing |
PACE Resilience for Multi-Agent Operations¶
The PACE methodology (Primary, Alternate, Contingency, Emergency) from the parent framework is extended for multi-agent failure modes.
Primary - Normal Operations. All agents active within designated roles. Full three-layer security stack operational. Inter-agent communication through the signed message bus. Behavioral baselines actively monitored.
Alternate - Agent Failover. Triggered when a single agent shows anomalous behavior. The anomalous agent is isolated. A backup agent (potentially from a different provider) is activated. Tool permissions tightened to read-only. All write operations require human approval. Transition authority: automated (monitoring agent or orchestrator can initiate P→A without human approval, but must notify).
Contingency - Degraded Mode. Triggered when multiple agents are compromised, message bus integrity is questioned, or the alternate agent also exhibits anomalous behavior. Multi-agent orchestration is suspended. Single pre-validated agent operates in fully supervised mode. All agent state captured for forensics. Transition authority: security team or AI security officer.
Emergency - Full Shutdown. Triggered by cascading failures, confirmed exfiltration, coordinated manipulation, or rogue behavior. All agents terminated. Tool access revoked. Memory and context snapshots preserved in immutable storage. Full rollback initiated. Transition authority: CISO or incident commander.
Recovery (E→P): Requires post-incident review confirming root cause identification, control remediation, and updated baselines before returning to Primary.
Implementation Tiers¶
Tier 1 - Supervised (Low Autonomy)¶
All agent actions require human approval. Inter-agent communication is logged but not encrypted. Behavioral monitoring is periodic (batch review). Suitable for pilot deployments and low-consequence use cases.
Minimum controls: Guardrails layer, basic tool scoping, human-in-the-loop for all writes, action audit log.
Tier 2 - Managed (Medium Autonomy)¶
Agents execute read operations and low-consequence writes autonomously. High-consequence actions escalate to human oversight. Inter-agent communication is signed and validated. Behavioral monitoring is continuous with automated anomaly alerting. PACE Alternate and Contingency fully configured.
Required controls: All three security layers, per-agent NHI, signed message bus, Model-as-Judge evaluation, continuous anomaly scoring, PACE A and C configured and tested.
Tier 3 - Autonomous (High Autonomy)¶
Agents operate with minimal human intervention for pre-approved task categories. Human oversight focuses on exception handling and strategic review. Full PACE cycle operational and tested through regular red-team exercises. All seven control domains fully implemented.
Required controls: Everything in Tier 2, plus kill switch tested and auditable, drift detection with baseline comparison, blast radius caps enforced, circuit breakers active, full OWASP coverage validated, regular adversarial testing.
Threat Intelligence¶
| Document | Purpose |
|---|---|
| Incident Tracker | Real-world AI security incidents mapped to framework controls, with confidence ratings and prevention mechanisms |
| Emerging Threats | 8 forward-looking threat patterns: cross-agent worms, agent collusion, transitive authority exploitation, MCP supply chain, epistemic cascading failure, memory poisoning, A2A protocol attacks, AI vs AI defences |
Threat Intelligence Grounding¶
Every control in MASO is grounded in observed or demonstrated attack patterns:
Confirmed Incidents (2025): EchoLeak (indirect prompt injection → data exfiltration, informs ASI01/LLM01), Amazon Q Exploit (tool misuse via manipulated inputs, informs ASI02), GitHub MCP Exploit (poisoned MCP server components, informs ASI04), AutoGPT RCE (natural language → code execution, informs ASI05), Gemini Memory Attack (persistent memory poisoning, informs ASI06), Replit Meltdown (rogue agent behavior, informs ASI10).
Emerging Patterns: Multi-agent consensus manipulation via shared knowledge base poisoning (ASI09), transitive delegation attacks creating implicit privilege escalation, agent-to-agent prompt injection through inter-agent output, credential harvesting via poisoned MCP tool descriptors, behavioral slow drift evading threshold-based detection.
Red Team Operations¶
| Document | Purpose |
|---|---|
| Red Team Playbook | 13 structured test scenarios across three tiers - from inter-agent injection propagation to PACE transition under attack. Includes test results template and reporting metrics |
Integration & Examples¶
| Document | Purpose |
|---|---|
| Integration Guide | MASO control implementation patterns for LangGraph, AutoGen, CrewAI, and AWS Bedrock Agents. Framework comparison matrix, per-control mapping, and architecture-specific guidance |
| Worked Examples | End-to-end MASO implementation for investment research (financial services), clinical decision support (healthcare), and grid operations (critical infrastructure). Includes PACE failure scenarios |
Stress Testing at Scale¶
| Document | Purpose |
|---|---|
| Stress Testing MASO at Scale | Tabletop methodology for identifying framework breakpoints as agent count grows from single digits to 100+. Eight stress dimensions covering epistemic cascade depth, delegation graph complexity, cross-cluster PACE cascades, observability volume, provider concentration, data boundary enforcement, kill switch practicality, and compound attack surfaces |
Regulatory Alignment¶
MASO inherits the parent framework's regulatory mappings and extends them to multi-agent-specific requirements:
| Regulation/Standard | Relevant Articles/Clauses | MASO Relevance |
|---|---|---|
| EU AI Act | Art. 9, 14, 15 | Human oversight proportional to autonomy level. PACE provides the operational model. |
| NIST AI RMF | Govern, Map, Measure, Manage | Control domains map directly: Observability → Measure, Execution Control → Manage. |
| ISO 42001 | §8.1-8.6, Annex A/B | Per-agent risk assessment and control assignment. |
| MITRE ATLAS | Agent-focused techniques (Oct 2025) | Threat intelligence aligned with ATLAS agent-specific attack techniques. |
| DORA | Art. 11 | Digital operational resilience for AI agents in financial services. PACE provides the resilience model. |
| APRA CPS 234 | Information Security | Australian prudential requirements for AI agent deployments in financial services. |
Operational Concerns¶
These questions come up in every MASO deployment. The answers sit across the framework - collected here so you don't have to hunt.
| Question | Where It's Answered |
|---|---|
| What does the Judge layer cost? When should it run async? | Cost & Latency - sampling rates, latency budgets, tiered evaluation cascade |
| What if the Judge is wrong or manipulated? | Judge Assurance · When the Judge Can Be Fooled · Privileged Agent Governance |
| How do we prevent operator fatigue at scale? | Human Factors - skill development, alert fatigue, canary testing, challenge rates |
| How do we vet models, tools, and MCP servers? | Supply Chain Controls - AIBOM, signed manifests, model provenance, dependency scanning |
| What emergent risks have no OWASP equivalent? | Emergent Risk Register - 34 risks across 9 categories including epistemic, coordination, and inference-side attacks |
| How do we evaluate whether agents are doing what they were designed to do? | Objective Intent - developer-declared OISpecs for every agent, judge, and workflow. Tactical judges evaluate individual compliance, strategic evaluators assess aggregate behavior, judge meta-evaluators close the watchmen loop |
| Won't multiple judges create "judge hell"? How many evaluation agents do I actually need? | Privileged Agent Governance - evaluation roles vs. services, judge necessity decision framework, deployment topology. Judge Assurance - the judge necessity test and ROI assessment. Execution Control - how the conceptual architecture maps to actual services |
| What happens when judges disagree? (e.g. fraud says flag, security says approve) | Privileged Agent Governance - precedence orders, most-restrictive-wins default, conflict logging, pattern tracking |
| What about ethics, bias, and fairness evaluation? | Privileged Agent Governance - offline policy-driven evaluation outside the inline architecture. Statistical portfolio monitoring, external signals (complaints, appeals, demographics), findings feed back as guardrail tuning. See Evaluation Architecture diagram |
| What does the full evaluation stack cost, not just one judge? | Cost & Latency - compound cost model for cloud judge vs. SLM scenarios, with fraud detection worked example |
| How much latency does multi-layer evaluation add to time-sensitive workflows? | Cost & Latency - sync vs. async breakdown, critical-path analysis for fraud detection and trading compliance |
| How do I get a single audit view of a multi-agent decision? | Observability - Decision Trace format collapsing the full evaluation chain into one auditable document per decision |
| What about DLP, API validation, database controls, and existing IAM? | Defence in Depth Beyond the AI Layer - MASO controls are a layer within your wider security architecture, not a replacement for it. External DLP (inbound and outbound), API gateways, database access controls, SIEM, and secure coding practices all apply |
Relationship to Parent Framework¶
MASO is the multi-agent extension of AI Runtime Security. It inherits the three-layer defence model, PACE resilience methodology, risk classification matrix, and regulatory mapping framework. It also inherits the core philosophy: controls are proportionate to risk, organisations select what they need based on their own context, and the goal is reducing harm rather than imposing process.
It extends into multi-agent territory by addressing multi-model orchestration security, inter-agent communication integrity, the OWASP Agentic Top 10 (2026), compound risk dynamics, Non-Human Identity management, and kill switch architecture.
File Structure¶
README.md # This document
controls/
├── prompt-goal-and-epistemic-integrity.md
├── identity-and-access.md
├── data-protection.md
├── execution-control.md
├── observability.md
├── supply-chain.md
├── objective-intent.md
└── risk-register.md
threat-intelligence/
├── incident-tracker.md
└── emerging-threats.md
red-team/
└── red-team-playbook.md
integration/
└── integration-guide.md
examples/
└── worked-examples.md
implementation/
├── tier-1-supervised.md
├── tier-2-managed.md
└── tier-3-autonomous.md
stress-test/
└── 100-agent-stress-test-overview.md
MASO 2.0: Anticipated Changes¶
Anticipated Changes to AI and Framework: MASO 2.0
Six AI capability trajectories that will stress or break the current framework, with architectural responses and a phased roadmap:
| Evolution Vector | Framework Impact | MASO 2.0 Response |
|---|---|---|
| Judge ceiling | Primary models exceed Judge evaluation capability | Verifiable action constraints, evidence-based reasoning, domain-specific verification oracles, ensemble Judge |
| Human oversight scaling | Transaction review becomes untenable at agent scale | Humans shift to governance over review, outcome-based oversight, automated escalation triage |
| Session boundary dissolution | Persistent/ambient agents invalidate session-based analysis | Continuous behavioral streams, intent inheritance, memory integrity as core control |
| Multi-agent emergent behaviors | Fleet interactions produce unanticipated states | Interaction graph analysis, fleet-level baselines, composition constraints, emergent behavior simulation |
| AI-vs-AI adversarial dynamics | Offensive AI outpaces human-speed defense updates | Continuous adversarial simulation, adaptive guardrails, Judge unpredictability |
| Regulatory divergence | Jurisdictions impose conflicting requirements | Jurisdiction-aware control profiles, compliance evidence automation |
Three-phase roadmap: Extend (0–6 months) → Architect (6–18 months) → Paradigm shift (18–36 months).
Learning
Learn the MASO Framework
AIruntimesecurity.co.za provides structured learning paths for the Multi-Agent Security Operations framework, from core concepts through to implementation.
What's Next¶
The framework core, implementation tiers, control domain specifications, threat intelligence, red team playbook, integration guide, and worked examples are complete. Planned extensions:
- Terraform/CloudFormation modules for automated MASO infrastructure deployment on AWS and Azure.
- Compliance evidence packs - pre-built documentation sets for ISO 42001, NIST AI RMF, and EU AI Act audits.
- Agent orchestration security benchmark - quantitative scoring methodology for multi-agent system security posture.