AI Runtime Security - Single-Agent Controls¶

Most AI governance guidance assumes you can test your way to safety. You cannot.

AI systems are non-deterministic. Same prompt, same model, same parameters - different response. Every time. Your test suite proves the system can behave correctly. It cannot prove it will on the next request.

AI systems are non-deterministic - same input, different output, by design
Guardrails fail silently - the most dangerous outputs look perfectly normal
Multi-agent systems amplify errors - hallucinations compound, permissions propagate, failures correlate

Guardrails → Judge → Human Oversight → Circuit Breaker

Block known-bad → Detect unknown-bad → Decide edge cases → Fail safely

Start here:

Read the 5-minute executive summary - the entire framework on one page
Explore the technical architecture - three layers, what fails when
Get working controls in 30 minutes - from zero to deployed

Architecture¶

Single-Agent Security Architecture

Three layers, one principle: you can't fully test a non-deterministic system before deployment, so you continuously verify behavior in production. This is a closed-loop control system - not evaluate-once-and-deploy, but constrain-and-continuously-verify.

Layer 1 - Guardrails (containment boundaries) block known-bad inputs and outputs at machine speed (~10ms). Deterministic pattern matching: content filters, PII detection, topic restrictions, rate limits. Permissions derive from business intent - what the use case requires - not from evaluation of the model's capabilities. These are action-space constraints that leave the model's reasoning unconstrained. Every request passes through. No exceptions.

Layer 2 - Model-as-Judge catches unknown-bad through independent model evaluation. The Judge can be a large LLM running asynchronously (500ms to 5s) or a distilled SLM running inline as a sidecar (10ms to 50ms). Either way, enterprise-owned and configured, evaluating task agent outputs against policy, factual grounding, tone, and safety criteria. Catches what guardrails cannot pattern-match, including within-bounds adversarial behavior that containment alone cannot address.

Layer 3 - Human Oversight provides the accountability backstop. Scope scales with risk: low-risk systems get spot checks, high-risk systems get human approval before commit. Humans decide edge cases. Humans own outcomes. Handles what neither containment nor the Judge can resolve autonomously.

Circuit Breaker stops all AI traffic and activates a non-AI fallback when any layer fails. Not a degradation - a full stop with a predetermined safe state.

Each layer is specifically designed to catch what the previous layer misses. This is compound defence by design - not defence-in-depth by coincidence. For the full argument, see Why Containment Beats Evaluation.

This pattern already exists in production at major platforms: NVIDIA NeMo, AWS Bedrock, Azure AI, LangChain, Guardrails AI, and others. This reference provides the vendor-neutral implementation: risk classification, controls, fail postures, and tested fallback paths.

Get Started¶

If you want to...	Go here
Get the whole framework on one page	Cheat Sheet / Decision Poster
Deploy low-risk AI fast	Fast Lane
Understand the concepts in 30 minutes	Quick Start
Implement controls with working code	Implementation Guide
Classify a system by risk	Risk Tiers
Deploy an agentic AI system	Agentic Controls
Understand what happens when controls fail	PACE Resilience
Enforce controls at the infrastructure layer	Infrastructure Controls
Track your implementation	Checklist
Secure a multi-agent system	MASO Framework

Before You Build Controls¶

The First Control: Choosing the Right Tool

The most effective way to reduce AI risk is to not use AI where it doesn't belong. Before guardrails, judges, or human oversight - ask whether AI is the right tool for this problem.

If your deployment is internal, read-only, handles no regulated data, and has a human reviewing output - start with the Fast Lane. You may not need the rest.

Risk-Scaled Controls¶

Controls scale to risk so that low-risk AI moves fast and high-risk AI stays safe. The framework provides a menu of control patterns. Select the ones you need. Consciously deselect the ones you do not. Every organisation has its own way of doing things, and the controls are designed to fit your context rather than impose a single way of working.

Risk Tier	Controls Required	PACE Posture	Use Case Examples
Low	Fast Lane: minimal guardrails, self-certification	P only (fail-open with logging)	Internal chatbots, document summarisation, code assistance
Medium	Guardrails + Judge, periodic human review	P + A configured	Customer-facing content, recommendation engines, search
High	All three layers, human-in-the-loop for writes	P + A + C configured and tested	Financial advice, medical support, regulatory decisions
Critical	Full architecture, mandatory human approval	Full PACE cycle with tested E→P recovery	Autonomous actions on regulated data, safety-critical systems

Classify your system: Risk Tiers

PACE Resilience¶

Every control has a defined failure mode. The PACE methodology ensures that when a control layer degrades - and it will - the system fails safely rather than silently.

Primary: All layers operational. Normal production.

Alternate: One layer degraded. Backup activated. Scope tightened. Example: Judge layer is down → guardrails remain active, all outputs queued for human review.

Contingency: Multiple layers degraded. AI operates in supervised-only mode. Human approves every action. Reduced capacity, high assurance.

Emergency: Confirmed compromise or cascading failure. Circuit breaker fires. AI traffic stopped. Non-AI fallback activated. Incident response engaged.

Even at the lowest risk tier, there's a fallback plan. At the highest, there's a structured degradation path from full autonomy to full stop.

Core Documents¶

Document	Purpose
Cheat Sheet	Entire framework on one page - classify, control, fail posture, test
Decision Poster	Visual one-page reference
Fast Lane	Pre-approved minimal controls for low-risk AI
Risk Tiers	Classify your system, determine control and resilience requirements
Risk Assessment	Quantitative control effectiveness, residual risk per tier, NIST AI RMF aligned
Controls	Guardrails, Judge, and Human Oversight implementation with per-layer fail postures
Agentic	Controls for single autonomous AI agents including graceful degradation path
PACE Resilience	What happens when controls fail
Checklist	Track implementation and PACE verification progress
Emerging Controls	Multimodal, reasoning, and streaming considerations (theoretical)

Infrastructure Controls¶

This section defines what to enforce. The infrastructure section defines how - 80 technical controls across 11 domains, with standards mappings and platform-specific patterns.

Domains: Identity & Access Management (8), Logging & Observability (10), Network & Segmentation (8), Data Protection (8), Secrets & Credentials (8), Supply Chain (8), Incident Response (8), Tool Access (6), Session & Scope (5), Delegation Chains (5), Sandbox Patterns (6).

Standards mappings: Every control maps to the three-layer model, ISO 42001 Annex A, NIST AI RMF, and OWASP LLM/Agentic Top 10.

Platform patterns: AWS Bedrock, Azure AI, and Databricks reference architectures.

Defence in Depth Beyond the AI Layer¶

Defence in Depth Beyond the AI Layer

The three-layer model above - guardrails, judge, human oversight - addresses controls specific to non-deterministic AI behavior. It does not replace the security controls your organisation already has. It sits inside them.

Your existing DLP systems apply to data flowing into and out of AI systems - both preventing sensitive data from reaching models and catching leakage that AI-specific controls miss. API gateways validate requests and enforce schemas regardless of whether the caller is human or AI. Database access controls and parameterised queries prevent injection even if an agent constructs a malicious query. IAM governs who can invoke AI systems in the first place. SIEM correlates AI events with network, endpoint, and application events. Secure coding practices in the systems agents interact with still matter - arguably more, because the caller is now non-deterministic.

These controls are outside the scope of this reference, but they are part of your defence. When you assess your AI security posture, include them. When you threat-model, include them. When one of these controls misses something, they are your safety net.

When You Need Multi-Agent¶

When AI agents collaborate, delegate tasks, and take autonomous actions across trust boundaries, the single-agent controls on this page are necessary but not sufficient. The MASO Framework extends this architecture into multi-agent orchestration.

What MASO adds	Why single-agent controls aren't enough
Inter-agent message bus security	Agents communicating directly create uncontrolled trust boundaries
Non-Human Identity per agent	Shared credentials between agents create lateral movement risk
Epistemic integrity controls	Hallucinations compound across agent chains; confidence inflates without evidence
Transitive authority prevention	Delegation creates implicit privilege escalation
Kill switch architecture	Multi-agent cascading failures require system-wide emergency stop
Dual OWASP coverage	Agentic Top 10 (2026) risks only exist when agents act autonomously

Document	Purpose
MASO Overview	Architecture, PACE integration, OWASP dual mapping, 7 control domains
Tier 1 - Supervised	Low autonomy: human approves all writes
Tier 2 - Managed	Medium autonomy: NHI, signed bus, Model-as-Judge, continuous monitoring
Tier 3 - Autonomous	High autonomy: self-healing PACE, adversarial testing, isolated kill switch
Red Team Playbook	13 adversarial test scenarios for multi-agent systems
Integration Guide	LangGraph, AutoGen, CrewAI, AWS Bedrock implementation patterns
Worked Examples	Financial services, healthcare, critical infrastructure

Extensions¶

Folder	Contents
Regulatory	ISO 42001 and EU AI Act mapping
Technical	Bypass prevention, metrics
Industry Solutions	Guardrails, evaluators, and safety model reference
Templates	Risk assessment templates, implementation plans
Worked Examples	Per-tier implementation walkthroughs

Insights¶

Foundational Arguments

Article	Key Argument
The First Control: Choosing the Right Tool	Design thinking before technology selection
Why Your AI Guardrails Aren't Enough	Guardrails block known-bad; you need detection for unknown-bad
The Judge Detects. It Doesn't Decide.	Async evaluation beats real-time blocking for nuanced decisions
Infrastructure Beats Instructions	You can't secure AI systems with prompts alone
Risk Tier Is Use Case, Not Technology	Classification reflects deployment context, not model capability
Humans Remain Accountable	AI assists decisions; humans own outcomes

Emerging Challenges

Article	Key Argument
The Verification Gap	Current safety approaches can't confirm ground truth
Behavioral Anomaly Detection	Aggregating signals to detect drift from expected behavior
Multimodal AI Breaks Your Text-Based Guardrails	Images, audio, and video create new attack surfaces
When AI Thinks Before It Answers	Reasoning models need reasoning-aware controls
When Agents Talk to Agents	Multi-agent accountability gaps → see MASO
The Memory Problem	Long context and persistent memory introduce novel risks
You Can't Validate What Hasn't Finished	Real-time streaming challenges existing validation
Open-Weight Models Shift the Burden	Self-hosted models inherit the provider's control responsibilities
When the Judge Can Be Fooled	The Judge layer needs its own threat model

Platforms Implementing This Pattern¶

This isn't a theoretical proposal. These platforms already implement variants of this pattern:

Platform	Approach
NVIDIA NeMo Guardrails	Five rail types: input, dialog, retrieval, execution, output
LangChain	Middleware chains with human-in-the-loop
Guardrails AI	Open-source validator framework
Galileo	Eval-to-guardrail lifecycle
DeepEval	Model-as-Judge evaluation framework
AWS Bedrock Guardrails	Managed input/output filtering
Azure AI Content Safety	Content filtering and moderation

Standards Alignment¶

Standard	Relevance	Mapping
OWASP LLM Top 10	Security vulnerabilities in LLM applications	OWASP mapping
OWASP Agentic Top 10	Risks specific to autonomous AI agents	MASO mapping
NIST AI RMF	AI risk management framework	NIST mapping
ISO 42001	AI management system standard	ISO 42001 mapping
NIST SP 800-218A	Secure development for generative AI	SP 800-218A mapping
MITRE ATLAS	Adversarial threat landscape for AI	MASO threat intelligence
DORA	Digital operational resilience	MASO regulatory alignment

Scope¶

In scope: Custom LLM applications, AI decision support, document processing, single-agent systems - from deployment through incident response.

Out of scope: Vendor AI products (use vendor controls), model training (see MLOps security guidance), and pre-deployment testing. This framework is about reducing harm during live operation.

Pre-deployment complement: For secure development practices covering data sourcing, training, fine-tuning, and model release, see NIST SP 800-218A. This framework begins where SP 800-218A ends.

For multi-agent systems: See MASO.

Contributing¶

Feedback, corrections, and extensions welcome. See CONTRIBUTING.md.