Foundation · Single agent
You can't test your way to a safe AI.
Same prompt, same model, different answer. Every time. Tests prove the system can behave. They can't prove it will on the next request. So you verify it live.
The pattern
Four layers, one principle: verify in production.
You cannot fully test a non-deterministic system before deployment, so the controls form a closed loop that watches behaviour as it happens.
01
Guardrails
Deterministic limits at machine speed (~10ms): content filters, PII detection, topic and rate limits.
02
Reviewing controls
Scanners, a semantic firewall, and a model-as-judge evaluate the output against policy, grounding, and intent.
03
Human oversight
The accountability backstop. Spot checks for low risk, approval before commit for high risk.
04
Circuit breaker
Stops AI traffic and switches to a non-AI fallback when any layer fails. A full stop, not a degrade.
Risk-scaled controls
Low-risk AI moves fast. High-risk AI stays safe.
Controls scale to consequence, so you spend scrutiny where it matters and nowhere else.
Low
Fast Lane: minimal guardrails and self-certification. PACE runs fail-open with logging.
Deploy low-risk AI fast →Medium & high
Guardrails plus reviewing controls, with human review scaling from periodic to in-the-loop for writes.
Implement the layers →Critical
Full architecture, mandatory approval, and the full PACE cycle with tested recovery.
Understand the failure modes →Defence in depth beyond the AI layer
The four-layer model addresses controls specific to non-deterministic AI behaviour. It does not replace the security you already have. It sits inside it.
Your DLP still applies to data flowing in and out. API gateways still validate requests whether the caller is human or AI. Database access controls and parameterised queries still prevent injection even if an agent builds a malicious query. IAM still governs who can invoke AI at all, and your SIEM still correlates AI events with everything else. When one of these catches something, it is your safety net.
For multi-agent systems, MASO Environment Containment formalises this: harden every system the agent touches so misbehaviour is structurally harmless, whatever the agent intends.
Where to next
Pick the path that matches your job.
Ship a first feature
AIRSLite: seven controls to get an LLM feature out safely.
Go to AIRSLite →Build it properly
The Core Controls library: every single-agent control, checklist, and risk tier.
Open Core Controls →Scale to many agents
MASO: securing systems where agents collaborate and trust gets complicated.
Go to MASO →