Foundation · Single agent

You can't test your way to a safe AI.

Same prompt, same model, different answer. Every time. Tests prove the system can behave. They can't prove it will on the next request. So you verify it live.

Single-agent security architecture: guardrails, reviewing controls, and human oversight wrapped around one AI model, with a circuit breaker behind them
One model, four layers. Each layer catches what the one before it misses: compound defence by design, not by coincidence.

The pattern

Four layers, one principle: verify in production.

You cannot fully test a non-deterministic system before deployment, so the controls form a closed loop that watches behaviour as it happens.

01

Guardrails

Deterministic limits at machine speed (~10ms): content filters, PII detection, topic and rate limits.

02

Reviewing controls

Scanners, a semantic firewall, and a model-as-judge evaluate the output against policy, grounding, and intent.

03

Human oversight

The accountability backstop. Spot checks for low risk, approval before commit for high risk.

04

Circuit breaker

Stops AI traffic and switches to a non-AI fallback when any layer fails. A full stop, not a degrade.

Why containment beats evaluation →

Defence in depth beyond the AI layer

The four-layer model addresses controls specific to non-deterministic AI behaviour. It does not replace the security you already have. It sits inside it.

Your DLP still applies to data flowing in and out. API gateways still validate requests whether the caller is human or AI. Database access controls and parameterised queries still prevent injection even if an agent builds a malicious query. IAM still governs who can invoke AI at all, and your SIEM still correlates AI events with everything else. When one of these catches something, it is your safety net.

For multi-agent systems, MASO Environment Containment formalises this: harden every system the agent touches so misbehaviour is structurally harmless, whatever the agent intends.