Single-Agent Overview¶
Most AI governance guidance assumes you can test your way to safety. You cannot.
AI systems are non-deterministic. Same prompt, same model, same parameters, different response. Every time. Your test suite proves the system can behave correctly. It cannot prove it will on the next request.
This page is a one-screen overview of the single-agent pattern. When you want the control reference, the checklist, the risk tiers, or the specialised controls, go to Core Controls.
Architecture¶
Three layers, one principle: you cannot fully test a non-deterministic system before deployment, so you continuously verify behaviour in production. This is a closed-loop control system, not evaluate-once-and-deploy.
Guardrails block known-bad inputs and outputs at machine speed (~10ms). Deterministic pattern matching: content filters, PII detection, topic restrictions, rate limits. These are action-space constraints that leave the model's reasoning unconstrained.
Model-as-Judge catches unknown-bad through independent model evaluation. A large LLM running asynchronously (500ms to 5s), or a distilled SLM running inline (10ms to 50ms). Enterprise-owned and configured, evaluating outputs against policy, factual grounding, tone, and safety criteria.
Human Oversight provides the accountability backstop. Scope scales with risk: low-risk systems get spot checks, high-risk systems get human approval before commit. Humans decide edge cases, humans own outcomes.
Circuit Breaker stops all AI traffic and activates a non-AI fallback when any layer fails. Not a degradation, a full stop with a predetermined safe state.
Each layer is specifically designed to catch what the previous layer misses. This is compound defence by design, not defence-in-depth by coincidence. For the full argument, see Why Containment Beats Evaluation.
Risk-Scaled Controls¶
Controls scale to risk so that low-risk AI moves fast and high-risk AI stays safe.
| Risk Tier | Controls Required | PACE Posture |
|---|---|---|
| Low | Fast Lane: minimal guardrails, self-certification | P only (fail-open with logging) |
| Medium | Guardrails + Judge, periodic human review | P + A configured |
| High | All three layers, human-in-the-loop for writes | P + A + C configured and tested |
| Critical | Full architecture, mandatory human approval | Full PACE cycle with tested E→P recovery |
Classify your system: Risk Tiers. Understand the failure modes: PACE Resilience.
Defence in Depth Beyond the AI Layer¶
The three-layer model addresses controls specific to non-deterministic AI behaviour. It does not replace the security controls your organisation already has. It sits inside them.
Your existing DLP applies to data flowing into and out of AI systems. API gateways validate requests regardless of whether the caller is human or AI. Database access controls and parameterised queries prevent injection even if an agent constructs a malicious query. IAM governs who can invoke AI in the first place. SIEM correlates AI events with network, endpoint, and application events. Secure coding practices in the systems agents interact with matter arguably more, because the caller is now non-deterministic.
These controls are outside the scope of this reference, but they are part of your defence. When you threat-model, include them. When one of these catches something, it is your safety net.
For multi-agent systems, the MASO Environment Containment strategy formalises this principle: harden every system the agent connects to so that agent misbehaviour is structurally harmless regardless of the agent's intent.
Where to Next¶
| If you want to... | Go here |
|---|---|
| Ship your first LLM feature | AIRSLite |
| Deploy low-risk AI fast | Fast Lane |
| Get working code in 30 minutes | Quick Start |
| See every single-agent control | Core Controls |
| Classify a system by risk | Risk Tiers |
| Enforce controls at the infrastructure layer | Infrastructure |
| Secure a multi-agent system | MASO Framework |