Insights¶

The why before the how. Each article identifies a specific problem that the core controls and worked examples then solve. Together, they make the case for risk-proportionate runtime controls that reduce harm without imposing disproportionate process.

New here? Follow the Golden Thread.

The Reading Paths page sequences these articles into a guided walkthrough of the framework. The Golden Thread takes you from why runtime security? through which controls? to how do they improve over time? in roughly two hours. If you do not know where to start, start there.

Core Arguments¶

Six articles carry the main argument. Read these, in order, and you have the case for runtime AI security in full. Everything else in Deep Dives elaborates, extends, or stress-tests these claims.

#	Article	The argument
1	Why AI Security Is a Runtime Problem	AI systems are non-deterministic. Pre-deployment testing cannot prove future safety. Security must be continuous.
2	Why Your AI Guardrails Aren't Enough	Guardrails catch known-bad patterns. Novel attacks, semantic violations, and emergent behaviour walk straight past them.
3	The Judge Detects. It Doesn't Decide.	An asynchronous LLM evaluator detects unknown-bad against declared intent, without blocking, and informs humans rather than replacing them.
4	Infrastructure Beats Instructions	Telling an agent what not to do fails. Make violations technically impossible through controls enforced outside the agent.
5	Humans Remain Accountable	AI assists decisions; humans own outcomes. The Judge makes oversight scalable, not optional.
6	The Feedback Loops That Make It Work	Four loops at different speeds turn guardrails, judges, humans, and outcomes into a self-improving system. Without them, every layer degrades.

After the core six

If those six landed and you want more depth on a specific layer, the Golden Thread adds Containment Through Intent, Process-Aware Evaluation, The Constraint Curve, Practical Guardrails, Judge Assurance, and What Works to fill out the full architecture.

Deep Dives¶

The rest of the library, grouped by theme. Expand the section you need. If you are trying to match a specific problem to a specific control, the core controls index and the worked examples are usually a faster route.

Foundations: other framing arguments

Alternative entry points into the same thesis. Read these when the core six raise a question or when you want a different angle on the same ground.

Article	One-line summary
The First Control: Choosing the Right Tool	The best way to reduce AI risk is to not use AI where it doesn't belong.
The Model You Choose Is a Security Decision	A flawed model makes every downstream control harder. Evaluate security posture, not just capability.
Why Containment Beats Evaluation	You cannot evaluate your way out of non-determinism. Containment bounds what the system can do, regardless of what it tries.
Security as Enablement, Not Commentary	Security creates value when delivered as platform infrastructure, not as narrative that diagnoses teams from the sidelines.
Risk Tier Is Use Case, Not Technology	Classification is about deployment context, not model capability.

Architecture: how the layers fit

The internal mechanics of the three-layer pattern, its reference points, and its limits.

Article	One-line summary
Practical Guardrails	What guardrails should catch: international PII, RAG filtering, exception governance, five pipeline points.
Containment Through Declared Intent	Declared intent is the organising principle that gives every defence layer its reference point.
The Intent Layer	Mechanical controls constrain what agents can do; semantic evaluation determines whether actions align with objectives.
Process-Aware Evaluation	Evaluating what an agent produced matters less than evaluating how it got there.
The Constraint Curve	Proportionate controls find the peak. Over-constraining destroys the value that justified using an LLM.
The Verification Gap	Current safety approaches cannot confirm ground truth. Solved by Judge Assurance.
Automated Risk Tiering	Classification should take two minutes, produce an immediate result, and auto-apply the controls that make the risk manageable.
The Hallucination Boundary	Tolerance for hallucination is a function of decision authority, blast radius, and reversibility.

Threats and attack surfaces

Where the risk actually lives in production systems: retrieval, tooling, supply chain, memory, and modality.

Article	One-line summary
RAG Is Your Biggest Attack Surface	Retrieval pipelines bypass your existing access controls.
The MCP Problem	The protocol everyone is adopting gives agents universal tool access without authentication, authorisation, or monitoring.
The Supply Chain Problem	You don't control the model you deploy.
The Agent Supply Chain Crisis	The agents you compose are a new supply chain with new failure modes.
The Memory Problem	Long context and persistent memory create new risks.
Multimodal AI Breaks Your Text-Based Guardrails	Images, audio, and video bypass text controls.
Evaluation Integrity Risks	The evaluator can be gamed, poisoned, or quietly wrong.
You Don't Know What You're Deploying	Version drift, silent swaps, and opaque weights turn deployment into a moving target.
The Sandbox Escape Problem	Tool runtimes are harder to contain than the models that call them.

Agentic AI: where the pattern meets its limits

Multi-agent systems, orchestrators, and long-running behaviour.

Article	One-line summary
When Agents Talk to Agents	Multi-agent systems have accountability gaps.
Agentic Drift	Objectives, context, and tools drift away from declared intent over time.
The Orchestrator Problem	The most powerful agents in your system have the least controls applied to them.
The Long-Horizon Problem	Security properties you validated on day one may not hold on day thirty. Time itself is an attack vector.
The Visibility Problem	You can't govern AI you don't know is running. Shadow AI, inventories, and governance KPIs.
When the Pattern Breaks	The three-layer pattern designed for single-agent systems fails to scale in complex multi-agent architectures.
Securing the Connective Tissue	The attack surface has shifted from models to the space between them.

Models and technology

Model-level properties that change what the surrounding controls must do.

Article	One-line summary
When AI Thinks Before It Answers	Reasoning models need reasoning-aware controls.
The Backbone Problem	Concentrated dependency on a handful of backbone models creates systemic risk.
Open-Weight Models Shift the Burden	Self-hosted models inherit the provider's control responsibilities.
Temporal Decay	Correlated model decay degrades every control layer simultaneously.
You Can't Validate What Hasn't Finished	Real-time streaming breaks the validation model.
When Learning Goes Wrong	Online learning and RLHF feedback loops can drift models out of alignment silently.
Beyond Language Models	Code, embeddings, and tool-use models need their own control stories.

Evidence and analysis

Deeper examinations of where the framework meets production reality: what works, what scales, and what the research actually supports.

Article	One-line summary
State of Reality	The AI security threat is real, specific, and concentrated in measurable failure modes.
What Works	Deployed controls are measurably reducing breach detection time and costs.
What Scales	Security controls succeed only if their cost grows slower than the system they protect.
The Evidence Gap	What research actually supports, and where the science hasn't caught up to the architecture.
Risk Stories	Real production incidents show where missing controls caused or worsened failures.
The Flight Recorder Problem	You log what happened but not why, or how to replay it. AI systems need provenance chains, not just event logs.
PACE Resilience	How the three-layer architecture achieves operational resilience through layered, independent control redundancy.
Graph-Based Agent Monitoring	Modelling agent interactions as a live graph to detect anomalous behaviour in near real-time.

References