Validated Against Real Incidents¶
Every major control in this framework addresses a documented, public AI security failure. This page shows how.
Part of AI Runtime Security Last updated: March 2026
How to Read This Page¶
The Incident Tracker is organised by incident: "here's what happened, here are the controls." This page inverts that view. It's organised by control: "here's the control, here are the real-world incidents it addresses."
Each control is mapped to the incidents it would have prevented or detected, with the specific mechanism explained. Controls aligned to more incidents address a wider range of known attack patterns. Controls aligned to zero incidents are flagged. They may still be valuable, but they're based on threat modelling rather than observed attacks.
Validation does not mean proven. It means the control addresses a documented attack pattern. Whether the control would have actually prevented the incident in your environment depends on your implementation. This is retroactive analysis, not a guarantee.
Confidence ratings are inherited from the Incident Tracker:
| Rating | Meaning |
|---|---|
| High | Controls directly and deterministically prevent the failure. The mechanism is concrete and testable. |
| Moderate | Controls significantly reduce the risk but cannot fully eliminate it. The failure class has inherent uncertainty. |
Validation Summary¶
Controls by Incident Alignment¶
| Alignment Level | Criteria | Control Count |
|---|---|---|
| Strong | Addresses 3+ real incidents | 6 controls |
| Moderate | Addresses 1–2 real incidents | 14 controls |
| Threat-modelled | Based on emerging threat analysis, not yet observed in production | Remaining controls |
Most-Aligned Controls¶
These controls are referenced across the highest number of documented incidents. They address the widest range of known attack patterns.
| Rank | Control | Incidents | Incident Alignment |
|---|---|---|---|
| 1 | Input guardrails / context sanitisation | 7 of 9 | INC-01, 02, 03, 04, 05, 06, 09 |
| 2 | Model-as-Judge gate (exfiltration, query validation, citation verification) | 6 of 9 | INC-01, 02, 05, 06, 07, 08 |
| 3 | Circuit breaker | 5 of 9 | INC-01, 02, 03, 05, 08 |
| 4 | Tool scoping / capability constraints | 5 of 9 | INC-01, 02, 04, 05, 06 |
| 5 | Audit logging | 5 of 9 | INC-01, 04, 06, 07, 09 |
What this tells you: If you implement nothing else, input guardrails and an independent Judge gate address the widest range of documented attack patterns. Circuit breakers provide the safety net when prevention fails. This is consistent with the framework's core architecture: Guardrails prevent, Judge detects, Circuit breaker contains.
Control-by-Control Validation¶
Untrusted Content Isolation¶
Incident alignment: Moderate (1 incident) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-01: Copilot EchoLeak | Email body and attachments tagged as untrusted data, never instruction. Prevents LLM from treating email content as executable commands |
Why this matters: The root cause of indirect prompt injection is that AI systems treat all input as potential instruction. Untrusted content isolation enforces the instruction/data boundary at the protocol level. This control is architecturally simple but addresses the most common AI attack primitive.
Input Guardrails / Context Sanitisation¶
Incident alignment: Strong (7 incidents) · Confidence: High
The single most broadly validated control. Addresses the widest range of attack vectors because prompt injection (direct and indirect) is the most common AI attack primitive.
| Incident | Attack Vector | How This Control Helps |
|---|---|---|
| INC-01: Copilot EchoLeak | Indirect injection via email | Detects injection patterns in email content before LLM processes it |
| INC-02: Copilot Reprompt | URL parameter injection | Sanitises URL parameters before they enter LLM context; strips injection payloads |
| INC-03: LangChain SQLi | Prompt injection → Cypher queries | Sanitises user input before query generation; detects SQL/Cypher injection patterns |
| INC-04: LangChain Experimental | Injection → code execution | Detects injection patterns that attempt to invoke arbitrary capabilities |
| INC-05: HackerOne exfil | Injection in user-supplied content | Catches injection payloads embedded in documents and messages |
| INC-06: Claude Code Interpreter | Injection → file read + exfil | Detects injection patterns that attempt to access file system or network |
| INC-09: Chevrolet $1 | Direct prompt override | Detects attempts to override system instructions with user-supplied objectives |
Limitations: Guardrails are pattern-based. They catch known injection techniques effectively but can be evaded by novel or highly contextual attacks. This is exactly why the framework pairs guardrails with Judge evaluation.
Tool Scoping / Capability Constraints (Least Privilege)¶
Incident alignment: Strong (5 incidents) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-01: Copilot EchoLeak | Retrieval tools limited to context required for the current task; blocks access to sensitive mailbox data outside authorised scope |
| INC-02: Copilot Reprompt | Outbound tools restricted to explicitly authorised targets; exfiltration endpoints blocked |
| INC-04: LangChain Experimental | Only explicitly approved tools available to the LLM; arbitrary capability invocation denied by default |
| INC-05: HackerOne exfil | Each tool has defined scope of what data it can access and where it can send it |
| INC-06: Claude Code Interpreter | File-system access restricted to defined directory scope; network egress limited to approved endpoints |
Why this is fundamental: Five separate incidents across different platforms and attack types would have been prevented or contained by this single control principle. If your AI system can invoke tools, least privilege is non-negotiable.
Capability Allowlisting / Tool Invocation Policy¶
Incident alignment: Moderate (2 incidents) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-04: LangChain Experimental | Only explicitly approved tools and functions available to the LLM; all others denied by default |
| INC-06: Claude Code Interpreter | Capability segmentation: file read capabilities and network capabilities operate under separate permission grants |
Structured Query Enforcement / Deterministic Query Builder¶
Incident alignment: Moderate (1 incident) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-03: LangChain SQLi | LLM selects from parameterised query templates rather than composing raw queries, eliminating arbitrary query composition entirely |
Why this is deterministic: This control doesn't depend on probabilistic detection. The LLM physically cannot compose arbitrary SQL/Cypher because the architecture only allows parameterised queries. This is the gold standard for AI-to-database interaction security.
Database Least-Privilege Role¶
Incident alignment: Moderate (1 incident) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-03: LangChain SQLi | Database connection uses minimum required permissions (read-only where possible). Limits blast radius even if a query escapes validation |
Execution Sandboxing¶
Incident alignment: Moderate (1 incident) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-04: LangChain Experimental | Code execution occurs in an isolated sandbox with no access to the host system. Even if code execution is triggered, blast radius is contained |
Model-as-Judge Gate (Various Specialisations)¶
Incident alignment: Strong (6 incidents) · Confidence: High (injection incidents), Moderate (hallucination incidents)
The second most broadly validated control category. Deployed as specialised judges for different failure classes.
| Incident | Judge Specialisation | How This Control Helps |
|---|---|---|
| INC-01: Copilot EchoLeak | Exfiltration judge | Evaluates whether outbound actions contain data that shouldn't leave the session |
| INC-02: Copilot Reprompt | Exfiltration detection judge | Evaluates outbound requests for signs of data leakage |
| INC-05: HackerOne exfil | Dual-control (Judge as second factor) | Actions involving sensitive data transmission require Judge confirmation |
| INC-06: Claude Code Interpreter | Sensitive data exfil judge | Evaluates code interpreter actions for patterns consistent with data exfiltration |
| INC-07: Air Canada hallucination | Citation verification judge | Checks that cited policies match the actual source documents and catches hallucinated citations |
| INC-08: NYC MyCity | Regulatory output validator | Evaluates legal/regulatory outputs against source law and catches contradictions |
Important distinction: For injection-based incidents (INC-01, 02, 05, 06), the Judge provides High-confidence defence as an independent second layer. For hallucination incidents (INC-07, 08), the Judge significantly reduces risk but can't fully eliminate it. Subtle hallucinations that are semantically close to the source material may evade verification.
Grounded Response Requirement / Mandatory Source Citation¶
Incident alignment: Moderate (2 incidents) · Confidence: Moderate
| Incident | How This Control Helps |
|---|---|
| INC-07: Air Canada hallucination | Constrains chatbot to cite verified policy documents rather than generating interpretations |
| INC-08: NYC MyCity | Constrains chatbot to retrieve and cite actual regulatory text, not generate interpretations |
Why Moderate confidence: Grounding eliminates the most egregious hallucinations. The Air Canada chatbot couldn't have invented a non-existent policy if it was constrained to citing the actual policy document. But generative models can still produce subtle misinterpretations of grounded content. The framework's position is that policy and regulatory advice should use retrieval-only architectures where possible.
Human Escalation for High-Impact Outputs¶
Incident alignment: Moderate (2 incidents) · Confidence: Moderate
| Incident | How This Control Helps |
|---|---|
| INC-07: Air Canada hallucination | Responses involving financial commitments or policy advice routed to human review |
| INC-08: NYC MyCity | Questions involving discrimination law, tenant rights, and labour law routed to human review |
Authority Separation (LLM Proposes, System Commits)¶
Incident alignment: Moderate (1 incident) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-09: Chevrolet $1 | The LLM can suggest prices and offers but has no ability to make binding commitments. All commitments flow through a deterministic approval system |
Why this is deterministic: Authority separation isn't a probabilistic control. The LLM physically cannot make binding commercial commitments because the architecture separates proposal from commitment. The $1 car offer would never have been confirmable because no approval workflow would have validated it.
Transactional Approval Workflow / Offer-Policy Validator¶
Incident alignment: Moderate (1 incident) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-09: Chevrolet $1 | All pricing and offer responses validated against current business rules before being served. Selling a $50K vehicle for $1 fails policy validation |
Outbound Data Classification / Egress Anomaly Detection¶
Incident alignment: Strong (3 incidents) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-01: Copilot EchoLeak | Outbound traffic monitored for sensitive data patterns; anomalous retrieval triggers alert |
| INC-05: HackerOne exfil | All outbound data classified before transmission; sensitive data blocked from unauthorised destinations |
| INC-06: Claude Code Interpreter | Network egress controlled; outbound traffic to unapproved endpoints blocked |
Circuit Breaker¶
Incident alignment: Strong (5 incidents) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-01: Copilot EchoLeak | Automated halt when retrieval patterns deviate from baseline |
| INC-02: Copilot Reprompt | Unusual outbound request patterns trigger automatic session termination |
| INC-03: LangChain SQLi | Queries matching destructive patterns blocked and session terminated |
| INC-05: HackerOne exfil | Automatic session termination when egress anomalies exceed threshold |
| INC-08: NYC MyCity | Error-rate monitoring triggers automatic scope restriction or shutdown |
Why circuit breakers appear so often: They're the last line of defence. When guardrails miss an injection, when the Judge doesn't catch a subtle attack, the circuit breaker provides a hard stop based on observable behavior anomalies. Five of nine incidents would have been contained by this single control.
Audit Logging / Action Logging¶
Incident alignment: Strong (5 incidents) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-01: Copilot EchoLeak | All retrieval and outbound actions logged with source attribution |
| INC-04: LangChain Experimental | All tool invocations and parameters logged with full context |
| INC-06: Claude Code Interpreter | All file access and network operations logged with context and timing |
| INC-07: Air Canada hallucination | All policy-related responses logged with source citations for accountability |
| INC-09: Chevrolet $1 | All customer interactions and proposed responses logged with policy validation results |
Why logging is a control, not just compliance: In three of these incidents (INC-01, 04, 06), audit logs would have enabled detection of the attack during exploitation, not just after. In two (INC-07, 09), they provide the accountability trail that prevents "we didn't know" defences.
Confidence Threshold Enforcement¶
Incident alignment: Moderate (1 incident) · Confidence: Moderate
| Incident | How This Control Helps |
|---|---|
| INC-07: Air Canada hallucination | Responses below confidence threshold are withheld or qualified with uncertainty language |
Commitment Circuit Breaker (Domain-Specific)¶
Incident alignment: Moderate (1 incident) · Confidence: High
| Incident | How This Control Helps |
|---|---|
| INC-09: Chevrolet $1 | Responses containing commitment language ("binding," "guarantee," "we agree to") automatically blocked |
Validation Coverage Map¶
By Control Category¶
| Category | Controls Validated | Coverage Pattern |
|---|---|---|
| Input controls (guardrails, sanitisation, isolation) | 3 | Broad: addresses 7+ incidents |
| Execution controls (tool scoping, sandboxing, query enforcement) | 5 | Moderate: addresses 1–5 incidents each |
| Judge / evaluation controls | 3 specialisations | Broad: addresses 6 incidents across specialisations |
| Output controls (grounding, authority separation) | 3 | Moderate: addresses 1–2 incidents each |
| Detection controls (anomaly, egress, audit) | 3 | Broad: addresses 3–5 incidents each |
| Containment controls (circuit breaker) | 2 | Broad: addresses 5 incidents |
Confidence Distribution¶
| Confidence | Incident Count | Pattern |
|---|---|---|
| High | 7 of 9 | Injection, exfiltration, unauthorised agency. Deterministic controls directly prevent |
| Moderate | 2 of 9 | Both hallucination incidents. Inherently probabilistic failure class |
What's Not Yet Validated¶
Controls in these categories are based on threat modelling and architectural reasoning, not observed incidents:
-
Epistemic integrity (claim provenance enforcement, self-referential evidence prohibition, uncertainty preservation). These address multi-agent amplification of misinformation. No public incident reports exist because organisations either aren't detecting them or aren't disclosing them. The threat model is strong, but the evidence is research-based, not incident-based.
-
Inter-agent communication controls (message source tagging, inter-agent injection detection). These address the AI worm attack class (Morris II proof-of-concept). The PoC is documented but no production incident has been reported yet.
-
Advanced identity and access (zero-trust agent credentials, non-human identity lifecycle). These extend standard NHI patterns to AI agents. The patterns are proven in traditional service-to-service authentication; the extension to AI agents is logical but not yet documented in public incidents.
-
Tier 3 autonomous controls (self-healing PACE, adversarial testing suites, independent kill switch). These are designed for fully autonomous multi-agent systems, which are still rare in production. The controls are architecturally sound but won't be incident-validated until autonomous systems are common enough to be attacked.
How This Page Evolves¶
This is a living document. As new AI security incidents are publicly disclosed:
- They're added to the Incident Tracker
- The control mappings on this page are updated
- Controls that were "threat-modelled only" may be upgraded to "incident-validated"
- New controls may be added if incidents reveal gaps
If you know of a public AI security incident not listed here, open an issue. We'll map it to controls and update both pages.