AI Security Cheat Sheet¶

Classify. Control. Define fail posture. Test. One page.

This document uses the simplified three-tier system (Tier 1/2/3). See Risk Tiers - Simplified Tier Mapping for the mapping to LOW/MEDIUM/HIGH/CRITICAL.

1. Classify¶

All four true?	→ Fast Lane - self-certify, deploy in days
Internal users only	Read-only (no write to external systems)
No regulated data (PII, financial, health, legal)	Human reviews before acting on output

If any criterion fails, classify by the highest applicable:

Tier	When	Example
1 - Low	Internal users. May have write access or unreviewed output. No regulated decisions.	Internal chatbot, code assistant, meeting summariser
2 - Medium	Customer-facing. Human reviews before delivery.	Customer support draft, document processing, decision support
3 - High	Regulated decisions, autonomous agents with write access, financial/medical/legal.	Loan decisioning, autonomous trading, clinical support

2. Controls Required¶

Control	Fast Lane	Tier 1	Tier 2	Tier 3
Guardrails	Basic filter	Standard	Full suite + injection detection	Hardened, multi-layer
Model-as-Judge	-	10–20% sample	100% async	100% dual-model, pre+post action
Human Oversight	-	-	Dedicated reviewers, SLA-bound	Domain experts, dual approval
Circuit Breaker	Feature flag	Feature flag	Automated health-check	Automated + staffed fallback
Usage Logging	Yes	Yes	Yes	Yes

Agentic add-ons (if agent has write access): tool permission matrix, transaction resolution plan, multi-agent cascade prevention, 5-phase degradation path (Tier 2+), token budget monitoring with context rotation (all tiers).

3. Fail Posture¶

For each control, define: when it fails, does the system fail-open or fail-closed?

Tier	Default	What It Means
Fast Lane / Tier 1	Fail-open	Pass traffic. Log. Fix next business day.
Tier 2	Fail-closed	Block AI traffic. Auto-switch to fallback.
Tier 3	Fail-closed always	No AI traffic passes a degraded control. No exceptions.

Fallback path:

Tier	Fallback	Speed	Maintenance
Fast Lane	Manual process (already exists)	Hours	Near zero
Tier 1	Manual process (documented)	Hours	Near zero
Tier 2	Rule-based / templated	Minutes (auto)	Quarterly
Tier 3	Staffed parallel process	Seconds (auto)	Monthly

4. Agentic Degradation Path¶

If deploying an agent, define these five phases before go-live:

Phase	Autonomy	What Changes
Normal	Full	All controls active
Constrained	Reduced	Read-only tools, tightened thresholds, all outputs reviewed
Supervised	Propose only	Human approves every action
Bypassed	Isolated	Non-AI fallback active, agent quarantined
Full Stop	None	All sessions terminated, incident response

For each tool the agent uses, answer: Can the action be rolled back? Completed without the agent? Is partial completion dangerous?

4b. Multi-Agent Systems¶

If deploying multiple agents that communicate, delegate, or act across trust boundaries, single-agent controls are necessary but not sufficient. The MASO Framework adds seven control domains on top of the foundation.

MASO Control	What It Addresses
Prompt, Goal & Epistemic Integrity	Injection propagation across agents, goal drift, hallucination amplification, groupthink
Identity & Access	Non-Human Identity per agent, no shared credentials, no transitive authority
Data Protection	Cross-agent data fencing, DLP on the message bus, memory isolation
Execution Control	Sandboxed execution, blast radius caps, Model-as-Judge gate, interaction timeouts
Observability	Decision chain audit, anomaly scoring, drift detection, independent kill switch
Supply Chain	AIBOM per agent, signed tool manifests, MCP server vetting
Privileged Agent Governance	Elevated controls for orchestrators, planners, and meta-agents with disproportionate authority

Implementation tiers: Tier 1 - Supervised (human approves all writes) → Tier 2 - Managed (auto-approve low-risk, escalate high-risk) → Tier 3 - Autonomous (self-healing PACE, adversarial testing, kill switch).

Key difference from single-agent: PACE extends to agent orchestration. When one agent fails, the system isolates that agent and tightens permissions across the chain - not just within a single model's control layers.

→ Full MASO Framework

5. Test¶

Test	Fast Lane	Tier 1	Tier 2	Tier 3
Feature flag / kill switch works	Annual	Annual	Quarterly	Monthly
Control layer failure simulation	-	Annual	Quarterly	Monthly
Human escalation exercise	-	Annual	Quarterly	Quarterly
Full degradation walkthrough	-	-	Semi-annual	Quarterly
Non-AI fallback operation	Annual	Annual	Quarterly	Monthly
Recovery (step back up)	-	Annual	Quarterly	Monthly

The Six Questions¶

Every AI deployment must answer these before production:

What tier is this?
What controls does it need?
Fail-open or fail-closed?
What's the fallback path?
Has it been tested?
Is this multi-agent? If yes → apply MASO controls on top of the foundation.
What happens when context fills up? Token exhaustion degrades guardrails, increases hallucinations, and weakens instruction-following, and it degrades the Judge monitoring it at the same time. Define context rotation strategy, alerting thresholds (70/85/95%), and fail-closed behavior. See MASO OP-04.

If you can answer all seven, you're ready. If you can't, you're not.