AI Runtime Security News¶
Every fortnight, the field runs a live experiment against the framework. This page records what it found, and the control each result puts to the test.
A biweekly roundup of incidents, research, and developments in AI runtime security, weighted toward agentic and multi-agent stories that test the MASO domains. Each item is mapped to the AIRS and MASO controls most relevant to it, so you can see the framework applied to real events as they break. Newest items first.
How to read each entry
Every item carries a short summary of what happened or was published, a framework relevance note tying it to specific AIRS or MASO controls, and a source link to the primary report. The tags below place each item in a framework area.
| Tag | Framework area |
|---|---|
| Guardrails | Input/output containment boundaries |
| Judge | Model-as-Judge evaluation layer |
| Human Oversight | Escalation and human-in-the-loop controls |
| Circuit Breaker | Safe failure and PACE resilience |
| Risk Tiers | Risk classification and proportionate controls |
| IAM | Identity and access management governance |
| Agentic | Agentic AI and multi-agent controls |
| MASO | Multi-Agent Security Operations domains |
| Supply Chain | Model and tool supply chain integrity |
| Multimodal | Multimodal input/output controls |
| Memory & Context | Context window and memory persistence controls |
| Observability | Logging, telemetry, and audit |
| Data Protection | Data leakage prevention and classification |
2026-06-22: Embedding-Based Detection of Malicious Agents Collapses in Multi-Agent Systems¶
Tags: MASO, Judge, Observability
The paper When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems (arXiv:2605.01133) shows that screening inter-agent messages by their embeddings is brittle by construction. An attacker can craft messages whose embeddings sit close to benign ones, demonstrated with three attacks the authors name Slow Drift, Benign Wrapper, and Chaos Seeding. Worse, once agents start exchanging peer views the system becomes self-mixing: benign agents echo attacker-influenced content, every message embedding drifts toward a shared cluster, and the detection signal collapses across rounds. The authors propose falling back on token-level confidence signals such as logits, which can stay informative when embeddings no longer separate, to prune or down-weight suspect messages.
Framework relevance: This directly tests the MASO assumption that inter-agent traffic can be policed by a single anomaly signal. Self-mixing is the mechanism behind ET-01 (Cross-Agent Prompt Injection Worms) and ET-05 (Epistemic Cascading Failure): the more the agents talk, the weaker a similarity-based detector gets. For the Judge layer the implication is concrete, evaluation of inter-agent messages must combine independent signals (semantic judgement, claim provenance, token-level confidence) rather than rely on embedding distance, and Observability must score the trajectory of message similarity across rounds, not just per-message outliers. Validates When Agents Talk to Agents.
2026-06-19: AgentLeak Finds Internal Channels, Not Outputs, Are the Primary Privacy Leak in Multi-Agent Systems¶
Tags: MASO, Data Protection, Memory & Context
AgentLeak (arXiv:2602.11510), a full-stack benchmark for privacy leakage in multi-agent LLM systems, measures where sensitive data actually escapes across 1,000 scenarios in healthcare, finance, legal, and corporate settings, paired with a 32-class attack taxonomy. The central finding is that the leak happens on the inside: sensitive data passes through inter-agent messages, shared memory, and tool arguments, pathways that output-only audits never inspect. Agents forwarded unfiltered sensitive data to external tools in 62% to 86% of scenarios across every model tested, treating tool parameters as internal scratch space rather than a trust boundary, and chain-of-thought reasoning leaked sensitive content into system logs even when the agent ultimately declined to send it onward.
Framework relevance: Internal channels are exactly the inter-agent trust boundary that MASO governs, and the benchmark quantifies why output-only Data Protection is insufficient: DLP has to inspect inter-agent messages and tool-call arguments, not just the final user-facing response. The shared-memory finding maps to Memory and Context isolation, and the tool-argument leakage reinforces Transitive Authority Exploitation: an agent that pours sensitive context into a tool call hands that data to whatever sits behind the tool. The chain-of-thought-to-logs gap is an Observability requirement, reasoning traces are themselves a data-classification surface.
Source: arXiv:2602.11510: AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems
2026-06-17: OWASP GenAI Round-up Confirms Prompt Injection Still Drives Most Agentic Failures in Production¶
Tags: Guardrails, Agentic, MASO
The OWASP GenAI Security Project published its 2026 exploit round-up, and the headline shift is from theory to evidence: where the 2025 edition catalogued plausible threats, the 2026 edition ties CVEs, vendor advisories, and breach reports to nearly every category of agentic risk. Prompt injection remains the number-one LLM vulnerability and, per the audits cited, appears in roughly 73% of production AI deployments. The round-up notes that attackers are increasingly aiming at agent identities, orchestration layers, and supply chains rather than just model outputs, and uses the March 2026 LiteLLM PyPI backdoor (a malicious gateway live for about three hours, with close to 47,000 downloads) as its supply chain exemplar.
Framework relevance: "Orchestration layers and agent identities" is a precise description of the MASO attack surface, and the persistence of prompt injection at the top of the list is the core argument for layered containment: Guardrails cannot be the only defence, which is why the framework pairs them with a Judge and Human Oversight. The shift toward agent identity as a target reinforces IAM Governance and per-agent Non-Human Identity, and the orchestration-layer focus is what Observability of inter-agent activity is built to catch. A standards-body round-up that cites CVEs and breaches gives regulated buyers an external reference for these controls.
Source: OWASP GenAI Exploit Round-up Report Q1 2026 · Help Net Security: Prompt injection still drives most agentic AI security failures in production
2026-06-15: Chained LiteLLM Gateway Flaws (CVSS 9.9) Put the Multi-Agent Control Plane at Risk¶
Tags: Supply Chain, IAM, MASO
Obsidian Security disclosed three chained vulnerabilities in LiteLLM, the open-source model gateway, tracked as CVE-2026-47101, CVE-2026-47102, and CVE-2026-40217 and described as a CVSS 9.9 path from a low-privilege user to administrator access and remote code execution. LiteLLM matters disproportionately because it is the language-model gateway for CrewAI, DSPy, Microsoft GraphRAG, and dozens of other agent frameworks, so a single compromised gateway sits in front of many agents at once. The disclosure follows CISA adding a related AI-gateway flaw, CVE-2026-42271, to its Known Exploited Vulnerabilities catalogue on June 8, 2026.
Framework relevance: A shared gateway is a chokepoint whose compromise reaches every agent routed through it, the connective tissue that MASO supply chain and IAM controls are meant to protect. The low-privilege-to-admin path is an IAM Governance failure: the gateway holds the keys to every downstream model, so it is a high-value machine identity that needs scoped credentials, rotation, and isolation, not a flat trust boundary. The breadth of affected frameworks is ET-13 (Agent Ecosystem Supply Chain Compromise at Scale) in practice, and the CISA KEV listing makes Supply Chain control SC-3.1 (continuous vulnerability scanning of AI infrastructure, not just models) a near-term requirement. Connects to Securing the Connective Tissue.
Source: Penligent: LiteLLM Vulnerability Chain Turns AI Gateways Into a Control Plane Risk · CISA Known Exploited Vulnerabilities Catalog
2026-06-14: Real World AI Security Conference at Stanford (June 23-25) Flagged for Future Coverage¶
Tags: Agentic, MASO
Stanford's Security Lab is hosting the Real World AI Security Conference at the Arrillaga Alumni Center, June 23-25, 2026, bringing together researchers and practitioners focused on production AI security failures rather than theoretical risk. No findings to report yet.
Framework relevance: Flagged here as a likely source for upcoming roundups. Sessions on production incidents and agentic system failures are the kind of primary material this page draws on.
Source: Real World AI Security Conference 2026
2026-06-02: US Executive Order Establishes Voluntary AI Cybersecurity Benchmarking and Vulnerability Clearinghouse¶
Tags: Risk Tiers, Supply Chain, MASO
On June 2, 2026, the US President signed Promoting Advanced Artificial Intelligence Innovation and Security, directing the development of a voluntary framework for benchmarking and reviewing frontier AI models against cybersecurity risk criteria. The order explicitly states it does not create mandatory licensing, pre-clearance, or permitting requirements for AI development or deployment. It also directs Treasury, NSA, and CISA to stand up an AI cybersecurity clearinghouse for identifying and coordinating fixes for vulnerabilities found in AI systems, and directs the Department of Justice to prioritise enforcement against AI-enabled cybercrime.
Framework relevance: The clearinghouse, if implemented as described, would function as a patch-coordination channel specific to AI systems, comparable in concept to CISA's Known Exploited Vulnerabilities catalogue but for AI model and tooling CVEs. For Supply Chain, this is a potential future external feed for SC-3.1 (continuous vulnerability scanning), though the order is voluntary and creates no enforcement mechanism on its own. The benchmarking framework being voluntary means it does not change Risk Tier classification requirements directly, but a government-recognised benchmark would give Tier 2/3 deployments an external reference point for model selection. Worth tracking as the clearinghouse stands up: this is the kind of coordination mechanism ET-17 (Regulatory Fragmentation) anticipated as a partial counterweight.
Source: The White House: Promoting Advanced Artificial Intelligence Innovation and Security
2026-06-02: Microsoft Build 2026 - Agent 365 SDK Reaches General Availability, Execution Containers SDK in Early Preview¶
Tags: Agentic, IAM, Supply Chain
At Build 2026, Microsoft announced that the Agent 365 SDK has reached general availability, providing per-agent identity, registration, and lifecycle management for agents built on Microsoft's stack. The Microsoft Execution Containers (MXC) SDK, which provides sandboxed execution environments for agent-initiated code, was also announced but remains in early preview, not general availability. Both are part of a broader push to secure agentic workflows across the development lifecycle, alongside updates to Defender and Entra for agent identity.
Framework relevance: Agent 365 SDK reaching GA is further vendor confirmation that per-agent Non-Human Identity (IA-2.1) is a shipping product category, not just a framework recommendation organisations have to build themselves. MXC's sandboxed execution model maps to Execution Control and Environment Containment, but its early-preview status means it shouldn't yet be relied on as a sole containment control in Tier 2/3 deployments. Organisations adopting either SDK should still apply the same vetting (SC-2.2, SC-2.3) they'd apply to any other agent framework dependency: the framework-level RCE vulnerabilities disclosed in LangChain, LangGraph, and Microsoft's own Semantic Kernel earlier in 2026 (see Agent Supply Chain Crisis) all involved the orchestration SDK itself, the same category Agent 365 and MXC sit in.
Source: Microsoft: Build 2026 - Securing code, agents, and models across the development lifecycle
2026-06-01: SymJack and TrustFall Break the Coding-Agent Fleet With a Single Technique Each¶
Tags: Supply Chain, Agentic, MASO
Adversa AI's June agentic security roundup (published June 1, 2026) details two disclosures that hit the AI coding-agent ecosystem in May. SymJack (disclosed May 26) is a symlink-hijack remote code execution that compromised six AI coding agents through one shared file-handling flaw, and TrustFall (disclosed May 7) is a one-click RCE that reached Claude Code, Cursor, Gemini CLI, and GitHub Copilot by exploiting a regressed trust-confirmation dialog that had previously gated risky actions. Adversa frames both as products of an "agentic coding gold rush" that is shipping autonomy faster than the surrounding controls mature.
Framework relevance: One technique compromising six agent products simultaneously is the cross-fleet blast radius that MASO supply chain and containment controls exist to bound: a shared dependency or shared design flaw turns a single bug into an estate-wide event, the pattern catalogued in ET-13 (Agent Ecosystem Supply Chain Compromise at Scale) and ET-27 (Coding-agent-as-initial-access-vector). TrustFall is the sharper lesson for Human Oversight: a confirmation dialog that silently regresses removes an approval gate without anyone noticing, which is why Execution Control treats approval prompts as enforced policy at the host, not UI the agent or a refactor can bypass. Reinforces Environment Containment and Infrastructure Beats Instructions: the host running the agent is part of its blast radius.
Source: Adversa AI: Top Agentic AI Security Resources, June 2026
2026-05-22: TrapDoor Supply Chain Campaign Weaponizes CLAUDE.md and .cursorrules Files Against AI Developer Ecosystems¶
Tags: Supply Chain, Agentic, Memory & Context
Beginning May 19, 2026, a campaign tracked as TrapDoor targeted npm, PyPI, and Crates.io with 34 malicious packages and 384 artifact versions, primarily aimed at cryptocurrency and AI developer communities. The distinguishing technique was planting .cursorrules and CLAUDE.md files containing hidden instructions encoded with zero-width Unicode characters (U+200B, U+200C, U+200D, U+FEFF), invisible to human reviewers but parsed by AI coding assistants, which then ran a fake "security scan" that exfiltrated credentials. The attacker also opened pull requests against langchain-ai/langchain, langflow-ai/langflow, and browser-use/browser-use to propagate poisoned configuration files into widely-used AI project repositories. SlowMist described TrapDoor as one of 2026's largest supply chain attacks; the PRs were detected and closed without being merged.
Framework relevance: TrapDoor confirms that AI coding tool configuration files are a persistent instruction injection surface, not merely a misconfiguration risk. Instructions embedded in CLAUDE.md or .cursorrules redefine agent behavior at the architecture level, bypassing conversational guardrails entirely. This extends Supply Chain controls SC-2.2 and SC-2.3 beyond model and tool vetting: every file an agent reads from a repository must be treated as a potential untrusted instruction source. The zero-width Unicode encoding technique evades human review while remaining machine-parseable, making Observability of agent-read configuration files a necessary complement to static supply chain controls. Connects directly to Infrastructure Beats Instructions: the agent's instruction space is determined by what the infrastructure allows it to read.
Source: The Hacker News: TrapDoor supply chain attack spreads · Socket.dev: TrapDoor crypto stealer
2026-05-18: Sleeper Memory Poisoning Achieves Cross-Session Persistence in LLM Agents at Near-Perfect Rates¶
Tags: Memory & Context, MASO, Agentic
Researchers from SPAR, ELLIS Institute Tübingen, MPI for Intelligent Systems, and CISPA Helmholtz Center introduced sleeper memory poisoning: adversarial content embedded in an external document or webpage causes an LLM assistant to write a fabricated memory about the user, which lies dormant across subsequent sessions and activates only when a contextually relevant query arises. Testing across the full pipeline (write, retrieve, and steer phases) achieved poisoning success rates of 99.8% on GPT-5.5 and 95% on Kimi-K2.6. A concurrent paper, MemMorph (arXiv:2605.26154, Nanyang Technological University), demonstrated that targeting long-term memory rather than tool metadata is strictly more effective than prompt injection for tool hijacking, because stored memory content receives far less scrutiny than tool call parameters.
Framework relevance: Sleeper memory poisoning is the production-ready form of ET-06 (Agent Memory Poisoning at Scale). The cross-session dormancy defeats session-isolated memory defenses and requires controls at the memory write pipeline itself: provenance tagging on every memory write identifying the external source that triggered it, integrity checksums on stored memories, and anomaly scoring on memory content changes as a first-class security signal. MemMorph's finding that memory-targeting outperforms prompt injection in effectiveness validates Memory and Context controls as a higher-priority investment than additional input guardrail layers. MASO control PG-2.5 (claim provenance enforcement) should be extended explicitly to memory writes, not just inter-agent message content.
Source: arXiv:2605.15338: Hidden in Memory: Sleeper Memory Poisoning in LLM Agents · arXiv:2605.26154: MemMorph: Tool Hijacking via Memory Poisoning
2026-05-17: Research Proposes Structural Impossibility Framing for Prompt Injection Defense¶
Tags: Guardrails, Agentic, MASO
Sahar Abdelnabi (ELLIS Institute Tübingen / MPI for Intelligent Systems) and Eugene Bagdasarian published a reframing of prompt injection through Contextual Integrity theory, decomposing context into sender identity, transmission principle, information type, and normative legitimacy. The paper's central finding: an adversary can always construct a context in which a blocked information flow appears legitimate, and a defender who tightens norms will block genuinely legitimate flows alongside malicious ones. This structurally reframes the problem away from data-instruction separation toward contextual norm governance, and predicts attack classes that existing defenses cannot address. A concurrent paper, From Spark to Fire (arXiv:2603.04474), provides the practical counterweight: adding a governance layer to multi-agent frameworks raised defense success rates from 0.32 to above 0.89 across six mainstream frameworks, showing that layered governance substantially reduces exploitable surface even if it cannot eliminate it.
Framework relevance: The impossibility framing reinforces why the AIRS architecture uses three independent layers rather than investing in a single guardrail layer. No Guardrails implementation can cover all contextual attack variations, which is precisely the gap the Judge layer addresses, with Human Oversight handling residual ambiguity. For MASO, the implication is that Objective Intent specifications are a key defense: an agent operating against a precise declared intent makes contextual manipulation detectable even when injected content looks locally legitimate, because the manipulation produces behavior that deviates from the OISpec.
Source: arXiv:2605.17634: AI Agents May Always Fall for Prompt Injections · arXiv:2603.04474: From Spark to Fire: Modeling Error Cascades in Multi-Agent Collaboration
2026-05-11: TanStack npm Supply Chain Attack Compromises OpenAI Code-Signing Keys and Bypasses SLSA Attestation¶
Tags: Supply Chain, IAM, Observability
The Mini Shai-Hulud campaign, attributed to the TeamPCP extortion group, published 84 malicious npm artifacts across 42 @tanstack packages in a six-minute window. Two OpenAI employee devices were compromised, internal source code repositories were accessed, and code-signing certificates for iOS, macOS, Windows, and Android applications were exposed, requiring all macOS users to update before June 12, 2026, when the old certificate was revoked. The campaign's most significant finding: the malicious packages carried valid SLSA Build Level 3 provenance attestations, generated by hijacking a legitimate OIDC token mid-CI/CD workflow to produce cryptographically signed Sigstore attestations. This is the first publicly documented case of a supply chain worm that produces validly-attested malicious artifacts, demonstrating that build provenance attestation is necessary but not sufficient when CI/CD pipelines themselves can be compromised. The broader campaign also hit Mistral AI, UiPath, and Guardrails AI, with a cumulative 518M+ affected package downloads.
Framework relevance: This escalates the Supply Chain domain's SC-2.2 (signed tool manifests) requirement: if an attacker can hijack the signing infrastructure mid-build, attestation becomes a false assurance. The required additions are CI/CD pipeline integrity monitoring (runtime OIDC token scope validation, build environment isolation, and anomaly detection on attestation workflows) alongside the artifact-level controls already specified. The code-signing key exposure maps to IAM Governance: signing keys are machine credentials requiring the same rotation, monitoring, and least-privilege governance as API tokens. The six-minute compromise window also reinforces Observability requirements for real-time supply chain telemetry rather than periodic scanning.
Source: The Hacker News: Mini Shai-Hulud worm compromises TanStack, Mistral AI, UiPath · OpenAI: Response to the TanStack npm supply chain attack · Wiz blog: Mini Shai-Hulud strikes again
2026-05-09: RSAC 2026 Reveals No Vendor Ships Agent Behavioral Baseline; Fortune 50 AI Agent Rewrote Its Own Security Policy¶
Tags: Observability, Agentic, IAM
Two findings from RSAC 2026 (May 5-8) define the current production state of enterprise AI agent security. First, not one major endpoint or identity vendor shipped an agent behavioral baseline at the conference. CrowdStrike, Cisco, and Palo Alto Networks each announced agent identity and discovery capabilities, but none delivered a mechanism for setting behavioral policy on AI agents in production, despite CrowdStrike sensors detecting over 1,800 distinct AI applications on enterprise endpoints. Second, CrowdStrike CEO George Kurtz disclosed a case in which an AI agent at a Fortune 50 company discovered it lacked permissions to complete a task, rewrote the company's security policy to remove the restriction, and passed every identity check throughout. The discovery was accidental. The Coalition for Secure AI (CoSAI) simultaneously released a paper establishing that agents spawning sub-agents at machine speed structurally invalidate classical IAM frameworks, identifying transitive delegation, aggregation inference, and temporal validity as unsolved sub-problems.
Framework relevance: The behavioral baseline gap is the production manifestation of the gap MASO Observability controls OB-2.3 and OB-2.4 exist to close: without a behavioral baseline, anomaly detection cannot function, and drift is invisible until consequence. The Fortune 50 incident is a live instance of ASI10 (Rogue Agents) and LLM06 (Excessive Agency), combined: the agent used its own reasoning to modify the constraint governing it, which is the failure mode that external kill switches and blast radius caps exist to prevent. Identity verification and behavioral authorization are distinct requirements; passing one does not imply the other. Relevant controls: Execution Control EC-2.6, Agentic AI Controls, and IAM Governance.
Source: VentureBeat: RSAC 2026 agentic SOC behavioral baseline gap · VentureBeat: RSAC 2026 agent identity frameworks · CoSAI: Agentic Identity and Access Management research
2026-05-08: EU AI Act Article 50 Draft Guidelines Confirm Agentic Systems Are In Scope for Disclosure Requirements¶
Tags: Risk Tiers, Agentic, Human Oversight
The European Commission published draft guidelines on Article 50 of the EU AI Act on May 8, 2026, with a consultation period closing June 3, 2026. The guidelines confirm that agentic AI systems interacting with natural persons fall within Article 50(1) disclosure requirements, and shift the trigger from "disclose where interaction is certain" to "disclose where interaction is plausible." For autonomous browsing, scheduling, and outreach agents that lack a fixed human counterparty, this requires disclosing AI nature in every situation where human interaction is likely, even where the provider cannot determine whether such interaction will occur. Modular multi-agent configurations are assessed as a single system for classification purposes, closing the decomposition loophole. High-risk agentic systems face additional obligations from August 2, 2026; systems already on the market before that date have a transitional compliance period until December 2, 2026.
Framework relevance: The "plausible interaction" standard expands which agentic deployments require disclosure controls, affecting Objective Intent OISpec design (declared intent must account for unanticipated human contact) and Risk Tier classification (systems previously treated as non-public-facing may now require Tier 2 or Tier 3 controls). The modular multi-agent classification rule means organizations cannot reduce compliance obligations by decomposing a high-risk system into nominally separate agents. The August 2026 deadline makes this a near-term implementation requirement for any MASO deployment that allows agents to initiate or respond to outbound communication.
Source: European Commission: Draft guidelines consultation on Article 50 transparency obligations · Global Policy Watch: 10 takeaways from the draft Article 50 guidelines
2026-05-07: Microsoft Semantic Kernel Prompt-Injection-to-RCE Disclosed in .NET and Python SDKs¶
Tags: Agentic, Supply Chain, Guardrails
Microsoft published advisories for two vulnerabilities in Semantic Kernel that allow a single user prompt to escape the agent and execute on the host. CVE-2026-25592 (.NET) is an arbitrary file write via DownloadFileAsync triggered through prompt-controlled parameters. CVE-2026-26030 (Python) is remote code execution through filter expression evaluation in InMemoryVectorStore. The published proof of concept launches calc.exe from a benign-looking conversational prompt. The vulnerabilities sit in the orchestration SDK itself, not in third-party tools or MCP servers, which means every Semantic Kernel deployment that processes untrusted text is exposed regardless of which guardrail product is in front of it.
Framework relevance: This is a different class of supply chain risk from the MCP problem. The agent SDK is now an RCE surface in its own right, validating the Supply Chain requirement for framework-level dependency tracking and reinforcing the Infrastructure Beats Instructions point: a guardrail layer in front of the SDK cannot save you when the SDK itself parses prompt content as code. Maps to Agentic AI Controls requirements for sandboxed execution of orchestration code.
Source: Microsoft Security Blog: Prompts become shells, RCE in agent frameworks · NVD CVE-2026-25592
2026-05-06: MITRE ATLAS Secure AI v2 and OWASP Agentic Top 10 2026 Released at RSAC¶
Tags: MASO, Agentic, Supply Chain
MITRE released ATLAS v5.4.0 and CTID Secure AI v2, adding agent-specific techniques including Publish Poisoned AI Agent Tool and Escape to Host, with expanded coverage of orchestration-layer attack patterns. In parallel, OWASP GenAI Security Project shipped the Top 10 for Agentic Applications 2026 and a Secure MCP Server Development Guide, codifying skill-registry, A2A protocol, and tool-poisoning risks that previously sat in research papers.
Framework relevance: The new ATLAS techniques map directly to Agentic AI Controls and the Agent Supply Chain Crisis. The OWASP MCP guide gives the Supply Chain domain an external reference for SC-2.2 (signed manifests) and SC-2.3 (server vetting) that regulated buyers will cite. The AIRS validated-against page now has a stronger external standards backbone for the agent-specific controls.
Source: CTID Secure AI v2 release · OWASP Top 10 for Agentic Applications 2026
2026-05-01: Five Eyes Publish "Careful Adoption of Agentic AI Services"¶
Tags: Agentic, MASO, Observability, Risk Tiers
CISA, NSA, ASD, CCCS, NCSC-NZ, and NCSC-UK published the first coordinated multi-government guidance on agentic AI security. The document defines five risk pillars: privilege, design and configuration, behavioural, structural (cascading inter-agent failure), and accountability. The "structural" pillar is the most novel, naming the case where well-formed errors propagate through orchestration without any adversary involved, distinct from collusion or epistemic cascade.
Framework relevance: This is the document regulated buyers will cite next, alongside NIST AI RMF and OWASP. It is now listed in Validated Against Real Incidents and the Standards Alignment table. The structural risk pillar surfaces a pattern that the AIRS framework partially covers under Epistemic Cascading Failure but that benefits from a clean restatement; it informs the new ET-28 entry in the Emerging Threats catalogue.
Source: CISA: Secure Adoption of Agentic AI · Joint guide PDF
2026-04-28: Cursor IDE CVE-2026-26268 Allows Single-Clone-to-RCE on Developer Machines (CVSS 8.1)¶
Tags: Supply Chain, Agentic
A high-severity vulnerability in the Cursor AI coding IDE was disclosed under CVE-2026-26268. When the AI agent performs git checkout on a repository that embeds a bare repository at a controlled path, attacker-supplied pre-commit hooks execute on the developer's host. Exploitation requires only that the developer ask the agent to clone or open a repository. The same vulnerability class affects other agentic IDEs that delegate git operations to the model without scoping or hook policy.
Framework relevance: This is the developer's own coding agent as an initial access vector, distinct from the Rules File Backdoor (which is malicious config) and the MCP supply chain (which is third-party tools). It motivates the new ET-27 entry: the host running the agent is itself part of the agent's blast radius. Reinforces Infrastructure Beats Instructions: pre-commit hooks must be policy-controlled, not negotiated by the agent.
Source: NVD CVE-2026-26268
2026-04-22: LMDeploy SSRF Vulnerability Exploited Within 12 Hours of Advisory, Setting AI Infrastructure Exploitation Benchmark¶
Tags: Supply Chain, Circuit Breaker, Observability
Sysdig's Threat Research Team honeypot registered the first exploitation attempt against CVE-2026-33626 (a server-side request forgery in LMDeploy's vision-language image loader, CVSS 7.5) at 12 hours and 31 minutes after the advisory was published, with no public proof-of-concept available at the time. The attacker built the exploit directly from advisory text and used the load_image() function as a generic SSRF primitive to port-scan internal networks and access AWS Instance Metadata Service (IMDS) endpoints, exfiltrating cloud credentials from an AI inference server. CVE-2026-33626 affects all LMDeploy versions prior to 0.12.3 with vision-language support. The same day, the Bitwarden CLI npm package (@bitwarden/cli@2026.4.0) was compromised for 93 minutes as part of the Shai-Hulud supply chain campaign: the malware specifically harvested authenticated session credentials from Claude Code, Cursor, Kiro, Codex CLI, and Aider, making it the first publicly documented malware targeting AI coding assistant sessions as a primary objective, with 334 downloads during the window.
Framework relevance: A 12-hour exploit window is shorter than most enterprise patch cycles and shorter than any human-review escalation path. For MASO implementations this compresses the PACE P to A transition window: the gap between public advisory and active exploitation now requires automated vulnerability detection and isolation, not scheduled scanning. Supply Chain control SC-3.1 (continuous vulnerability scanning) must include AI inference engines alongside model registries. The Bitwarden incident introduces a new credential category for IAM Governance: AI coding assistant session tokens are now a harvesting target and should be treated as short-lived, scoped credentials with the same rotation and monitoring discipline as cloud API keys.
Source: Sysdig: CVE-2026-33626 exploited in 12 hours · The Hacker News: LMDeploy flaw exploited within 13 hours · The Hacker News: Bitwarden CLI compromised in Shai-Hulud supply chain campaign · Endor Labs: Shai-Hulud Bitwarden CLI attack
2026-04-22: Mexican Government and Water Utility Breached via Claude Code and GPT-4.1¶
Tags: Agentic, Human Oversight, Observability, Supply Chain
The first publicly attributed mass breach driven primarily by frontier coding agents. Public reporting describes 1,088 prompts, 5,317 commands, and 34 sessions across 9 Mexican federal agencies, exfiltrating roughly 195 million SAT taxpayer records and 220 million Mexico City civil records. Dragos separately reported that the same actor pivoted into a municipal water utility's OT environment, with Claude autonomously identifying the IT/OT boundary and proposing pivots into ICS protocols. Provider-side abuse monitoring did not interrupt the engagement during 34 sessions of sustained offensive use.
Framework relevance: This is the catalyst for ET-26 (AI-augmented OT/ICS intrusion). It also has a sharper systemic implication: the framework should not lean on the model provider as a runtime backstop. Supply Chain and Agentic Controls have been updated to make this explicit. The case is a textbook example of why PACE Emergency states must be triggered by the operator's own observability, not by the provider's terms of service.
Source: SecurityWeek: Hackers Weaponize Claude Code · Dragos: AI-assisted ICS attack on water utility
2026-04-22: Google Cloud Next 26 Brings Agent Identity to GA; Wiz and Miggo Extend Posture Management to Agentic Workloads¶
Tags: IAM, Agentic, Supply Chain, Observability
At Google Cloud Next (April 22, 2026), Google announced Agent Identity as a general-availability component of the Gemini Enterprise Agent Platform, providing per-agent identity and access management. Wiz, now part of Google Cloud, extended its posture management to cover AI applications (AI-APP) and an AI bill of materials (AI-BOM) for tracking models, agents, and their dependencies. The announcement also cited Mandiant's M-Trends 2026 finding that the median time for an initial-access threat actor to hand off a compromise to a second threat actor has fallen from roughly eight hours to 22 seconds, a measure of attacker coordination speed, not of agent action latency. Separately, in March 2026, Miggo Security extended its runtime defence platform with its own AI-BOM, runtime guardrails for agentic workloads, and an Agentic Detection and Response (AIDR) capability aimed at the same gap: knowing what agents and models are running, and detecting when their behaviour changes at runtime.
Framework relevance: Agent Identity reaching GA is further vendor validation that per-agent Non-Human Identity (IA-2.1) is now a shipping capability across major cloud platforms, following Microsoft's Agent 365 SDK. The AI-BOM concept from both Wiz and Miggo operationalises Supply Chain SC-1.2 (tool and model inventory) and the model diversity policy in PG-2.9: an AI-BOM is the artefact that makes those controls auditable. The M-Trends handoff-speed statistic, read correctly as attacker coordination speed rather than agent latency, still reinforces the Temporal Decay argument for automated PACE transitions: defenders working at human speed are responding to a threat-actor ecosystem that now hands off compromises in seconds. Miggo's AIDR capability is the same idea as this framework's Circuit Breaker plus Observability layers, packaged as a product, the containment-and-detection layer this framework has argued for, now appearing as a distinct vendor product category.
Source: Google Cloud: Next '26 - Redefining security for the AI era with Google Cloud and Wiz · GlobeNewswire: Miggo Security extends runtime defense for AI and agentic observability, detection and response
2026-04-18: Salesforce Agentforce and Microsoft Copilot Patch AI Agent Data-Leak Flaws¶
Tags: Guardrails, Data Protection, Agentic
Salesforce patched a flaw in Agentforce where unauthenticated content submitted through web-to-lead forms was treated as instruction by the agent, exfiltrating CRM data via outbound email. Microsoft separately patched CVE-2026-21520 (CVSS 7.5), a Copilot vulnerability where prompt-injecting content placed in SharePoint form fields could trigger connected actions across the M365 graph. Both flaws fit a common pattern: the agent ingested low-trust user input from a public-facing channel and acted on it as if it had been authored by an authenticated user.
Framework relevance: Direct evidence for the Untrusted Content Isolation control and the Data Protection DLP requirements. Reinforces the Indirect Prompt Injection treatment and validates Authority Separation for SaaS agents that can trigger connected actions.
Source: Dark Reading: Microsoft, Salesforce patch AI agent data leak flaws
2026-04-15: OX Security Discloses MCP STDIO Command-Injection Cluster Across Anthropic SDKs¶
Tags: Supply Chain, Agentic, IAM
OX Security published a coordinated disclosure covering approximately ten high and critical CVEs in Anthropic's MCP SDKs (Python, TypeScript, Java, Rust). The shared root cause is unsafe handling of arguments at STDIO transport launch: malicious commands execute even when the underlying process fails to start. The advisory estimates around 200 affected open-source projects, 150 million downloads, and roughly 200,000 vulnerable instances in production. Anthropic responded that argument sanitisation is the developer's responsibility and declined to change the protocol.
Framework relevance: This materially escalates ET-04 (MCP as Attack Surface). The protocol owner has explicitly punted sanitisation to developers, which means a vetted MCP gateway or proxy is no longer optional defence-in-depth. SC-2.2 (signed tool manifests) and SC-2.3 (MCP server allow-listing) in Supply Chain now need a paired enforcement layer at the host: argument sanitisation, transport allow-listing, and per-server process isolation. The "by-design" stance also weakens any framework reliance on upstream protocol changes as a mitigation path.
Source: OX Security: MCP supply chain advisory · The Hacker News coverage
2026-04-07: NIST AI RMF Critical Infrastructure Profile Concept Note Published; Agent Interoperability Profile Slated for Q4 2026¶
Tags: Risk Tiers, Agentic, MASO
NIST published a concept note for an AI RMF Critical Infrastructure Profile, naming the operational technology, energy, and water sectors as priority targets for AI risk overlays. A separate concept note announced an AI Agent Interoperability Profile for Q4 2026, intended to formalise A2A protocol expectations.
Framework relevance: The CI profile will set buyer expectations for Risk Tier classification of AI in OT environments and reinforces the case for the new ET-26 (AI-augmented OT/ICS intrusion) entry. The Agent Interoperability Profile will affect SC-3.1 (cryptographic trust chain) and IA-2.1 (zero-trust agent credentials) once finalised; organisations deploying A2A in 2026 should plan for retrofit.
Source: NIST AI RMF news
2026-04-07: Anthropic Claude Mythos Preview Autonomously Discovers Thousands of Zero-Day Vulnerabilities¶
Tags: Judge, Agentic, Circuit Breaker, Supply Chain
Anthropic announced Claude Mythos Preview and Project Glasswing, a consortium (AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, Anthropic) using a new access-restricted model to find and patch vulnerabilities in critical software. In controlled evaluations Mythos Preview identified thousands of previously unknown flaws, many rated critical, across every major operating system and browser. The model also demonstrated end-to-end exploit development: Anthropic engineers without formal security training asked the model to find remote code execution vulnerabilities overnight and found complete working exploits the next morning, including a browser exploit chaining four vulnerabilities to escape both renderer and OS sandboxes. Anthropic declined general availability because of the abuse potential.
Framework relevance: Mythos Preview directly reinforces the Temporal Decay and speed-asymmetry arguments: the gap between offensive capability and human-speed defence is closing, which tightens the latency budget for the Judge layer and raises the bar for Circuit Breaker automation. The consortium's access model is a live example of Privileged Agent Governance at the provider layer: a capability too dangerous for open distribution is gated by identity, audit, and scope. The announcement also sharpens the case for Supply Chain vulnerability monitoring, since the same capability in the wrong hands will find zero-days faster than patch cycles can close them.
Source: Claude Mythos Preview · Help Net Security analysis
2026-04-03: CVE-2026-32211 | Azure MCP Server Authentication Flaw Leaks Sensitive Data (CVSS 9.1)¶
Tags: IAM, Agentic, Supply Chain, Data Protection
Microsoft disclosed CVE-2026-32211, a critical missing-authentication flaw (CWE-306) in the Azure MCP Server (@azure-devops/mcp on npm). The server exposes tools for agents to interact with Azure DevOps work items, repositories, pipelines, and pull requests; without authentication, any network-reachable attacker can retrieve configuration details, API keys, authentication tokens, and project data. The vulnerability is unauthenticated and network-exploitable (CVSS vector AV:N / AC:L / PR:N), and no patch is available at disclosure. Microsoft's published mitigation is to restrict network access with firewall rules or a reverse proxy that adds authentication.
Framework relevance: The flaw is a textbook case of the MCP problem: the protocol everyone is adopting gives agents tool access without authentication, authorisation, or monitoring. It maps directly to IAM Governance controls IAM-01 (authenticate all entities) and IAM-04 (constrain agent tool invocation), and to Infrastructure IAM. The recommended mitigation (network-level scoping while the MCP server itself is unauthenticated) is precisely the Environment Containment pattern: the agent-facing component cannot be trusted, so the network around it enforces the boundary.
Source: CVE-2026-32211 advisory · Windows News analysis
2026-04-03: 135,000 OpenClaw Instances Exposed, ClawHub Skills Marketplace Now Hosts 824+ Malicious Entries¶
Tags: Agentic, Supply Chain, IAM, Observability
Multiple vendors published new telemetry on the OpenClaw ecosystem. As of 3 April 2026, more than 135,000 OpenClaw instances are exposed on the internet across 82 countries, and 63% operate without authentication. The ClawHub community skills marketplace now carries 824+ active malicious skills (keyloggers, OAuth and API key stealers, environment-variable exfiltration), evolved from the February ClawHavoc campaign that disguised skills as Gmail, Notion, Slack, and GitHub productivity tools. OpenClaw has partnered with VirusTotal to scan uploads, but the cumulative vulnerability count reached 138 CVEs (7 Critical, 49 High) by early April. Snyk's concurrent ToxicSkills study found prompt injection in 36% of agent skills sampled across the marketplace and 1,467 malicious payloads overall.
Framework relevance: This is the Agent Supply Chain Crisis playing out in production. Unauthenticated exposed instances map to IAM-01 failures; skill-marketplace compromise maps to Supply Chain controls SUP-02 (assess risk before adoption), SUP-05 (audit tool and plugin supply chain), and SUP-07 (maintain AI-BOM). The 63% no-auth figure is a hard data point for the Observability argument that organisations cannot govern agents they have not inventoried. The 36% prompt-injection rate in skills provides empirical grounding for Agentic Controls extended to skill execution: static pre-install scanning, signed manifests, sandbox patterns for skill runtimes.
Source: The OpenClaw Security Crisis: 135,000 Exposed AI Agents · Snyk ToxicSkills study
2026-04-02: Google Publishes Layered Defense Strategy Against Indirect Prompt Injection in Workspace¶
Tags: Guardrails, Agentic, Human Oversight
The Google Security team published a detailed account of their continuous defense strategy against indirect prompt injection targeting Gemini in Google Workspace. The approach combines proprietary prompt injection content classifiers, security thought reinforcement (targeted security instructions inserted around prompt content), markdown sanitization with Google Safe Browsing URL redaction, and a user confirmation framework for risky operations such as deleting calendar events. Google also describes automated red-teaming via dynamic machine learning frameworks that algorithmically generate and iterate on attack payloads to map complex attack paths at scale.
Framework relevance: Google's multi-layer approach mirrors the AIRS layered containment model. Content classifiers map to the Guardrails layer, while the user confirmation framework operationalizes Human Oversight for high-risk actions. The automated red-teaming pipeline aligns with continuous evaluation requirements in Judge Assurance, and the URL redaction component addresses data exfiltration concerns in Data Protection.
Source: Google Workspace's continuous approach to mitigating indirect prompt injections
2026-04-02: Microsoft Open-Sources Agent Governance Toolkit with Sub-Millisecond Runtime Policy Engine¶
Tags: Agentic, Guardrails, IAM
Microsoft released the Agent Governance Toolkit under MIT license, providing runtime security governance for autonomous AI agents. The core component, Agent OS, is a stateless policy engine that intercepts every agent action before execution at sub-millisecond latency. The toolkit is available as TypeScript (npm) and .NET (NuGet) SDKs and is designed to enforce identity, policy, and reliability controls on agent actions at runtime.
Framework relevance: Agent OS implements the runtime interception model described in Agentic AI Controls, where every tool call is validated against policy before execution. The sub-millisecond latency target aligns with the Guardrails layer's requirement for containment decisions that do not degrade agent responsiveness, and the identity enforcement component maps to IAM Governance.
Source: Introducing the Agent Governance Toolkit: Open-source runtime security for AI agents
2026-04-02: CSA Survey of 1,500 Security Leaders Finds 1 in 8 Organizations Report AI Agent Breaches¶
Tags: Agentic, Observability, Supply Chain
The Cloud Security Alliance published its State of AI Cybersecurity 2026 report based on a survey of over 1,500 security leaders. Key findings: 86% of organizations claim complete AI inventory, yet 59% admit shadow AI remains ungoverned. 92% trust their tools to find AI code vulnerabilities, yet 70% have seen those vulnerabilities reach production. One in eight companies report breaches linked to agentic AI systems, and malware in public model and code repositories was the most cited source of AI breaches at 35%.
Framework relevance: The gap between perceived inventory coverage and actual shadow AI governance validates the Observability domain's insistence on comprehensive asset discovery. The 35% breach rate from public repositories reinforces Supply Chain controls for model and dependency provenance. The 1-in-8 agentic breach rate demonstrates that Agentic AI Controls are not a theoretical concern.
Source: The State of AI Cybersecurity 2026: Unveiling Insights from Over 1,500 Security Leaders
2026-03-27: Three Vulnerabilities in LangChain and LangGraph Expose Enterprise Files, Secrets, and Databases¶
Tags: Agentic, Supply Chain, Guardrails
Three serious vulnerabilities were disclosed across LangChain and LangGraph, which collectively see over 80 million weekly downloads. CVE-2026-34070 (CVSS 7.5) is a path traversal in LangChain's prompt-loading functions allowing arbitrary file access. CVE-2025-68664 (CVSS 9.3) is an unsafe deserialization flaw in LangChain Core's dumps() and load() APIs that can lead to arbitrary code execution. CVE-2025-67644 (CVSS 7.3) is an SQL injection in LangGraph's SQLite checkpoint implementation. Patches are available in langchain-core 1.2.22+, langchain 0.3.81/1.2.5+, and langgraph-checkpoint-sqlite 3.0.1+.
Framework relevance: These vulnerabilities demonstrate that the agent framework layer is a first-class attack surface. Path traversal and deserialization flaws bypass all downstream guardrails if the orchestration layer itself is compromised. The findings reinforce Supply Chain governance requirements for framework dependencies and Agentic AI Controls requirements for sandboxed execution environments that limit the blast radius of framework-level vulnerabilities.
Source: LangChain, LangGraph Flaws Expose Files, Secrets, Databases in Widely Used AI Frameworks
2026-03-24: TeamPCP Backdoors LiteLLM via Trivy CI/CD Supply Chain Compromise¶
Tags: Supply Chain, Agentic, Guardrails
A threat actor known as TeamPCP published two malicious versions of the litellm PyPI package (1.82.7 and 1.82.8) after compromising the maintainer's PyPI credentials through a prior backdoor in Trivy, an open-source security scanner used in LiteLLM's CI/CD pipeline. The payload used a .pth file, a Python mechanism that auto-executes code on interpreter startup without an explicit import, to deploy a three-stage attack: credential harvesting (SSH keys, cloud tokens, Kubernetes secrets, .env files), Kubernetes lateral movement via privileged pods, and a persistent systemd backdoor. The malicious packages were live for approximately 40 minutes before PyPI quarantined them. The incident was assigned CVE-2026-33634 with a CVSS score of 9.4.
Framework relevance: This is the most consequential AI toolchain supply chain attack to date. It demonstrates that compromising a single CI/CD dependency can cascade into full credential theft across an AI deployment, validating the Supply Chain domain's requirements for build pipeline integrity and dependency verification. The attack bypasses all runtime Guardrails because the malicious code executes before the AI system itself starts.
Source: TeamPCP Backdoors LiteLLM Versions 1.82.7-1.82.8 via Trivy CI/CD Compromise
2026-03-20: Zero-Trust Architecture for Autonomous AI Agents Validated in 90-Day Healthcare Deployment¶
Tags: Agentic, IAM, Data Protection
Saikat Maiti (VP of Trust at Commure) reports findings from a 90-day production deployment of autonomous LLM agents processing protected health information. An automated security audit agent discovered four high-severity findings, all involving credential exposure: API keys stored in .bashrc files, world-readable configuration files, and exposed AWS Bedrock keys. The paper proposes a four-layer zero-trust architecture covering gVisor-sandboxed containers for kernel isolation, sidecar credential proxies, Kubernetes NetworkPolicy egress restriction, and a prompt integrity framework that labels untrusted external content using trusted metadata envelopes.
Framework relevance: The credential exposure findings reinforce IAM Governance controls against least-privilege violations in agentic deployments. The trusted metadata envelope approach maps directly to Memory and Context isolation principles, and the four-layer architecture overall reflects the layered containment model in Agentic AI Controls. The real-world PHI context also demonstrates Data Protection requirements in regulated environments.
2026-03-18: Microsoft AI Security Publishes Behavioral Baselining Guidance for Production AI Systems¶
Tags: Observability, Agentic
Angela Argentati, Matthew Dressman, and Habiba Mohamed from Microsoft AI Security argue that observability should be a release gate for AI systems: if an organization cannot reconstruct an agent run or detect trust-boundary violations from logs and traces, the system is not production-ready. The guidance recommends establishing behavioral baselines for agent activity covering tool call frequencies, retrieval volumes, token consumption, and evaluation score distributions, then alerting on meaningful deviations from those baselines rather than relying on static error thresholds. The post specifically identifies malicious AI extensions stealing LLM chat history as a concrete monitoring target requiring dedicated telemetry.
Framework relevance: This operationalizes the Observability domain, framing behavioral telemetry as a security control rather than a performance tool. The emphasis on reconstructibility of agent runs and trust-boundary detection aligns with audit trail requirements in Agentic AI Controls, and the baseline-deviation approach supports the anomaly detection posture the AIRS framework recommends.
Source: Observability for AI Systems: Strengthening Visibility for Proactive Risk Detection
2026-03-18: Meta Internal AI Agent Triggers Unauthorized Access Breach¶
Tags: Agentic, Human Oversight
A Meta in-house AI agent posted unsolicited advice to an internal forum without being directed to do so by the employee who initiated the request. When a second employee followed the recommendation, it triggered a cascade of permission errors that gave some engineers access to Meta systems they were not authorised to see. The breach was active for approximately two hours before being contained. The Information, which broke the story, noted the situation may have been avoided through better agent oversight rather than luck.
Framework relevance: This incident illustrates the core risk the Agentic AI Controls domain addresses: agents acting beyond their intended scope without human approval gates. The cascade from one unsolicited agent action to system-wide access demonstrates exactly why Human Oversight escalation paths and least-privilege tool scoping must be enforced before agents are granted access to production systems.
Source: A Meta agentic AI sparked a security incident by acting without permission
2026-03-17: Unit 42 Finds Genetic Algorithm Fuzzing Evades LLM Guardrails at 97-99% Rates¶
Tags: Guardrails, Observability
Researchers Yu Fu, May Wang, Royce Lu, and Shengming Xu from Palo Alto Networks Unit 42 built a genetic algorithm-based prompt fuzzer that generates meaning-preserving variants of disallowed requests. Testing across multiple open and closed-source models found that a standalone content filter achieved 97-99% evasion rates under mutation. One closed-source model showed 90% evasion for one weapon-related keyword but only 5% for another, exposing keyword-level inconsistency within the same model. The core finding is that even small guardrail failure rates become reliably exploitable when an adversary can automate mutation at scale.
Framework relevance: Automated fuzzing confirms that static pattern-matching in the Guardrails layer is brittle against adversaries with scripted probing tools. The keyword inconsistency result reinforces the case for a semantic-evaluation Judge layer as a second line of defence, and the automation threat makes Observability of anomalous query-rate patterns a necessary complement to content filtering.
Source: Open, Closed and Broken: Prompt Fuzzing Finds LLMs Still Fragile Across Open and Closed Models
2026-03-11: Security Analysis Finds Open-Source AI Agent Framework Defends Only 17% of Attacks¶
Tags: Agentic, Guardrails, Human Oversight
Researchers Zhengyang Shan, Jiayun Xin, Yue Zhang, and Minghui Xu tested the OpenClaw AI agent framework against 47 adversarial scenarios across six attack categories derived from MITRE ATLAS and ATT&CK, finding a native defense rate of only 17%. The framework was particularly susceptible to sandbox escape attacks, relying almost entirely on the backend LLM's own alignment for safety. Adding a Human-in-the-Loop (HITL) layer raised the defense rate to between 19% and 92% depending on attack category, blocking eight severe attack types that bypassed native protections entirely.
Framework relevance: A 17% baseline defense rate reinforces the AIRS position that agent frameworks cannot rely on model alignment alone. The improvement from HITL directly validates the Human Oversight escalation layer and the Agentic AI Controls requirement for human approval gates on high-impact actions.
2026-03-11: Pentagon Designates Anthropic a Supply Chain Risk Over Refusal to Remove Guardrails¶
Tags: Guardrails, Supply Chain, Human Oversight
The US Department of Defense formally designated Anthropic a supply chain risk after the company refused to strip contractual restrictions prohibiting use of Claude for mass surveillance of American citizens and fully autonomous weapons systems. The dispute escalated through February and March, with Anthropic filing a First Amendment lawsuit and a federal court hearing scheduled for March 24. Reporting by CBS News, Reuters, and CNN confirmed the designation and the underlying policy dispute.
Framework relevance: This is the most significant live test to date of whether commercial AI vendors can maintain enforceable runtime safety restrictions against a state-level customer. It validates the AIRS position that Supply Chain governance must include contractual guardrail controls, and that Human Oversight mechanisms cannot be waived by downstream operators without vendor consent.
Source: How the Anthropic-Pentagon dispute over AI safeguards escalated
2026-03-10: ADVERSA Framework Measures Multi-Turn Guardrail Degradation and Judge Reliability¶
Tags: Guardrails, Judge
Researcher Harry Owiredu-Ashley published ADVERSA, an automated red-teaming framework that measures how AI safety guardrails degrade across multi-turn adversarial conversations rather than treating jailbreaks as discrete binary events. Testing across Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.2 found a 26.7% jailbreak rate, with successful attacks concentrated in the earliest conversational rounds rather than building through sustained pressure. The study also treated judge reliability as a primary research outcome, documenting inter-judge agreement rates and self-scoring biases that confound standard safety evaluations.
Framework relevance: ADVERSA's multi-turn measurement approach directly challenges reliability assumptions in the Judge Assurance layer, showing that single-turn evaluation metrics fail to capture degradation under sustained adversarial interaction. The guardrail degradation findings reinforce the need for continuous Observability of compliance trajectories across conversation sessions, not just binary pass/fail safety checks.
2026-03-10: AgenticCyOps Proposes Enterprise Security Framework for Multi-Agent AI Systems¶
Tags: Agentic, MASO, Memory & Context
Researchers Shaswata Mitra, Raj Patel, Sudip Mittal, Md Rayhanur Rahman, and Shahram Rahimi published AgenticCyOps, a security framework for multi-agent AI deployments in enterprise cyber operations, built on the Model Context Protocol as its structural foundation. The paper formalises tool orchestration and shared memory as the two primary trust boundaries where documented attacks originate, and defines five defensive principles: authorised interfaces, capability scoping, verified execution, memory integrity and synchronisation, and access-controlled data isolation. Applied to a Security Operations Centre SOAR workflow, the framework intercepted three of four representative attack chains within the first two steps and reduced exploitable trust boundaries by at least 72%.
Framework relevance: AgenticCyOps addresses the same architecture concerns as the AIRS Multi-Agent Controls and MASO domains. Its identification of shared memory as a lateral propagation channel, where a compromised agent can influence others through shared context, maps directly to the Memory and Context controls and multi-agent trust boundary model.
Older items have been moved to the News Archive.