Anticipated Changes to AI and Framework: MASO 2.0¶
Where AI is heading, where the framework breaks, and what MASO 2.0 must address.
Purpose¶
This document identifies the AI capability trajectories that will stress or break the current framework, assesses each against existing controls, and defines the architectural changes required for MASO 2.0. It is forward-looking by design, grounded in observable trends but projecting beyond the current threat landscape.
Evolution Vectors¶
Six capability trajectories will reshape the framework's operating environment. Each is assessed for timeline, confidence, and impact on current controls.
1. The Judge Ceiling¶
Trajectory: Primary models are getting more capable faster than Judge models can keep up. The AISI data already shows R² = 0.097 between capability and safeguard robustness, and that black-box monitors lose accuracy on harder tasks, exactly where accurate evaluation matters most.
Where the framework breaks:
The framework's second layer depends on Model-as-Judge providing independent evaluation. The current mitigation (use a different model family) rests on the assumption that different families have different blind spots. But model families are converging on similar architectures and training approaches. As this convergence continues, "model independence" provides diminishing assurance.
| Timeline | What Happens | Framework Impact |
|---|---|---|
| Now | Judge catches most violations; misses subtle, domain-specific, and novel attacks | Acceptable: human sampling compensates |
| 12–18 months | Primary models substantially exceed Judge capability in complex domains | Judge becomes a false assurance layer: catches easy cases, misses hard ones |
| 24–36 months | Human reviewers can no longer meaningfully evaluate primary model output in specialized domains | Ground truth calibration collapses: neither Judge nor humans can verify |
MASO 2.0 response:
| Change | Description | Priority |
|---|---|---|
| Verifiable action constraints | Shift from "evaluate whether the output is safe" to "prove the action was within declared bounds." Formal verification of action-space compliance rather than LLM evaluation of output quality. | Critical |
| Evidence-based reasoning | Require agents to provide verifiable evidence chains for consequential claims. Judge evaluates the evidence, not the conclusion, a narrower, more tractable evaluation task. | High |
| Domain-specific verification oracles | For high-risk domains (medical, legal, financial), integrate deterministic verification systems (drug interaction databases, regulatory rule engines, financial limit checks) that don't depend on LLM evaluation. | High |
| Ensemble Judge with disagreement detection | Multiple Judge models from different families; flag when they disagree rather than relying on any single evaluation. Disagreement itself is the signal. | Medium |
2. Human Oversight Becomes Untenable¶
Trajectory: Three forces converge to undermine the framework's third layer.
- Speed: Agent execution outpaces human review capacity. Thousands of concurrent agents generate review volumes no team can absorb within SLA.
- Complexity: Agent tasks become too sophisticated for human reviewers to meaningfully evaluate. A reviewer approving a complex multi-step agent workflow is rubber-stamping by necessity, not negligence.
- Economics: Human review ($5–50 per review) scales linearly while automation value scales superlinearly. Organizations will reduce human review under business pressure.
Where the framework breaks:
The framework treats HITL as a load-bearing control, the third layer that catches what guardrails and Judge miss. The anti-rubber-stamping controls (canary cases, time tracking, volume limits) detect when reviewers aren't reviewing, but cannot solve the problem of reviewers who genuinely lack the capacity or expertise to evaluate what they're seeing.
| Current Assumption | Reality at Scale |
|---|---|
| Humans review edge cases | Edge cases outnumber reviewers |
| Humans calibrate the Judge | Humans can't evaluate what the Judge is evaluating |
| Humans make final decisions | Decisions are too fast, too numerous, too complex |
| Review SLAs are achievable | 1-hour SLA × 10,000 concurrent agents = impossible |
MASO 2.0 response:
| Change | Description | Priority |
|---|---|---|
| Redefine human role: governance, not review | Humans set policy, define constraints, review aggregate patterns, and handle exceptions, not individual transaction review. Shift from "human reviews action" to "human defines the action space." | Critical |
| Automated escalation triage | AI-powered triage that routes only genuinely ambiguous cases to humans. Reduce human review volume by 90%+ while maintaining coverage of truly novel situations. | High |
| Outcome-based oversight | Instead of pre-action approval, audit outcomes against declared intent. Statistical sampling of completed workflows replaces per-action review. Catches systematic problems, accepts individual-action risk. | High |
| Human oversight SLA tiers | Explicit acknowledgment that not all systems get human review. Tier 1/2: automated oversight only. Tier 3: human oversight on exceptions. CRITICAL: human oversight on policy and aggregate patterns. | Medium |
| Institutional review boards | For the highest-risk deployments, borrow from clinical research: independent review boards that evaluate the agent deployment design, not individual agent actions. | Medium |
3. Session Boundaries Dissolve¶
Trajectory: AI is moving toward persistent, ambient, event-driven agents that operate continuously rather than in discrete sessions.
- Persistent memory: Agents retain context across interactions, blurring where one "session" ends and another begins.
- Multi-agent workflows: A single user task may span dozens of agents, hours of elapsed time, and multiple re-entries.
- Ambient AI: Background agents triggered by events (new email, calendar change, system alert) rather than explicit user requests, with no clear "session start."
Where the framework breaks:
The session-level intent analysis (§6, added in v0.9.0) assumes sessions are discrete, bounded units with a declared intent at the start. Persistent agents don't have session boundaries. Ambient agents don't have declared intent; they respond to triggers. An attack spanning two weeks across hundreds of micro-interactions with a persistent agent produces no single "session" to analyze.
MASO 2.0 response:
| Change | Description | Priority |
|---|---|---|
| Continuous behavioral streams | Replace session-based analysis with continuous behavioral modeling. Treat the agent's entire operational history as a single stream with rolling analysis windows (1h, 24h, 7d, 30d). | Critical |
| Intent inheritance for ambient agents | Ambient agents inherit intent constraints from their deployment configuration, not from a per-session declaration. "This agent handles calendar scheduling" IS the declared intent, evaluated continuously, not per-trigger. | High |
| Memory integrity as a first-class control | As agents become persistent, their memory becomes an attack surface that persists across sessions. Memory integrity verification (checksums, anomaly detection on memory content, memory decay policies) elevates from MASO extension to core control. | High |
| Temporal attack detection | Detect slow-burn attacks that unfold over days or weeks by correlating behavioral signals across time windows that exceed any single "session." UEBA-style temporal profiling applied to agent identity. | Medium |
4. Multi-Agent Emergent Behaviors¶
Trajectory: Agent systems are growing from single agents and simple orchestrations toward complex, dynamic multi-agent ecosystems: agents hiring agents, marketplace-style tool discovery, runtime composition.
Where the framework breaks:
MASO's 128 controls assume predictable interaction patterns: agents communicating through defined channels with known protocols. But as agent systems become more complex, emergent behaviors become possible: two agents that individually operate within policy may produce a combined behavior that violates policy. Not because either was compromised, but because their interaction creates an unanticipated state.
This is Charles Perrow's "normal accidents" theory applied to AI: in sufficiently complex, tightly coupled systems, unexpected interactions between components produce failures that no component-level analysis would predict.
The observability controls (OB-2.1, OB-2.2) detect emergent behaviors after they manifest. The dry-run/simulation mode tests individual agents, not fleet interaction dynamics. There is no mechanism for predicting emergent behaviors before they cause harm.
MASO 2.0 response:
| Change | Description | Priority |
|---|---|---|
| Interaction graph analysis | Model the multi-agent system as a directed graph. Monitor for unexpected edge formation (new agent-to-agent communication paths), cycle detection (feedback loops), and influence concentration (single agent becoming a hub). | High |
| Fleet-level behavioral baselines | Baseline not just individual agents but the interaction patterns of the fleet. Detect when the system's aggregate behavior deviates from expected patterns, even if individual agents look normal. | High |
| Emergent behavior simulation | Adversarial simulation of agent fleet dynamics, not testing individual agents against known attacks, but testing what happens when agents interact under stress, partial failure, or adversarial inputs. Extend the 100-agent and 10K stress tests from tabletop to automated simulation. | Medium |
| Composition constraints | Limit which agents can interact with which other agents, and through what channels. Static composition (defined at deployment) rather than dynamic composition (agents discover and connect to other agents at runtime). Dynamic composition requires Tier 3+ controls. | High |
| Circuit breakers on interaction patterns | Not just per-agent circuit breakers but system-level: if the total number of inter-agent messages exceeds baseline by >Nσ, pause the orchestration. | Medium |
5. AI-vs-AI Adversarial Dynamics¶
Trajectory: Today's threat landscape has human adversaries using AI as a force multiplier. The trajectory is toward autonomous offensive AI: adversarial systems that probe, adapt, and exploit defensive controls at machine speed.
CrowdStrike already documents adversaries using generative AI for reconnaissance, social engineering, and malware development. The next step is adversarial AI that autonomously probes guardrails, maps bypass patterns, and generates novel attack variants faster than any manual update cycle can respond.
Where the framework breaks:
The framework's guardrail update cycle is human-speed: adversarial testing identifies a bypass → analyst writes a new rule → rule is deployed → repeat. An AI adversary generating and testing thousands of bypass variations per minute outpaces this cycle by orders of magnitude.
The Judge is also vulnerable: if an adversary can observe the Judge's behavior (by submitting test inputs and observing responses), it can iteratively craft inputs that pass the Judge while achieving the adversary's goal. This is gradient-free adversarial optimization, no access to model weights required.
MASO 2.0 response:
| Change | Description | Priority |
|---|---|---|
| Continuous adversarial simulation | A persistent red-team agent that continuously probes the defensive layers, finding bypasses before external adversaries do. Run internally, scoped to the organization's own systems. | High |
| Adaptive guardrails | Guardrails that update automatically based on detected attack patterns. When the behavioral anomaly detection layer identifies a new attack signature, it generates a candidate guardrail rule, validates it against a test corpus, and deploys it, human-approved but machine-speed. | High |
| Judge unpredictability | Rotate Judge evaluation strategies, prompts, and model versions on a random schedule. Prevent adversaries from optimizing against a static evaluation target. The framework already recommends rotating evaluation strategies; MASO 2.0 makes this automated and continuous. | Medium |
| Rate limiting on boundary probing | Detect and throttle inputs that appear to be systematically testing guardrail boundaries. The behavioral anomaly detection layer already identifies "repeated near-misses"; MASO 2.0 adds automated throttling as a response. | Medium |
6. Regulatory Divergence¶
Trajectory: AI regulation is accelerating globally and diverging across jurisdictions. The EU AI Act, US executive orders, China's AI regulations, India's proposed governance framework, Australia's APRA/CPS 234, these impose different and sometimes contradictory requirements.
Where the framework breaks:
The framework currently maps to ISO 42001, EU AI Act, and banking regulations. As jurisdictions diverge, a single control configuration may not satisfy all applicable regulations simultaneously. The framework's "start with 5 core documents" simplicity is at risk from jurisdiction-specific complexity.
MASO 2.0 response:
| Change | Description | Priority |
|---|---|---|
| Jurisdiction-aware control profiles | Parameterized control configurations that can be tuned per jurisdiction. Base controls remain universal; jurisdiction-specific overlays add or tighten controls as required. | Medium |
| Compliance evidence automation | Automated generation of compliance evidence packs from runtime telemetry. Reduce the cost of demonstrating compliance across multiple jurisdictions. | Medium |
| Regulatory change tracking | Systematic monitoring of regulatory changes across key jurisdictions, with impact assessment against framework controls. | Low |
The Deeper Architectural Question¶
All six vectors point to the same tension:
The framework assumes external controls can contain AI behavior. Infrastructure constraints, Judge evaluation, human oversight: all external to the model. This is correct for current-generation AI. But the capability trajectory implies a future where:
- Models are too capable for external evaluation to be reliable (Judge ceiling)
- Agents are too fast for human oversight to be meaningful (oversight scaling)
- Systems are too complex for infrastructure constraints to anticipate all failure modes (emergent behaviors)
- Adversaries operate at machine speed against human-speed defenses (AI-vs-AI)
The framework's "constrain regardless" architecture is the right response: constrain the action space, not the behavior. A database view that only exposes customer data doesn't need to understand the agent's reasoning. An API gateway with a hard spending cap doesn't need to evaluate the agent's logic.
But tighter action-space constraints reduce utility. The business case for AI agents is that they do useful work autonomously. Every constraint reduces what they can do. MASO 2.0's long-term challenge is not technical; it is finding the equilibrium between constraint and utility as capabilities scale.
This is not solvable in a document. It is a continuous organizational negotiation. What MASO 2.0 can do is make the tradeoff explicit and measurable:
| Metric | What It Measures |
|---|---|
| Constraint coverage | % of possible agent actions that are infrastructure-constrained (vs. instruction-constrained) |
| Evaluation reliability | Judge agreement rate with human ground truth, tracked over time and by task complexity |
| Oversight capacity | Ratio of review-requiring events to available reviewer capacity |
| Bypass detection latency | Time from bypass occurrence to detection, by bypass category |
| Utility impact | Task completion rate and quality under current constraint profile |
When evaluation reliability drops, tighten constraints. When oversight capacity is exceeded, shift humans to governance. When bypass detection latency grows, invest in adaptive defenses. When utility impact becomes unacceptable, accept more risk or reduce agent autonomy.
The framework's job is not to solve this tradeoff. It is to make organizations see it clearly and manage it deliberately.
MASO 2.0 Priority Roadmap¶
Phase 1: Extend Current Architecture (0–6 months)¶
| Item | Addresses | Builds On |
|---|---|---|
| Continuous behavioral streams (replace session-based analysis) | Session dissolution | Behavioral Anomaly Detection, OB-2.2 |
| Fleet-level behavioral baselines | Emergent behaviors | UEBA mapping, peer group comparison |
| Ensemble Judge with disagreement detection | Judge ceiling | Judge Assurance, model independence |
| Automated escalation triage | Oversight scaling | HITL queue design, risk-based routing |
| Composition constraints for agent interactions | Emergent behaviors | Inter-agent message bus, NHI scoping |
Phase 2: Architectural Additions (6–18 months)¶
| Item | Addresses | New Capability |
|---|---|---|
| Verifiable action constraints (formal verification of action-space compliance) | Judge ceiling | Shifts evaluation from "is the output safe?" to "was the action within bounds?" |
| Evidence-based reasoning requirements | Judge ceiling | Agents must provide verifiable evidence chains for consequential claims |
| Continuous adversarial simulation | AI-vs-AI | Persistent red-team agent finding bypasses before external adversaries |
| Adaptive guardrails with human-approved auto-deployment | AI-vs-AI | Machine-speed defense updates within human-defined policy bounds |
| Outcome-based oversight model | Oversight scaling | Audit completed workflows statistically instead of reviewing individual actions |
Phase 3: Paradigm Shifts (18–36 months)¶
| Item | Addresses | Implication |
|---|---|---|
| Human role redefined: governance over review | Oversight scaling | Fundamental change to Layer 3: humans set policy, not approve actions |
| Domain-specific verification oracles | Judge ceiling | Deterministic verification for high-risk domains, replacing probabilistic LLM evaluation |
| Emergent behavior simulation (automated fleet stress testing) | Emergent behaviors | Testing interaction dynamics, not just individual agents |
| Jurisdiction-aware control profiles | Regulatory divergence | Parameterized compliance across multiple regulatory regimes |
| Constraint-utility equilibrium metrics | All vectors | Making the fundamental tradeoff visible and manageable |
What MASO 2.0 Does NOT Attempt¶
| Out of Scope | Why |
|---|---|
| Solving AI alignment | The framework constrains agent behavior at the infrastructure level. It does not attempt to solve the alignment problem at the model level. |
| Predicting specific capability jumps | A sudden capability discontinuity (like GPT-3 → GPT-4) could render current controls inadequate overnight. The framework prepares for this by making controls adjustable, not by predicting when it will happen. |
| Eliminating all risk | The framework's honest position since v0.1.0: you cannot prevent all bypasses. MASO 2.0 maintains this: the goal is to make failures detectable, containable, and recoverable. |
| Replacing organizational judgment | The constraint-utility tradeoff is an organizational decision, not a technical one. MASO 2.0 provides the data and the levers. The organization decides where to set them. |
Relationship to Current Framework¶
MASO 2.0 is an evolution, not a replacement. The three-layer model (Guardrails → Judge → Human Oversight) remains the foundation. What changes is how each layer operates as AI capabilities scale:
| Layer | MASO 1.x | MASO 2.0 |
|---|---|---|
| Guardrails | Static rules, periodic updates | Adaptive rules, continuous adversarial testing, auto-update within policy bounds |
| Judge | Single LLM evaluation, async | Ensemble evaluation, sync pre-action for elevated risk, formal verification for action-space compliance |
| Human Oversight | Transaction review, approval queues | Governance and policy, outcome auditing, exception handling, institutional review |
| Behavioral Analysis | Session-level, statistical | Continuous streams, fleet-level, ML-driven, temporal attack detection |
| Infrastructure Constraints | Per-agent scoping | Interaction graph constraints, composition limits, fleet-level circuit breakers |
The "constrain regardless" principle becomes more important, not less. As evaluation becomes harder and oversight becomes scarcer, infrastructure-level action-space constraints are the control layer that scales with capability, because they don't need to understand what the agent is doing to prevent what it shouldn't be doing.