Skip to content

Economic Governance

You can't govern what you don't meter. AI costs are non-deterministic: treat them like risk, not like infrastructure.

The Problem

Traditional software costs are predictable: compute scales linearly, storage costs are forecastable, and API calls have fixed prices. AI systems break all of these assumptions.

LLM inference costs depend on input length, output length, model selection, and (increasingly) reasoning tokens that are invisible without instrumentation. Agentic systems compound the problem: autonomous agents retry, reformulate, and chain tool calls in patterns that make per-task costs unpredictable. A single agent loop can consume thousands of API calls before anyone notices.

The result is a new class of operational risk: economic risk from uncontrolled AI runtime behavior.

This isn't hypothetical:

  • IDC's 2025 survey found that 92% of decision-makers reported AI agent costs higher than expected, with inference being the most common cause.
  • The Greyhound CIO Pulse 2025 found that 68% of digital leaders experienced major budget overruns during initial agent deployments, with nearly half attributing overruns to runaway tool loops and recursive logic.
  • Mavvrik's 2025 AI Cost Governance Report (372 companies surveyed) found that 84% of companies report AI costs cutting gross margins by more than 6%, and 80% of enterprises miss AI infrastructure forecasts by more than 25%.
  • Gartner predicts that over 40% of agentic AI projects will fail to reach production by 2027, driven by cost and complexity.
  • IDC FutureScape 2026 warns that by 2027, G1000 organisations will face up to a 30% rise in underestimated AI infrastructure costs, not from overspending, but from under-forecasting expenses unique to AI workloads.

The framework's Cost & Latency guide covers how to budget for security controls. This document covers a different problem: how to govern AI economics at runtime: monitoring spend, enforcing budgets, and preventing runaway costs before they become incidents.

Why This Is a Security Problem

Economic governance is not just a finance concern. Uncontrolled AI spending creates security-relevant risks:

Risk How It Manifests Security Impact
Resource exhaustion Agent loops or prompt injection cause excessive API calls Denial of service through cost, not traffic
Budget-driven shortcuts Teams disable controls to reduce costs Security controls bypassed to stay within budget
Adversarial cost inflation Attacker crafts inputs that maximise token consumption Financial denial-of-service (FDoS)
Shadow AI spend Teams use unmonitored AI services to avoid governance Ungoverned AI systems with no controls
Model downgrade pressure Cost pressure forces use of cheaper, less capable models Judge and guardrail effectiveness reduced

Financial denial-of-service is an emerging threat class. An attacker who can trigger expensive model calls (through prompt injection that causes verbose output, or inputs that trigger agent retry loops) can inflict economic harm without exfiltrating data or compromising systems.

Governance Must Move Inside the System

For most of the past decade, AI governance lived comfortably outside the systems it was meant to regulate. As long as AI behaved like a tool (producing predictions or recommendations on demand) that separation mostly worked.

That assumption is breaking down. As AI systems move from assistive components to autonomous actors, governance imposed from the outside no longer scales. O'Reilly Radar's "Control Planes for Autonomous AI" (2025) identifies the critical architectural insight: control planes should function as feedback systems, not gatekeepers. Signals flow continuously from execution into governance: confidence degradation, policy boundary crossings, cost acceleration patterns. Those signals are evaluated in real time, not weeks later during audits. Responses flow back: throttling, intervention, escalation, or constraint adjustment.

The distinction matters: output monitoring tells you what happened. Control plane telemetry tells you why it was allowed to happen.

Economic governance follows the same principle. Budget enforcement that operates outside the AI system (monthly cost reviews, manual spend approvals) cannot contain a cost spike that happens in minutes. The enforcement must be architectural: embedded in the request pipeline, aware of cost in real time, and capable of automated response.

The Economic Governance Model

Economic governance for AI runtime requires four capabilities:

Meter → Attribute → Enforce → Optimise

1. Meter: Know What You're Spending

You cannot govern what you cannot see. Every AI interaction must be instrumented for cost:

Telemetry Point What to Capture Why It Matters
Token usage Input tokens, output tokens, reasoning tokens (where applicable) Basis for cost calculation
Model selection Which model served each request Different models have 50x cost differences
Tool calls Number and type of external tool invocations Agentic systems chain tool calls with compounding cost
Retry count Number of retries per task Retries are the primary driver of runaway spend
Latency Time per request and total task duration Correlates with cost; long-running tasks signal loops
Cache hits Requests served from cache vs. fresh inference Validates cost optimisation effectiveness

Integration with existing telemetry. The framework's Runtime Telemetry Reference defines the observability baseline. Economic telemetry should be captured alongside security telemetry: same pipeline, same dashboards, same alerting infrastructure. Cost anomalies and security anomalies often share the same root cause.

2. Attribute: Know Who's Spending It

Visibility without attribution doesn't change behavior. Every AI cost must be attributable to a specific dimension:

Attribution Dimension Purpose Example
Application Which AI system incurred the cost Customer service bot vs. fraud detection
Team / business unit Who owns the spend Engineering, operations, marketing
User / session Which end-user or session triggered the cost Per-user cost tracking for abuse detection
Feature Which capability within an application Search vs. summarisation vs. decision support
Risk tier Cost segmented by risk classification Tier 3 systems should cost more (they have more controls)
Control layer Cost of security controls vs. primary inference Generator cost vs. judge cost vs. guardrail cost

Why control-layer attribution matters. Without it, security controls become the first target when cost pressure arrives. If leadership sees "AI costs are over budget" without understanding that 30% is security controls, the response is often to reduce controls rather than optimise inference. Separating generator costs from security costs protects the control framework from budget-driven erosion.

3. Enforce: Stop Spending Before It's a Problem

Monitoring alone is insufficient. Economic governance requires enforcement: automated mechanisms that prevent budget overruns before they occur.

Graduated Budget Responses

Do not jump straight to blocking requests when budgets are approached. Use graduated responses:

Budget Threshold Response Example Action
50% of period budget Alert Notify cost owner via dashboard and email
75% of period budget Warn Alert escalation to team lead; increase monitoring frequency
90% of period budget Throttle Reduce sampling rates; route to cheaper models for non-critical requests
95% of period budget Degrade Disable non-essential features; queue non-urgent requests
100% of period budget Hard stop Block new requests; serve cached responses where possible

Enforcement Mechanisms

Mechanism What It Does When to Use
Per-request token caps Limits max_tokens per individual request Always; prevents single requests from consuming excessive budget
Per-session budget Caps total spend within a single user session Agentic systems where sessions can run for extended periods
Per-user daily/monthly limits Prevents individual users from consuming disproportionate budget Multi-tenant systems; abuse prevention
Per-application budget Hard ceiling on total spend per application per period Portfolio-level cost governance
Agent loop detection Detects and terminates repetitive agent behavior Agentic systems, the primary runaway cost driver
Circuit breaker Automatically halts AI processing when cost rate exceeds threshold Emergency protection against cost spikes

The Agent Loop Problem

Agentic AI introduces the most significant economic governance challenge. Unlike single-request systems, agents persist, retrying, reformulating, and chaining operations. A single poorly-constrained agent can generate hundreds of model calls for one task.

Runtime controls for agent economics:

Control Implementation
Maximum iterations per task Hard limit on agent retry/reformulation cycles (e.g., 10 iterations)
Maximum tool calls per session Cap on external tool invocations per agent session
Token budget per task Total token budget allocated to complete a single task
Cost-per-step monitoring Track cost accumulation per agent step; alert on acceleration
Diminishing returns detection Detect when additional iterations aren't improving outcomes; terminate early
Recursive call depth limits Prevent agents from spawning unbounded sub-agent chains

Google's Budget Tracker and BATS framework (arXiv: 2511.17006, 2025) from Google Cloud AI Research, Google DeepMind, UC Santa Barbara, and NYU provides the most rigorous treatment of this problem to date. Budget Tracker is a lightweight plug-in that provides continuous budget awareness (like a fuel gauge), it appends remaining budget information after every tool response. It operates purely at the prompt level with no additional training required. The results are striking: a Gemini-2.5-Pro agent achieved similar accuracy with 10 tool calls versus 100 for standard ReAct, using 40% fewer search calls and cutting total cost by 31%.

BATS (Budget Aware Test-time Scaling) extends this by dynamically adapting planning strategy based on remaining resources, deciding whether to "dig deeper" on a promising lead or "pivot" to new paths. On the BrowseComp benchmark, it achieved 24.6% accuracy at ~$0.23/query versus a parallel scaling baseline requiring >$0.50 for similar accuracy.

The key insight: budget as a first-class input to agent reasoning, not an external constraint applied after the fact.

4. Optimise: Spend Effectively, Not Less

Cost optimisation is not cost reduction. The goal is maximum security and business value per unit of spend.

Optimisation Strategy How It Works Typical Savings
Tiered model routing Route simple requests to cheaper models; reserve expensive models for complex tasks 40–60%
Prompt optimisation Reduce input token count through concise prompting 10–30%
Response caching Cache identical or semantically similar responses (Tier 1 only; see Cost & Latency) 5–30%
Adaptive sampling Adjust judge evaluation frequency based on risk signals (see Cost & Latency) 20–40%
Batch processing Aggregate non-urgent requests for batch inference at lower cost 30–50%
Token-aware rate limiting Limit by tokens consumed, not requests made; prevents heavy prompts from consuming disproportionate budget Variable

Critical principle: never optimise security controls to meet budget. If the budget doesn't support the required control intensity for the risk tier, the correct response is to reduce the system's scope or autonomy (lowering the risk tier), not to weaken controls. This is a governance decision, not an engineering one.

Budget Governance by Risk Tier

Economic governance intensity should match risk tier, just like security controls:

Dimension Tier 1 (Low) Tier 2 (Medium) Tier 3 (High) Tier 4 (Critical)
Budget monitoring Monthly review Weekly review Daily review Real-time
Cost attribution Per-application Per-application, per-team Per-application, per-team, per-feature Per-request
Budget enforcement Soft alerts only Graduated responses Hard limits with circuit breakers Hard limits, circuit breakers, automatic degradation
Anomaly detection Manual review Threshold-based alerts Statistical anomaly detection ML-based anomaly detection with automated response
Reporting Monthly cost report Weekly cost report with variance analysis Daily cost report with forecasting Real-time dashboard with predictive alerts
Governance approval Annual budget Quarterly review Monthly review with variance justification Weekly review; real-time escalation for overruns

FinOps for AI: What's Different

The FinOps Foundation's 2026 State of FinOps report found that 98% of respondents now manage AI spend (up from 63% in 2025 and 31% in 2024). AI cost governance has moved from emerging concern to mainstream operational requirement in two years.

However, AI FinOps is not cloud FinOps. Traditional cloud cost management assumptions break down:

Cloud FinOps Assumption Why AI Breaks It
Costs scale linearly with usage Agent retry loops and reasoning tokens create non-linear cost curves
Resource usage is predictable Same prompt can cost 10x more depending on model reasoning path
Idle resources are the main waste Active AI systems waste through inefficient prompting, unnecessary retries, and over-provisioned model selection
Cost allocation maps to infrastructure AI costs map to business outcomes: cost per resolved ticket, not cost per GPU hour
Monthly billing cycles are sufficient AI cost spikes happen in minutes, not months

Cost-Per-Outcome Thinking

Mature AI economic governance tracks cost per business outcome, not cost per API call:

Metric What It Measures Why It Matters
Cost per resolved interaction Total AI cost to resolve one customer query Enables comparison with human-only cost
Cost per decision AI cost to produce one actionable recommendation Determines whether AI adds value vs. alternatives
Security cost ratio Control cost as percentage of total AI cost Tracks whether security overhead is proportionate
Cost per risk tier Average interaction cost segmented by tier Validates that higher-risk systems cost more (they should)
Cost variance coefficient Standard deviation of per-interaction cost Measures predictability; high variance signals governance gaps

Provider-Native Controls: Know the Gaps

AI model providers offer varying levels of native cost controls. Organisations should understand what providers offer, and where they fall short:

Provider Native Budget Control Limitation
Anthropic Hard monthly spend caps tied to usage tiers; token bucket rate limiting (RPM, TPM, TPD) Organisation-level only; no per-application or per-user granularity
OpenAI Budget limit settings; configurable alerts Reported to function as alerts rather than hard stops in some configurations; verify enforcement behavior
Azure OpenAI TPM/RPM quotas per deployment No built-in hard spending cap. Quotas control request rate but do not correlate to total monthly spending; exceeding limits triggers throttling, not blocking
AWS Bedrock Per-model throughput provisioning; CloudWatch billing alarms Billing alarms are reactive, not preventive; no native per-request cost enforcement
Google Vertex AI Budget alerts via Cloud Billing; quota limits Alerts are notifications, not enforcement; budget exceeded before alert arrives

The gap is consistent: provider-native controls operate at the infrastructure level (rate limits, billing alerts) but not at the application level (per-user budgets, per-feature limits, graduated responses). This is why a centralised AI gateway is essential.

The AI Gateway Pattern

A centralised AI gateway or proxy, sitting between your applications and model providers, is the primary enforcement point for economic governance:

Gateway Type Key Strength
LiteLLM Open source Unified interface for 100+ providers; per-key dollar budgets with automatic hard-stop enforcement; 8ms P95 latency at 1K RPS
Portkey Commercial Hierarchical budgets (org → workspace → team → key); soft and hard limits; policy-as-code enforcement
Bifrost Open source High performance (11μs overhead at 5K RPS); real-time cost visibility as first-class capability
Langfuse Open source Detailed cost attribution at span level within multi-step agent workflows; strong observability

The gateway pattern provides a single enforcement point regardless of which models or providers are used downstream. It also enables model routing, automatically directing requests to cost-appropriate models based on task complexity.

Runtime Enforcement Taxonomy

Runtime budget enforcement operates across six layers. Mature organisations implement controls at multiple layers:

Layer What It Controls Examples
1. Provider-level Rate limits and spend caps from the model provider Anthropic hard caps, OpenAI budget limits, Azure TPM quotas
2. Gateway-level Centralised request-level enforcement LiteLLM virtual key budgets, Portkey hierarchical limits, policy-as-code
3. Agent-level Budget awareness within agent reasoning Google BATS budget tracker, iteration caps, loop detection
4. Infrastructure-level Compute resource constraints Kubernetes resource quotas, GPU fractional sharing, wall-clock timeboxing
5. Organisational Human governance processes Cross-functional FinOps teams, approval workflows, progressive trust models
6. Observability Detection and feedback Real-time telemetry, cost anomaly detection, tiered alerting dashboards

Layer 2 (gateway) is the minimum viable enforcement point. Layers 1 and 4 provide defence in depth. Layer 3 is emerging but essential for agentic systems. Layers 5 and 6 provide the governance context that makes technical controls meaningful.

The Financial Denial-of-Service Threat

Financial denial-of-service (FDoS) deserves specific attention as an emerging threat class:

Attack Vectors

Vector Mechanism Impact
Verbose prompt injection Crafted input that causes model to generate maximum-length output Maximises output token cost per request
Agent loop triggering Input designed to cause repeated agent retries without resolution Multiplies cost through iteration
Complex reasoning provocation Prompts that trigger extended chain-of-thought in reasoning models Reasoning tokens consumed without producing useful output
Distributed low-rate attacks Many users each triggering slightly-above-normal costs Bypasses per-user limits while exceeding aggregate budget
Tool call amplification Input that causes agent to invoke expensive external tools repeatedly Compounds cost across multiple services

Defences

Defence Mechanism
Per-request cost ceiling Hard max_tokens limit on every request
Per-user cost rate limiting Token-aware rate limits (not just request count)
Agent iteration caps Maximum retry/reformulation cycles per task
Cost anomaly detection Statistical detection of cost patterns deviating from baseline
Circuit breaker Automatic halt when cost rate exceeds threshold
Input complexity analysis Pre-screening inputs for characteristics associated with high-cost responses

Integration with Existing Controls

Economic governance is not a standalone function. It integrates with existing framework controls:

Framework Component Economic Governance Integration
Guardrails Input guardrails can reject inputs likely to trigger high-cost responses (complexity screening)
Judge Judge evaluation costs must be tracked separately; judge sampling rates are an economic governance lever
Circuit Breaker Extend circuit breaker criteria to include cost thresholds, not just safety thresholds
Observability Economic telemetry uses the same pipeline as security observability
Risk Tiers Economic governance intensity scales with risk tier
PACE Resilience Budget exhaustion is a failure mode requiring PACE-style resilience planning
Incident Response Cost overruns may require incident response procedures, especially if caused by adversarial action

Governance Operating Model

Economic governance requires clear roles and responsibilities:

Role Responsibility
AI Product Owner Sets budget for their AI system; accountable for cost-per-outcome
AI Engineering Implements cost metering, attribution, and enforcement mechanisms
FinOps / Finance Provides budget allocation, forecasting, and variance analysis
Security Monitors for FDoS, ensures cost pressure doesn't degrade security controls
AI Governance Committee Approves budget exceptions; reviews cost-vs-risk tradeoffs

Decision Rights

Decision Who Decides
Total AI budget allocation Finance + Leadership
Budget per application AI Product Owner + Finance
Model selection (cost/capability tradeoff) AI Engineering + Product Owner
Security control budget (non-negotiable by tier) AI Governance Committee
Budget exception requests AI Governance Committee
Emergency cost circuit breaker threshold Security + AI Engineering

Implementation Checklist

Phase 1: Visibility (Weeks 1–4)

  • Instrument all AI API calls for token usage and cost capture
  • Implement cost attribution by application and team
  • Create a cost dashboard with daily aggregation
  • Establish baseline cost patterns for each AI system
  • Separate security control costs from primary inference costs

Phase 2: Governance (Weeks 5–8)

  • Define budget allocations per application and risk tier
  • Implement graduated budget alerts (50%, 75%, 90% thresholds)
  • Establish cost reporting cadence (aligned to risk tier)
  • Define cost escalation procedures
  • Document cost-per-outcome metrics for each AI system

Phase 3: Enforcement (Weeks 9–12)

  • Implement per-request token caps
  • Deploy agent loop detection and iteration limits
  • Configure circuit breakers for cost rate spikes
  • Implement per-user and per-session budget limits
  • Test graduated response mechanisms (alert → throttle → degrade → stop)

Phase 4: Optimisation (Ongoing)

  • Implement tiered model routing based on request complexity
  • Deploy response caching where risk-appropriate
  • Optimise prompts for token efficiency
  • Review cost-per-outcome trends monthly
  • Conduct quarterly cost-vs-risk tier alignment reviews

Key Metrics

Metric Target Alert Threshold
Cost variance vs. budget ±10% >25%
Cost per interaction (by tier) Within budget >120% of budget
Security cost ratio 15–40% (Tier 2), 40–100% (Tier 3) Dropping below expected range (may indicate control erosion)
Agent cost predictability CV < 0.3 CV > 0.5
Budget utilisation 70–90% <50% (underutilised) or >95% (at risk)
FDoS incidents 0 Any
Cost-driven control exceptions 0 Any

What This Doesn't Cover

  • Procurement and licensing costs. This document covers runtime economics, not vendor selection or contract negotiation.
  • Infrastructure capacity planning. See platform-specific guidance in Platform Patterns.
  • Build-phase costs. Development, training, and fine-tuning costs are project costs, not runtime governance. The framework's Business Alignment covers build-vs-run cost analysis.
  • Cost modelling for the security control layers. See Cost & Latency for detailed control cost analysis.

References

Standards and Frameworks

Industry Research

  • FinOps Foundation, "FinOps for AI Overview" (2025): finops.org/wg/finops-for-ai-overview
  • FinOps Foundation, "State of FinOps 2026": data.finops.org
  • Mavvrik, "2025 State of AI Cost Governance Report": mavvrik.ai
  • CloudZero, "The State of AI Costs in 2025": cloudzero.com
  • IDC FutureScape 2026, AI infrastructure cost underestimation projections
  • Gartner, AI agent production failure predictions; AI strategy ROI findings
  • Galileo AI, "The Hidden Costs of Agentic AI": galileo.ai

Academic Papers

  • Google Cloud AI Research, Google DeepMind et al., "Budget Aware Test-time Scaling" (BATS), arXiv: 2511.17006 (2025): arxiv.org
  • O'Reilly Radar, "Control Planes for Autonomous AI" (2025): oreilly.com
  • GovAI, "Computing Power and the Governance of AI": governance.ai
  • "AI Governance through Markets", arXiv: 2501.17755 (2025): arxiv.org
  • Springer, "AI Governance: A Systematic Literature Review" (AI and Ethics, 2025): springer.com

Tools