Economic Governance¶
You can't govern what you don't meter. AI costs are non-deterministic: treat them like risk, not like infrastructure.
The Problem¶
Traditional software costs are predictable: compute scales linearly, storage costs are forecastable, and API calls have fixed prices. AI systems break all of these assumptions.
LLM inference costs depend on input length, output length, model selection, and (increasingly) reasoning tokens that are invisible without instrumentation. Agentic systems compound the problem: autonomous agents retry, reformulate, and chain tool calls in patterns that make per-task costs unpredictable. A single agent loop can consume thousands of API calls before anyone notices.
The result is a new class of operational risk: economic risk from uncontrolled AI runtime behavior.
This isn't hypothetical:
- IDC's 2025 survey found that 92% of decision-makers reported AI agent costs higher than expected, with inference being the most common cause.
- The Greyhound CIO Pulse 2025 found that 68% of digital leaders experienced major budget overruns during initial agent deployments, with nearly half attributing overruns to runaway tool loops and recursive logic.
- Mavvrik's 2025 AI Cost Governance Report (372 companies surveyed) found that 84% of companies report AI costs cutting gross margins by more than 6%, and 80% of enterprises miss AI infrastructure forecasts by more than 25%.
- Gartner predicts that over 40% of agentic AI projects will fail to reach production by 2027, driven by cost and complexity.
- IDC FutureScape 2026 warns that by 2027, G1000 organisations will face up to a 30% rise in underestimated AI infrastructure costs, not from overspending, but from under-forecasting expenses unique to AI workloads.
The framework's Cost & Latency guide covers how to budget for security controls. This document covers a different problem: how to govern AI economics at runtime: monitoring spend, enforcing budgets, and preventing runaway costs before they become incidents.
Why This Is a Security Problem¶
Economic governance is not just a finance concern. Uncontrolled AI spending creates security-relevant risks:
| Risk | How It Manifests | Security Impact |
|---|---|---|
| Resource exhaustion | Agent loops or prompt injection cause excessive API calls | Denial of service through cost, not traffic |
| Budget-driven shortcuts | Teams disable controls to reduce costs | Security controls bypassed to stay within budget |
| Adversarial cost inflation | Attacker crafts inputs that maximise token consumption | Financial denial-of-service (FDoS) |
| Shadow AI spend | Teams use unmonitored AI services to avoid governance | Ungoverned AI systems with no controls |
| Model downgrade pressure | Cost pressure forces use of cheaper, less capable models | Judge and guardrail effectiveness reduced |
Financial denial-of-service is an emerging threat class. An attacker who can trigger expensive model calls (through prompt injection that causes verbose output, or inputs that trigger agent retry loops) can inflict economic harm without exfiltrating data or compromising systems.
Governance Must Move Inside the System¶
For most of the past decade, AI governance lived comfortably outside the systems it was meant to regulate. As long as AI behaved like a tool (producing predictions or recommendations on demand) that separation mostly worked.
That assumption is breaking down. As AI systems move from assistive components to autonomous actors, governance imposed from the outside no longer scales. O'Reilly Radar's "Control Planes for Autonomous AI" (2025) identifies the critical architectural insight: control planes should function as feedback systems, not gatekeepers. Signals flow continuously from execution into governance: confidence degradation, policy boundary crossings, cost acceleration patterns. Those signals are evaluated in real time, not weeks later during audits. Responses flow back: throttling, intervention, escalation, or constraint adjustment.
The distinction matters: output monitoring tells you what happened. Control plane telemetry tells you why it was allowed to happen.
Economic governance follows the same principle. Budget enforcement that operates outside the AI system (monthly cost reviews, manual spend approvals) cannot contain a cost spike that happens in minutes. The enforcement must be architectural: embedded in the request pipeline, aware of cost in real time, and capable of automated response.
The Economic Governance Model¶
Economic governance for AI runtime requires four capabilities:
Meter → Attribute → Enforce → Optimise
1. Meter: Know What You're Spending¶
You cannot govern what you cannot see. Every AI interaction must be instrumented for cost:
| Telemetry Point | What to Capture | Why It Matters |
|---|---|---|
| Token usage | Input tokens, output tokens, reasoning tokens (where applicable) | Basis for cost calculation |
| Model selection | Which model served each request | Different models have 50x cost differences |
| Tool calls | Number and type of external tool invocations | Agentic systems chain tool calls with compounding cost |
| Retry count | Number of retries per task | Retries are the primary driver of runaway spend |
| Latency | Time per request and total task duration | Correlates with cost; long-running tasks signal loops |
| Cache hits | Requests served from cache vs. fresh inference | Validates cost optimisation effectiveness |
Integration with existing telemetry. The framework's Runtime Telemetry Reference defines the observability baseline. Economic telemetry should be captured alongside security telemetry: same pipeline, same dashboards, same alerting infrastructure. Cost anomalies and security anomalies often share the same root cause.
2. Attribute: Know Who's Spending It¶
Visibility without attribution doesn't change behavior. Every AI cost must be attributable to a specific dimension:
| Attribution Dimension | Purpose | Example |
|---|---|---|
| Application | Which AI system incurred the cost | Customer service bot vs. fraud detection |
| Team / business unit | Who owns the spend | Engineering, operations, marketing |
| User / session | Which end-user or session triggered the cost | Per-user cost tracking for abuse detection |
| Feature | Which capability within an application | Search vs. summarisation vs. decision support |
| Risk tier | Cost segmented by risk classification | Tier 3 systems should cost more (they have more controls) |
| Control layer | Cost of security controls vs. primary inference | Generator cost vs. judge cost vs. guardrail cost |
Why control-layer attribution matters. Without it, security controls become the first target when cost pressure arrives. If leadership sees "AI costs are over budget" without understanding that 30% is security controls, the response is often to reduce controls rather than optimise inference. Separating generator costs from security costs protects the control framework from budget-driven erosion.
3. Enforce: Stop Spending Before It's a Problem¶
Monitoring alone is insufficient. Economic governance requires enforcement: automated mechanisms that prevent budget overruns before they occur.
Graduated Budget Responses¶
Do not jump straight to blocking requests when budgets are approached. Use graduated responses:
| Budget Threshold | Response | Example Action |
|---|---|---|
| 50% of period budget | Alert | Notify cost owner via dashboard and email |
| 75% of period budget | Warn | Alert escalation to team lead; increase monitoring frequency |
| 90% of period budget | Throttle | Reduce sampling rates; route to cheaper models for non-critical requests |
| 95% of period budget | Degrade | Disable non-essential features; queue non-urgent requests |
| 100% of period budget | Hard stop | Block new requests; serve cached responses where possible |
Enforcement Mechanisms¶
| Mechanism | What It Does | When to Use |
|---|---|---|
| Per-request token caps | Limits max_tokens per individual request |
Always; prevents single requests from consuming excessive budget |
| Per-session budget | Caps total spend within a single user session | Agentic systems where sessions can run for extended periods |
| Per-user daily/monthly limits | Prevents individual users from consuming disproportionate budget | Multi-tenant systems; abuse prevention |
| Per-application budget | Hard ceiling on total spend per application per period | Portfolio-level cost governance |
| Agent loop detection | Detects and terminates repetitive agent behavior | Agentic systems, the primary runaway cost driver |
| Circuit breaker | Automatically halts AI processing when cost rate exceeds threshold | Emergency protection against cost spikes |
The Agent Loop Problem¶
Agentic AI introduces the most significant economic governance challenge. Unlike single-request systems, agents persist, retrying, reformulating, and chaining operations. A single poorly-constrained agent can generate hundreds of model calls for one task.
Runtime controls for agent economics:
| Control | Implementation |
|---|---|
| Maximum iterations per task | Hard limit on agent retry/reformulation cycles (e.g., 10 iterations) |
| Maximum tool calls per session | Cap on external tool invocations per agent session |
| Token budget per task | Total token budget allocated to complete a single task |
| Cost-per-step monitoring | Track cost accumulation per agent step; alert on acceleration |
| Diminishing returns detection | Detect when additional iterations aren't improving outcomes; terminate early |
| Recursive call depth limits | Prevent agents from spawning unbounded sub-agent chains |
Google's Budget Tracker and BATS framework (arXiv: 2511.17006, 2025) from Google Cloud AI Research, Google DeepMind, UC Santa Barbara, and NYU provides the most rigorous treatment of this problem to date. Budget Tracker is a lightweight plug-in that provides continuous budget awareness (like a fuel gauge), it appends remaining budget information after every tool response. It operates purely at the prompt level with no additional training required. The results are striking: a Gemini-2.5-Pro agent achieved similar accuracy with 10 tool calls versus 100 for standard ReAct, using 40% fewer search calls and cutting total cost by 31%.
BATS (Budget Aware Test-time Scaling) extends this by dynamically adapting planning strategy based on remaining resources, deciding whether to "dig deeper" on a promising lead or "pivot" to new paths. On the BrowseComp benchmark, it achieved 24.6% accuracy at ~$0.23/query versus a parallel scaling baseline requiring >$0.50 for similar accuracy.
The key insight: budget as a first-class input to agent reasoning, not an external constraint applied after the fact.
4. Optimise: Spend Effectively, Not Less¶
Cost optimisation is not cost reduction. The goal is maximum security and business value per unit of spend.
| Optimisation Strategy | How It Works | Typical Savings |
|---|---|---|
| Tiered model routing | Route simple requests to cheaper models; reserve expensive models for complex tasks | 40–60% |
| Prompt optimisation | Reduce input token count through concise prompting | 10–30% |
| Response caching | Cache identical or semantically similar responses (Tier 1 only; see Cost & Latency) | 5–30% |
| Adaptive sampling | Adjust judge evaluation frequency based on risk signals (see Cost & Latency) | 20–40% |
| Batch processing | Aggregate non-urgent requests for batch inference at lower cost | 30–50% |
| Token-aware rate limiting | Limit by tokens consumed, not requests made; prevents heavy prompts from consuming disproportionate budget | Variable |
Critical principle: never optimise security controls to meet budget. If the budget doesn't support the required control intensity for the risk tier, the correct response is to reduce the system's scope or autonomy (lowering the risk tier), not to weaken controls. This is a governance decision, not an engineering one.
Budget Governance by Risk Tier¶
Economic governance intensity should match risk tier, just like security controls:
| Dimension | Tier 1 (Low) | Tier 2 (Medium) | Tier 3 (High) | Tier 4 (Critical) |
|---|---|---|---|---|
| Budget monitoring | Monthly review | Weekly review | Daily review | Real-time |
| Cost attribution | Per-application | Per-application, per-team | Per-application, per-team, per-feature | Per-request |
| Budget enforcement | Soft alerts only | Graduated responses | Hard limits with circuit breakers | Hard limits, circuit breakers, automatic degradation |
| Anomaly detection | Manual review | Threshold-based alerts | Statistical anomaly detection | ML-based anomaly detection with automated response |
| Reporting | Monthly cost report | Weekly cost report with variance analysis | Daily cost report with forecasting | Real-time dashboard with predictive alerts |
| Governance approval | Annual budget | Quarterly review | Monthly review with variance justification | Weekly review; real-time escalation for overruns |
FinOps for AI: What's Different¶
The FinOps Foundation's 2026 State of FinOps report found that 98% of respondents now manage AI spend (up from 63% in 2025 and 31% in 2024). AI cost governance has moved from emerging concern to mainstream operational requirement in two years.
However, AI FinOps is not cloud FinOps. Traditional cloud cost management assumptions break down:
| Cloud FinOps Assumption | Why AI Breaks It |
|---|---|
| Costs scale linearly with usage | Agent retry loops and reasoning tokens create non-linear cost curves |
| Resource usage is predictable | Same prompt can cost 10x more depending on model reasoning path |
| Idle resources are the main waste | Active AI systems waste through inefficient prompting, unnecessary retries, and over-provisioned model selection |
| Cost allocation maps to infrastructure | AI costs map to business outcomes: cost per resolved ticket, not cost per GPU hour |
| Monthly billing cycles are sufficient | AI cost spikes happen in minutes, not months |
Cost-Per-Outcome Thinking¶
Mature AI economic governance tracks cost per business outcome, not cost per API call:
| Metric | What It Measures | Why It Matters |
|---|---|---|
| Cost per resolved interaction | Total AI cost to resolve one customer query | Enables comparison with human-only cost |
| Cost per decision | AI cost to produce one actionable recommendation | Determines whether AI adds value vs. alternatives |
| Security cost ratio | Control cost as percentage of total AI cost | Tracks whether security overhead is proportionate |
| Cost per risk tier | Average interaction cost segmented by tier | Validates that higher-risk systems cost more (they should) |
| Cost variance coefficient | Standard deviation of per-interaction cost | Measures predictability; high variance signals governance gaps |
Provider-Native Controls: Know the Gaps¶
AI model providers offer varying levels of native cost controls. Organisations should understand what providers offer, and where they fall short:
| Provider | Native Budget Control | Limitation |
|---|---|---|
| Anthropic | Hard monthly spend caps tied to usage tiers; token bucket rate limiting (RPM, TPM, TPD) | Organisation-level only; no per-application or per-user granularity |
| OpenAI | Budget limit settings; configurable alerts | Reported to function as alerts rather than hard stops in some configurations; verify enforcement behavior |
| Azure OpenAI | TPM/RPM quotas per deployment | No built-in hard spending cap. Quotas control request rate but do not correlate to total monthly spending; exceeding limits triggers throttling, not blocking |
| AWS Bedrock | Per-model throughput provisioning; CloudWatch billing alarms | Billing alarms are reactive, not preventive; no native per-request cost enforcement |
| Google Vertex AI | Budget alerts via Cloud Billing; quota limits | Alerts are notifications, not enforcement; budget exceeded before alert arrives |
The gap is consistent: provider-native controls operate at the infrastructure level (rate limits, billing alerts) but not at the application level (per-user budgets, per-feature limits, graduated responses). This is why a centralised AI gateway is essential.
The AI Gateway Pattern¶
A centralised AI gateway or proxy, sitting between your applications and model providers, is the primary enforcement point for economic governance:
| Gateway | Type | Key Strength |
|---|---|---|
| LiteLLM | Open source | Unified interface for 100+ providers; per-key dollar budgets with automatic hard-stop enforcement; 8ms P95 latency at 1K RPS |
| Portkey | Commercial | Hierarchical budgets (org → workspace → team → key); soft and hard limits; policy-as-code enforcement |
| Bifrost | Open source | High performance (11μs overhead at 5K RPS); real-time cost visibility as first-class capability |
| Langfuse | Open source | Detailed cost attribution at span level within multi-step agent workflows; strong observability |
The gateway pattern provides a single enforcement point regardless of which models or providers are used downstream. It also enables model routing, automatically directing requests to cost-appropriate models based on task complexity.
Runtime Enforcement Taxonomy¶
Runtime budget enforcement operates across six layers. Mature organisations implement controls at multiple layers:
| Layer | What It Controls | Examples |
|---|---|---|
| 1. Provider-level | Rate limits and spend caps from the model provider | Anthropic hard caps, OpenAI budget limits, Azure TPM quotas |
| 2. Gateway-level | Centralised request-level enforcement | LiteLLM virtual key budgets, Portkey hierarchical limits, policy-as-code |
| 3. Agent-level | Budget awareness within agent reasoning | Google BATS budget tracker, iteration caps, loop detection |
| 4. Infrastructure-level | Compute resource constraints | Kubernetes resource quotas, GPU fractional sharing, wall-clock timeboxing |
| 5. Organisational | Human governance processes | Cross-functional FinOps teams, approval workflows, progressive trust models |
| 6. Observability | Detection and feedback | Real-time telemetry, cost anomaly detection, tiered alerting dashboards |
Layer 2 (gateway) is the minimum viable enforcement point. Layers 1 and 4 provide defence in depth. Layer 3 is emerging but essential for agentic systems. Layers 5 and 6 provide the governance context that makes technical controls meaningful.
The Financial Denial-of-Service Threat¶
Financial denial-of-service (FDoS) deserves specific attention as an emerging threat class:
Attack Vectors¶
| Vector | Mechanism | Impact |
|---|---|---|
| Verbose prompt injection | Crafted input that causes model to generate maximum-length output | Maximises output token cost per request |
| Agent loop triggering | Input designed to cause repeated agent retries without resolution | Multiplies cost through iteration |
| Complex reasoning provocation | Prompts that trigger extended chain-of-thought in reasoning models | Reasoning tokens consumed without producing useful output |
| Distributed low-rate attacks | Many users each triggering slightly-above-normal costs | Bypasses per-user limits while exceeding aggregate budget |
| Tool call amplification | Input that causes agent to invoke expensive external tools repeatedly | Compounds cost across multiple services |
Defences¶
| Defence | Mechanism |
|---|---|
| Per-request cost ceiling | Hard max_tokens limit on every request |
| Per-user cost rate limiting | Token-aware rate limits (not just request count) |
| Agent iteration caps | Maximum retry/reformulation cycles per task |
| Cost anomaly detection | Statistical detection of cost patterns deviating from baseline |
| Circuit breaker | Automatic halt when cost rate exceeds threshold |
| Input complexity analysis | Pre-screening inputs for characteristics associated with high-cost responses |
Integration with Existing Controls¶
Economic governance is not a standalone function. It integrates with existing framework controls:
| Framework Component | Economic Governance Integration |
|---|---|
| Guardrails | Input guardrails can reject inputs likely to trigger high-cost responses (complexity screening) |
| Judge | Judge evaluation costs must be tracked separately; judge sampling rates are an economic governance lever |
| Circuit Breaker | Extend circuit breaker criteria to include cost thresholds, not just safety thresholds |
| Observability | Economic telemetry uses the same pipeline as security observability |
| Risk Tiers | Economic governance intensity scales with risk tier |
| PACE Resilience | Budget exhaustion is a failure mode requiring PACE-style resilience planning |
| Incident Response | Cost overruns may require incident response procedures, especially if caused by adversarial action |
Governance Operating Model¶
Economic governance requires clear roles and responsibilities:
| Role | Responsibility |
|---|---|
| AI Product Owner | Sets budget for their AI system; accountable for cost-per-outcome |
| AI Engineering | Implements cost metering, attribution, and enforcement mechanisms |
| FinOps / Finance | Provides budget allocation, forecasting, and variance analysis |
| Security | Monitors for FDoS, ensures cost pressure doesn't degrade security controls |
| AI Governance Committee | Approves budget exceptions; reviews cost-vs-risk tradeoffs |
Decision Rights¶
| Decision | Who Decides |
|---|---|
| Total AI budget allocation | Finance + Leadership |
| Budget per application | AI Product Owner + Finance |
| Model selection (cost/capability tradeoff) | AI Engineering + Product Owner |
| Security control budget (non-negotiable by tier) | AI Governance Committee |
| Budget exception requests | AI Governance Committee |
| Emergency cost circuit breaker threshold | Security + AI Engineering |
Implementation Checklist¶
Phase 1: Visibility (Weeks 1–4)¶
- Instrument all AI API calls for token usage and cost capture
- Implement cost attribution by application and team
- Create a cost dashboard with daily aggregation
- Establish baseline cost patterns for each AI system
- Separate security control costs from primary inference costs
Phase 2: Governance (Weeks 5–8)¶
- Define budget allocations per application and risk tier
- Implement graduated budget alerts (50%, 75%, 90% thresholds)
- Establish cost reporting cadence (aligned to risk tier)
- Define cost escalation procedures
- Document cost-per-outcome metrics for each AI system
Phase 3: Enforcement (Weeks 9–12)¶
- Implement per-request token caps
- Deploy agent loop detection and iteration limits
- Configure circuit breakers for cost rate spikes
- Implement per-user and per-session budget limits
- Test graduated response mechanisms (alert → throttle → degrade → stop)
Phase 4: Optimisation (Ongoing)¶
- Implement tiered model routing based on request complexity
- Deploy response caching where risk-appropriate
- Optimise prompts for token efficiency
- Review cost-per-outcome trends monthly
- Conduct quarterly cost-vs-risk tier alignment reviews
Key Metrics¶
| Metric | Target | Alert Threshold |
|---|---|---|
| Cost variance vs. budget | ±10% | >25% |
| Cost per interaction (by tier) | Within budget | >120% of budget |
| Security cost ratio | 15–40% (Tier 2), 40–100% (Tier 3) | Dropping below expected range (may indicate control erosion) |
| Agent cost predictability | CV < 0.3 | CV > 0.5 |
| Budget utilisation | 70–90% | <50% (underutilised) or >95% (at risk) |
| FDoS incidents | 0 | Any |
| Cost-driven control exceptions | 0 | Any |
What This Doesn't Cover¶
- Procurement and licensing costs. This document covers runtime economics, not vendor selection or contract negotiation.
- Infrastructure capacity planning. See platform-specific guidance in Platform Patterns.
- Build-phase costs. Development, training, and fine-tuning costs are project costs, not runtime governance. The framework's Business Alignment covers build-vs-run cost analysis.
- Cost modelling for the security control layers. See Cost & Latency for detailed control cost analysis.
References¶
Standards and Frameworks¶
- NIST AI Risk Management Framework (AI RMF 1.0), economic harm categories and continuous monitoring: nist.gov/itl/ai-risk-management-framework
- ISO/IEC 42001:2023, AI management system standard: iso.org/standard/42001
- OECD, "Governing with Artificial Intelligence" (2025): oecd.org
Industry Research¶
- FinOps Foundation, "FinOps for AI Overview" (2025): finops.org/wg/finops-for-ai-overview
- FinOps Foundation, "State of FinOps 2026": data.finops.org
- Mavvrik, "2025 State of AI Cost Governance Report": mavvrik.ai
- CloudZero, "The State of AI Costs in 2025": cloudzero.com
- IDC FutureScape 2026, AI infrastructure cost underestimation projections
- Gartner, AI agent production failure predictions; AI strategy ROI findings
- Galileo AI, "The Hidden Costs of Agentic AI": galileo.ai
Academic Papers¶
- Google Cloud AI Research, Google DeepMind et al., "Budget Aware Test-time Scaling" (BATS), arXiv: 2511.17006 (2025): arxiv.org
- O'Reilly Radar, "Control Planes for Autonomous AI" (2025): oreilly.com
- GovAI, "Computing Power and the Governance of AI": governance.ai
- "AI Governance through Markets", arXiv: 2501.17755 (2025): arxiv.org
- Springer, "AI Governance: A Systematic Literature Review" (AI and Ethics, 2025): springer.com
Tools¶
- LiteLLM, open-source LLM proxy with budget enforcement: github.com/BerriAI/litellm
- Portkey, AI gateway with hierarchical budget controls: portkey.ai
- Langfuse, open-source LLM observability with cost tracking: langfuse.com
- OpenCost, CNCF Kubernetes cost monitoring with AI plugin: opencost.io