A Day in the Life: The Framework in an Enterprise¶
What a single day looks like when an enterprise with multiple departments, data classifications, and risk tiers operates agentic AI under the framework.
The Enterprise¶
A mid-large enterprise. Multiple departments run or depend on agentic AI:
| Department | AI System | Tier | Agent Architecture |
|---|---|---|---|
| HR | Candidate screening assistant | HIGH | Single agent, RAG over CVs and job specs, recommends shortlists |
| Finance | Invoice processing agent | CRITICAL | Multi-agent: OCR agent → validation agent → approval agent |
| Product (Line A) | Customer service chatbot | HIGH | Single agent with tool access (order lookup, refund initiation) |
| Product (Line B) | Internal knowledge assistant | LOW | Single agent, RAG over internal wiki, read-only |
| Product (Line C) | Claims processing orchestrator | CRITICAL | Multi-agent: intake agent → assessment agent → decision agent → payout agent |
| Compliance | Regulatory change monitor | MEDIUM | Single agent scanning regulatory feeds, summarising changes |
| Risk & Security | Threat intelligence assistant | MEDIUM | Single agent summarising threat feeds, no write access |
| Privacy | DSAR response assistant | HIGH | Single agent with read access to customer data stores |
Eight AI systems. Four risk tiers. Three multi-agent orchestrations. Five departments as consumers, three as governance functions. This is realistic - and this is what the framework is designed to govern.
06:00 - Overnight Batch: Judge Evaluation Catches Drift¶
The shared Judge infrastructure runs overnight batch evaluations on the previous day's interactions for MEDIUM-tier systems. This morning's results flag an issue.
What happened: The Compliance regulatory change monitor (MEDIUM tier) has started producing summaries that omit qualifying language from source regulations. Yesterday's outputs included "organisations must implement X" when the actual regulation said "organisations should consider implementing X where proportionate." The Judge's grounding check scored 23 outputs at below-threshold confidence.
Framework response:
| Layer | Action | Reference |
|---|---|---|
| Judge (Layer 2) | Batch evaluation flags 23 outputs with grounding score below 0.7 | Judge Evaluation - MEDIUM tier: batch daily, basic quality |
| Alerting | Alert sent to Compliance AI operator and the shared AI operations team | Observability |
| PACE | System remains in Primary state - the Judge is detecting, not blocking | PACE Resilience - Tier 1 systems fail-open |
Who acts: The Compliance team's designated AI operator reviews the flagged outputs. They weren't sent externally - this is an internal summarisation tool - so the blast radius is limited. The operator adds a correction note to affected summaries and escalates to the AI engineering team to investigate prompt drift.
Time spent: 30 minutes. No escalation to management.
07:30 - Finance Multi-Agent System: Morning Health Check¶
The Finance invoice processing system (CRITICAL tier) runs a mandatory pre-market health check. This is a multi-agent orchestration: an OCR agent extracts invoice data, a validation agent cross-references against purchase orders and contracts, and an approval agent authorises payments up to delegated limits.
Framework response:
| Control | Check | Result |
|---|---|---|
| Agent identity | All three agent NHIs authenticated, credentials rotated overnight | Pass |
| Inter-agent message bus | Signed message verification between all agent pairs | Pass |
| Judge (Layer 2) | Real-time synchronous Judge active, evaluating 100% of approval decisions | Pass |
| PACE state | All three agents in Primary state | Pass |
| Delegation chain | Approval agent's financial authority confirmed within delegated limits ($50K per invoice, $500K daily aggregate) | Pass |
| Circuit breaker thresholds | Error rate 0.02% (threshold: 1%), latency P99 2.1s (threshold: 5s) | Pass |
Who acts: The Finance AI operations lead reviews the health dashboard. Green across the board. The system processes invoices for the day.
Time spent: 5 minutes. Automated checks, human review of dashboard.
MASO controls active: Identity & Access (per-agent NHI), Execution Control (delegation limits, circuit breakers), Observability (decision chain logging), Prompt & Goal Integrity (immutable task specifications per agent).
08:15 - Product Line A: Customer Service Escalation¶
The customer service chatbot (Product Line A, HIGH tier) handles its first escalation of the day. A customer asks about a refund for a product that was recalled. The chatbot has tool access to initiate refunds up to $200.
What happened: The chatbot correctly identifies the recalled product and initiates a $150 refund. But the customer then asks: "Can you also compensate me for the inconvenience? I had to take a day off work." The chatbot's guardrails don't block this - it's a legitimate request type. The chatbot drafts a response offering a $500 goodwill credit.
Framework response:
| Layer | Action | Result |
|---|---|---|
| Guardrails (Layer 1) | Content policy check: response doesn't violate content rules | Passed (not a content violation) |
| Guardrails (Layer 1) | Financial authority check: $500 exceeds the chatbot's $200 tool limit | Blocked |
| System response | Chatbot cannot execute the $500 credit; responds with "I'd like to help with that - let me connect you with a team member who can review compensation requests" | Graceful degradation |
| Judge (Layer 2) | Async evaluation flags the interaction: chatbot intended to authorise $500, which indicates a policy alignment issue even though guardrails caught it | Flagged for review |
| Human review | Product team's HITL queue receives the flagged interaction for same-day review | Queued |
Why this matters: The guardrail caught the action. The Judge caught the intent. If the guardrail had been configured at $500 instead of $200, the chatbot would have committed the company to a $500 goodwill payment without authorisation. The Judge flag drives a review of whether the chatbot's training or system prompt needs adjustment to align with the compensation policy.
Who acts: The Product Line A AI operator reviews the Judge flag. They open a ticket for the AI engineering team to adjust the system prompt to clarify compensation authority limits. The customer service manager is notified that a customer received a handoff rather than an automated resolution.
Time spent: 15 minutes for the operator. Engineering ticket created for system prompt update.
09:00 - CIO's Weekly AI Portfolio Review¶
The CIO holds a 30-minute weekly review of the AI portfolio. The shared AI operations team presents the portfolio dashboard.
Dashboard content (sourced from framework components):
| System | Tier | PACE State | Judge Flag Rate (7d) | Incidents (7d) | Control Coverage |
|---|---|---|---|---|---|
| HR Candidate Screening | HIGH | Primary | 0.3% | 0 | Full |
| Finance Invoice Processing | CRITICAL | Primary | 0.1% | 0 | Full |
| Product A Customer Service | HIGH | Primary | 1.2% ↑ | 0 | Full |
| Product B Knowledge Assistant | LOW | Primary | N/A (5% sample) | 0 | Basic |
| Product C Claims Processing | CRITICAL | Primary | 0.05% | 0 | Full |
| Compliance Reg Monitor | MEDIUM | Primary | 4.8% ↑ | 0 (drift detected) | Full |
| Risk Threat Intel | MEDIUM | Primary | 0.2% | 0 | Full |
| Privacy DSAR Assistant | HIGH | Primary | 0.4% | 0 | Full |
Discussion points:
-
Product A flag rate trending up (1.2%): The compensation authority issue from 08:15 is part of a pattern. Three similar flags this week. The CIO asks whether the system prompt needs a broader policy review, not just a patch. Action: Product Line A owner to review AI compensation policy alignment by end of week.
-
Compliance monitor drift (4.8% flag rate): Investigated this morning. Root cause appears to be a model provider update that subtly changed summarisation behavior. The CIO asks: "Is this a supply chain issue? Did we get notified of the model update?" Answer: No notification received. Action: AI engineering team to implement model version pinning for all MEDIUM+ systems and add model change detection to the Supply Chain monitoring.
-
Cost review: The shared Judge infrastructure processed 2.1M evaluations last week across all systems. Cost: $18,400. Broken down:
System Judge Volume Cost % of Total Finance (CRITICAL, 100%) 340K $5,100 28% Claims (CRITICAL, 100%) 280K $4,200 23% Customer Service (HIGH, 40%) 520K $3,900 21% HR Screening (HIGH, 30%) 180K $2,700 15% DSAR (HIGH, 25%) 95K $1,425 8% Others (MEDIUM, sampled) 685K $1,075 5% The CIO notes that CRITICAL-tier systems (Finance + Claims) consume 51% of Judge cost despite processing only 30% of volume. This is expected and proportionate - 100% coverage at CRITICAL tier is a framework requirement, not a tuning decision.
-
Skills capacity: The human review queue across all HIGH/CRITICAL systems processed 4,200 reviews last week. Current team of 8 FTE is at 78% utilisation. The CIO flags that Product Line C (Claims) is planning to increase daily volume by 40% next quarter. Action: HR to begin recruiting 3 additional AI review specialists. Training timeline: 2-3 months using Human Factors competency framework.
Time spent: 30 minutes. Four actions assigned with owners and deadlines.
10:30 - Privacy DSAR Assistant: Data Classification Boundary¶
The Privacy team's DSAR (Data Subject Access Request) response assistant (HIGH tier) is processing a request. The agent has read access to customer data stores to locate and compile data for subject access requests.
What happened: The DSAR assistant locates customer records across three systems. While compiling the response, it retrieves a record from a legacy system that contains health-related data (occupational health notes from an employee's HR file - the subject is both a customer and an employee).
Framework response:
| Layer | Action | Result |
|---|---|---|
| Guardrails (Layer 1) | Data classification filter detects health data category (special category under GDPR/POPIA) | Alert triggered |
| Guardrails (Layer 1) | Output DLP prevents health data from being included in the DSAR compilation without explicit authorisation | Blocked from output |
| Judge (Layer 2) | Evaluates the interaction: correctly identifies cross-domain data access (customer data store returned employee health data) | Flagged as data boundary violation |
| PACE | System remains in Primary - guardrails caught it, but the flag requires human decision | No state change |
Who acts: The Privacy team's AI operator reviews the flag. This is a legitimate DSAR - the subject is entitled to their health data - but the AI system isn't authorised to handle special category data without human review. The operator:
- Confirms the DSAR includes the subject as both customer and employee
- Manually reviews the health data for completeness and accuracy
- Adds it to the DSAR response with the required special category handling
- Logs the incident as a data boundary event (not a breach - the controls worked)
Follow-up: The Privacy officer raises this pattern at the weekly governance meeting. Should the DSAR assistant's data access scope include employee health systems, or should those always route to manual processing? This is a Use Case Definition decision that affects the system's risk classification.
Time spent: 45 minutes. Governance question escalated.
11:00 - Product Line B: Fast Lane Deployment¶
Product Line B wants to deploy an update to their internal knowledge assistant (LOW tier). They've added a new data source - the internal engineering wiki. The system remains read-only, internal-only, no regulated data, with human review of outputs.
Framework response: Fast Lane applies.
| Check | Result |
|---|---|
| Internal users only? | Yes |
| Read-only? | Yes |
| No regulated data? | Yes (engineering wiki contains no PII, financial, or health data) |
| Human reviews output? | Yes (engineers use it as a starting point, not as authoritative) |
Process: The Product B team completes the self-certification checklist, updates the AI system registry, and deploys. Basic guardrails (input injection detection, output content filtering) are inherited from the shared guardrail service. No Judge evaluation required beyond the existing 5% sample.
Who acts: Product B engineering lead. Self-certified. No security review required.
Time spent: 2 hours (integration testing + deployment). Zero security overhead.
13:00 - Claims Processing: PACE State Change¶
The Claims processing orchestrator (Product Line C, CRITICAL tier) experiences a degradation. The assessment agent's latency spikes from P99 2s to P99 12s. The circuit breaker threshold is 5s.
Framework response:
| Time | Event | PACE State | Action |
|---|---|---|---|
| 13:00 | Assessment agent P99 crosses 5s threshold | Primary → Alternate | System switches to simplified assessment model (fewer validation checks, faster inference) |
| 13:02 | Automated alert to Claims operations team and AI engineering | Alternate | On-call engineer begins investigation |
| 13:05 | Judge evaluation confirms Alternate model producing acceptable quality (accuracy within 2% of Primary) | Alternate | System continues processing claims at reduced thoroughness |
| 13:08 | Root cause identified: upstream data provider (medical records API) is throttling responses | Alternate | Not an AI system issue - external dependency |
| 13:15 | Medical records API recovers | Alternate → Primary | System restores full assessment model |
What didn't happen: The system didn't stop processing claims. It didn't fail-open and process claims without assessment. It degraded to a pre-tested, pre-approved alternate configuration that trades depth for speed while maintaining safety boundaries. This is PACE Resilience working as designed.
MASO controls during degradation: The Execution Control circuit breaker triggered the state change. The inter-agent message bus continued operating normally - the degradation was contained to the assessment agent. The orchestrator agent's delegation to the assessment agent was automatically adjusted to match the Alternate model's reduced scope. The Observability layer logged the complete state transition for post-incident review.
Who acts: On-call AI engineer monitored the transition. Claims operations lead was notified but didn't need to intervene. Post-incident review scheduled for tomorrow.
Time spent: 15 minutes of active monitoring. Zero manual intervention in claim processing.
14:30 - HR Screening: Bias Review Cycle¶
The HR candidate screening assistant (HIGH tier) undergoes its fortnightly bias review. This is a scheduled governance activity, not an incident.
What happens:
| Activity | Owner | Output |
|---|---|---|
| Demographic parity analysis | HR Analytics + AI engineering | Statistical comparison of recommendation rates across protected characteristics for the past 2 weeks |
| Judge evaluation review | AI operations team | Review of all Judge flags in the period - are flags correlated with any candidate characteristics? |
| Adverse impact ratio calculation | HR Legal | Four-fifths rule analysis on AI-recommended vs. AI-not-recommended candidates |
| Sample audit | HR domain expert | Manual review of 50 randomly selected recommendations (25 recommended, 25 not recommended) |
Today's findings: The adverse impact ratio is 0.83 (above the 0.8 four-fifths threshold, but trending down from 0.91 two weeks ago). The HR legal advisor flags this as a watch item. If it drops below 0.8, the system needs intervention.
Framework connection: This is Human Oversight at HIGH tier - structured review with domain expertise. The Risk Assessment methodology requires ongoing measurement, not just initial classification.
Who acts: HR Analytics lead presents findings. HR Legal advisor sets the watch threshold. AI engineering team is on standby for prompt or training data adjustments if needed.
Time spent: 2 hours across four people. Scheduled, not reactive.
15:00 - Risk & Security: Threat Intelligence Update¶
The Risk & Security team's threat intelligence assistant (MEDIUM tier) surfaces a new relevant threat: a published prompt injection technique targeting RAG-based systems that use a specific chunking strategy.
Framework response:
| Action | Owner | Timeline |
|---|---|---|
| Threat assessment | Security team | Immediate: does this technique affect any of our AI systems? |
| Control mapping | AI security architect | Same day: which guardrails would catch this? Which wouldn't? |
| Portfolio impact | Shared AI operations | Same day: which of the 8 systems use the affected RAG pattern? |
| Guardrail update | AI engineering | This week: update injection detection patterns if needed |
| Red team test | Security team | This sprint: attempt the technique against affected systems |
Portfolio impact assessment: Three systems use RAG: HR Screening (HIGH), Product B Knowledge Assistant (LOW), and the Compliance Reg Monitor (MEDIUM). The AI security architect reviews each:
- Product B (LOW): Internal, read-only. Even if exploited, impact is an employee seeing incorrect information. Risk accepted, guardrail update applied as routine maintenance.
- Compliance (MEDIUM): Internal, read-only. Slightly higher impact - incorrect regulatory information could mislead compliance decisions. Guardrail update prioritised. Judge evaluation rate temporarily increased from 10% to 50% until the guardrail update is deployed.
- HR Screening (HIGH): External-facing data (CVs), consequential decisions. Guardrail update is urgent. Red team test scheduled for tomorrow. If the technique bypasses current guardrails, PACE escalation to Alternate (increased Judge coverage to 100%) until the fix is deployed.
Framework reference: This is Adaptive Sampling in action - temporarily increasing Judge evaluation rate in response to elevated threat intelligence.
Who acts: Security team leads assessment. AI security architect maps controls. AI engineering team implements guardrail updates. Each product team is notified of the risk and the remediation timeline.
Time spent: 1 hour for initial assessment. Engineering work scheduled for this week.
16:30 - Business Owner Review: Product Line C Quarterly Planning¶
The Business Owner for Product Line C (Claims processing, CRITICAL tier) holds a planning session for next quarter. They want to increase daily claim volume by 40% and add a new claim type (motor vehicle).
Framework implications:
| Change | Framework Impact | Action Required |
|---|---|---|
| Volume increase (40%) | Judge cost increases proportionally (CRITICAL = 100% coverage) | Budget: additional ~$1,700/week in Judge costs |
| Volume increase (40%) | Human review queue increases proportionally | Staffing: 2 additional HITL reviewers needed (see 09:00 CIO review) |
| New claim type (motor) | Different domain, different regulations, different fraud patterns | Use Case Definition: re-evaluate - may require separate risk classification |
| New claim type (motor) | Assessment agent needs new training data and evaluation criteria | Judge prompt update + recalibration period |
| New claim type (motor) | PACE fail postures may differ for motor vs. existing claim types | PACE design review for the new claim type |
The Business Owner's decision framework:
-
Is the volume increase within current tier? Yes - scaling within CRITICAL doesn't change the tier, but it changes the cost and staffing requirement. Budget accordingly.
-
Does the new claim type change the risk profile? Possibly. Motor vehicle claims involve different regulations, higher average values, and different fraud patterns. The Risk Tiers six-dimension scoring should be re-run for the motor vehicle claim type specifically. It will almost certainly remain CRITICAL, but the control calibration may differ.
-
What's the progression path? Start motor vehicle claims at reduced autonomy (human approval on all decisions above $5K) and progress to the current autonomy level after 3 months of operational data confirms control effectiveness. This follows the Progression methodology.
Who acts: Business Owner sets the business requirements. AI engineering team scopes the technical changes. The AI governance board reviews the updated risk classification. Finance approves the incremental budget.
Time spent: 1 hour planning session. Multiple follow-up workstreams created.
17:00 - End of Day: Governance Roll-up¶
The shared AI operations team compiles the daily governance summary.
Today's metrics across the portfolio:
| Metric | Value | Trend |
|---|---|---|
| Total AI interactions processed | 127,400 | +3% week-on-week |
| Guardrail blocks (Layer 1) | 342 (0.27%) | Stable |
| Judge flags (Layer 2) | 89 (0.07%) | ↑ slightly (Compliance drift) |
| Human reviews completed | 612 | Stable |
| PACE state changes | 1 (Claims: Primary → Alternate → Primary, 15 min) | Within normal range |
| Incidents | 0 | - |
| Data boundary events | 1 (DSAR health data, controls worked as designed) | New pattern flagged |
Open actions from today:
| # | Action | Owner | Due |
|---|---|---|---|
| 1 | Review Product A chatbot compensation policy alignment | Product A owner | Friday |
| 2 | Implement model version pinning for MEDIUM+ systems | AI engineering | Next sprint |
| 3 | Evaluate DSAR assistant scope for employee health data | Privacy officer | Next governance meeting |
| 4 | Update injection detection patterns for new RAG threat | AI engineering | This week |
| 5 | Red team test against HR Screening for new RAG technique | Security team | Tomorrow |
| 6 | Budget approval for Claims volume increase | Finance | Next quarter planning |
| 7 | Recruit 3 additional AI review specialists | HR | Immediate |
What Makes This Work¶
This day illustrates seven framework principles operating simultaneously across a complex enterprise:
1. Proportionate controls eliminate waste¶
Product B's knowledge assistant (LOW tier) deployed an update in 2 hours with zero security overhead. Product C's claims processing (CRITICAL tier) has 100% Judge coverage, dedicated reviewers, and PACE fail postures. Both are governed by the same framework - at different intensities. Without proportionality, either the LOW-tier system is over-controlled (wasting time and money) or the CRITICAL-tier system is under-controlled (accumulating risk).
2. Shared infrastructure reduces portfolio cost¶
One guardrail service, one Judge infrastructure, one logging pipeline, one identity platform serving eight AI systems. The alternative - eight independent implementations - would cost more, fragment expertise, and create inconsistent security posture. The CIO perspective details the platform economics.
3. PACE resilience keeps the business running¶
When the Claims assessment agent degraded at 13:00, the system didn't stop. It transitioned to a pre-tested alternate configuration, maintained safety boundaries, and recovered in 15 minutes. This was designed at deployment time, not improvised during an incident. PACE Resilience is the methodology.
4. The Judge catches what guardrails miss¶
The customer service chatbot's guardrails correctly blocked a $500 payment. But the Judge caught the intent - the chatbot wanted to make the payment, which indicates a policy alignment issue. Without the Judge, the guardrail would have been seen as sufficient. With the Judge, the root cause is addressed. Controls explains the three-layer interaction.
5. Governance is continuous, not periodic¶
The CIO's weekly review, the HR fortnightly bias audit, the daily governance roll-up, the real-time PACE state monitoring - governance happens at multiple cadences simultaneously. The framework doesn't treat governance as a quarterly checkbox. The AI Governance Operating Model provides the structure.
6. Skills are the binding constraint¶
The CIO's review identified that the human review team is at 78% utilisation and a volume increase is coming. Without acting on this, the CRITICAL-tier system would lose its human oversight layer - which means losing a control layer that the risk classification requires. Human Factors provides the workforce planning methodology.
7. Multi-agent systems need multi-agent controls¶
The Finance and Claims systems run multi-agent orchestrations with per-agent identity, inter-agent message signing, delegation chain auditing, and independent failure domains. These controls don't exist in single-agent deployments because the risks don't exist. The MASO Framework provides the seven control domains.
How to Use This Narrative¶
| If You Are... | Focus On... |
|---|---|
| A CIO | 09:00 - Portfolio governance, cost visibility, skills planning |
| A Business Owner | 16:30 - Growth planning within the framework |
| A Security Leader | 15:00 - Threat response across a multi-tier portfolio |
| An AI Engineer | 13:00 - PACE state transitions in multi-agent systems |
| A Privacy Officer | 10:30 - Data boundary controls in practice |
| A Product Owner | 11:00 and 08:15 - Fast Lane vs. HIGH-tier daily operations |
| A Risk Manager | 17:00 - Portfolio-level risk metrics and action tracking |