SOC Content Pack for AI Security¶

Ready-to-deploy detection rules, correlation searches, and dashboard definitions for AI system monitoring.

This content pack extends the SOC Integration architecture guide with concrete, platform-specific detection content. Import what applies to your SIEM. Ignore the rest.

Prerequisites¶

Before deploying these rules, ensure:

AI security events are flowing to your SIEM in the standard log format.
The ai_security index (or equivalent) is created and receiving data.
Correlation IDs are propagated across API gateway → application → LLM provider → Judge (see SOC Integration: Identity Correlation).
Alert routing is configured to the AI Platform Team and SOC queues.

Detection Rules¶

1. Prompt Injection - Repeated Attempts¶

What it detects: A single user making multiple prompt injection attempts in a short window, indicating active adversarial probing.

Splunk SPL:

index=ai_security category="prompt_attack"
| stats count as attempt_count, dc(endpoint) as endpoints_targeted,
        values(endpoint) as target_list by user_id, src_ip
| where attempt_count > 3
| eval severity=case(attempt_count > 10, "critical", attempt_count > 5, "high", 1=1, "medium")

Sentinel KQL:

AISecurity_CL
| where category_s == "prompt_attack"
| summarize attempt_count=count(), endpoints_targeted=dcount(endpoint_s),
            target_list=make_set(endpoint_s) by user_id_s, src_ip_s, bin(TimeGenerated, 15m)
| where attempt_count > 3
| extend severity = case(attempt_count > 10, "critical", attempt_count > 5, "high", "medium")

Datadog Log Query:

service:ai_security @category:prompt_attack | stats count by @user_id,@src_ip | filter count > 3

Field	Value
Severity	Medium–Critical (scales with volume)
MITRE ATLAS	AML.T0051 - LLM Prompt Injection
Framework Control	LOG-06, NET-02
Response	Block user after 10 attempts. Capture full payloads for threat intel.

2. Judge Flag Clustering - High-Severity Burst¶

What it detects: Multiple high-confidence Judge flags for the same user or endpoint in a short period, indicating sustained policy violation or coordinated attack.

Splunk SPL:

index=ai_security category="judge_flag" judge_score>0.8
| stats count as flag_count, avg(judge_score) as avg_score,
        values(policy_violated) as policies by user_id, endpoint
| where flag_count > 5
| sort -flag_count

Sentinel KQL:

AISecurity_CL
| where category_s == "judge_flag" and judge_score_d > 0.8
| summarize flag_count=count(), avg_score=avg(judge_score_d),
            policies=make_set(policy_violated_s) by user_id_s, endpoint_s, bin(TimeGenerated, 1h)
| where flag_count > 5
| order by flag_count desc

Field	Value
Severity	High
Framework Control	Judge Assurance, LOG-01
Response	Triage per Judge Flag procedure. Escalate if multiple policies violated.

3. Data Exfiltration - Anomalous Output Volume¶

What it detects: Model responses significantly larger than baseline, suggesting bulk data extraction or structured data exfiltration.

Splunk SPL:

index=ai_security
| stats avg(tokens_out) as baseline_tokens by endpoint
| join endpoint [
    search index=ai_security
    | where tokens_out > 0
    | eval ratio=tokens_out/tokens_in
]
| where tokens_out > (baseline_tokens * 5) OR ratio > 20
| table _time, user_id, endpoint, tokens_in, tokens_out, ratio, baseline_tokens

Sentinel KQL:

let baseline = AISecurity_CL
| summarize avg_tokens=avg(tokens_out_d) by endpoint_s;
AISecurity_CL
| join kind=inner baseline on endpoint_s
| extend ratio = tokens_out_d / max_of(tokens_in_d, 1)
| where tokens_out_d > (avg_tokens * 5) or ratio > 20
| project TimeGenerated, user_id_s, endpoint_s, tokens_in_d, tokens_out_d, ratio

Field	Value
Severity	High
MITRE ATLAS	AML.T0024 - Exfiltration via ML Inference API
Framework Control	DAT-06, NET-04
Response	Investigate user activity. Check if output contains structured data, PII, or credential patterns.

4. Agent Boundary Violation - Unauthorised Tool Use¶

What it detects: An agent attempting to invoke tools outside its declared permission set, indicating prompt injection, goal hijacking, or misconfiguration.

Splunk SPL:

index=ai_security category="agent_boundary_violation"
| stats count as violation_count, values(tool_attempted) as tools_attempted,
        values(agent_id) as agents by user_id, endpoint
| where violation_count >= 1

Sentinel KQL:

AISecurity_CL
| where category_s == "agent_boundary_violation"
| summarize violation_count=count(), tools_attempted=make_set(tool_attempted_s),
            agents=make_set(agent_id_s) by user_id_s, endpoint_s, bin(TimeGenerated, 1h)
| where violation_count >= 1

Field	Value
Severity	High (single event is significant)
OWASP Agentic	ASI02 (Tool Misuse and Exploitation), ASI03 (Identity and Privilege Abuse)
Framework Control	IAM-04, TOOL-01, TOOL-02
Response	Halt agent immediately. Determine if prompt injection triggered the tool call. Review full conversation context.

5. Guardrail Bypass - Successful Attack¶

What it detects: A guardrail passed a request but the Judge subsequently flagged the same transaction as a policy violation - indicating the guardrail was bypassed.

Splunk SPL:

index=ai_security guardrail_result="pass" category="judge_flag" judge_score>0.9
| stats count as bypass_count by user_id, endpoint, policy_violated
| where bypass_count >= 1
| eval severity="critical"

Sentinel KQL:

AISecurity_CL
| where guardrail_result_s == "pass" and category_s == "judge_flag" and judge_score_d > 0.9
| summarize bypass_count=count() by user_id_s, endpoint_s, policy_violated_s, bin(TimeGenerated, 1h)
| where bypass_count >= 1

Field	Value
Severity	Critical
Framework Control	Guardrails + Judge Assurance, bypass-prevention
Response	This is a confirmed guardrail gap. Escalate to AI Security team. Update guardrail rules. Review all transactions from this user in the window.

6. Model Drift - Judge Accuracy Degradation¶

What it detects: Judge evaluation metrics shifting over time - increasing false positive/negative rates or declining agreement with human reviewers.

Splunk SPL:

index=ai_security category="judge_flag"
| bin _time span=1d
| stats count as total_flags, avg(judge_score) as avg_score by _time, judge_model
| streamstats window=7 avg(avg_score) as rolling_avg_score
| where abs(avg_score - rolling_avg_score) > 0.15

Sentinel KQL:

AISecurity_CL
| where category_s == "judge_flag"
| summarize total_flags=count(), avg_score=avg(judge_score_d) by bin(TimeGenerated, 1d), judge_model_s
| order by TimeGenerated asc
| serialize
| extend rolling_avg = avg_of(prev(avg_score, 1), prev(avg_score, 2), prev(avg_score, 3),
                               prev(avg_score, 4), prev(avg_score, 5), prev(avg_score, 6), avg_score)
| where abs(avg_score - rolling_avg) > 0.15

Field	Value
Severity	Medium
Framework Control	Judge Assurance, operational-metrics
Response	Investigate whether the model provider updated the Judge model. Trigger human calibration review per Judge Assurance.

7. Credential Exposure - Secrets in Model I/O¶

What it detects: Credential patterns (API keys, tokens, connection strings) appearing in model inputs or outputs despite guardrail scanning.

Splunk SPL:

index=ai_security (category="credential_detected" OR category="guardrail_block" policy_violated="credential_exposure")
| stats count as detection_count, values(credential_type) as cred_types by user_id, endpoint, direction
| where detection_count >= 1

Sentinel KQL:

AISecurity_CL
| where category_s == "credential_detected" or
        (category_s == "guardrail_block" and policy_violated_s == "credential_exposure")
| summarize detection_count=count(), cred_types=make_set(credential_type_s) by user_id_s, endpoint_s, direction_s
| where detection_count >= 1

Field	Value
Severity	High
Framework Control	SEC-04, SEC-05, IAM-07
Response	Treat exposed credential as compromised. Trigger immediate rotation (SEC-05). Investigate whether credential was exfiltrated.

8. Anomalous Usage Pattern - Off-Hours / Geo Shift¶

What it detects: AI system usage from a user during unusual hours or from an unexpected geographic location, correlated with AI-specific indicators.

Splunk SPL:

index=ai_security
| iplocation src_ip
| stats count as request_count, values(Country) as countries,
        earliest(_time) as first_seen, latest(_time) as last_seen by user_id
| where (request_count > 50 AND (date_hour < 6 OR date_hour > 22))
        OR mvcount(countries) > 1

Sentinel KQL:

AISecurity_CL
| extend geo = geo_info_from_ip_address(src_ip_s)
| summarize request_count=count(), countries=make_set(tostring(geo.country)),
            first_seen=min(TimeGenerated), last_seen=max(TimeGenerated) by user_id_s
| where request_count > 50 or array_length(countries) > 1

Field	Value
Severity	Medium
Framework Control	LOG-01, IAM-01
Response	Cross-reference with HR/IdP for travel. If not explained, treat as potential account compromise.

Correlation Searches¶

These rules combine multiple AI security signals to surface compound threats.

Compound: Injection Followed by Exfiltration¶

A prompt injection attempt from a user, followed by a high-volume output within the same session, suggests a successful attack leading to data extraction.

Splunk SPL:

index=ai_security (category="prompt_attack" OR (tokens_out > 5000))
| transaction user_id maxspan=30m
| where eventcount > 1 AND mvfind(category, "prompt_attack") >= 0 AND max(tokens_out) > 5000
| eval severity="critical"
| table _time, user_id, eventcount, category, max(tokens_out)

Sentinel KQL:

let attacks = AISecurity_CL | where category_s == "prompt_attack" | project attack_time=TimeGenerated, user_id_s;
let large_outputs = AISecurity_CL | where tokens_out_d > 5000 | project output_time=TimeGenerated, user_id_s, tokens_out_d;
attacks
| join kind=inner (large_outputs) on user_id_s
| where output_time between (attack_time .. (attack_time + 30m))
| project attack_time, output_time, user_id_s, tokens_out_d

Compound: Boundary Violation + Escalation Attempt¶

An agent boundary violation followed by a successful tool invocation on a different tool suggests progressive privilege escalation.

Splunk SPL:

index=ai_security (category="agent_boundary_violation" OR category="tool_invocation")
| transaction agent_id maxspan=10m
| where mvfind(category, "agent_boundary_violation") >= 0
        AND mvfind(category, "tool_invocation") >= 0
| eval severity="critical"

Dashboard Panels¶

Use these queries as the basis for SOC dashboard panels. Adapt to your visualisation platform.

Panel 1: AI Security Event Volume (Timeseries)¶

Purpose: Trending view of all AI security events by category.

Splunk SPL:

index=ai_security
| timechart span=1h count by category

Sentinel KQL:

AISecurity_CL
| summarize count() by category_s, bin(TimeGenerated, 1h)
| render timechart

Panel 2: Top Users by Alert Count¶

Purpose: Identify users generating the most AI security alerts.

Splunk SPL:

index=ai_security severity IN ("high", "critical")
| stats count as alert_count by user_id
| sort -alert_count
| head 10

Panel 3: Guardrail Effectiveness Ratio¶

Purpose: Track what percentage of threats guardrails catch versus what the Judge catches post-guardrail (indicating guardrail gaps).

Splunk SPL:

index=ai_security (category="guardrail_block" OR (category="judge_flag" AND guardrail_result="pass"))
| stats count(eval(category="guardrail_block")) as guardrail_caught,
        count(eval(category="judge_flag" AND guardrail_result="pass")) as judge_caught
| eval guardrail_pct=round(guardrail_caught/(guardrail_caught+judge_caught)*100, 1)
| eval judge_pct=round(judge_caught/(guardrail_caught+judge_caught)*100, 1)

Panel 4: Risk Tier Heatmap¶

Purpose: Show alert distribution across risk tiers and categories. Higher-tier systems generating alerts require faster response.

Splunk SPL:

index=ai_security
| stats count by risk_tier, category
| sort risk_tier, -count

Panel 5: Judge Score Distribution¶

Purpose: Track Judge confidence distribution to detect drift or calibration issues.

Splunk SPL:

index=ai_security category="judge_flag"
| bin judge_score span=0.1
| stats count by judge_score
| sort judge_score

SOC Analyst Quick Reference¶

AI security alerts look different from traditional security events. This reference maps AI concepts to SOC-familiar equivalents.

AI Concept	SOC Equivalent	Key Difference
Prompt injection	SQL injection / XSS	Attack is in natural language, not code. No signature match - requires semantic analysis.
Judge flag	IDS alert	Async detection, not inline blocking. May fire minutes after the event.
Guardrail block	WAF block	Inline, deterministic. May have false positives for legitimate edge-case queries.
Agent boundary violation	Privilege escalation alert	The "user" is an AI agent, not a human. The escalation may be caused by injected instructions.
Model drift	Baseline deviation	The AI model's behavior changed - could be provider update, adversarial manipulation, or data shift.
Token volume spike	Data exfiltration alert	Large output doesn't always mean exfiltration - some queries legitimately produce long answers. Check content, not just volume.
Credential in output	Secret exposure	Model may have memorised a credential from training data or tool output. Treat as compromised regardless of source.

Triage Decision Tree¶

AI Security Alert Received
├── Is the user a human or an AI agent?
│   ├── Human → Follow standard user investigation
│   └── Agent → Check: was the agent's action triggered by user input (prompt injection)?
│       ├── Yes → Investigate the user who triggered the agent
│       └── No → Investigate agent configuration and tool permissions
│
├── Did the guardrail block or did the Judge flag?
│   ├── Guardrail block → Known pattern. Verify guardrail is current. Close if single event.
│   └── Judge flag → Possible new pattern. Review full I/O. Feed back to guardrail team.
│
└── Is this a single event or a pattern?
    ├── Single → Log and monitor
    └── Pattern → Escalate. Check for coordinated activity across users/endpoints.

Deployment Notes¶

Import order: Deploy detection rules before correlation searches. Correlation searches depend on detection rule output.

Tuning period: Run all rules in alert-only mode (no automated response) for 2 weeks. Review false positive rates. Adjust thresholds to your environment's baseline before enabling automated actions.

Maintenance: Review and update detection rules when: - New guardrail categories are added - New AI endpoints are deployed - The Judge model is updated - New agent tools are onboarded - Threat intelligence identifies new attack patterns

SOC Integration - Architecture, alert taxonomy, and triage procedures
Anomaly Detection Ops - Behavioral anomaly detection operations
Operational Metrics - Metrics that feed SOC dashboards
Logging & Observability - Infrastructure-level logging controls

SOC Content Pack for AI Security¶

Prerequisites¶

Detection Rules¶

1. Prompt Injection - Repeated Attempts¶

2. Judge Flag Clustering - High-Severity Burst¶

3. Data Exfiltration - Anomalous Output Volume¶

4. Agent Boundary Violation - Unauthorised Tool Use¶

5. Guardrail Bypass - Successful Attack¶

6. Model Drift - Judge Accuracy Degradation¶

7. Credential Exposure - Secrets in Model I/O¶

8. Anomalous Usage Pattern - Off-Hours / Geo Shift¶

Correlation Searches¶

Compound: Injection Followed by Exfiltration¶

Compound: Boundary Violation + Escalation Attempt¶

Dashboard Panels¶

Panel 1: AI Security Event Volume (Timeseries)¶

Panel 2: Top Users by Alert Count¶

Panel 3: Guardrail Effectiveness Ratio¶

Panel 4: Risk Tier Heatmap¶

Panel 5: Judge Score Distribution¶

SOC Analyst Quick Reference¶

Triage Decision Tree¶

Deployment Notes¶

Related¶