Graph-Based Agent Monitoring: When Relationships Are the Data¶

Tabular logs answer "what did Agent X do?" Graph queries answer "how did the system behave?" For multi-agent systems, the second question is the one that matters.

The Problem With Logs¶

The framework's observability controls specify comprehensive logging: action audit logs (OB-1.1), inter-agent message logs (OB-1.2), immutable decision chains (OB-2.1), and anomaly scoring (OB-2.2). These are well-designed. They are also, in their default form, tabular.

Tabular logs capture events as rows. Each row records one thing that happened: Agent X called Tool Y at time T with parameters P. For a single agent, that is sufficient. You can query it, trend it, alert on it.

For a multi-agent system, the most important data is not in the rows. It is in the relationships between them. Which agent delegated to which. What communication paths emerged. Whether the topology of agent interaction changed. Whether a trust chain that should be three hops deep is suddenly five. Whether an agent that normally talks to two peers is now talking to seven.

These are structural questions. Answering them from tabular logs requires multi-way joins across session IDs, chain IDs, agent IDs, and timestamps. The queries are expensive, brittle, and slow. And "slow" is a problem when the framework specifies near real-time anomaly scoring (OB-2.2) and cross-agent correlation (OB-3.4).

A graph database does not answer these questions faster. It answers them naturally, because relationships are stored as first-class data, not reconstructed from foreign keys at query time.

Why a Graph¶

Every observable event in a multi-agent system is a relationship:

Event	Graph Representation
Agent A delegates task to Agent B	`(A)-[:DELEGATED {task, permissions, timestamp}]->(B)`
Agent B calls a tool	`(B)-[:INVOKED {params, result, latency}]->(Tool)`
Agent B sends a message to Agent C	`(B)-[:MESSAGED {content_hash, confidence, provenance}]->(C)`
Judge evaluates Agent B's output	`(Judge)-[:EVALUATED {score, flags}]->(B_output)`
Agent B accesses a data source	`(B)-[:ACCESSED {query, record_count}]->(DataSource)`
Workflow W includes Agents A, B, C	`(W)-[:CONTAINS]->(A), (W)-[:CONTAINS]->(B), ...`

These are not just log entries. They are a live, queryable model of how the system is behaving right now. The graph is the system's interaction topology, updated in real time, and every graph algorithm in the literature becomes a monitoring tool.

Agent Interaction Graph with Anomaly Detection

Near Real-Time: Is It Possible?¶

Yes. And the architecture is not speculative.

In-memory graph databases like Memgraph are designed for exactly this pattern: high-throughput event ingestion with concurrent graph traversal and algorithm execution. The key characteristics:

Ingestion. Memgraph supports native Kafka and Pulsar stream connectors. Agent events flow from the message bus (which the MASO framework already mandates) through the stream processor into the graph as nodes and edges. Ingestion latency is typically under 10ms per event.

Query. Cypher queries against an in-memory graph execute in single-digit milliseconds for local traversals (neighbourhood queries, path finding) and tens of milliseconds for global algorithms (PageRank, community detection) on graphs with millions of edges. The framework's nine-signal anomaly scoring vector can be computed per agent on every event or on a sub-second polling interval.

Algorithm execution. Memgraph includes built-in graph algorithm libraries (MAGE) that run in-process. PageRank, Louvain community detection, betweenness centrality, and temporal pattern matching run without data export or external computation. Results feed directly into alerting logic.

Total pipeline latency. The realistic end-to-end time from agent event emission to anomaly alert is under 200ms:

Stage	Latency	Technology
Event emission to stream	~5ms	Agent message bus (already specified)
Stream to graph ingestion	~10ms	Kafka/Pulsar connector
Graph update + index	~5ms	In-memory storage
Anomaly algorithm execution	~50ms	PageRank/Louvain on warm graph
Alert evaluation + PACE trigger	~10ms	Threshold comparison
Total	~80ms typical, <200ms worst case

This is well within the framework's near real-time requirements. For comparison, the Model-as-Judge layer adds 500ms to 5 seconds of latency per evaluation. The graph monitoring layer is an order of magnitude faster.

Four Detection Patterns That Only Graphs Can See¶

1. New Edge to Unknown Target¶

The simplest and most powerful pattern. If Agent X has a baseline graph showing connections to Tools A, B, C and Agents Y, Z, any new edge to an unknown target is immediately visible:

MATCH (a:Agent)-[e:ACTION]->(t:Target)
WHERE e.timestamp > now() - duration('PT1H')
  AND NOT EXISTS {
    MATCH (a)-[b:ACTION]->(t)
    WHERE b.timestamp < now() - duration('P7D')
  }
RETURN a.id, t.id, count(e), a.anomaly_score

This catches: exfiltration via new external API calls, tool-chain exploitation (agent reaches tools outside declared scope), and compromised agents being directed to new targets. In tabular logs, this requires a self-join against the full history. In a graph, it is a one-hop neighbourhood comparison.

Maps to: OB-2.2 (anomaly scoring, tool_usage_pattern signal), OB-3.4 (cross-agent correlation).

2. Edge Weight Spike¶

Agent A normally sends 3-5 messages per session to Agent B. Today it has sent 47. The edge weight (message count, data volume, or token count as a property) has spiked relative to the rolling baseline.

MATCH (a:Agent)-[e:MESSAGED]->(b:Agent)
WHERE e.timestamp > now() - duration('PT1H')
WITH a, b, count(e) AS current_count
MATCH (a)-[baseline:BASELINE_COMM]->(b)
WHERE current_count > baseline.mean + (2 * baseline.stddev)
RETURN a.id, b.id, current_count, baseline.mean, baseline.stddev

This catches: data exfiltration through inter-agent channels, runaway loops (LLM10), hallucination amplification chains (one agent feeding fabricated data at high volume), and prompt injection causing excessive delegation.

Maps to: OB-2.2 (message_volume signal), OB-2.5 (cost and consumption monitoring).

3. Centrality Shift¶

In a stable multi-agent system, the orchestrator has high betweenness centrality (most paths go through it) and task agents have low centrality. If a task agent's centrality score suddenly increases, it has become a hub. Something is routing through it that should not be.

CALL pagerank.get()
YIELD node, rank
WHERE node:Agent
WITH node.id AS agent_id, rank AS current_rank
MATCH (a:Agent {id: agent_id})
WHERE abs(current_rank - a.baseline_pagerank) > a.centrality_threshold
RETURN agent_id, current_rank, a.baseline_pagerank

This catches: delegation chain manipulation (agent accumulates authority by becoming an intermediary), compromised agents being used as proxies, and emergent hub formation that violates the intended trust topology.

Maps to: OB-3.4 (peer_group_deviation signal), delegation chain controls (DEL-02).

4. Community Fragmentation¶

Louvain community detection identifies clusters of agents that communicate more with each other than with agents outside the cluster. In a well-configured system, communities map to workflows. If an agent migrates from one community to another, or if the community structure itself fragments, the interaction topology has changed.

CALL community_detection.get()
YIELD node, community_id
WHERE node:Agent
WITH node.id AS agent_id, community_id AS current_community
MATCH (a:Agent {id: agent_id})
WHERE current_community <> a.baseline_community
RETURN agent_id, current_community, a.baseline_community

This catches: workflow boundary violations, agents being recruited into unauthorized collaboration patterns, and slow topology drift that per-agent monitoring misses because no single agent's behavior changes dramatically.

Maps to: OB-3.1 (long-window behavioral analysis), OB-3.4 (cross-agent correlation).

The Temporal Dimension¶

Static graph analysis tells you what the interaction topology looks like right now. Temporal graph analysis tells you how it is changing. Both matter, but temporal analysis catches the attacks that static analysis cannot.

Sliding Window Comparison¶

The framework specifies three temporal windows: 7-day rolling (OB-2.3), 30-day trend (OB-3.1), and 90-day trend with anchor points (OB-3.1). In a graph database, each window becomes a graph snapshot:

Window	Graph Operation	What It Detects
Real-time vs 7-day	Diff current graph topology against 7-day rolling baseline	Sudden changes: new edges, removed edges, weight spikes
7-day vs 30-day	Compare rolling baseline against monthly trend	Gradual drift that 7-day windows normalise
30-day vs 90-day anchor	Compare current topology against known-good anchor	Slow drift attacks: 0.5% per week, invisible to short windows, 13% cumulative over 6 months

Memgraph handles this through temporal edge properties. Each edge carries a timestamp. Baseline graphs are computed as materialized views over time-windowed edge sets. The diff between two temporal snapshots is a graph query, not a batch job.

Temporal Motif Detection¶

Beyond aggregate comparison, temporal graph analysis can detect specific sequences of interactions that form attack patterns:

Temporal Motif Detection

This is a temporal motif: a specific sequence of edges with temporal ordering constraints. Motif detection on a temporal graph is a single query. On tabular logs, it is a window function with self-joins across four tables, ordered by timestamp, filtered by session. The graph query runs in milliseconds. The SQL query might not finish before the next event arrives.

Architecture: Fitting the Graph Into the Framework¶

The graph database does not replace the framework's observability stack. It sits alongside it, consuming the same event stream and producing alerts that feed into the same PACE escalation logic.

Graph Agent Architecture Pipeline

The message bus (already mandated by MASO) is the single source of truth. The stream processor (Kafka, Pulsar, or equivalent) fans out to three consumers:

SIEM/SOAR (OB-2.4) for correlation with non-AI security events
Memgraph for structural anomaly detection on the live interaction graph
Immutable log store (OB-2.1) for forensics and regulatory compliance

The graph database is a hot-path analytics layer. It holds the recent interaction graph (7-day window in memory, 30/90-day as periodic snapshots) and runs continuous graph algorithms. Anomaly alerts feed into the PACE escalation logic at the same priority as per-agent anomaly scores.

What Memgraph Adds That the Existing Stack Does Not¶

Capability	Existing Framework	With Graph DB
Per-agent anomaly scoring	OB-2.2: 9-signal composite vector	Same, but `peer_group_deviation` and `message_volume` computed from graph structure instead of tabular aggregation
Cross-agent correlation	OB-3.4: specified but no implementation pattern	Native: community detection, centrality analysis, neighbourhood comparison
Delegation chain analysis	DEL-02: logged but analysed post-hoc	Live: chain depth, branching factor, authority accumulation visible as graph properties
Topology change detection	Not specified	New: graph diff between temporal snapshots detects structural shifts
Temporal motif detection	Not specified	New: attack sequence patterns detected as temporal subgraph matches

The UEBA Parallel¶

The framework already draws the parallel between User and Entity Behavior Analytics (UEBA) for human insiders and agent behavioral monitoring. The graph database makes this parallel concrete.

UEBA systems in enterprise security have used graph databases for years. The pattern is established: model entities (users, devices, applications) as nodes, model interactions as edges, compute behavioral baselines as graph properties, and detect anomalies as deviations from the baseline graph.

For agents, the mapping is direct:

UEBA Entity	Agent Equivalent	Graph Node
User	Agent	`:Agent {id, role, tier, anomaly_score}`
Device	Runtime environment	`:Runtime {provider, model, version}`
Application	Tool	`:Tool {name, scope, permissions}`
Data store	Data source	`:DataSource {classification, access_policy}`
Network destination	External API	`:ExternalAPI {endpoint, approved}`

UEBA Behavior	Agent Equivalent	Graph Edge
Login	Session start	`[:STARTED_SESSION]`
File access	Data source query	`[:ACCESSED]`
Email sent	Inter-agent message	`[:MESSAGED]`
Privilege escalation	Permission request	`[:REQUESTED_PERMISSION]`
Lateral movement	Delegation chain hop	`[:DELEGATED]`

The graph algorithms that detect insider threats in UEBA (anomalous access patterns, unusual communication partners, privilege accumulation, lateral movement paths) transfer directly to agent monitoring. The threat model is the same. The detection patterns are the same. The technology is the same.

Implementation Considerations¶

Graph Size and Retention¶

An in-memory graph database requires data discipline. Not every log event belongs in the hot graph. A practical retention strategy:

Data	In-Memory (Memgraph)	Cold Storage (Log Store)
Current interaction topology	Always	Also persisted (OB-2.1)
7-day rolling edge history	Full edges with properties	Full content
30-day baseline graph	Aggregated edges (counts, means, stddevs)	Full content
90-day anchor snapshot	Topology only (nodes, edges, no content)	Full content
Historical forensic data	Not stored	Retained per policy

For a system with 50 agents, 20 tools, and 10 data sources, the 7-day in-memory graph typically stays under 1GB even with full edge properties. This is well within Memgraph's operational range on a single node.

Baseline Calibration¶

The graph-based anomaly detection is only as good as the baseline it compares against. The framework's Tier 1 controls (OB-1.3: weekly manual review, OB-1.4: output quality log) exist specifically to build this baseline during the supervised phase.

A practical calibration sequence:

Week 1-4 (Tier 1). Ingest all events into the graph. No alerting. Build the interaction topology. Record graph snapshots as candidate baselines.
Week 4-8. Run anomaly algorithms against the baseline. Review every alert manually. Tune thresholds. The framework specifies OB-1.3 weekly review for this purpose.
Week 8+. Promote to automated alerting. Per-agent anomaly scores from the graph feed into the composite OB-2.2 vector. PACE escalation thresholds apply.
Ongoing. Preserve anchor snapshots at known-good states (post-audit, post-validation). Compare rolling baseline against anchors per OB-3.1.

Alternative Technologies¶

Memgraph is the strongest fit for this pattern because of its in-memory architecture and native stream processing. But the approach is not Memgraph-specific:

Technology	Fit for This Pattern	Trade-offs
Memgraph	Purpose-built for real-time streaming graph analytics	Requires in-memory capacity; newer ecosystem
Neo4j	Mature graph database with extensive algorithm library	Disk-based by default; streaming requires GDS + external connectors; higher latency for real-time
Amazon Neptune	Managed graph service	Higher query latency; limited algorithm support; vendor lock
Apache AGE (PostgreSQL)	Graph queries on existing PostgreSQL	Leverages existing infrastructure; limited algorithm support; not designed for streaming
TigerGraph	High-performance distributed graph	Strong for large-scale analytics; complex deployment; enterprise pricing

The choice depends on existing infrastructure, team expertise, and scale. For the near real-time requirement specified by the framework, an in-memory solution (Memgraph or equivalent) is the natural starting point.

What This Enables for PACE¶

The graph-based monitoring layer gives PACE transitions structural awareness, not just metric thresholds.

PACE Phase	Graph-Informed Trigger
Primary	Graph topology matches baseline. All communities stable. No unknown edges. Centrality distribution normal.
P to A	Graph anomaly detected: new edges to unknown targets, centrality shift, community fragmentation, or edge weight spike above 2 sigma. Anomalous agent isolated. Graph query identifies all agents in the affected subgraph for quarantine.
A to C	Multiple correlated graph anomalies across agents. Community structure has changed. Graph diff against 30-day baseline shows topology divergence beyond threshold. Multi-agent orchestration suspended.
C to E	Graph analysis confirms propagation: anomalous agent's outputs have been consumed by downstream agents (reachability query). Blast radius computed from graph traversal. All reachable agents terminated.

The critical capability: when PACE transitions from Alternate to Contingency, the graph can answer "which other agents are affected?" in a single reachability query. In tabular logs, this requires reconstructing delegation chains from chain IDs across multiple log tables. In a graph, it is:

MATCH path = (compromised:Agent {id: 'agent-analyst-01'})-[:DELEGATED|MESSAGED*1..5]->(downstream)
WHERE ALL(r IN relationships(path) WHERE r.timestamp > $incident_start)
RETURN downstream.id, length(path) AS hops

One query. Milliseconds. The entire blast radius is visible.

Key Takeaways¶

Multi-agent observability is a graph problem. The most important signals are structural: who talks to whom, through what paths, with what frequency, and how that topology changes over time. Graph databases model this natively. Tabular databases reconstruct it expensively.
Near real-time is achievable. In-memory graph databases with stream connectors (Memgraph + Kafka) deliver sub-200ms end-to-end latency from agent event to anomaly alert. This is faster than the Model-as-Judge layer and well within the framework's requirements.
Four detection patterns emerge from graph structure. New edges to unknown targets, edge weight spikes, centrality shifts, and community fragmentation catch behavioral anomalies that per-agent metric monitoring cannot see, because they exist in the relationships, not in the individual agents.
The UEBA parallel is direct. Enterprise security has used graph-based behavioral analytics for human insider threat detection for years. The same entities, relationships, algorithms, and detection patterns transfer to agent monitoring with minimal adaptation.
The graph makes PACE transitions structural. When an anomaly triggers escalation, the graph answers "what is the blast radius?" as a reachability query instead of a log reconstruction exercise. Containment decisions are faster and more precise.
Baseline calibration is the prerequisite. The graph is only as good as the baseline it compares against. The framework's Tier 1 supervised phase (OB-1.3, OB-1.4) is where the baseline graph is built. Skipping supervised operation to go directly to automated alerting produces false positives and missed anomalies.

Observability Controls - the full OB-1.x through OB-3.x control set
PACE Resilience - structured degradation when controls detect anomalies
The Hallucination Boundary - when the outputs the graph is monitoring cross from tolerable to catastrophic
The Verification Gap - why monitoring outputs alone is insufficient
Prompt, Goal and Epistemic Integrity - the epistemic controls that generate the events the graph monitors
Infrastructure Beats Instructions - why structural monitoring outperforms behavioral guidelines