AI Data Retention Guidance
Data retention requirements for AI systems, covering the 10 data states specific to AI and jurisdictional requirements.
The 10 Data States in AI Systems
AI systems create data in states that traditional retention policies may not address:
| State |
Description |
Retention Consideration |
| 1. At rest in feeder systems |
Source data before AI processing |
Existing policies apply |
| 2. In transit to AI |
Data moving to AI system |
Transient, no retention |
| 3. In vector store |
Embeddings of source documents |
May be invertible; retain as source |
| 4. In model context |
Prompt + retrieved context |
Session-scoped |
| 5. In model memory |
Within-session state |
Session-scoped |
| 6. In model response |
Generated output |
Retain per policy |
| 7. In interaction logs |
Full interaction records |
Key retention decision |
| 8. In Judge evaluation |
Judge inputs and outputs |
Retain with interaction |
| 9. In HITL queue |
Pending human review |
Retain with interaction |
| 10. In backups |
Copies of above |
Mirror source retention |
Retention by Risk Tier
CRITICAL Systems
| Data Type |
Minimum Retention |
Maximum Retention |
Rationale |
| Full interaction logs |
7 years |
10 years |
Regulatory, audit, litigation |
| System prompts (versioned) |
7 years |
Indefinite |
Audit trail |
| Guardrail configuration |
7 years |
Indefinite |
Audit trail |
| Judge evaluations |
7 years |
10 years |
Assurance evidence |
| HITL decisions |
7 years |
10 years |
Accountability |
| Model versions used |
7 years |
Indefinite |
Reproducibility |
| Incidents |
7 years |
Indefinite |
Lessons learned |
HIGH Systems
| Data Type |
Minimum Retention |
Maximum Retention |
Rationale |
| Full interaction logs |
3 years |
7 years |
Regulatory, investigation |
| System prompts (versioned) |
3 years |
Indefinite |
Audit trail |
| Guardrail configuration |
3 years |
Indefinite |
Audit trail |
| Judge evaluations |
3 years |
5 years |
Assurance evidence |
| HITL decisions |
3 years |
5 years |
Accountability |
| Model versions used |
3 years |
Indefinite |
Reproducibility |
| Incidents |
5 years |
Indefinite |
Lessons learned |
MEDIUM Systems
| Data Type |
Minimum Retention |
Maximum Retention |
Rationale |
| Metadata + sampled content |
1 year |
3 years |
Trend analysis |
| System prompts (versioned) |
1 year |
3 years |
Audit trail |
| Guardrail configuration |
1 year |
3 years |
Audit trail |
| Judge evaluations (sampled) |
1 year |
3 years |
Assurance evidence |
| Model versions used |
1 year |
3 years |
Reproducibility |
| Incidents |
3 years |
5 years |
Lessons learned |
LOW Systems
| Data Type |
Minimum Retention |
Maximum Retention |
Rationale |
| Basic metadata |
90 days |
1 year |
Troubleshooting |
| System prompts (current) |
90 days |
1 year |
Reference |
| Incidents |
1 year |
3 years |
Lessons learned |
Jurisdictional Requirements
United Kingdom
| Regulation |
Data Type |
Requirement |
| UK GDPR |
Personal data |
Delete when no longer necessary; document lawful basis |
| FCA SYSC 9 |
Records of services and transactions |
5 years minimum |
| FCA COBS 11 |
Order records |
5 years |
| PRA SS1/23 |
Model documentation |
Duration of model use + 5 years |
| Consumer Duty |
Evidence of fair outcomes |
5 years |
European Union
| Regulation |
Data Type |
Requirement |
| GDPR |
Personal data |
Delete when no longer necessary; document lawful basis |
| EU AI Act |
High-risk AI logs |
6 months minimum, longer if needed for obligations |
| EU AI Act |
Documentation |
Duration of AI system lifecycle |
| MiFID II |
Transaction records |
5 years |
| PSD2 |
Payment records |
5 years |
United States
| Regulation |
Data Type |
Requirement |
| SOX |
Financial records |
7 years |
| HIPAA |
Health information |
6 years |
| GLBA |
Financial customer information |
5 years |
| CCPA/CPRA |
Consumer data |
Varies; disclose retention periods |
| SEC Rule 17a-4 |
Broker-dealer records |
3-6 years depending on type |
| State laws |
Varies |
Check applicable states |
Banking-Specific (Global)
| Standard |
Data Type |
Requirement |
| Basel III |
Risk model documentation |
Duration of use + review cycle |
| SR 11-7 |
Model documentation, validation |
Duration of use + examination cycle |
| BCBS 239 |
Risk data |
Sufficient for risk reporting |
Interaction Log Content
What to Log (by Tier)
| Field |
CRITICAL |
HIGH |
MEDIUM |
LOW |
| Timestamp |
✓ |
✓ |
✓ |
✓ |
| User identity |
✓ |
✓ |
✓ |
Optional |
| Session ID |
✓ |
✓ |
✓ |
✓ |
| Model version |
✓ |
✓ |
✓ |
Optional |
| Model parameters |
✓ |
✓ |
Optional |
Optional |
| System prompt version |
✓ |
✓ |
✓ |
Optional |
| Full user input |
✓ |
✓ |
Sampled |
Optional |
| Retrieved context (RAG) |
✓ |
✓ |
Reference only |
No |
| Full model output |
✓ |
✓ |
Sampled |
Optional |
| Guardrail results |
✓ |
✓ |
✓ |
✓ |
| Latency metrics |
✓ |
✓ |
✓ |
✓ |
| Cost |
✓ |
✓ |
✓ |
Optional |
| Judge evaluation |
✓ |
✓ |
Sampled |
No |
What NOT to Log
| Data Type |
Reason |
Alternative |
| Full credit card numbers |
PCI-DSS |
Mask (last 4 digits) |
| Full SSN/national ID |
Regulatory |
Mask or tokenise |
| Passwords/credentials |
Security |
Never log |
| Raw biometric data |
Privacy |
Hash or don't log |
| Health data (unless required) |
HIPAA/GDPR |
Minimise or mask |
PII in Logs
Detection and Handling
| Stage |
Action |
| At logging time |
Detect PII using guardrails; flag or redact |
| In storage |
Encrypt at rest; access controls |
| At retrieval |
Verify authorisation; mask if displaying |
| At deletion |
Ensure complete removal including backups |
Redaction vs. Tokenisation
| Approach |
Use When |
Tradeoff |
| Redaction |
PII not needed for any purpose |
Data lost permanently |
| Tokenisation |
Need to re-identify for investigation |
Token mapping must be secured |
| Masking |
Partial visibility sufficient |
Some data visible |
| Encryption |
Full data needed, access controlled |
Key management overhead |
Vector Store Retention
Vector embeddings require special consideration:
| Concern |
Guidance |
| Embeddings can be inverted |
Treat embeddings with same classification as source |
| Deletion complexity |
Deleting from vector store may require rebuild |
| Versioning |
Track which documents are in which version of store |
| Staleness |
Set refresh/review cycles (see AI.5.4) |
Recommended Approach
- Classify vector store content at source data level
- Track lineage from source documents to embeddings
- Implement deletion procedures that work with your vector DB
- Verify deletions are complete (not just soft-deleted)
Judge and HITL Data
Judge Evaluation Retention
Judge evaluations contain:
- Copy of interaction being evaluated
- Judge's analysis and findings
- Metadata (Judge model version, evaluation time)
Retain Judge evaluations for the same period as the underlying interaction - they're part of the audit trail.
HITL Decision Retention
HITL decisions must capture:
- What the human reviewed
- What decision they made
- Why (if documented)
- Who made the decision
- When
Retain HITL decisions for accountability - typically same as interaction retention or longer.
Deletion Procedures
Standard Deletion
| Step |
Action |
Verification |
| 1 |
Identify data eligible for deletion |
Query by retention date |
| 2 |
Verify no legal hold |
Check with legal |
| 3 |
Delete from primary storage |
Confirm deletion |
| 4 |
Delete from backups (per backup policy) |
Confirm in next backup cycle |
| 5 |
Delete from vector stores if applicable |
Verify removal |
| 6 |
Log deletion |
Maintain deletion record |
Legal Hold
When litigation or regulatory investigation is anticipated:
1. Identify potentially relevant data
2. Suspend deletion for that data
3. Document the hold scope and duration
4. Notify relevant personnel
5. Release hold only when legal confirms
Backup Considerations
| Backup Type |
Retention Approach |
| Daily incremental |
30-90 days |
| Weekly full |
90 days - 1 year |
| Monthly archive |
Per data classification |
| Disaster recovery |
Mirror primary retention |
Key principle: Backup retention should not exceed primary retention without explicit justification. Otherwise you have data you should have deleted.
Audit and Compliance
Documentation Requirements
Maintain documentation of:
- Retention policy (this document)
- Data inventory (what AI data exists where)
- Deletion logs (what was deleted when)
- Legal holds (active and historical)
- Exceptions (with justification and approval)
Periodic Review
| Review Type |
Frequency |
Scope |
| Policy review |
Annual |
Update for regulatory changes |
| Implementation audit |
Annual |
Verify policy is followed |
| Deletion verification |
Quarterly |
Sample check that deletion occurred |
| Legal hold review |
Quarterly |
Confirm holds still needed |
Implementation Checklist
Initial Setup
Ongoing