mirror of
https://github.com/netbirdio/netbird.git
synced 2026-06-09 17:39:57 +00:00
Status(GetFullPeerStatus=true) RPCs trigger a full health probe
(network round-trips to management, signal and the relays). The
desktop UI issues these frequently and concurrently, and a burst of
parallel Get() calls each fired its own probe — the lastProbe guard
was unprotected against concurrent access and only advanced when every
component was healthy, so a sustained unhealthy state (e.g. relay down)
disabled the throttle entirely and let every call re-probe.
Extract the throttle/single-flight policy into probeThrottle:
- single-flight: only one probe runs at a time; concurrent callers
that piled up while it ran share its result instead of each
launching another, even when that probe failed.
- throttle: lastOK only advances on a fully successful probe, so
while anything is unhealthy callers keep probing frequently and
notice recovery quickly (preserved from the original design).
RunHealthProbes now takes a context so a caller that gives up (e.g. a
Status RPC whose client disconnected) cancels the in-flight STUN/TURN
probe instead of letting it run to its per-component timeout. The
engine's own lifetime ctx still applies independently.