* [relay] evict foreign client cache on disconnect
When a foreign relay's TCP connection drops, the manager's
onServerDisconnected handler only triggered reconnect logic for the
home server; the disconnected foreign entry stayed in the relayClients
cache. Subsequent OpenConn calls reused the closed client until the
60-second cleanup tick evicted it, breaking peer connectivity through
that relay for up to a minute.
Evict the foreign entry from the cache on disconnect so the next
OpenConn dials a fresh client.
Also:
- Make the reconnect backoff cap configurable via WithMaxBackoffInterval
ManagerOption; the previous hard-coded 60s constant forced
TestAutoReconnect to sleep ~61s. Test now polls Ready() and finishes
in ~2s.
- Add NB_HOME_RELAY_SERVERS env var that overrides the relay URL list
received from management, so a peer can be pinned to a specific home
relay (used by the netbird-conn-lab Edge 4 reproducer).
* [client] treat empty NB_HOME_RELAY_SERVERS as unset
Returning (urls=[], ok=true) when the env var contained only separators or
whitespace caused callers to wipe the mgmt-provided relay list, leaving the
peer with no relays. Treat a parsed-empty result the same as an unset env.
* [client] Use `net.JoinHostPort` for consistency in constructing host-port pairs
* [client] Fix handling of IPv6 addresses by trimming brackets in `net.JoinHostPort`
* Fix race condition and ensure correct message ordering in
connection establishment
Reorder operations in OpenConn to register the connection before
waiting for peer availability. This ensures:
- Connection is ready to receive messages before peer subscription
completes
- Transport messages and onconnected events maintain proper ordering
- No messages are lost during the connection establishment window
- Concurrent OpenConn calls cannot create duplicate connections
If peer availability check fails, the pre-registered connection is
properly cleaned up.
* Handle service shutdown during relay connection initialization
Ensure relay connections are properly cleaned up when the service is not running by verifying `serviceIsRunning` and removing stale entries from `c.conns` to prevent unintended behaviors.
* Refactor relay client Conn/connContainer ownership and decouple Conn from Client
Conn previously held a direct *Client pointer and called client methods
(writeTo, closeConn, LocalAddr) directly, creating a tight bidirectional
coupling. The message channel was also created externally in OpenConn and
shared between Conn and connContainer with unclear ownership.
Now connContainer fully owns the lifecycle of both the channel and the
Conn it wraps:
- connContainer creates the channel (sized by connChannelSize const)
and the Conn internally via newConnContainer
- connContainer feeds messages into the channel (writeMsg), closes and
drains it on shutdown (close)
- Conn reads from the channel (Read) but never closes it
Conn is decoupled from *Client by replacing the *Client field with
three function closures (writeFn, closeFn, localAddrFn) that are wired
by newConnContainer at construction time. Write, Close, and LocalAddr
delegate to these closures. This removes the direct dependency while
keeping the identity-check logic: writeTo and closeConn now compare
connContainer pointers instead of Conn pointers to verify the caller
is the current active connection for that peer.
* Fix race condition and ensure correct message ordering in
connection establishment
Reorder operations in OpenConn to register the connection before
waiting for peer availability. This ensures:
- Connection is ready to receive messages before peer subscription
completes
- Transport messages and onconnected events maintain proper ordering
- No messages are lost during the connection establishment window
- Concurrent OpenConn calls cannot create duplicate connections
If peer availability check fails, the pre-registered connection is
properly cleaned up.
* Handle service shutdown during relay connection initialization
Ensure relay connections are properly cleaned up when the service is not running by verifying `serviceIsRunning` and removing stale entries from `c.conns` to prevent unintended behaviors.
- Move `util/grpc` and `util/net` to `client` so `internal` packages can be accessed
- Add methods to return the next best interface after the NetBird interface.
- Use `IP_UNICAST_IF` sock opt to force the outgoing interface for the NetBird `net.Dialer` and `net.ListenerConfig` to avoid routing loops. The interface is picked by the new route lookup method.
- Some refactoring to avoid import cycles
- Old behavior is available through `NB_USE_LEGACY_ROUTING=true` env var