* Updates rosenpass version
go-rosenpass v0.4.0 → v0.5.42 bump — detailed findings
Change summary
cunicu.li/go-rosenpass v0.4.0 → v0.5.42 (target)
cilium/ebpf v0.15.0 → v0.19.0 (transitive)
gopacket/gopacket v1.1.1 → v1.4.0 (transitive)
wireguard 2023-07 → 2023-12 (transitive)
wireguard/wgctrl 2023-04 → 2024-12 (transitive)
Wire interop
v0.4.0 (in v0.70.5) <-> v0.5.42 OK
v0.5.42 <-> v0.5.42 OK
Quantum resistance: true both ends
---
**Replay error eliminated.**
Before (on v0.4.0):
`ERROR Failed to handle message: failed to load biscuit (ICR1): detected replay`
Recurring every ~50ms for minutes at a time. Gone entirely after both ends upgraded to v0.5.42. Upstream fix in biscuit/replay handling between v0.4.x and v0.5.x series.
* Fixup [::]:port socket trying to send to v4
* Adds more tests on netbird<->rosenpass interactions
* Anticipates rp handler creation before generateConfig
* [client] Moves deterministic key gen into rosenpass
* go mod tidy
* Adds reminder to reason about rosenpass surface area
* Apply code rabbit suggestions
Exit nodes are mutually exclusive, but the RouteSelector stores routes with
default-on semantics, so every available exit node reported as selected at once.
Reconcile exit-node selection on each network map (and on runtime selection):
keep at most one selected — the user's persisted pick, else whatever management
marks for auto-apply (SkipAutoApply=false), else none. Never auto-activate an
exit node the map doesn't request; it stays off until the user picks it.
The server deselects sibling exit nodes when the user activates one (leaving
non-exit routes untouched), and the tray/React exit-node toggle now appends so
activating an exit node no longer wipes network-route selections.
Make the tray Exit Node submenu selectable (mutually exclusive, sourced from
ListNetworks by NetID) instead of read-only.
Add networksRevision to the status snapshot, bumped by the route manager on
network-map and selection changes, so the tray and the React NetworksContext
re-fetch ListNetworks via the push stream instead of polling. The peer-status
route list only carries chosen routes, so a candidate exit node appearing or
disappearing would otherwise never reach the UI.
The session deadline lived in two sinks kept in sync by hand:
ApplySessionDeadline wrote both the (engine-scoped) sessionwatch.Watcher
and the (server-scoped) peer.Status recorder. The clear paths only
touched the watcher, so the recorder — which is what the Status RPC /
SubscribeStatus snapshot the UI reads from — kept reporting a deadline
that had gone stale, surfacing as a frozen "expires in …" countdown.
Two cases were leaking:
- Profile switch / Down: the watcher is recreated per engine but the
recorder outlives it, so a switch to a profile whose server sends no
deadline left the previous profile's value in place.
- In-place expiry: the watcher arms warning timers at T-WarningLead and
T-FinalWarningLead but nothing at the deadline itself, so once the
moment passed the recorder kept the now-past value indefinitely.
Make the watcher the single writer of the recorder deadline (Update /
clearLocked / Close all route through SetSessionExpiresAt) so teardown
clears it, and guard GetSessionExpiresAt to report a past deadline as
none so in-place expiry stops painting a stale countdown.
Adds a new "private" service mode for the reverse proxy: services reachable exclusively over the embedded WireGuard tunnel, gated by per-peer group membership instead of operator auth schemes.
Wire contract
- ProxyMapping.private (field 13): the proxy MUST call ValidateTunnelPeer and fail closed; operator schemes are bypassed.
- ProxyCapabilities.private (4) + supports_private_service (5): capability gate. Management never streams private mappings to proxies that don't claim the capability; the broadcast path applies the same filter via filterMappingsForProxy.
- ValidateTunnelPeer RPC: resolves an inbound tunnel IP to a peer, checks the peer's groups against service.AccessGroups, and mints a session JWT on success. checkPeerGroupAccess fails closed when a private service has empty AccessGroups.
- ValidateSession/ValidateTunnelPeer responses now carry peer_group_ids + peer_group_names so the proxy can authorise policy-aware middlewares without an extra management round-trip.
- ProxyInboundListener + SendStatusUpdate.inbound_listener: per-account inbound listener state surfaced to dashboards.
- PathTargetOptions.direct_upstream (11): bypass the embedded NetBird client and dial the target via the proxy host's network stack for upstreams reachable without WireGuard.
Data model
- Service.Private (bool) + Service.AccessGroups ([]string, JSON- serialised). Validate() rejects bearer auth on private services. Copy() deep-copies AccessGroups. pgx getServices loads the columns.
- DomainConfig.Private threaded into the proxy auth middleware. Request handler routes private services through forwardWithTunnelPeer and returns 403 on validation failure.
- Account-level SynthesizePrivateServiceZones (synthetic DNS) and injectPrivateServicePolicies (synthetic ACL) gate on len(svc.AccessGroups) > 0.
Proxy
- /netbird proxy --private (embedded mode) flag; Config.Private in proxy/lifecycle.go.
- Per-account inbound listener (proxy/inbound.go) binding HTTP/HTTPS on the embedded NetBird client's WireGuard tunnel netstack.
- proxy/internal/auth/tunnel_cache: ValidateTunnelPeer response cache with single-flight de-duplication and per-account eviction.
- Local peerstore short-circuit: when the inbound IP isn't in the account roster, deny fast without an RPC.
- proxy/server.go reports SupportsPrivateService=true and redacts the full ProxyMapping JSON from info logs (auth_token + header-auth hashed values now only at debug level).
Identity forwarding
- ValidateSessionJWT returns user_id, email, method, groups, group_names. sessionkey.Claims carries Email + Groups + GroupNames so the proxy can stamp identity onto upstream requests without an extra management round-trip on every cookie-bearing request.
- CapturedData carries userEmail / userGroups / userGroupNames; the proxy stamps X-NetBird-User and X-NetBird-Groups on r.Out from the authenticated identity (strips client-supplied values first to prevent spoofing).
- AccessLog.UserGroups: access-log enrichment captures the user's group memberships at write time so the dashboard can render group context without reverse-resolving stale memberships.
OpenAPI/dashboard surface
- ReverseProxyService gains private + access_groups; ReverseProxyCluster gains private + supports_private. ReverseProxyTarget target_type enum gains "cluster". ServiceTargetOptions gains direct_upstream. ProxyAccessLog gains user_groups.
Adds an end-to-end SSO session-extension feature: the management server
publishes per-peer session deadlines on every Login/Sync, a new
ExtendAuthSession RPC refreshes the deadline using a fresh JWT without
tearing down the tunnel, and the daemon tracks the deadline locally so
the UI can fire a T-10min warning toast with an interactive "Extend now"
action.
Two related daemon-side status-stream fixes that together keep the UI's
status in sync with the daemon's contextState:
* state.Set previously only mutated the in-memory enum — transitions
that weren't accompanied by a Mark{Management,Signal,...} call (e.g.
StatusNeedsLogin after a PermissionDenied login, StatusLoginFailed
after OAuth init failure, StatusIdle in the Login defer) left the
UI stuck on the previous snapshot until an unrelated peer event
happened to fire notifyStateChange. Add a callback on contextState
fired from Set (outside the mutex, to avoid lock-order issues with
the recorder's stateChangeMux), and wire it in Server.Start to the
recorder's new public NotifyStateChange. Every state.Set callsite
now pushes automatically; new ones don't need to opt in.
* WaitSSOLogin's WaitToken error branch lumped every failure into
StatusLoginFailed, including context.Canceled aborts from a parallel
profile switch (actCancel/waitCancel). That spurious LoginFailed
then wedged the new profile's Up RPC with "up already in progress:
current status LoginFailed". Split the branch by error type:
context.Canceled lets the top-level defer pick StatusIdle,
context.DeadlineExceeded sets StatusNeedsLogin (retryable; OAuth
device-code window just expired), other errors keep LoginFailed
(real auth/IO failures). Document the full state-transition table
in the function godoc.
The status snapshot tore down on every management retry because
state.Status() blanks the status when an error is wrapped, and the
SubscribeStatus stream propagated that as FailedPrecondition. The UI
treated any stream error as "daemon not running" and flickered the tray
to Not running between retries.
Disconnect was also unresponsive: Down set Idle before the retry
goroutine exited, which then overwrote it with Set(Connecting) on the
next attempt; the backoff sleep (up to 15s) wasn't context-aware, so the
goroutine kept running long after actCancel.
- buildStatusResponse falls back to the underlying status (via new
state.CurrentStatus) instead of breaking the stream on wrapped errors.
- UI only flips to DaemonUnavailable on codes.Unavailable / non-status
errors, so a live daemon returning FailedPrecondition is not reported
as down.
- connect retry uses backoff.WithContext so actCancel interrupts the
inter-attempt sleep, and skips Wrap(err) when the dial fails due to
ctx cancellation.
- Down sets Idle after waiting for giveUpChan, so the retry goroutine
can no longer race the disconnect.
- Tray hides Connect during Connecting and keeps Disconnect enabled so
the user can abort an in-flight connection attempt.
Port IPv6 overlay support (#5631) into the Wails UI:
- Add DisableIPv6 config toggle to Settings (NetworkTab + services)
- Filter ::/0 alongside 0.0.0.0/0 as an exit-node route
- Suppress duplicate v6 default-route notifications in tray
The Status recorder used to fire notifier callbacks while holding d.mux:
- notifyPeerListChanged / notifyPeerStateChangeListeners ran from inside
the locked section of every Update*/AddPeerStateRoute/etc.
- notifyAddressChanged ran from UpdateLocalPeerState and CleanLocalPeerState
while d.mux was held.
- onConnectionChanged was registered with a defer above defer d.mux.Unlock,
so it executed before the mutex was released in the Mark*Connected/
Disconnected helpers.
- notifyPeerStateChangeListeners did a blocking channel send under d.mux,
so a slow subscriber stalled every other d.mux holder.
A listener that re-enters the recorder (e.g. calls GetFullStatus from
within a callback) deadlocks against d.mux, and any callback that takes
longer than expected stalls every other state query for its duration.
Capture the values needed for notification under the lock, release d.mux,
then call the notifier. Build per-peer router-state snapshots inside the
lock and dispatch them via dispatchRouterPeers afterwards. The router-peer
channel send stays blocking, but now happens outside d.mux so a slow
consumer cannot stall any other d.mux holder, and no peer state
transitions are silently dropped.
The notifier itself is unchanged: its internal state was already protected
by its own locks, and the field d.notifier is set once in NewRecorder and
never reassigned, so reading it without d.mux is safe.
Also fix a pre-existing race in Test_notifier_RemoveListener /
Test_notifier_SetListener: setListener spawns a goroutine that writes
listener.peers, but the tests read listener.peers without waiting for it.
Adds a SubscribeStatus gRPC RPC that pushes a fresh FullStatus snapshot
on every peer-recorder state change, replacing the Wails UI's 2-second
Status poll. The daemon's notifier already triggers on Connected /
Disconnected / Connecting / management or signal flip / address
change / peers-list change; we now coalesce those into ticks on a
buffered chan and stream the resulting snapshots over gRPC.
- Status recorder gains SubscribeToStateChanges /
UnsubscribeFromStateChanges + a non-blocking notifyStateChange that
drops ticks when a subscriber's 1-slot buffer is full (next snapshot
the consumer pulls already reflects everything).
- Server.Status handler split: the snapshot composition is shared
with the new SubscribeStatus stream handler so unary and stream
paths return identical bytes.
- UI peers service: pollLoop replaced by statusStreamLoop. The local
name of the existing SubscribeEvents loop is now toastStreamLoop so
the two streams are easy to tell apart — the underlying RPC name is
unchanged.
- Tray applyStatus skips the icon refresh when connected/lastStatus
hasn't changed; rapid SubscribeStatus bursts during health probes
no longer churn Shell_NotifyIcon or the log.
* [relay] evict foreign client cache on disconnect
When a foreign relay's TCP connection drops, the manager's
onServerDisconnected handler only triggered reconnect logic for the
home server; the disconnected foreign entry stayed in the relayClients
cache. Subsequent OpenConn calls reused the closed client until the
60-second cleanup tick evicted it, breaking peer connectivity through
that relay for up to a minute.
Evict the foreign entry from the cache on disconnect so the next
OpenConn dials a fresh client.
Also:
- Make the reconnect backoff cap configurable via WithMaxBackoffInterval
ManagerOption; the previous hard-coded 60s constant forced
TestAutoReconnect to sleep ~61s. Test now polls Ready() and finishes
in ~2s.
- Add NB_HOME_RELAY_SERVERS env var that overrides the relay URL list
received from management, so a peer can be pinned to a specific home
relay (used by the netbird-conn-lab Edge 4 reproducer).
* [client] treat empty NB_HOME_RELAY_SERVERS as unset
Returning (urls=[], ok=true) when the env var contained only separators or
whitespace caused callers to wipe the mgmt-provided relay list, leaving the
peer with no relays. Treat a parsed-empty result the same as an unset env.
* [debug] fix port collision in TestUpload
TestUpload hardcoded :8080, so it failed deterministically when anything
was already on that port and collided across concurrent test runs.
Bind a :0 listener in the test to get a kernel-assigned free port, and
add Server.Serve so tests can hand the listener in without reaching
into unexported state.
* [debug] drop test-only Server.Serve, use SERVER_ADDRESS env
The previous commit added a Server.Serve method on the upload-server,
used only by TestUpload. That left production with an unused function.
Reserve an ephemeral loopback port in the test, release it, and pass
the address through SERVER_ADDRESS (which the server already reads).
A small wait helper ensures the server is accepting connections before
the upload runs, so the close/rebind gap does not cause a false failure.
* [client] Suppress ICE signaling and periodic offers in force-relay mode
When NB_FORCE_RELAY is enabled, skip WorkerICE creation entirely,
suppress ICE credentials in offer/answer messages, disable the
periodic ICE candidate monitor, and fix isConnectedOnAllWay to
only check relay status so the guard stops sending unnecessary offers.
* [client] Dynamically suppress ICE based on remote peer's offer credentials
Track whether the remote peer includes ICE credentials in its
offers/answers. When remote stops sending ICE credentials, skip
ICE listener dispatch, suppress ICE credentials in responses, and
exclude ICE from the guard connectivity check. When remote resumes
sending ICE credentials, re-enable all ICE behavior.
* [client] Fix nil SessionID panic and force ICE teardown on relay-only transition
Fix nil pointer dereference in signalOfferAnswer when SessionID is nil
(relay-only offers). Close stale ICE agent immediately when remote peer
stops sending ICE credentials to avoid traffic black-hole during the
ICE disconnect timeout.
* [client] Add relay-only fallback check when ICE is unavailable
Ensure the relay connection is supported with the peer when ICE is disabled to prevent connectivity issues.
* [client] Add tri-state connection status to guard for smarter ICE retry (#5828)
* [client] Add tri-state connection status to guard for smarter ICE retry
Refactor isConnectedOnAllWay to return a ConnStatus enum (Connected,
Disconnected, PartiallyConnected) instead of a boolean. When relay is
up but ICE is not (PartiallyConnected), limit ICE offers to 3 retries
with exponential backoff then fall back to hourly attempts, reducing
unnecessary signaling traffic. Fully disconnected peers continue to
retry aggressively. External events (relay/ICE disconnect, signal/relay
reconnect) reset retry state to give ICE a fresh chance.
* [client] Clarify guard ICE retry state and trace log trigger
Split iceRetryState.attempt into shouldRetry (pure predicate) and
enterHourlyMode (explicit state transition) so the caller in
reconnectLoopWithRetry reads top-to-bottom. Restore the original
trace-log behavior in isConnectedOnAllWay so it only logs on full
disconnection, not on the new PartiallyConnected state.
* [client] Extract pure evalConnStatus and add unit tests
Split isConnectedOnAllWay into a thin method that snapshots state and
a pure evalConnStatus helper that takes a connStatusInputs struct, so
the tri-state decision logic can be exercised without constructing
full Worker or Handshaker objects. Add table-driven tests covering
force-relay, ICE-unavailable and fully-available code paths, plus
unit tests for iceRetryState budget/hourly transitions and reset.
* [client] Improve grammar in logs and refactor ICE credential checks
* Add support for legacy IDP cache environment variable
* Centralize cache store creation to reuse a single Redis connection pool
Each cache consumer (IDP cache, token store, PKCE store, secrets manager,
EDR validator) was independently calling NewStore, creating separate Redis
clients with their own connection pools — up to 1400 potential connections
from a single management server process.
Introduce a shared CacheStore() singleton on BaseServer that creates one
store at boot and injects it into all consumers. Consumer constructors now
receive a store.StoreInterface instead of creating their own.
For Redis mode, all consumers share one connection pool (1000 max conns).
For in-memory mode, all consumers share one GoCache instance.
* Update management-integrations module to latest version
* sync go.sum
* Export `GetAddrFromEnv` to allow reuse across packages
* Update management-integrations module version in go.mod and go.sum
* Update management-integrations module version in go.mod and go.sum
extraInitialRoutes() was meant to preserve only the fake IP route
(240.0.0.0/8) across TUN rebuilds, but it re-injected any initial
route missing from the current set. When the management server
advertised exit node routes (0.0.0.0/0) that were later filtered
by the route selector, extraInitialRoutes() re-added them, causing
the Android VPN to capture all traffic with no peer to handle it.
Store the fake IP route explicitly and append only that in notify(),
removing the overly broad initial route diffing.