netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-05-31 21:19:55 +00:00

Author	SHA1	Message	Date
Zoltán Papp	d57b30f8d5	Merge branch 'main' into ui-refactor	2026-05-28 13:43:19 +02:00
Riccardo Manfrin	7ea5e37dd4	[client] Improve rosenpass support (#6136 ) * Updates rosenpass version go-rosenpass v0.4.0 → v0.5.42 bump — detailed findings Change summary cunicu.li/go-rosenpass v0.4.0 → v0.5.42 (target) cilium/ebpf v0.15.0 → v0.19.0 (transitive) gopacket/gopacket v1.1.1 → v1.4.0 (transitive) wireguard 2023-07 → 2023-12 (transitive) wireguard/wgctrl 2023-04 → 2024-12 (transitive) Wire interop v0.4.0 (in v0.70.5) <-> v0.5.42 OK v0.5.42 <-> v0.5.42 OK Quantum resistance: true both ends --- Replay error eliminated. Before (on v0.4.0): `ERROR Failed to handle message: failed to load biscuit (ICR1): detected replay` Recurring every ~50ms for minutes at a time. Gone entirely after both ends upgraded to v0.5.42. Upstream fix in biscuit/replay handling between v0.4.x and v0.5.x series. * Fixup [::]:port socket trying to send to v4 * Adds more tests on netbird<->rosenpass interactions * Anticipates rp handler creation before generateConfig * [client] Moves deterministic key gen into rosenpass * go mod tidy * Adds reminder to reason about rosenpass surface area * Apply code rabbit suggestions	2026-05-28 09:01:18 +02:00
Riccardo Manfrin	9d7ef9b255	[client] Fix statemanager possible deadlock (#6228 ) 1. Stop() takes m.mu.Lock() and defers m.mu.Unlock() 2. <-m.done under lock 3. periodicStateSave defers close(m.done) 4. periodicStateSave calls PersistState() (line 256) which does m.mu.Lock() Double Stop() remains idempotent: second cancel() on dead ctx (no-op) and reads done already closed (immediate return).	2026-05-28 08:54:15 +02:00
Zoltan Papp	966fbec119	routemanager: enforce a single selected exit node Exit nodes are mutually exclusive, but the RouteSelector stores routes with default-on semantics, so every available exit node reported as selected at once. Reconcile exit-node selection on each network map (and on runtime selection): keep at most one selected — the user's persisted pick, else whatever management marks for auto-apply (SkipAutoApply=false), else none. Never auto-activate an exit node the map doesn't request; it stays off until the user picks it. The server deselects sibling exit nodes when the user activates one (leaving non-exit routes untouched), and the tray/React exit-node toggle now appends so activating an exit node no longer wipes network-route selections.	2026-05-27 20:48:16 +02:00
Zoltan Papp	f693d268b4	tray: selectable exit nodes + push-based network list refresh Make the tray Exit Node submenu selectable (mutually exclusive, sourced from ListNetworks by NetID) instead of read-only. Add networksRevision to the status snapshot, bumped by the route manager on network-map and selection changes, so the tray and the React NetworksContext re-fetch ListNetworks via the push stream instead of polling. The peer-status route list only carries chosen routes, so a candidate exit node appearing or disappearing would otherwise never reach the UI.	2026-05-27 20:48:16 +02:00
Zoltan Papp	13179081d2	Merge branch 'main' into ui-refactor	2026-05-26 23:41:18 +02:00
Zoltan Papp	53bbc2d551	session: clear stale SSO deadline on teardown and after expiry The session deadline lived in two sinks kept in sync by hand: ApplySessionDeadline wrote both the (engine-scoped) sessionwatch.Watcher and the (server-scoped) peer.Status recorder. The clear paths only touched the watcher, so the recorder — which is what the Status RPC / SubscribeStatus snapshot the UI reads from — kept reporting a deadline that had gone stale, surfacing as a frozen "expires in …" countdown. Two cases were leaking: - Profile switch / Down: the watcher is recreated per engine but the recorder outlives it, so a switch to a profile whose server sends no deadline left the previous profile's value in place. - In-place expiry: the watcher arms warning timers at T-WarningLead and T-FinalWarningLead but nothing at the deadline itself, so once the moment passed the recorder kept the now-past value indefinitely. Make the watcher the single writer of the recorder deadline (Update / clearLocked / Close all route through SetSessionExpiresAt) so teardown clears it, and guard GetSessionExpiresAt to report a past deadline as none so in-place expiry stops painting a stale countdown.	2026-05-26 23:15:03 +02:00
Bethuel Mmbaga	14af179556	[management] Refactor management server bootstrap (#6256 )	2026-05-26 17:44:28 +03:00
Viktor Liu	e89b1e0596	[proxy, client] Bound embed client WireGuard per-Device memory (#5962 )	2026-05-26 11:51:53 +02:00
Viktor Liu	4983b5cf17	[client] Match DNS wildcard handlers on label boundaries (#6255 )	2026-05-25 18:38:48 +02:00
Viktor Liu	b3b0feb3b8	[client] Filter scoped/cloned default routes from BSD network monitor RTM_ADD (#6208 )	2026-05-25 18:38:21 +02:00
Maycon Santos	7aebdd69dd	[management, client, proxy] add expose NetBird-only services over tunnel peers (#6226 ) Adds a new "private" service mode for the reverse proxy: services reachable exclusively over the embedded WireGuard tunnel, gated by per-peer group membership instead of operator auth schemes. Wire contract - ProxyMapping.private (field 13): the proxy MUST call ValidateTunnelPeer and fail closed; operator schemes are bypassed. - ProxyCapabilities.private (4) + supports_private_service (5): capability gate. Management never streams private mappings to proxies that don't claim the capability; the broadcast path applies the same filter via filterMappingsForProxy. - ValidateTunnelPeer RPC: resolves an inbound tunnel IP to a peer, checks the peer's groups against service.AccessGroups, and mints a session JWT on success. checkPeerGroupAccess fails closed when a private service has empty AccessGroups. - ValidateSession/ValidateTunnelPeer responses now carry peer_group_ids + peer_group_names so the proxy can authorise policy-aware middlewares without an extra management round-trip. - ProxyInboundListener + SendStatusUpdate.inbound_listener: per-account inbound listener state surfaced to dashboards. - PathTargetOptions.direct_upstream (11): bypass the embedded NetBird client and dial the target via the proxy host's network stack for upstreams reachable without WireGuard. Data model - Service.Private (bool) + Service.AccessGroups ([]string, JSON- serialised). Validate() rejects bearer auth on private services. Copy() deep-copies AccessGroups. pgx getServices loads the columns. - DomainConfig.Private threaded into the proxy auth middleware. Request handler routes private services through forwardWithTunnelPeer and returns 403 on validation failure. - Account-level SynthesizePrivateServiceZones (synthetic DNS) and injectPrivateServicePolicies (synthetic ACL) gate on len(svc.AccessGroups) > 0. Proxy - /netbird proxy --private (embedded mode) flag; Config.Private in proxy/lifecycle.go. - Per-account inbound listener (proxy/inbound.go) binding HTTP/HTTPS on the embedded NetBird client's WireGuard tunnel netstack. - proxy/internal/auth/tunnel_cache: ValidateTunnelPeer response cache with single-flight de-duplication and per-account eviction. - Local peerstore short-circuit: when the inbound IP isn't in the account roster, deny fast without an RPC. - proxy/server.go reports SupportsPrivateService=true and redacts the full ProxyMapping JSON from info logs (auth_token + header-auth hashed values now only at debug level). Identity forwarding - ValidateSessionJWT returns user_id, email, method, groups, group_names. sessionkey.Claims carries Email + Groups + GroupNames so the proxy can stamp identity onto upstream requests without an extra management round-trip on every cookie-bearing request. - CapturedData carries userEmail / userGroups / userGroupNames; the proxy stamps X-NetBird-User and X-NetBird-Groups on r.Out from the authenticated identity (strips client-supplied values first to prevent spoofing). - AccessLog.UserGroups: access-log enrichment captures the user's group memberships at write time so the dashboard can render group context without reverse-resolving stale memberships. OpenAPI/dashboard surface - ReverseProxyService gains private + access_groups; ReverseProxyCluster gains private + supports_private. ReverseProxyTarget target_type enum gains "cluster". ServiceTargetOptions gains direct_upstream. ProxyAccessLog gains user_groups.	2026-05-25 17:41:50 +02:00
Zoltán Papp	341848b1ae	fix lint issues in session watcher tests and status humaniser	2026-05-20 18:46:56 +02:00
Zoltán Papp	ef6b4f7538	add SSO session extend flow Adds an end-to-end SSO session-extension feature: the management server publishes per-peer session deadlines on every Login/Sync, a new ExtendAuthSession RPC refreshes the deadline using a fresh JWT without tearing down the tunnel, and the daemon tracks the deadline locally so the UI can fire a T-10min warning toast with an interactive "Extend now" action.	2026-05-20 16:43:14 +02:00
Viktor Liu	9192b4f029	[client] Bump macOS sleep callback timeout to 20s (#6220 )	2026-05-20 13:09:22 +02:00
Zoltán Papp	f468f15a30	Merge branch 'main' into ui-refactor # Conflicts: # client/ui/network.go	2026-05-18 10:24:31 +02:00
Viktor Liu	9ed2e2a5b4	[client] Drop DNS probes for passive health projection (#5971 )	2026-05-15 17:07:38 +02:00
Viktor Liu	2ccae7ec47	[client] Mirror v4 exit selection onto v6 pair and honour SkipAutoApply per route (#6150 )	2026-05-15 16:58:47 +02:00
Viktor Liu	3f914090cb	[client] Bracket IPv6 in embed listeners, expand debug bundle (#6134 )	2026-05-14 16:22:53 +02:00
Zoltan Papp	d841a6aa07	[client] Push status snapshot on every state.Set and classify SSO errors Two related daemon-side status-stream fixes that together keep the UI's status in sync with the daemon's contextState: * state.Set previously only mutated the in-memory enum — transitions that weren't accompanied by a Mark{Management,Signal,...} call (e.g. StatusNeedsLogin after a PermissionDenied login, StatusLoginFailed after OAuth init failure, StatusIdle in the Login defer) left the UI stuck on the previous snapshot until an unrelated peer event happened to fire notifyStateChange. Add a callback on contextState fired from Set (outside the mutex, to avoid lock-order issues with the recorder's stateChangeMux), and wire it in Server.Start to the recorder's new public NotifyStateChange. Every state.Set callsite now pushes automatically; new ones don't need to opt in. * WaitSSOLogin's WaitToken error branch lumped every failure into StatusLoginFailed, including context.Canceled aborts from a parallel profile switch (actCancel/waitCancel). That spurious LoginFailed then wedged the new profile's Up RPC with "up already in progress: current status LoginFailed". Split the branch by error type: context.Canceled lets the top-level defer pick StatusIdle, context.DeadlineExceeded sets StatusNeedsLogin (retryable; OAuth device-code window just expired), other errors keep LoginFailed (real auth/IO failures). Document the full state-transition table in the function godoc.	2026-05-14 14:51:51 +02:00
Zoltan Papp	e3efaa5e59	[client] Fix tray flicker and stuck Connecting during management retry The status snapshot tore down on every management retry because state.Status() blanks the status when an error is wrapped, and the SubscribeStatus stream propagated that as FailedPrecondition. The UI treated any stream error as "daemon not running" and flickered the tray to Not running between retries. Disconnect was also unresponsive: Down set Idle before the retry goroutine exited, which then overwrote it with Set(Connecting) on the next attempt; the backoff sleep (up to 15s) wasn't context-aware, so the goroutine kept running long after actCancel. - buildStatusResponse falls back to the underlying status (via new state.CurrentStatus) instead of breaking the stream on wrapped errors. - UI only flips to DaemonUnavailable on codes.Unavailable / non-status errors, so a live daemon returning FailedPrecondition is not reported as down. - connect retry uses backoff.WithContext so actCancel interrupts the inter-attempt sleep, and skips Wrap(err) when the dial fails due to ctx cancellation. - Down sets Idle after waiting for giveUpChan, so the retry goroutine can no longer race the disconnect. - Tray hides Connect during Connecting and keeps Disconnect enabled so the user can abort an in-flight connection attempt.	2026-05-12 20:38:30 +02:00
Zoltan Papp	7a9f5a734f	Merge branch 'main' into ui-refactor Port IPv6 overlay support (#5631) into the Wails UI: - Add DisableIPv6 config toggle to Settings (NetworkTab + services) - Filter ::/0 alongside 0.0.0.0/0 as an exit-node route - Suppress duplicate v6 default-route notifications in tray	2026-05-11 14:10:12 +02:00
Viktor Liu	a4114a5e45	[client] Skip DNS upstream failover on definitive EDE (#6089 )	2026-05-11 10:00:23 +02:00
Viktor Liu	205ebcfda2	[management, client] Add IPv6 overlay support (#5631 )	2026-05-07 11:33:37 +02:00
Viktor Liu	f532976e05	[client] Add public key to debug bundle config.txt (#6092 )	2026-05-06 13:42:47 +02:00
Viktor Liu	71a400f90f	[client] Include MTU and SSH auth/JWT cache config in debug bundle (#6071 )	2026-05-06 13:23:43 +02:00
Zoltán Papp	a8812d5fb1	Merge remote-tracking branch 'origin/main' into ui-refactor # Conflicts: # go.mod # go.sum	2026-05-05 15:41:59 +02:00
Viktor Liu	cd8e71002f	[client] Bump go-netroute to v0.4.0 and drop fork (#6062 )	2026-05-05 15:26:27 +02:00
Zoltán Papp	4c743bc03d	Merge remote-tracking branch 'origin/main' into ui-refactor # Conflicts: # client/internal/peer/status.go # client/proto/daemon.pb.go # client/proto/daemon_grpc.pb.go # go.mod	2026-05-05 12:49:09 +02:00
alexsavio	bde632c3b2	[client] Replace WG interface monitor polling with netlink subscription on Linux (#5857 )	2026-05-04 18:49:39 +02:00
Zoltan Papp	a21f6ecb0a	[client] release Status.mux before invoking notifier callbacks (#6039 ) The Status recorder used to fire notifier callbacks while holding d.mux: - notifyPeerListChanged / notifyPeerStateChangeListeners ran from inside the locked section of every Update/AddPeerStateRoute/etc. - notifyAddressChanged ran from UpdateLocalPeerState and CleanLocalPeerState while d.mux was held. - onConnectionChanged was registered with a defer above defer d.mux.Unlock, so it executed before the mutex was released in the MarkConnected/ Disconnected helpers. - notifyPeerStateChangeListeners did a blocking channel send under d.mux, so a slow subscriber stalled every other d.mux holder. A listener that re-enters the recorder (e.g. calls GetFullStatus from within a callback) deadlocks against d.mux, and any callback that takes longer than expected stalls every other state query for its duration. Capture the values needed for notification under the lock, release d.mux, then call the notifier. Build per-peer router-state snapshots inside the lock and dispatch them via dispatchRouterPeers afterwards. The router-peer channel send stays blocking, but now happens outside d.mux so a slow consumer cannot stall any other d.mux holder, and no peer state transitions are silently dropped. The notifier itself is unchanged: its internal state was already protected by its own locks, and the field d.notifier is set once in NewRecorder and never reassigned, so reading it without d.mux is safe. Also fix a pre-existing race in Test_notifier_RemoveListener / Test_notifier_SetListener: setListener spawns a goroutine that writes listener.peers, but the tests read listener.peers without waiting for it.	2026-05-04 11:59:01 +02:00
Viktor Liu	50b58a6828	[client, relay] Advertise relay server IP via signal for foreign-relay fallback dial (#6004 )	2026-05-04 11:40:25 +02:00
Viktor Liu	057d651d2e	[client, proxy] Add packet capture to debug bundle and CLI (#5891 )	2026-05-04 11:28:56 +02:00
Zoltán Papp	88a2bf582d	[client] Push-based status stream for the Wails UI Adds a SubscribeStatus gRPC RPC that pushes a fresh FullStatus snapshot on every peer-recorder state change, replacing the Wails UI's 2-second Status poll. The daemon's notifier already triggers on Connected / Disconnected / Connecting / management or signal flip / address change / peers-list change; we now coalesce those into ticks on a buffered chan and stream the resulting snapshots over gRPC. - Status recorder gains SubscribeToStateChanges / UnsubscribeFromStateChanges + a non-blocking notifyStateChange that drops ticks when a subscriber's 1-slot buffer is full (next snapshot the consumer pulls already reflects everything). - Server.Status handler split: the snapshot composition is shared with the new SubscribeStatus stream handler so unary and stream paths return identical bytes. - UI peers service: pollLoop replaced by statusStreamLoop. The local name of the existing SubscribeEvents loop is now toastStreamLoop so the two streams are easy to tell apart — the underlying RPC name is unchanged. - Tray applyStatus skips the icon refresh when connected/lastStatus hasn't changed; rapid SubscribeStatus bursts during health probes no longer churn Shell_NotifyIcon or the log.	2026-04-30 11:45:43 +02:00
Viktor Liu	ed828b7af4	Tolerate EEXIST when adding macOS scoped default routes (#6027 )	2026-04-29 16:08:47 +02:00
Viktor Liu	11ac2af2f5	Use BindListener for all userspace bind in lazyconn activity (#6028 )	2026-04-29 16:07:33 +02:00
Bethuel Mmbaga	df197d5001	[management] Prevent JWT reuse during peer login (#6002 )	2026-04-29 15:04:27 +03:00
Viktor Liu	407e9d304b	[client] Move macOS sleep detection into the daemon (purego) (#5926 )	2026-04-29 08:09:55 +02:00
Zoltan Papp	8fc4265995	[relay] evict foreign client cache on disconnect (#6015 ) * [relay] evict foreign client cache on disconnect When a foreign relay's TCP connection drops, the manager's onServerDisconnected handler only triggered reconnect logic for the home server; the disconnected foreign entry stayed in the relayClients cache. Subsequent OpenConn calls reused the closed client until the 60-second cleanup tick evicted it, breaking peer connectivity through that relay for up to a minute. Evict the foreign entry from the cache on disconnect so the next OpenConn dials a fresh client. Also: - Make the reconnect backoff cap configurable via WithMaxBackoffInterval ManagerOption; the previous hard-coded 60s constant forced TestAutoReconnect to sleep ~61s. Test now polls Ready() and finishes in ~2s. - Add NB_HOME_RELAY_SERVERS env var that overrides the relay URL list received from management, so a peer can be pinned to a specific home relay (used by the netbird-conn-lab Edge 4 reproducer). * [client] treat empty NB_HOME_RELAY_SERVERS as unset Returning (urls=[], ok=true) when the env var contained only separators or whitespace caused callers to wipe the mgmt-provided relay list, leaving the peer with no relays. Treat a parsed-empty result the same as an unset env.	2026-04-28 15:04:48 +02:00
Viktor Liu	801de8c68d	[client] Add TTL-based refresh to mgmt DNS cache via handler chain (#5945 )	2026-04-22 15:10:14 +02:00
Zoltan Papp	1165058fad	[client] fix port collision in TestUpload (#5950 ) * [debug] fix port collision in TestUpload TestUpload hardcoded :8080, so it failed deterministically when anything was already on that port and collided across concurrent test runs. Bind a :0 listener in the test to get a kernel-assigned free port, and add Server.Serve so tests can hand the listener in without reaching into unexported state. * [debug] drop test-only Server.Serve, use SERVER_ADDRESS env The previous commit added a Server.Serve method on the upload-server, used only by TestUpload. That left production with an unused function. Reserve an ephemeral loopback port in the test, release it, and pass the address through SERVER_ADDRESS (which the server already reads). A small wait helper ensures the server is accepting connections before the upload runs, so the close/rebind gap does not cause a false failure.	2026-04-21 19:07:20 +02:00
Viktor Liu	064ec1c832	[client] Trust wg interface in firewalld to bypass owner-flagged chains (#5928 )	2026-04-21 17:57:16 +02:00
Viktor Liu	75e408f51c	[client] Prefer systemd-resolved stub over file mode regardless of resolv.conf header (#5935 )	2026-04-21 17:56:56 +02:00
Zoltan Papp	5a89e6621b	[client] Supress ICE signaling (#5820 ) * [client] Suppress ICE signaling and periodic offers in force-relay mode When NB_FORCE_RELAY is enabled, skip WorkerICE creation entirely, suppress ICE credentials in offer/answer messages, disable the periodic ICE candidate monitor, and fix isConnectedOnAllWay to only check relay status so the guard stops sending unnecessary offers. * [client] Dynamically suppress ICE based on remote peer's offer credentials Track whether the remote peer includes ICE credentials in its offers/answers. When remote stops sending ICE credentials, skip ICE listener dispatch, suppress ICE credentials in responses, and exclude ICE from the guard connectivity check. When remote resumes sending ICE credentials, re-enable all ICE behavior. * [client] Fix nil SessionID panic and force ICE teardown on relay-only transition Fix nil pointer dereference in signalOfferAnswer when SessionID is nil (relay-only offers). Close stale ICE agent immediately when remote peer stops sending ICE credentials to avoid traffic black-hole during the ICE disconnect timeout. * [client] Add relay-only fallback check when ICE is unavailable Ensure the relay connection is supported with the peer when ICE is disabled to prevent connectivity issues. * [client] Add tri-state connection status to guard for smarter ICE retry (#5828) * [client] Add tri-state connection status to guard for smarter ICE retry Refactor isConnectedOnAllWay to return a ConnStatus enum (Connected, Disconnected, PartiallyConnected) instead of a boolean. When relay is up but ICE is not (PartiallyConnected), limit ICE offers to 3 retries with exponential backoff then fall back to hourly attempts, reducing unnecessary signaling traffic. Fully disconnected peers continue to retry aggressively. External events (relay/ICE disconnect, signal/relay reconnect) reset retry state to give ICE a fresh chance. * [client] Clarify guard ICE retry state and trace log trigger Split iceRetryState.attempt into shouldRetry (pure predicate) and enterHourlyMode (explicit state transition) so the caller in reconnectLoopWithRetry reads top-to-bottom. Restore the original trace-log behavior in isConnectedOnAllWay so it only logs on full disconnection, not on the new PartiallyConnected state. * [client] Extract pure evalConnStatus and add unit tests Split isConnectedOnAllWay into a thin method that snapshots state and a pure evalConnStatus helper that takes a connStatusInputs struct, so the tri-state decision logic can be exercised without constructing full Worker or Handshaker objects. Add table-driven tests covering force-relay, ICE-unavailable and fully-available code paths, plus unit tests for iceRetryState budget/hourly transitions and reset. * [client] Improve grammar in logs and refactor ICE credential checks	2026-04-21 15:52:08 +02:00
Zoltan Papp	7f023ce801	[client] Android debug bundle support (#5888 ) Add Android debug bundle support with Troubleshoot UI	2026-04-20 11:26:30 +02:00
Viktor Liu	2e0e3a3601	[client] Replace exclusion routes with scoped default + IP_BOUND_IF on macOS (#5918 )	2026-04-20 10:01:01 +02:00
Maycon Santos	53b04e512a	[management] Reuse a single cache store across all management server consumers (#5889 ) * Add support for legacy IDP cache environment variable * Centralize cache store creation to reuse a single Redis connection pool Each cache consumer (IDP cache, token store, PKCE store, secrets manager, EDR validator) was independently calling NewStore, creating separate Redis clients with their own connection pools — up to 1400 potential connections from a single management server process. Introduce a shared CacheStore() singleton on BaseServer that creates one store at boot and injects it into all consumers. Consumer constructors now receive a store.StoreInterface instead of creating their own. For Redis mode, all consumers share one connection pool (1000 max conns). For in-memory mode, all consumers share one GoCache instance. * Update management-integrations module to latest version * sync go.sum * Export `GetAddrFromEnv` to allow reuse across packages * Update management-integrations module version in go.mod and go.sum * Update management-integrations module version in go.mod and go.sum	2026-04-16 16:04:53 +02:00
Viktor Liu	633dde8d1f	[client] Reconnect conntrack netlink listener on error (#5885 )	2026-04-16 22:30:36 +09:00
Viktor Liu	0d86de47df	[client] Add PCP support (#5219 )	2026-04-15 11:43:16 +02:00
Zoltan Papp	7483fec048	Fix Android internet blackhole caused by stale route re-injection on TUN rebuild (#5865 ) extraInitialRoutes() was meant to preserve only the fake IP route (240.0.0.0/8) across TUN rebuilds, but it re-injected any initial route missing from the current set. When the management server advertised exit node routes (0.0.0.0/0) that were later filtered by the route selector, extraInitialRoutes() re-added them, causing the Android VPN to capture all traffic with no peer to handle it. Store the fake IP route explicitly and append only that in notify(), removing the overly broad initial route diffing.	2026-04-13 09:38:38 +02:00

1 2 3 4 5 ...

818 Commits