netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-04-25 19:56:46 +00:00

Author	SHA1	Message	Date
Zoltán Papp	77ec25796e	client/dns/mgmt: bypass overlay for control-plane FQDN resolution When an exit-node peer's network-map installs a 0.0.0.0/0 default route on the overlay interface before that peer's WireGuard key material is active, any UDP socket dialing an off-link address is routed into wt0 and the kernel returns ENOKEY. Two places needed fixing: 1. The mgmt cache refresh path. It reactively refreshes the control-plane FQDNs advertised by the mgmt (api/signal/stun/turn/ the Relay pool root) after the daemon has installed its own resolv.conf pointing at the overlay listener. Previously the refresh dial followed the chain's upstream handler, which followed the overlay default route and deadlocked on ENOKEY. 2. Foreign relay FQDN resolution. When a remote peer is homed on a different relay instance than us, we need to resolve a streamline-* subdomain that is not in the cache. That lookup went through the same overlay-routed upstream and failed identically, deadlocking the exit-node test whenever the relay LB put the two peers on different instances. Fix both by giving the mgmt cache a dedicated net.Resolver that dials the original pre-NetBird system nameservers through nbnet.NewDialer. The dialer marks the socket as control-plane (SO_MARK on Linux, IP_BOUND_IF on darwin, IP_UNICAST_IF on Windows); the routemanager's policy rules keep those sockets on the underlay regardless of the overlay default. Pool-root domains (the Relay entries in ServerDomains) now register through a subdomain-matching wrapper so that instance subdomains like streamline-de-fra1-0.relay.netbird.io also hit the mgmt cache handler. On cache miss under a pool root, ServeDNS resolves the FQDN on demand through the bypass resolver, caches the result, and returns it. Pool-root membership is derived dynamically from mgmt-advertised ServerDomains.Relay[] — no hardcoded domain lists, no protocol change. No hardcoded fallback nameservers: if the host had no original system resolver at all, the bypass resolver stays nil and the stale-while- revalidate cache keeps serving. The general upstream forwarder and the user DNS path are unchanged.	2026-04-24 17:40:33 +02:00
Viktor Liu	801de8c68d	[client] Add TTL-based refresh to mgmt DNS cache via handler chain (#5945 )	2026-04-22 15:10:14 +02:00
Zoltan Papp	1165058fad	[client] fix port collision in TestUpload (#5950 ) * [debug] fix port collision in TestUpload TestUpload hardcoded :8080, so it failed deterministically when anything was already on that port and collided across concurrent test runs. Bind a :0 listener in the test to get a kernel-assigned free port, and add Server.Serve so tests can hand the listener in without reaching into unexported state. * [debug] drop test-only Server.Serve, use SERVER_ADDRESS env The previous commit added a Server.Serve method on the upload-server, used only by TestUpload. That left production with an unused function. Reserve an ephemeral loopback port in the test, release it, and pass the address through SERVER_ADDRESS (which the server already reads). A small wait helper ensures the server is accepting connections before the upload runs, so the close/rebind gap does not cause a false failure.	2026-04-21 19:07:20 +02:00
Zoltan Papp	2fb50aef6b	[client] allow UDP packet loss in TestICEBind_HandlesConcurrentMixedTraffic (#5953 ) The test writes 500 packets per family and asserted exact-count delivery within a 5s window, even though its own comment says "Some packet loss is acceptable for UDP". On FreeBSD/QEMU runners the writer loops cannot always finish all 500 before the 5s deadline closes the readers (we have seen 411/500 in CI). The real assertion of this test is the routing check — IPv4 peer only gets v4- packets, IPv6 peer only gets v6- packets — which remains strict. Replace the exact-count assertions with a >=80% delivery threshold so runner speed variance no longer causes false failures.	2026-04-21 19:05:58 +02:00
Viktor Liu	064ec1c832	[client] Trust wg interface in firewalld to bypass owner-flagged chains (#5928 )	2026-04-21 17:57:16 +02:00
Viktor Liu	75e408f51c	[client] Prefer systemd-resolved stub over file mode regardless of resolv.conf header (#5935 )	2026-04-21 17:56:56 +02:00
Zoltan Papp	5a89e6621b	[client] Supress ICE signaling (#5820 ) * [client] Suppress ICE signaling and periodic offers in force-relay mode When NB_FORCE_RELAY is enabled, skip WorkerICE creation entirely, suppress ICE credentials in offer/answer messages, disable the periodic ICE candidate monitor, and fix isConnectedOnAllWay to only check relay status so the guard stops sending unnecessary offers. * [client] Dynamically suppress ICE based on remote peer's offer credentials Track whether the remote peer includes ICE credentials in its offers/answers. When remote stops sending ICE credentials, skip ICE listener dispatch, suppress ICE credentials in responses, and exclude ICE from the guard connectivity check. When remote resumes sending ICE credentials, re-enable all ICE behavior. * [client] Fix nil SessionID panic and force ICE teardown on relay-only transition Fix nil pointer dereference in signalOfferAnswer when SessionID is nil (relay-only offers). Close stale ICE agent immediately when remote peer stops sending ICE credentials to avoid traffic black-hole during the ICE disconnect timeout. * [client] Add relay-only fallback check when ICE is unavailable Ensure the relay connection is supported with the peer when ICE is disabled to prevent connectivity issues. * [client] Add tri-state connection status to guard for smarter ICE retry (#5828) * [client] Add tri-state connection status to guard for smarter ICE retry Refactor isConnectedOnAllWay to return a ConnStatus enum (Connected, Disconnected, PartiallyConnected) instead of a boolean. When relay is up but ICE is not (PartiallyConnected), limit ICE offers to 3 retries with exponential backoff then fall back to hourly attempts, reducing unnecessary signaling traffic. Fully disconnected peers continue to retry aggressively. External events (relay/ICE disconnect, signal/relay reconnect) reset retry state to give ICE a fresh chance. * [client] Clarify guard ICE retry state and trace log trigger Split iceRetryState.attempt into shouldRetry (pure predicate) and enterHourlyMode (explicit state transition) so the caller in reconnectLoopWithRetry reads top-to-bottom. Restore the original trace-log behavior in isConnectedOnAllWay so it only logs on full disconnection, not on the new PartiallyConnected state. * [client] Extract pure evalConnStatus and add unit tests Split isConnectedOnAllWay into a thin method that snapshots state and a pure evalConnStatus helper that takes a connStatusInputs struct, so the tri-state decision logic can be exercised without constructing full Worker or Handshaker objects. Add table-driven tests covering force-relay, ICE-unavailable and fully-available code paths, plus unit tests for iceRetryState budget/hourly transitions and reset. * [client] Improve grammar in logs and refactor ICE credential checks	2026-04-21 15:52:08 +02:00
Zoltan Papp	3098f48b25	[client] fix ios network addresses mac filter (#5906 ) * fix(client): skip MAC address filter for network addresses on iOS iOS does not expose hardware (MAC) addresses due to Apple's privacy restrictions (since iOS 14), causing networkAddresses() to return an empty list because all interfaces are filtered out by the HardwareAddr check. Move networkAddresses() to platform-specific files so iOS can skip this filter.	2026-04-20 11:49:38 +02:00
Zoltan Papp	7f023ce801	[client] Android debug bundle support (#5888 ) Add Android debug bundle support with Troubleshoot UI	2026-04-20 11:26:30 +02:00
Michael Uray	e361126515	[client] Fix WGIface.Close deadlock when DNS filter hook re-enters GetDevice (#5916 ) WGIface.Close() took w.mu and held it across w.tun.Close(). The underlying wireguard-go device waits for its send/receive goroutines to drain before Close() returns, and some of those goroutines re-enter WGIface during shutdown. In particular, the userspace packet filter DNS hook in client/internal/dns.ServiceViaMemory.filterDNSTraffic calls s.wgInterface.GetDevice() on every packet, which also needs w.mu. With the Close-side holding the mutex, the read goroutine blocks in GetDevice and Close waits forever for that goroutine to exit: goroutine N (TestDNSPermanent_updateUpstream): WGIface.Close -> holds w.mu -> tun.Close -> sync.WaitGroup.Wait goroutine M (wireguard read routine): FilteredDevice.Read -> filterOutbound -> udpHooksDrop -> filterDNSTraffic.func1 -> WGIface.GetDevice -> sync.Mutex.Lock This surfaces as a 5 minute test timeout on the macOS Client/Unit CI job (panic: test timed out after 5m0s, running tests: TestDNSPermanent_updateUpstream). Release w.mu before calling w.tun.Close(). The other Close steps (wgProxyFactory.Free, waitUntilRemoved, Destroy) do not mutate any fields guarded by w.mu beyond what Free() already does, so the lock is not needed once the tun has started shutting down. A new unit test in iface_close_test.go uses a fake WGTunDevice to reproduce the deadlock deterministically without requiring CAP_NET_ADMIN.	2026-04-20 10:36:19 +02:00
Viktor Liu	95213f7157	[client] Use Match host+exec instead of Host+Match in SSH client config (#5903 )	2026-04-20 10:24:11 +02:00
Viktor Liu	2e0e3a3601	[client] Replace exclusion routes with scoped default + IP_BOUND_IF on macOS (#5918 )	2026-04-20 10:01:01 +02:00
Maycon Santos	53b04e512a	[management] Reuse a single cache store across all management server consumers (#5889 ) * Add support for legacy IDP cache environment variable * Centralize cache store creation to reuse a single Redis connection pool Each cache consumer (IDP cache, token store, PKCE store, secrets manager, EDR validator) was independently calling NewStore, creating separate Redis clients with their own connection pools — up to 1400 potential connections from a single management server process. Introduce a shared CacheStore() singleton on BaseServer that creates one store at boot and injects it into all consumers. Consumer constructors now receive a store.StoreInterface instead of creating their own. For Redis mode, all consumers share one connection pool (1000 max conns). For in-memory mode, all consumers share one GoCache instance. * Update management-integrations module to latest version * sync go.sum * Export `GetAddrFromEnv` to allow reuse across packages * Update management-integrations module version in go.mod and go.sum * Update management-integrations module version in go.mod and go.sum	2026-04-16 16:04:53 +02:00
Viktor Liu	633dde8d1f	[client] Reconnect conntrack netlink listener on error (#5885 )	2026-04-16 22:30:36 +09:00
Michael Uray	7e4542adde	fix(client): populate NetworkAddresses on iOS for posture checks (#5900 ) The iOS GetInfo() function never populated NetworkAddresses, causing the peer_network_range_check posture check to fail for all iOS clients. This adds the same networkAddresses() call that macOS, Linux, Windows, and FreeBSD already use. Fixes: #3968 Fixes: #4657	2026-04-16 14:25:55 +02:00
Viktor Liu	d4c61ed38b	[client] Add mangle FORWARD guard to prevent Docker DNAT bypass of ACL rules (#5697 )	2026-04-16 14:02:52 +02:00
Viktor Liu	6b540d145c	[client] Add --disable-networks flag to block network selection (#5896 )	2026-04-16 14:02:31 +02:00
Viktor Liu	95bc01e48f	[client] Allow clearing saved service env vars with --service-env "" (#5893 )	2026-04-15 19:22:08 +02:00
Viktor Liu	0d86de47df	[client] Add PCP support (#5219 )	2026-04-15 11:43:16 +02:00
Viktor Liu	4eed459f27	[client] Fix DNS resolution with userspace WireGuard and kernel firewall (#5873 )	2026-04-13 16:23:57 +02:00
Zoltan Papp	7483fec048	Fix Android internet blackhole caused by stale route re-injection on TUN rebuild (#5865 ) extraInitialRoutes() was meant to preserve only the fake IP route (240.0.0.0/8) across TUN rebuilds, but it re-injected any initial route missing from the current set. When the management server advertised exit node routes (0.0.0.0/0) that were later filtered by the route selector, extraInitialRoutes() re-added them, causing the Android VPN to capture all traffic with no peer to handle it. Store the fake IP route explicitly and append only that in notify(), removing the overly broad initial route diffing.	2026-04-13 09:38:38 +02:00
Viktor Liu	d2cdc0efec	[client] Use native firewall for peer ACLs in userspace WireGuard mode (#5668 )	2026-04-10 09:12:13 +08:00
Viktor Liu	94a36cb53e	[client] Handle UPnP routers that only support permanent leases (#5826 )	2026-04-08 17:59:59 +02:00
Viktor Liu	c7ba931466	[client] Populate network addresses in FreeBSD system info (#5827 )	2026-04-08 17:14:16 +02:00
Viktor Liu	413d95b740	[client] Include service.json in debug bundle (#5825 ) * Include service.json in debug bundle * Add tests for service params sanitization logic	2026-04-08 21:10:31 +08:00
Viktor Liu	332c624c55	[client] Don't abort UI debug bundle when up/down fails (#5780 )	2026-04-08 10:33:46 +02:00
Viktor Liu	dc160aff36	[client] Fix SSH proxy stripping shell quoting from forwarded commands (#5669 )	2026-04-08 10:25:57 +02:00
Viktor Liu	d33cd4c95b	[client] Add NAT-PMP/UPnP support (#5202 )	2026-04-08 15:29:32 +08:00
Maycon Santos	e2c2f64be7	[client] Fix iOS DNS upstream routing for deselected exit nodes (#5803 ) - Add GetSelectedClientRoutes() to the route manager that filters through FilterSelectedExitNodes, returning only active routes instead of all management routes - Use GetSelectedClientRoutes() in the DNS route checker so deselected exit nodes' 0.0.0.0/0 no longer matches upstream DNS IPs — this prevented the resolver from switching away from the utun-bound socket after exit node deselection - Initialize iOS DNS server with host DNS fallback addresses (1.1.1.1:53, 1.0.0.1:53) and a permanent root zone handler, matching Android's behavior — without this, unmatched DNS queries arriving via the 0.0.0.0/0 tunnel route had no handler and were silently dropped	2026-04-08 08:43:48 +02:00
Viktor Liu	cb73b94ffb	[client] Add TCP DNS support for local listener (#5758 )	2026-04-08 07:40:36 +02:00
Viktor Liu	1d920d700c	[client] Fix SSH server Stop() deadlock when sessions are active (#5717 )	2026-04-07 17:56:54 +02:00
Viktor Liu	bb85eee40a	[client] Skip down interfaces in network address collection for posture checks (#5768 )	2026-04-07 17:56:48 +02:00
Viktor Liu	aba5d6f0d2	[client] Error out on netbird expose when block inbound is enabled (#5818 )	2026-04-07 17:55:35 +02:00
Zoltan Papp	6da34e483c	[client] Fix mgmProber interface to match unexported GetServerPublicKey (#5815 ) Update the mgmProber interface to use HealthCheck() instead of the now-unexported GetServerPublicKey(), aligning with the changes in the management client API.	2026-04-07 13:13:38 +02:00
Zoltan Papp	0efef671d7	[client] Unexport GetServerPublicKey, add HealthCheck method (#5735 ) * Unexport GetServerPublicKey, add HealthCheck method Internalize server key fetching into Login, Register, GetDeviceAuthorizationFlow, and GetPKCEAuthorizationFlow methods, removing the need for callers to fetch and pass the key separately. Replace the exported GetServerPublicKey with a HealthCheck() error method for connection validation, keeping IsHealthy() bool for non-blocking background monitoring. Fix test encryption to use correct key pairs (client public key as remotePubKey instead of server private key). * Refactor `doMgmLogin` to return only error, removing unused response	2026-04-07 12:18:21 +02:00
Maycon Santos	decb5dd3af	[client] Add GetSelectedClientRoutes to route manager and update DNS route check (#5802 ) - DNS resolution broke after deselecting an exit node because the route checker used all client routes (including deselected ones) to decide how to forward upstream DNS queries - Added GetSelectedClientRoutes() to the route manager that filters out deselected exit nodes, and switched the DNS route checker to use it - Confirmed fix via device testing: after deselecting exit node, DNS queries now correctly use a regular network socket instead of binding to the utun interface	2026-04-05 13:44:53 +02:00
Viktor Liu	28fbf96b2a	[client] Fix flaky TestServiceLifecycle/Restart on FreeBSD (#5786 )	2026-04-02 21:45:49 +02:00
Bethuel Mmbaga	9d1a37c644	[management,client] Revert gRPC client secret removal (#5781 ) * This reverts commit `e5914e4e8b` Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com> * Deprecate client secret in proto Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com> * Fix lint Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com> --------- Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com>	2026-04-02 18:21:00 +02:00
tham-le	81f45dab21	[client] Support embed.Client on Android with netstack mode (#5623 ) * [client] Support embed.Client on Android with netstack mode embed.Client.Start() calls ConnectClient.Run() which passes an empty MobileDependency{}. On Android, the engine dereferences nil fields (IFaceDiscover, NetworkChangeListener, DnsReadyListener) causing panics. Provide complete no-op stubs so the engine's existing Android code paths work unchanged — zero modifications to engine.go: - Add androidRunOverride hook in Run() for Android-specific dispatch - Add runOnAndroidEmbed() with complete MobileDependency (all stubs) - Wire default stubs via init() in connect_android_default.go: noopIFaceDiscover, noopNetworkChangeListener, noopDnsReadyListener - Forward logPath to c.run() Tested: embed.Client starts on Android arm64, joins mesh via relay, discovers peers, localhost proxy works for TCP+UDP forwarding. * [client] Fix TestServiceParamsPath for Windows path separators Use filepath.Join in test assertions instead of hardcoded POSIX paths so the test passes on Windows where filepath.Join uses backslashes.	2026-04-01 16:19:34 +02:00
Zoltan Papp	4d3e2f8ad3	Fix path join (#5762 )	2026-04-01 13:21:19 +02:00
Bethuel Mmbaga	e5914e4e8b	[management,client] Remove client secret from gRPC auth flow (#5751 ) Remove client secret from gRPC auth flow. The secret was originally included to support providers like Google Workspace that don't offer a proper PKCE flow, but this is no longer necessary with the embedded IdP. Deployments using such providers should migrate to the embedded IdP instead.	2026-03-31 18:50:49 +03:00
Viktor Liu	6553ce4cea	[client] Mock management client in TestUpdateOldManagementURL to fix CI flakiness (#5703 )	2026-03-31 10:49:06 +02:00
Viktor Liu	a62d472bc4	[client] Include fake IP block routes in Android TUN rebuilds (#5739 )	2026-03-31 10:36:27 +02:00
Akshay Ubale	7bbe71c3ac	[client] Refactor Android PeerInfo to use proper ConnStatus enum type (#5644 ) * Simplify Android ConnStatus API with integer constants Replace dual field PeerInfo design with unified integer based ConnStatus field and exported gomobile friendly constants. Changes: > PeerInfo.ConnStatus: changed from string to int > Export three constants: ConnStatusIdle, ConnStatusConnecting,ConnStatusConnected (mapped to peer.ConnStatus enum values) > Updated PeersList() to convert peer enum directly to int Benefits: > Simpler API surface with single ConnStatus field > Better gomobile compatibility for cross-platform usage > Type-safe integer constants across language boundaries * test: add All group to setupTestAccount fixture The setupTestAccount() test helper was missing the required "All" group, causing "failed to get group all: no group ALL found" errors during test execution. Add the All group with all test peers to match the expected account structure. Fixes the failing account and types package tests when GetGroupAll() is called in test scenarios.	2026-03-30 17:55:01 +02:00
Viktor Liu	04dcaadabf	[client] Persist service install parameters across reinstalls (#5732 )	2026-03-30 16:25:14 +02:00
Zoltan Papp	c522506849	[client] Add Expose support to embed library (#5695 ) * [client] Add Expose support to embed library Add ability to expose local services via the NetBird reverse proxy from embedded client code. Introduce ExposeSession with a blocking Wait method that keeps the session alive until the context is cancelled. Extract ProtocolType with ParseProtocolType into the expose package and use it across CLI and embed layers. * Fix TestNewRequest assertion to use ProtocolType instead of int * Add documentation for Request and KeepAlive in expose manager * Refactor ExposeSession to pass context explicitly in Wait method * Refactor ExposeSession Wait method to explicitly pass context * Update client/embed/expose.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Fix build * Update client/embed/expose.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: Viktor Liu <viktor@netbird.io> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Viktor Liu <17948409+lixmal@users.noreply.github.com>	2026-03-30 15:53:50 +02:00
tobsec	13807f1b3d	[client] Fix Exit Node submenu separator accumulation on Windows (#5691 ) * client/ui: fix Exit Node submenu separator accumulation on Windows On Windows the tray uses a background poller (every 10s) instead of TrayOpenedCh to keep the Exit Node menu fresh. Each poll that has a selected exit node called s.mExitNode.AddSeparator() before the "Deselect All" item. Because AddSeparator() returns no handle the separator was never removed in the cleanup pass of recreateExitNodeMenu(), while every other item (exit node checkboxes and the "Deselect All" entry) was properly tracked and removed. After the client has been running for a while with an exit node selected this leaves hundreds of separator lines stacked in the submenu, filling the screen height with blank entries (#4702). On Linux/FreeBSD this is masked because the parent mExitNode item itself is removed and recreated each cycle, wiping all children including orphaned separators. Fix: replace the untracked AddSeparator() call with a regular disabled sub-menu item that is stored in mExitNodeSeparator and removed at the start of each recreateExitNodeMenu() call alongside mExitNodeDeselectAll. Fixes #4702 * client/ui: extract addExitNodeDeselectAll to reduce cognitive complexity Move the separator + deselect-all creation and its goroutine listener out of recreateExitNodeMenu into a dedicated helper, bringing the function's cognitive complexity back under the SonarCloud threshold.	2026-03-30 10:41:38 +02:00
Viktor Liu	145d82f322	[client] Replace iOS DNS IsPrivate heuristic with route manager check (#5694 )	2026-03-26 18:11:05 +08:00
Viktor Liu	2313494e0e	[client] Don't abort debug for command when up/down fails (#5657 )	2026-03-23 14:04:03 +01:00
Viktor Liu	fd9d430334	[client] Simplify entrypoint by running netbird up unconditionally (#5652 )	2026-03-23 09:39:32 +01:00

1 2 3 4 5 ...

1173 Commits