netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-05-04 16:16:40 +00:00

Author	SHA1	Message	Date
Viktor Liu	2cdd553048	Bump go-netroute to v0.4.0 and drop fork	2026-05-04 18:09:48 +02:00
Zoltan Papp	a547fc74ed	[client] Use ctx.Err() instead of gRPC codes.Canceled to detect shutdown (#6019 ) Detecting shutdown by inspecting the gRPC status code conflates a local context cancellation with a server- or proxy-sent codes.Canceled. When the latter occurs (e.g. an intermediary proxy resets the stream), the retry loop silently terminates and the client never reconnects. Switch to ctx.Err() in the signal Receive loop and management Sync/Job handlers, and stop matching gRPC Canceled/DeadlineExceeded in the flow client's isContextDone helper. With this change, a server-sent Canceled is treated as a transient error and the backoff retry loop continues.	2026-05-04 11:59:25 +02:00
Zoltan Papp	a21f6ecb0a	[client] release Status.mux before invoking notifier callbacks (#6039 ) The Status recorder used to fire notifier callbacks while holding d.mux: - notifyPeerListChanged / notifyPeerStateChangeListeners ran from inside the locked section of every Update/AddPeerStateRoute/etc. - notifyAddressChanged ran from UpdateLocalPeerState and CleanLocalPeerState while d.mux was held. - onConnectionChanged was registered with a defer above defer d.mux.Unlock, so it executed before the mutex was released in the MarkConnected/ Disconnected helpers. - notifyPeerStateChangeListeners did a blocking channel send under d.mux, so a slow subscriber stalled every other d.mux holder. A listener that re-enters the recorder (e.g. calls GetFullStatus from within a callback) deadlocks against d.mux, and any callback that takes longer than expected stalls every other state query for its duration. Capture the values needed for notification under the lock, release d.mux, then call the notifier. Build per-peer router-state snapshots inside the lock and dispatch them via dispatchRouterPeers afterwards. The router-peer channel send stays blocking, but now happens outside d.mux so a slow consumer cannot stall any other d.mux holder, and no peer state transitions are silently dropped. The notifier itself is unchanged: its internal state was already protected by its own locks, and the field d.notifier is set once in NewRecorder and never reassigned, so reading it without d.mux is safe. Also fix a pre-existing race in Test_notifier_RemoveListener / Test_notifier_SetListener: setListener spawns a goroutine that writes listener.peers, but the tests read listener.peers without waiting for it.	2026-05-04 11:59:01 +02:00
Bethuel Mmbaga	6262b0d841	[management] Track pending approval in peer event metadata (#6040 )	2026-05-04 12:47:13 +03:00
Viktor Liu	50b58a6828	[client, relay] Advertise relay server IP via signal for foreign-relay fallback dial (#6004 )	2026-05-04 11:40:25 +02:00
Viktor Liu	057d651d2e	[client, proxy] Add packet capture to debug bundle and CLI (#5891 )	2026-05-04 11:28:56 +02:00
Misha Bragin	c4b2da4c92	[management] Add public connection ipv4 and ipv6 posture check (#6038 ) This change enables admins to configure posture checks for connecting public IPs of their peers. It changes the behavior of the check as well and now the evaluation is if the received network is part of the configured network.	2026-04-30 18:36:50 +02:00
Nicolas Frati	dcd1db42ef	[management] Enable PAT creation during setup (#6003 ) * enable pat creation on setup * remove logic from handler towards setup service * fix lint issue * fix rollback on account id returning empty * fix coderabbit comments * fix setup PAT rollback behavior	2026-04-30 17:21:35 +02:00
Pascal Fischer	f29f5a0978	[management] add monitoring for nmap update source (#6036 )	2026-04-30 14:52:54 +02:00
Maycon Santos	3fc5a8d4a1	[misc] fix MSI generation add installer tests (#6031 ) Add Windows installer build test workflow v0.70.4	2026-04-29 23:44:38 +02:00
Zoltan Papp	57945fc328	[client] Trigger mobile submodule bump PRs on release tags (#6029 ) Trigger mobile submodule bump PRs on release tags v0.70.3	2026-04-29 17:19:22 +02:00
Viktor Liu	ed828b7af4	Tolerate EEXIST when adding macOS scoped default routes (#6027 )	2026-04-29 16:08:47 +02:00
Viktor Liu	11ac2af2f5	Use BindListener for all userspace bind in lazyconn activity (#6028 )	2026-04-29 16:07:33 +02:00
Bethuel Mmbaga	df197d5001	[management] Prevent JWT reuse during peer login (#6002 )	2026-04-29 15:04:27 +03:00
shuuri-labs	ad93dcf980	[client] Enable UI autostart for silent and MSI installs (#6026 ) * fix(client): enable UI autostart for silent and MSI installs The MSI installer had no autostart logic and the EXE silent installer skipped the autostart page, leaving the registry entry unwritten. This caused the NetBird UI tray to not start at login after RMM deployments. Add an AUTOSTART property (default: 1) to the MSI that writes the HKLM Run key, and initialize AutostartEnabled in the NSIS .onInit so silent installs match the interactive default. * add real guid for NetBirdAutoStart component	2026-04-29 13:14:46 +02:00
Nicolas Frati	7eba5dafd8	[misc] Add comment automation on release workflow for PRs (#6016 ) * feat: add comment automation on release workflow for PRs * update action permissions v0.70.2	2026-04-29 11:28:55 +02:00
Viktor Liu	28fe26637b	[client] Fix Windows installer upgrade detection for pre-0.70.1 installs (#6025 )	2026-04-29 11:01:07 +02:00
Viktor Liu	407e9d304b	[client] Move macOS sleep detection into the daemon (purego) (#5926 )	2026-04-29 08:09:55 +02:00
Viktor Liu	e5474e199f	[client] Use WinRT COM for Windows toasts (#6013 ) * Use WinRT COM for Windows toasts instead of fyne's PowerShell path * Quote autostart path and split HKCU registry into per-user component v0.70.1	2026-04-28 20:54:06 +02:00
Bethuel Mmbaga	db44848e2d	[management] Drop netmap calculation on peer read (#6006 )	2026-04-28 18:25:56 +03:00
EL OUAZIZI Walid	9417ce3b3a	fix(getting-started): Infinite healthcheck loop with existing traefik (#5871 )	2026-04-28 17:22:51 +02:00
Zoltan Papp	8fc4265995	[relay] evict foreign client cache on disconnect (#6015 ) * [relay] evict foreign client cache on disconnect When a foreign relay's TCP connection drops, the manager's onServerDisconnected handler only triggered reconnect logic for the home server; the disconnected foreign entry stayed in the relayClients cache. Subsequent OpenConn calls reused the closed client until the 60-second cleanup tick evicted it, breaking peer connectivity through that relay for up to a minute. Evict the foreign entry from the cache on disconnect so the next OpenConn dials a fresh client. Also: - Make the reconnect backoff cap configurable via WithMaxBackoffInterval ManagerOption; the previous hard-coded 60s constant forced TestAutoReconnect to sleep ~61s. Test now polls Ready() and finishes in ~2s. - Add NB_HOME_RELAY_SERVERS env var that overrides the relay URL list received from management, so a peer can be pinned to a specific home relay (used by the netbird-conn-lab Edge 4 reproducer). * [client] treat empty NB_HOME_RELAY_SERVERS as unset Returning (urls=[], ok=true) when the env var contained only separators or whitespace caused callers to wipe the mgmt-provided relay list, leaving the peer with no relays. Treat a parsed-empty result the same as an unset env.	2026-04-28 15:04:48 +02:00
Zoltan Papp	9c50819f20	Don't mark management disconnected on transient job stream errors (#6005 ) The JOB stream is a separate channel from the SYNC stream. Server-side EOF or other transient errors on the JOB stream do not indicate that the management connection is unhealthy — the SYNC stream remains the authoritative state signal. Previously, a JOB stream EOF would call notifyDisconnected and the client would emit OnConnecting to the UI. The backoff retry would reconnect the JOB stream, but handleJobStream never calls notifyConnected on success, so the UI was stuck on "Connecting" until the next SYNC event or health check. Keep notifyDisconnected for codes.PermissionDenied since IsLoginRequired relies on managementError to detect expired auth.	2026-04-28 15:04:41 +02:00
Bethuel Mmbaga	6f0eff3ba0	[management] Handle single-string JWT group claim from IdPs (#6014 )	2026-04-28 14:48:28 +03:00
Bethuel Mmbaga	f8745723fc	[management] Add Microsoft AD FS support for embedded Dex identity providers (#6008 )	2026-04-28 12:42:19 +03:00
Vlad	154b81645a	[management] removed legacy network map code (#5565 )	2026-04-27 16:02:54 +02:00
Maycon Santos	34167c8a16	[misc] Update release pipeline version (#5995 ) v0.70.0	2026-04-27 10:55:38 +02:00
Maycon Santos	d6f08e4840	[misc] Update sign pipeline version (#5981 )	2026-04-24 13:13:27 +02:00
Zoltan Papp	f732b01a05	[management] unify peer-update test timeout via constant (#5952 ) peerShouldReceiveUpdate waited 500ms for the expected update message, and every outer wrapper across the management/server test suite paired it with a 1s goroutine-drain timeout. Both were too tight for slower CI runners (MySQL, FreeBSD, loaded sqlite), producing intermittent "Timed out waiting for update message" failures in tests like TestDNSAccountPeersUpdate, TestPeerAccountPeersUpdate, and TestNameServerAccountPeersUpdate. Introduce peerUpdateTimeout (5s) next to the helper and use it both in the helper and in every outer wrapper so the two timeouts stay in sync. Only runs down on failure; passing tests return as soon as the channel delivers, so there is no slowdown on green runs.	2026-04-23 21:19:21 +02:00
alsruf36	c07c726ea7	[proxy] Set session cookie path to root (#5915 )	2026-04-23 18:20:54 +02:00
Pascal Fischer	fa0d58d093	[management] exclude peers for expiration job that have already been marked expired (#5970 )	2026-04-23 16:01:54 +02:00
Vlad	b6038e8acd	[management] refactor: changeable pat rate limiting (#5946 )	2026-04-23 15:13:22 +02:00
Zoltan Papp	5da05ecca6	[client] increase gRPC health check timeout to 5s (#5961 ) Bump the IsHealthy() context timeout from 1s to 5s for both the management and signal gRPC clients to reduce false negatives on slower or congested connections.	2026-04-22 20:54:18 +02:00
Viktor Liu	801de8c68d	[client] Add TTL-based refresh to mgmt DNS cache via handler chain (#5945 )	2026-04-22 15:10:14 +02:00
Viktor Liu	a822a33240	[self-hosted] Use cscli lapi status for CrowdSec readiness in installer (#5949 )	2026-04-22 10:35:22 +02:00
Bethuel Mmbaga	57b23c5b25	[management] Propagate context changes to upstream middleware (#5956 )	2026-04-21 23:06:52 +03:00
Zoltan Papp	1165058fad	[client] fix port collision in TestUpload (#5950 ) * [debug] fix port collision in TestUpload TestUpload hardcoded :8080, so it failed deterministically when anything was already on that port and collided across concurrent test runs. Bind a :0 listener in the test to get a kernel-assigned free port, and add Server.Serve so tests can hand the listener in without reaching into unexported state. * [debug] drop test-only Server.Serve, use SERVER_ADDRESS env The previous commit added a Server.Serve method on the upload-server, used only by TestUpload. That left production with an unused function. Reserve an ephemeral loopback port in the test, release it, and pass the address through SERVER_ADDRESS (which the server already reads). A small wait helper ensures the server is accepting connections before the upload runs, so the close/rebind gap does not cause a false failure.	2026-04-21 19:07:20 +02:00
Zoltan Papp	703353d354	[flow] fix goroutine leak in TestReceive_ProtocolErrorStreamReconnect (#5951 ) The Receive goroutine could outlive the test and call t.Logf after teardown, panicking with "Log in goroutine after ... has completed". Register a cleanup that waits for the goroutine to exit; ordering is LIFO so it runs after client.Close, which is what unblocks Receive.	2026-04-21 19:06:47 +02:00
Zoltan Papp	2fb50aef6b	[client] allow UDP packet loss in TestICEBind_HandlesConcurrentMixedTraffic (#5953 ) The test writes 500 packets per family and asserted exact-count delivery within a 5s window, even though its own comment says "Some packet loss is acceptable for UDP". On FreeBSD/QEMU runners the writer loops cannot always finish all 500 before the 5s deadline closes the readers (we have seen 411/500 in CI). The real assertion of this test is the routing check — IPv4 peer only gets v4- packets, IPv6 peer only gets v6- packets — which remains strict. Replace the exact-count assertions with a >=80% delivery threshold so runner speed variance no longer causes false failures.	2026-04-21 19:05:58 +02:00
Vlad	eb3aa96257	[management] check policy for changes before actual db update (#5405 )	2026-04-21 18:37:04 +02:00
Viktor Liu	064ec1c832	[client] Trust wg interface in firewalld to bypass owner-flagged chains (#5928 )	2026-04-21 17:57:16 +02:00
Viktor Liu	75e408f51c	[client] Prefer systemd-resolved stub over file mode regardless of resolv.conf header (#5935 )	2026-04-21 17:56:56 +02:00
Zoltan Papp	5a89e6621b	[client] Supress ICE signaling (#5820 ) * [client] Suppress ICE signaling and periodic offers in force-relay mode When NB_FORCE_RELAY is enabled, skip WorkerICE creation entirely, suppress ICE credentials in offer/answer messages, disable the periodic ICE candidate monitor, and fix isConnectedOnAllWay to only check relay status so the guard stops sending unnecessary offers. * [client] Dynamically suppress ICE based on remote peer's offer credentials Track whether the remote peer includes ICE credentials in its offers/answers. When remote stops sending ICE credentials, skip ICE listener dispatch, suppress ICE credentials in responses, and exclude ICE from the guard connectivity check. When remote resumes sending ICE credentials, re-enable all ICE behavior. * [client] Fix nil SessionID panic and force ICE teardown on relay-only transition Fix nil pointer dereference in signalOfferAnswer when SessionID is nil (relay-only offers). Close stale ICE agent immediately when remote peer stops sending ICE credentials to avoid traffic black-hole during the ICE disconnect timeout. * [client] Add relay-only fallback check when ICE is unavailable Ensure the relay connection is supported with the peer when ICE is disabled to prevent connectivity issues. * [client] Add tri-state connection status to guard for smarter ICE retry (#5828) * [client] Add tri-state connection status to guard for smarter ICE retry Refactor isConnectedOnAllWay to return a ConnStatus enum (Connected, Disconnected, PartiallyConnected) instead of a boolean. When relay is up but ICE is not (PartiallyConnected), limit ICE offers to 3 retries with exponential backoff then fall back to hourly attempts, reducing unnecessary signaling traffic. Fully disconnected peers continue to retry aggressively. External events (relay/ICE disconnect, signal/relay reconnect) reset retry state to give ICE a fresh chance. * [client] Clarify guard ICE retry state and trace log trigger Split iceRetryState.attempt into shouldRetry (pure predicate) and enterHourlyMode (explicit state transition) so the caller in reconnectLoopWithRetry reads top-to-bottom. Restore the original trace-log behavior in isConnectedOnAllWay so it only logs on full disconnection, not on the new PartiallyConnected state. * [client] Extract pure evalConnStatus and add unit tests Split isConnectedOnAllWay into a thin method that snapshots state and a pure evalConnStatus helper that takes a connStatusInputs struct, so the tri-state decision logic can be exercised without constructing full Worker or Handshaker objects. Add table-driven tests covering force-relay, ICE-unavailable and fully-available code paths, plus unit tests for iceRetryState budget/hourly transitions and reset. * [client] Improve grammar in logs and refactor ICE credential checks	2026-04-21 15:52:08 +02:00
Misha Bragin	06dfa9d4a5	[management] replace mailru/easyjson with netbirdio/easyjson fork (#5938 )	2026-04-21 13:59:35 +02:00
Misha Bragin	45d9ee52c0	[self-hosted] add reverse proxy retention fields to combined YAML (#5930 )	2026-04-21 10:21:11 +02:00
Zoltan Papp	3098f48b25	[client] fix ios network addresses mac filter (#5906 ) * fix(client): skip MAC address filter for network addresses on iOS iOS does not expose hardware (MAC) addresses due to Apple's privacy restrictions (since iOS 14), causing networkAddresses() to return an empty list because all interfaces are filtered out by the HardwareAddr check. Move networkAddresses() to platform-specific files so iOS can skip this filter. v0.69.0	2026-04-20 11:49:38 +02:00
Zoltan Papp	7f023ce801	[client] Android debug bundle support (#5888 ) Add Android debug bundle support with Troubleshoot UI	2026-04-20 11:26:30 +02:00
Michael Uray	e361126515	[client] Fix WGIface.Close deadlock when DNS filter hook re-enters GetDevice (#5916 ) WGIface.Close() took w.mu and held it across w.tun.Close(). The underlying wireguard-go device waits for its send/receive goroutines to drain before Close() returns, and some of those goroutines re-enter WGIface during shutdown. In particular, the userspace packet filter DNS hook in client/internal/dns.ServiceViaMemory.filterDNSTraffic calls s.wgInterface.GetDevice() on every packet, which also needs w.mu. With the Close-side holding the mutex, the read goroutine blocks in GetDevice and Close waits forever for that goroutine to exit: goroutine N (TestDNSPermanent_updateUpstream): WGIface.Close -> holds w.mu -> tun.Close -> sync.WaitGroup.Wait goroutine M (wireguard read routine): FilteredDevice.Read -> filterOutbound -> udpHooksDrop -> filterDNSTraffic.func1 -> WGIface.GetDevice -> sync.Mutex.Lock This surfaces as a 5 minute test timeout on the macOS Client/Unit CI job (panic: test timed out after 5m0s, running tests: TestDNSPermanent_updateUpstream). Release w.mu before calling w.tun.Close(). The other Close steps (wgProxyFactory.Free, waitUntilRemoved, Destroy) do not mutate any fields guarded by w.mu beyond what Free() already does, so the lock is not needed once the tun has started shutting down. A new unit test in iface_close_test.go uses a fake WGTunDevice to reproduce the deadlock deterministically without requiring CAP_NET_ADMIN.	2026-04-20 10:36:19 +02:00
Viktor Liu	95213f7157	[client] Use Match host+exec instead of Host+Match in SSH client config (#5903 )	2026-04-20 10:24:11 +02:00
Viktor Liu	2e0e3a3601	[client] Replace exclusion routes with scoped default + IP_BOUND_IF on macOS (#5918 )	2026-04-20 10:01:01 +02:00

1 2 3 4 5 ...

2846 Commits