netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-07-19 15:01:29 +02:00

Author	SHA1	Message	Date
mlsmaycon	3eb1298cb4	Refactor sync fast path tests and fix CI flakiness - Introduce `skipOnWindows` helper to properly skip tests relying on Unix specific paths. - Replace fixed sleep with `require.Eventually` in `waitForPeerDisconnect` to address flakiness in CI. - Split `commitFastPath` logic out of `runFastPathSync` to close race conditions and improve clarity. - Update tests to leverage new helpers and more precise assertions (e.g., `waitForPeerDisconnect`). - Add `flakyStore` test helper to exercise fail-closed behavior in flag handling. - Enhance `RunFastPathFlagRoutine` to disable the flag on store read errors.	2026-04-21 17:07:31 +02:00
mlsmaycon	93391fc68f	generate only current.bin and android_current.bin on ci/cd	2026-04-21 16:49:54 +02:00
mlsmaycon	48c080b861	Replace Redis dependency with a generic cache store for fast path flag handling	2026-04-21 16:28:24 +02:00
mlsmaycon	3716838c25	Remove unused cacheKey helper and testcontainers imports, simplify Redis container setup	2026-04-21 16:17:31 +02:00
mlsmaycon	5d58000dbd	Merge branch 'main' into cached-serial-check-on-sync	2026-04-21 15:55:47 +02:00
mlsmaycon	8430b06f2a	[management] Add Redis-backed kill switch for Sync fast path Gate the peer-sync fast path on a runtime flag polled from Redis so operators can roll the optimisation out gradually and flip it off without a redeploy. Without NB_PEER_SYNC_REDIS_ADDRESS the routine stays disabled, every Sync runs the full network map path, and no entries accumulate in the peer serial cache — bit-for-bit identical to the pre-fast-path behaviour. When the env var is set, a background goroutine polls the configured key (default "peerSyncFastPath") every minute; values "1" or "true" enable the fast path, anything else disables it. - RunFastPathFlagRoutine mirrors shared/logleveloverrider: dedicated Redis connection, background ticker, redis.Nil treated as disabled. - NewServer takes the flag handle; tryFastPathSync and the recordPeerSyncEntry helpers short-circuit when Enabled() is false. - invalidatePeerSyncEntry still runs on Login regardless of flag state. - NewFastPathFlag(bool) exposed for tests and callers that need to force a state without going through Redis.	2026-04-21 15:52:34 +02:00
Zoltan Papp	5a89e6621b	[client] Supress ICE signaling (#5820 ) * [client] Suppress ICE signaling and periodic offers in force-relay mode When NB_FORCE_RELAY is enabled, skip WorkerICE creation entirely, suppress ICE credentials in offer/answer messages, disable the periodic ICE candidate monitor, and fix isConnectedOnAllWay to only check relay status so the guard stops sending unnecessary offers. * [client] Dynamically suppress ICE based on remote peer's offer credentials Track whether the remote peer includes ICE credentials in its offers/answers. When remote stops sending ICE credentials, skip ICE listener dispatch, suppress ICE credentials in responses, and exclude ICE from the guard connectivity check. When remote resumes sending ICE credentials, re-enable all ICE behavior. * [client] Fix nil SessionID panic and force ICE teardown on relay-only transition Fix nil pointer dereference in signalOfferAnswer when SessionID is nil (relay-only offers). Close stale ICE agent immediately when remote peer stops sending ICE credentials to avoid traffic black-hole during the ICE disconnect timeout. * [client] Add relay-only fallback check when ICE is unavailable Ensure the relay connection is supported with the peer when ICE is disabled to prevent connectivity issues. * [client] Add tri-state connection status to guard for smarter ICE retry (#5828) * [client] Add tri-state connection status to guard for smarter ICE retry Refactor isConnectedOnAllWay to return a ConnStatus enum (Connected, Disconnected, PartiallyConnected) instead of a boolean. When relay is up but ICE is not (PartiallyConnected), limit ICE offers to 3 retries with exponential backoff then fall back to hourly attempts, reducing unnecessary signaling traffic. Fully disconnected peers continue to retry aggressively. External events (relay/ICE disconnect, signal/relay reconnect) reset retry state to give ICE a fresh chance. * [client] Clarify guard ICE retry state and trace log trigger Split iceRetryState.attempt into shouldRetry (pure predicate) and enterHourlyMode (explicit state transition) so the caller in reconnectLoopWithRetry reads top-to-bottom. Restore the original trace-log behavior in isConnectedOnAllWay so it only logs on full disconnection, not on the new PartiallyConnected state. * [client] Extract pure evalConnStatus and add unit tests Split isConnectedOnAllWay into a thin method that snapshots state and a pure evalConnStatus helper that takes a connStatusInputs struct, so the tri-state decision logic can be exercised without constructing full Worker or Handshaker objects. Add table-driven tests covering force-relay, ICE-unavailable and fully-available code paths, plus unit tests for iceRetryState budget/hourly transitions and reset. * [client] Improve grammar in logs and refactor ICE credential checks	2026-04-21 15:52:08 +02:00
Misha Bragin	06dfa9d4a5	[management] replace mailru/easyjson with netbirdio/easyjson fork (#5938 )	2026-04-21 13:59:35 +02:00
Misha Bragin	45d9ee52c0	[self-hosted] add reverse proxy retention fields to combined YAML (#5930 )	2026-04-21 10:21:11 +02:00
Zoltan Papp	3098f48b25	[client] fix ios network addresses mac filter (#5906 ) * fix(client): skip MAC address filter for network addresses on iOS iOS does not expose hardware (MAC) addresses due to Apple's privacy restrictions (since iOS 14), causing networkAddresses() to return an empty list because all interfaces are filtered out by the HardwareAddr check. Move networkAddresses() to platform-specific files so iOS can skip this filter. v0.69.0	2026-04-20 11:49:38 +02:00
Zoltan Papp	7f023ce801	[client] Android debug bundle support (#5888 ) Add Android debug bundle support with Troubleshoot UI	2026-04-20 11:26:30 +02:00
Michael Uray	e361126515	[client] Fix WGIface.Close deadlock when DNS filter hook re-enters GetDevice (#5916 ) WGIface.Close() took w.mu and held it across w.tun.Close(). The underlying wireguard-go device waits for its send/receive goroutines to drain before Close() returns, and some of those goroutines re-enter WGIface during shutdown. In particular, the userspace packet filter DNS hook in client/internal/dns.ServiceViaMemory.filterDNSTraffic calls s.wgInterface.GetDevice() on every packet, which also needs w.mu. With the Close-side holding the mutex, the read goroutine blocks in GetDevice and Close waits forever for that goroutine to exit: goroutine N (TestDNSPermanent_updateUpstream): WGIface.Close -> holds w.mu -> tun.Close -> sync.WaitGroup.Wait goroutine M (wireguard read routine): FilteredDevice.Read -> filterOutbound -> udpHooksDrop -> filterDNSTraffic.func1 -> WGIface.GetDevice -> sync.Mutex.Lock This surfaces as a 5 minute test timeout on the macOS Client/Unit CI job (panic: test timed out after 5m0s, running tests: TestDNSPermanent_updateUpstream). Release w.mu before calling w.tun.Close(). The other Close steps (wgProxyFactory.Free, waitUntilRemoved, Destroy) do not mutate any fields guarded by w.mu beyond what Free() already does, so the lock is not needed once the tun has started shutting down. A new unit test in iface_close_test.go uses a fake WGTunDevice to reproduce the deadlock deterministically without requiring CAP_NET_ADMIN.	2026-04-20 10:36:19 +02:00
Viktor Liu	95213f7157	[client] Use Match host+exec instead of Host+Match in SSH client config (#5903 )	2026-04-20 10:24:11 +02:00
Viktor Liu	2e0e3a3601	[client] Replace exclusion routes with scoped default + IP_BOUND_IF on macOS (#5918 )	2026-04-20 10:01:01 +02:00
mlsmaycon	3f4ef0031b	[management] Skip full network map on Sync when peer state is unchanged Introduce a peer-sync cache keyed by WireGuard pubkey that records the NetworkMap.Serial and meta hash the server last delivered to each peer. When a Sync request arrives from a non-Android peer whose cached serial matches the current account serial and whose meta hash matches the last delivery, short-circuit SyncAndMarkPeer and reply with a NetbirdConfig-only SyncResponse mirroring the shape TimeBasedAuthSecretsManager already pushes for TURN/Relay token rotation. The client keeps its existing network map state and refreshes only control-plane credentials. The fast path avoids GetAccountWithBackpressure, the full per-peer map assembly, posture-check recomputation and the large encrypted payload on every reconnect of a peer whose account is quiescent. Slow path remains the source of truth for any real state change; every full-map send (initial sync or streamed NetworkMap update) rewrites the cache, and every Login deletes it so a fresh map is guaranteed after SSH key rotation, approval changes or re-registration. Backend-only: no proto changes and no client changes. Compatibility is provided by the existing client handling of nil NetworkMap in handleSync (every version from v0.20.0 on). Android is gated out at the server because its readInitialSettings path calls GrpcClient.GetNetworkMap which errors on nil map. The cache is wired through BaseServer.CacheStore() so it shares the same Redis/in-memory backend as OneTimeTokenStore and PKCEVerifierStore. Test coverage lands in four layers: - Pure decision function (peer_serial_cache_decision_test.go) - Cache wrapper with TTL + concurrency (peer_serial_cache_test.go) - Response shape unit tests (sync_fast_path_response_test.go) - In-process gRPC behavioural tests covering first sync, reconnect skip, android never-skip, meta change, login invalidation, and serial advance (management/server/sync_fast_path_test.go) - Frozen SyncRequest wire-format fixtures for v0.20.0 / v0.40.0 / v0.60.0 / current / android replayed against the in-process server (management/server/sync_legacy_wire_test.go + testdata fixtures)	2026-04-17 16:20:04 +02:00
Nicolas Frati	8ae8f2098f	[management] chores: fix lint error on google workspace (#5907 ) * chores: fix lint error on google workspace * chores: updated google api dependency * update google golang api sdk to latest	2026-04-16 20:02:09 +02:00
Viktor Liu	a39787d679	[infrastructure] Add CrowdSec LAPI container to self-hosted setup script (#5880 )	2026-04-16 18:06:38 +02:00
Maycon Santos	53b04e512a	[management] Reuse a single cache store across all management server consumers (#5889 ) * Add support for legacy IDP cache environment variable * Centralize cache store creation to reuse a single Redis connection pool Each cache consumer (IDP cache, token store, PKCE store, secrets manager, EDR validator) was independently calling NewStore, creating separate Redis clients with their own connection pools — up to 1400 potential connections from a single management server process. Introduce a shared CacheStore() singleton on BaseServer that creates one store at boot and injects it into all consumers. Consumer constructors now receive a store.StoreInterface instead of creating their own. For Redis mode, all consumers share one connection pool (1000 max conns). For in-memory mode, all consumers share one GoCache instance. * Update management-integrations module to latest version * sync go.sum * Export `GetAddrFromEnv` to allow reuse across packages * Update management-integrations module version in go.mod and go.sum * Update management-integrations module version in go.mod and go.sum	2026-04-16 16:04:53 +02:00
Viktor Liu	633dde8d1f	[client] Reconnect conntrack netlink listener on error (#5885 )	2026-04-16 22:30:36 +09:00
Michael Uray	7e4542adde	fix(client): populate NetworkAddresses on iOS for posture checks (#5900 ) The iOS GetInfo() function never populated NetworkAddresses, causing the peer_network_range_check posture check to fail for all iOS clients. This adds the same networkAddresses() call that macOS, Linux, Windows, and FreeBSD already use. Fixes: #3968 Fixes: #4657	2026-04-16 14:25:55 +02:00
Viktor Liu	d4c61ed38b	[client] Add mangle FORWARD guard to prevent Docker DNAT bypass of ACL rules (#5697 )	2026-04-16 14:02:52 +02:00
Viktor Liu	6b540d145c	[client] Add --disable-networks flag to block network selection (#5896 )	2026-04-16 14:02:31 +02:00
Bethuel Mmbaga	08f624507d	[management] Enforce peer or peer groups requirement for network routers (#5894 )	2026-04-16 13:12:19 +03:00
Viktor Liu	95bc01e48f	[client] Allow clearing saved service env vars with --service-env "" (#5893 )	2026-04-15 19:22:08 +02:00
Viktor Liu	0d86de47df	[client] Add PCP support (#5219 )	2026-04-15 11:43:16 +02:00
Viktor Liu	e804a705b7	[infrastructure] Update sign pipeline version to v0.1.2 (#5884 )	2026-04-14 17:08:35 +02:00
Pascal Fischer	46fc8c9f65	[proxy] direct redirect to SSO (#5874 )	2026-04-14 13:47:02 +02:00
Viktor Liu	d7ad908962	[misc] Add CI check for proto version string changes (#5854 ) * Add CI check for proto version string changes * Handle pagination and missing patch data in proto version check	2026-04-14 13:36:26 +02:00
Pascal Fischer	c5623307cc	[management] add context cancel monitoring (#5879 )	2026-04-14 12:49:18 +02:00
Vlad	7f666b8022	[management] revert ctx dependency in get account with backpressure (#5878 )	2026-04-14 12:16:03 +02:00
Viktor Liu	0a30b9b275	[management, proxy] Add CrowdSec IP reputation integration for reverse proxy (#5722 )	2026-04-14 12:14:58 +02:00
Viktor Liu	4eed459f27	[client] Fix DNS resolution with userspace WireGuard and kernel firewall (#5873 ) v0.68.2	2026-04-13 16:23:57 +02:00
Zoltan Papp	13539543af	[client] Fix/grpc retry (#5750 ) * [client] Fix flow client Receive retry loop not stopping after Close Use backoff.Permanent for canceled gRPC errors so Receive returns immediately instead of retrying until context deadline when the connection is already closed. Add TestNewClient_PermanentClose to verify the behavior. The connectivity.Shutdown check was meaningless because when the connection is shut down, c.realClient.Events(ctx, grpc.WaitForReady(true)) on the nex line already fails with codes.Canceled — which is now handled as a permanent error. The explicit state check was just duplicating what gRPC already reports through its normal error path. * [client] remove WaitForReady from stream open call grpc.WaitForReady(true) parks the RPC call internally until the connection reaches READY, only unblocking on ctx cancellation. This means the external backoff.Retry loop in Receive() never gets control back during a connection outage — it cannot tick, log, or apply its retry intervals while WaitForReady is blocking. Removing it restores fail-fast behaviour: Events() returns immediately with codes.Unavailable when the connection is not ready, which is exactly what the backoff loop expects. The backoff becomes the single authority over retry timing and cadence, as originally intended. * [client] Add connection recreation and improve flow client error handling Store gRPC dial options on the client to enable connection recreation on Internal errors (RST_STREAM/PROTOCOL_ERROR). Treat Unauthenticated, PermissionDenied, and Unimplemented as permanent failures. Unify mutex usage and add reconnection logging for better observability. * [client] Remove Unauthenticated, PermissionDenied, and Unimplemented from permanent error handling * [client] Fix error handling in Receive to properly re-establish stream and improve reconnection messaging * Fix test * [client] Add graceful shutdown handling and test for concurrent Close during Receive Prevent reconnection attempts after client closure by tracking a `closed` flag. Use `backoff.Permanent` for errors caused by operations on a closed client. Add a test to ensure `Close` does not block when `Receive` is actively running. * [client] Fix connection swap to properly close old gRPC connection Close the old `gRPC.ClientConn` after successfully swapping to a new connection during reconnection. * [client] Reset backoff * [client] Ensure stream closure on error during initialization * [client] Add test for handling server-side stream closure and reconnection Introduce `TestReceive_ServerClosesStream` to verify the client's ability to recover and process acknowledgments after the server closes the stream. Enhance test server with a controlled stream closure mechanism. * [client] Add protocol error simulation and enhance reconnection test Introduce `connTrackListener` to simulate HTTP/2 RST_STREAM with PROTOCOL_ERROR for testing. Refactor and rename `TestReceive_ServerClosesStream` to `TestReceive_ProtocolErrorStreamReconnect` to verify client recovery on protocol errors. * [client] Update Close error message in test for clarity * [client] Fine-tune the tests * [client] Adjust connection tracking in reconnection test * [client] Wait for Events handler to exit in RST_STREAM reconnection test Ensure the old `Events` handler exits fully before proceeding in the reconnection test to avoid dropped acknowledgments on a broken stream. Add a `handlerDone` channel to synchronize handler exits. * [client] Prevent panic on nil connection during Close * [client] Refactor connection handling to use explicit target tracking Introduce `target` field to store the gRPC connection target directly, simplifying reconnections and ensuring consistent connection reuse logic. * [client] Rename `isCancellation` to `isContextDone` and extend handling for `DeadlineExceeded` Refactor error handling to include `DeadlineExceeded` scenarios alongside `Canceled`. Update related condition checks for consistency. * [client] Add connection generation tracking to prevent stale reconnections Introduce `connGen` to track connection generations and ensure that stale `recreateConnection` calls do not override newer connections. Update stream establishment and reconnection logic to incorporate generation validation. * [client] Add backoff reset condition to prevent short-lived retry cycles Refine backoff reset logic to ensure it only occurs for sufficiently long-lived stream connections, avoiding interference with `MaxElapsedTime`. * [client] Introduce `minHealthyDuration` to refine backoff reset logic Add `minHealthyDuration` constant to ensure stream retries only reset the backoff timer if the stream survives beyond a minimum duration. Prevents unhealthy, short-lived streams from interfering with `MaxElapsedTime`. * [client] IPv6 friendly connection parsedURL.Hostname() strips IPv6 brackets. For http://[::1]:443, this turns it into ::1:443, which is not a valid host:port target for gRPC. Additionally, fmt.Sprintf("%s:%s", hostname, port) produces a trailing colon when the URL has no explicit port—http://example.com becomes example.com:. Both cases break the initial dial and reconnect paths. Use parsedURL.Host directly instead. * [client] Add `handlerStarted` channel to synchronize stream establishment in tests Introduce `handlerStarted` channel in the test server to signal when the server-side handler begins, ensuring robust synchronization between client and server during stream establishment. Update relevant test cases to wait for this signal before proceeding. * [client] Replace `receivedAcks` map with atomic counter and improve stream establishment sync in tests Refactor acknowledgment tracking in tests to use an `atomic.Int32` counter instead of a map. Replace fixed sleep with robust synchronization by waiting on `handlerStarted` signal for stream establishment. * [client] Extract `handleReceiveError` to simplify receive logic Refactor error handling in `receive` to a dedicated `handleReceiveError` method. Streamlines the main logic and isolates error recovery, including backoff reset and connection recreation. * [client] recreate gRPC ClientConn on every retry to prevent dual backoff The flow client had two competing retry loops: our custom exponential backoff and gRPC's internal subchannel reconnection. When establishStream failed, the same ClientConn was reused, allowing gRPC's internal backoff state to accumulate and control dial timing independently. Changes: - Consolidate error handling into handleRetryableError, which now handles context cancellation, permanent errors, backoff reset, and connection recreation in a single path - Call recreateConnection on every retryable error so each retry gets a fresh ClientConn with no internal backoff state - Remove connGen tracking since Receive is sequential and protected by a new receiving guard against concurrent calls - Reduce RandomizationFactor from 1 to 0.5 to avoid near-zero backoff intervals	2026-04-13 10:42:24 +02:00
Zoltan Papp	7483fec048	Fix Android internet blackhole caused by stale route re-injection on TUN rebuild (#5865 ) extraInitialRoutes() was meant to preserve only the fake IP route (240.0.0.0/8) across TUN rebuilds, but it re-injected any initial route missing from the current set. When the management server advertised exit node routes (0.0.0.0/0) that were later filtered by the route selector, extraInitialRoutes() re-added them, causing the Android VPN to capture all traffic with no peer to handle it. Store the fake IP route explicitly and append only that in notify(), removing the overly broad initial route diffing.	2026-04-13 09:38:38 +02:00
Pascal Fischer	5259e5df51	[management] add domain and service cleanup migration (#5850 )	2026-04-11 12:00:40 +02:00
Zoltan Papp	ebd78e0122	[client] Update `RaceDial` to accept context for improved cancellation handling (#5849 )	2026-04-10 20:51:04 +02:00
Pascal Fischer	cf86b9a528	[management] enable access log cleanup by default (#5842 )	2026-04-10 17:07:27 +02:00
Pascal Fischer	ee588e1536	Revert "[management] allow local routing peer resource (#5814 )" (#5847 )	2026-04-10 14:53:47 +02:00
Pascal Fischer	2a8aacc5c9	[management] allow local routing peer resource (#5814 )	2026-04-10 13:08:21 +02:00
Pascal Fischer	15709bc666	[management] update account delete with proper proxy domain and service cleanup (#5817 )	2026-04-10 13:08:04 +02:00
Pascal Fischer	789b4113fe	[misc] update dashboards (#5840 )	2026-04-10 12:15:58 +02:00
Viktor Liu	d2cdc0efec	[client] Use native firewall for peer ACLs in userspace WireGuard mode (#5668 )	2026-04-10 09:12:13 +08:00
Pascal Fischer	ee343d5d77	[management] use sql null vars (#5844 )	2026-04-09 18:12:38 +02:00
Maycon Santos	099c493b18	[management] network map tests (#5795 ) * Add network map benchmark and correctness test files * Add tests for network map components correctness and edge cases * Skip benchmarks in CI and enhance network map test coverage with new helper functions * Remove legacy network map benchmarks and tests; refactor components-based test coverage for clarity and scalability.	2026-04-08 21:28:29 +02:00
Pascal Fischer	c1d1229ae0	[management] use NullBool for terminated flag (#5829 ) v0.68.1	2026-04-08 21:08:43 +02:00
Viktor Liu	94a36cb53e	[client] Handle UPnP routers that only support permanent leases (#5826 )	2026-04-08 17:59:59 +02:00
Viktor Liu	c7ba931466	[client] Populate network addresses in FreeBSD system info (#5827 )	2026-04-08 17:14:16 +02:00
Viktor Liu	413d95b740	[client] Include service.json in debug bundle (#5825 ) * Include service.json in debug bundle * Add tests for service params sanitization logic	2026-04-08 21:10:31 +08:00
Viktor Liu	332c624c55	[client] Don't abort UI debug bundle when up/down fails (#5780 ) v0.68.0	2026-04-08 10:33:46 +02:00
Viktor Liu	dc160aff36	[client] Fix SSH proxy stripping shell quoting from forwarded commands (#5669 )	2026-04-08 10:25:57 +02:00

1 2 3 4 5 ...

2811 Commits