netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-04-25 11:46:40 +00:00

Author	SHA1	Message	Date
mlsmaycon	69c0b96d73	Refactor fast-path Sync to log skip reasons, streamline `tryFastPathSync` outputs, and improve debug consistency.	2026-04-24 21:25:32 +02:00
mlsmaycon	d3ea28734c	Introduce network serial caching in sync fast path, optimize DB reads, and add granular cache invalidation	2026-04-24 20:50:47 +02:00
mlsmaycon	4dddafc5a1	Add caching for `ExtraSettings` and peer groups in fast path to reduce DB reads.	2026-04-24 19:19:58 +02:00
mlsmaycon	8c521a7cb5	Refactor sync fast path to introduce caching for `ExtraSettings` and peer groups, optimize `MarkPeerConnected` with async writes, and reduce DB round trips.	2026-04-24 18:13:37 +02:00
mlsmaycon	ac6b73005d	Upgrade cache logic in sync fast path to handle legacy entries and avoid corrupting HasUser flag.	2026-04-24 17:35:33 +02:00
mlsmaycon	cf7081e592	Refactor peer cache logic in sync fast path; consolidate and optimize write operations	2026-04-24 13:33:15 +02:00
mlsmaycon	94730fe066	Add debug log for cache hit in sync fast path	2026-04-24 12:00:32 +02:00
mlsmaycon	7e9d3485d8	[management] Cache peer snapshot + consolidate auth reads on Sync hot path Trim the fast-path Sync handler by removing two DB round trips on cache hit: 1. Consolidate GetUserIDByPeerKey + GetAccountIDByPeerPubKey into a single GetPeerAuthInfoByPubKey store call. Both looked up the same peer row by pubkey and returned one column each; the new method SELECTs both columns in one query. AccountManager exposes it as GetPeerAuthInfo. 2. Extend peerSyncEntry with AccountID, PeerID, PeerKey, Ephemeral and a HasUser flag so the cache carries everything the fast path needs. On cache hit with a matching metaHash: - The Sync handler skips GetPeerAuthInfo entirely (entry.AccountID and entry.HasUser drive the loginFilter gate). - commitFastPath skips GetPeerByPeerPubKey by using the cached peer snapshot for OnPeerConnectedWithPeer. Old cache entries from pre-step-2 shape still decode (missing fields zero out) but IsComplete() returns false, so they fall through to the slow path and get rewritten with the full shape on first pass. No migration needed. Expected impact on a 16.8 s pathological Sync observed in production: ~6 s saved from eliminating one auth-read round trip, the pre-fast-path GetPeerAuthInfo on cache hit, and GetPeerByPeerPubKey in commitFastPath. Cache miss / cold start remain on the slow path unchanged. Account-serial, ExtraSettings and peer-group caching — the remaining synchronous DB reads — are deliberately left for a follow-up so the invalidation design can be proven incrementally.	2026-04-24 11:41:59 +02:00
mlsmaycon	5993264d34	Add detailed timing logs to sync fast path operations	2026-04-24 08:07:12 +02:00
mlsmaycon	617ceab2e3	Add `OnPeerConnectedWithPeer` to optimize sync fast path operations	2026-04-22 22:40:31 +02:00
mlsmaycon	53deabbdb4	Add timing log for GetExtraSettings in sync fast path	2026-04-22 15:00:21 +02:00
mlsmaycon	ac3fe4343b	Refactor sync fast path logging for improved clarity and timing accuracy	2026-04-22 14:24:52 +02:00
mlsmaycon	a4ae160993	Fix deferred logging function in `commitFastPath` for correct execution	2026-04-22 11:41:32 +02:00
mlsmaycon	3ac4263257	Add timing instrumentation for sync fast path functions	2026-04-22 01:23:44 +02:00
mlsmaycon	dc86c9655d	Improve timing precision in sync fast path logging	2026-04-22 00:39:09 +02:00
mlsmaycon	66494d61af	Replace Tracef with Debugf for sync fast path logging	2026-04-22 00:06:39 +02:00
mlsmaycon	46446acd30	Add detailed timing logs to sync fast path operations	2026-04-21 23:02:58 +02:00
mlsmaycon	3eb1298cb4	Refactor sync fast path tests and fix CI flakiness - Introduce `skipOnWindows` helper to properly skip tests relying on Unix specific paths. - Replace fixed sleep with `require.Eventually` in `waitForPeerDisconnect` to address flakiness in CI. - Split `commitFastPath` logic out of `runFastPathSync` to close race conditions and improve clarity. - Update tests to leverage new helpers and more precise assertions (e.g., `waitForPeerDisconnect`). - Add `flakyStore` test helper to exercise fail-closed behavior in flag handling. - Enhance `RunFastPathFlagRoutine` to disable the flag on store read errors.	2026-04-21 17:07:31 +02:00
mlsmaycon	48c080b861	Replace Redis dependency with a generic cache store for fast path flag handling	2026-04-21 16:28:24 +02:00
mlsmaycon	3716838c25	Remove unused cacheKey helper and testcontainers imports, simplify Redis container setup	2026-04-21 16:17:31 +02:00
mlsmaycon	8430b06f2a	[management] Add Redis-backed kill switch for Sync fast path Gate the peer-sync fast path on a runtime flag polled from Redis so operators can roll the optimisation out gradually and flip it off without a redeploy. Without NB_PEER_SYNC_REDIS_ADDRESS the routine stays disabled, every Sync runs the full network map path, and no entries accumulate in the peer serial cache — bit-for-bit identical to the pre-fast-path behaviour. When the env var is set, a background goroutine polls the configured key (default "peerSyncFastPath") every minute; values "1" or "true" enable the fast path, anything else disables it. - RunFastPathFlagRoutine mirrors shared/logleveloverrider: dedicated Redis connection, background ticker, redis.Nil treated as disabled. - NewServer takes the flag handle; tryFastPathSync and the recordPeerSyncEntry helpers short-circuit when Enabled() is false. - invalidatePeerSyncEntry still runs on Login regardless of flag state. - NewFastPathFlag(bool) exposed for tests and callers that need to force a state without going through Redis.	2026-04-21 15:52:34 +02:00
mlsmaycon	3f4ef0031b	[management] Skip full network map on Sync when peer state is unchanged Introduce a peer-sync cache keyed by WireGuard pubkey that records the NetworkMap.Serial and meta hash the server last delivered to each peer. When a Sync request arrives from a non-Android peer whose cached serial matches the current account serial and whose meta hash matches the last delivery, short-circuit SyncAndMarkPeer and reply with a NetbirdConfig-only SyncResponse mirroring the shape TimeBasedAuthSecretsManager already pushes for TURN/Relay token rotation. The client keeps its existing network map state and refreshes only control-plane credentials. The fast path avoids GetAccountWithBackpressure, the full per-peer map assembly, posture-check recomputation and the large encrypted payload on every reconnect of a peer whose account is quiescent. Slow path remains the source of truth for any real state change; every full-map send (initial sync or streamed NetworkMap update) rewrites the cache, and every Login deletes it so a fresh map is guaranteed after SSH key rotation, approval changes or re-registration. Backend-only: no proto changes and no client changes. Compatibility is provided by the existing client handling of nil NetworkMap in handleSync (every version from v0.20.0 on). Android is gated out at the server because its readInitialSettings path calls GrpcClient.GetNetworkMap which errors on nil map. The cache is wired through BaseServer.CacheStore() so it shares the same Redis/in-memory backend as OneTimeTokenStore and PKCEVerifierStore. Test coverage lands in four layers: - Pure decision function (peer_serial_cache_decision_test.go) - Cache wrapper with TTL + concurrency (peer_serial_cache_test.go) - Response shape unit tests (sync_fast_path_response_test.go) - In-process gRPC behavioural tests covering first sync, reconnect skip, android never-skip, meta change, login invalidation, and serial advance (management/server/sync_fast_path_test.go) - Frozen SyncRequest wire-format fixtures for v0.20.0 / v0.40.0 / v0.60.0 / current / android replayed against the in-process server (management/server/sync_legacy_wire_test.go + testdata fixtures)	2026-04-17 16:20:04 +02:00
Maycon Santos	53b04e512a	[management] Reuse a single cache store across all management server consumers (#5889 ) * Add support for legacy IDP cache environment variable * Centralize cache store creation to reuse a single Redis connection pool Each cache consumer (IDP cache, token store, PKCE store, secrets manager, EDR validator) was independently calling NewStore, creating separate Redis clients with their own connection pools — up to 1400 potential connections from a single management server process. Introduce a shared CacheStore() singleton on BaseServer that creates one store at boot and injects it into all consumers. Consumer constructors now receive a store.StoreInterface instead of creating their own. For Redis mode, all consumers share one connection pool (1000 max conns). For in-memory mode, all consumers share one GoCache instance. * Update management-integrations module to latest version * sync go.sum * Export `GetAddrFromEnv` to allow reuse across packages * Update management-integrations module version in go.mod and go.sum * Update management-integrations module version in go.mod and go.sum	2026-04-16 16:04:53 +02:00
Viktor Liu	0a30b9b275	[management, proxy] Add CrowdSec IP reputation integration for reverse proxy (#5722 )	2026-04-14 12:14:58 +02:00
Bethuel Mmbaga	9d1a37c644	[management,client] Revert gRPC client secret removal (#5781 ) * This reverts commit `e5914e4e8b` Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com> * Deprecate client secret in proto Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com> * Fix lint Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com> --------- Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com>	2026-04-02 18:21:00 +02:00
Bethuel Mmbaga	e5914e4e8b	[management,client] Remove client secret from gRPC auth flow (#5751 ) Remove client secret from gRPC auth flow. The secret was originally included to support providers like Google Workspace that don't offer a proper PKCE flow, but this is no longer necessary with the embedded IdP. Deployments using such providers should migrate to the embedded IdP instead.	2026-03-31 18:50:49 +03:00
Viktor Liu	0765352c99	[management] Persist proxy capabilities to database (#5720 )	2026-03-30 13:03:42 +02:00
Viktor Liu	b550a2face	[management, proxy] Add require_subdomain capability for proxy clusters (#5628 )	2026-03-20 11:29:50 +01:00
Pascal Fischer	a1858a9cb7	[management] recover proxies after cleanup if heartbeat is still running (#5617 )	2026-03-18 11:48:38 +01:00
Viktor Liu	212b34f639	[management] Add GET /reverse-proxies/clusters endpoint (#5611 )	2026-03-18 11:15:56 +08:00
Viktor Liu	387e374e4b	[proxy, management] Add header auth, access restrictions, and session idle timeout (#5587 )	2026-03-16 15:22:00 +01:00
Viktor Liu	3e6baea405	[management,proxy,client] Add L4 capabilities (TLS/TCP/UDP) (#5530 )	2026-03-13 18:36:44 +01:00
Zoltan Papp	fe9b844511	[client] refactor auto update workflow (#5448 ) Auto-update logic moved out of the UI into a dedicated updatemanager.Manager service that runs in the connection layer. The UI no longer polls or checks for updates independently. The update manager supports three modes driven by the management server's auto-update policy: No policy set by mgm: checks GitHub for the latest version and notifies the user (previous behavior, now centralized) mgm enforces update: the "About" menu triggers installation directly instead of just downloading the file — user still initiates the action mgm forces update: installation proceeds automatically without user interaction updateManager lifecycle is now owned by daemon, giving the daemon server direct control via a new TriggerUpdate RPC Introduces EngineServices struct to group external service dependencies passed to NewEngine, reducing its argument count from 11 to 4	2026-03-13 17:01:28 +01:00
Pascal Fischer	11eb725ac8	[management] only count login request duration for successful logins (#5545 )	2026-03-09 14:56:46 +01:00
Pascal Fischer	30c02ab78c	[management] use the cache for the pkce state (#5516 )	2026-03-09 12:23:06 +01:00
Pascal Fischer	e6587b071d	[management] use realip for proxy registration (#5525 )	2026-03-06 16:11:44 +01:00
Pascal Fischer	d7c8e37ff4	[management] Store connected proxies in DB (#5472 ) Co-authored-by: mlsmaycon <mlsmaycon@gmail.com>	2026-03-03 18:39:46 +01:00
Maycon Santos	327142837c	[management] Refactor expose feature: move business logic from gRPC to manager (#5435 ) Consolidate all expose business logic (validation, permission checks, TTL tracking, reaping) into the manager layer, making the gRPC layer a pure transport adapter that only handles proto conversion and authentication. - Add ExposeServiceRequest/ExposeServiceResponse domain types with validation in the reverseproxy package - Move expose tracker (TTL tracking, reaping, per-peer limits) from gRPC server into manager/expose_tracker.go - Internalize tracking in CreateServiceFromPeer, RenewServiceFromPeer, and new StopServiceFromPeer so callers don't manage tracker state - Untrack ephemeral services in DeleteService/DeleteAllServices to keep tracker in sync when services are deleted via API - Simplify gRPC expose handlers to parse, auth, convert, delegate - Remove tracker methods from Manager interface (internal detail)	2026-02-24 15:09:30 +01:00
Maycon Santos	63c83aa8d2	[client,management] Feature/client service expose (#5411 ) CLI: new expose command to publish a local port with flags for PIN, password, user groups, custom domain, name prefix and protocol (HTTP default). Management/API: create/renew/stop expose sessions (streamed status), automatic naming/domain, TTL renewals, background expiration, new management RPCs and client methods. UI/API: account settings now include peer_expose_enabled and peer_expose_groups; new activity codes for peer expose events.	2026-02-24 10:02:16 +01:00
Pascal Fischer	5d171f181a	[proxy] Send proxy updates on account delete (#5375 )	2026-02-23 16:08:28 +01:00
Vlad	4aff4a6424	[management] fix utc difference on last seen status for a peer (#5348 )	2026-02-17 13:29:32 +01:00
Pascal Fischer	f53155562f	[management, reverse proxy] Add reverse proxy feature (#5291 ) * implement reverse proxy --------- Co-authored-by: Alisdair MacLeod <git@alisdairmacleod.co.uk> Co-authored-by: mlsmaycon <mlsmaycon@gmail.com> Co-authored-by: Eduard Gert <kontakt@eduardgert.de> Co-authored-by: Viktor Liu <viktor@netbird.io> Co-authored-by: Diego Noguês <diego.sure@gmail.com> Co-authored-by: Diego Noguês <49420+diegocn@users.noreply.github.com> Co-authored-by: Bethuel Mmbaga <bethuelmbaga12@gmail.com> Co-authored-by: Zoltan Papp <zoltan.pmail@gmail.com> Co-authored-by: Ashley Mensah <ashleyamo982@gmail.com>	2026-02-13 19:37:43 +01:00
Zoltan Papp	3be16d19a0	[management] Feature/grpc debounce msgtype (#5239 ) * Add gRPC update debouncing mechanism Implements backpressure handling for peer network map updates to efficiently handle rapid changes. First update is sent immediately, subsequent rapid updates are coalesced, ensuring only the latest update is sent after a 1-second quiet period. * Enhance unit test to verify peer count synchronization with debouncing and timeout handling * Debounce based on type * Refactor test to validate timer restart after pending update dispatch * Simplify timer reset for Go 1.23+ automatic channel draining Remove manual channel drain in resetTimer() since Go 1.23+ automatically drains the timer channel when Stop() returns false, making the select-case pattern unnecessary.	2026-02-06 19:47:38 +01:00
Vlad	af8f730bda	[management] check stream start time for connecting peer (#5267 )	2026-02-06 18:00:43 +01:00
Vlad	d488f58311	[management] fix set disconnected status for connected peer (#5247 )	2026-02-04 11:44:46 +01:00
Vlad	8931293343	[management] run cancelPeerRoutinesWithoutLock in sync (#5234 )	2026-02-01 15:44:27 +01:00
Vlad	7b830d8f72	disable sync lim (#5233 )	2026-02-01 14:37:00 +01:00
Vlad	cead3f38ee	[management] fix ephemeral peers being not removed (#5203 )	2026-01-28 18:24:12 +01:00
Zoltan Papp	44ab454a13	[management] Fix peer deletion error handling (#5188 ) When a deleted peer tries to reconnect, GetUserIDByPeerKey was returning Internal error instead of NotFound, causing clients to retry indefinitely instead of recognizing the unrecoverable PermissionDenied error. This fix: 1. Updates GetUserIDByPeerKey to properly return NotFound when peer doesn't exist 2. Updates Sync handler to convert NotFound to PermissionDenied with message 'peer is not registered', matching the behavior of GetAccountIDForPeerKey Fixes the regression introduced in v0.61.1 where deleted peers would see: - Before: 'rpc error: code = Internal desc = failed handling request' (retry loop) - After: 'rpc error: code = PermissionDenied desc = peer is not registered' (exits)	2026-01-26 23:15:34 +01:00
Zoltan Papp	58daa674ef	[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592 ) (#4832 ) This PR adds the ability to trigger debug bundle generation remotely from the Management API/Dashboard.	2026-01-19 11:22:16 +01:00

1 2

70 Commits