netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-04-16 15:26:40 +00:00

Author	SHA1	Message	Date
mlsmaycon	06578127fd	Centralize cache store creation to reuse a single Redis connection pool Each cache consumer (IDP cache, token store, PKCE store, secrets manager, EDR validator) was independently calling NewStore, creating separate Redis clients with their own connection pools — up to 1400 potential connections from a single management server process. Introduce a shared CacheStore() singleton on BaseServer that creates one store at boot and injects it into all consumers. Consumer constructors now receive a store.StoreInterface instead of creating their own. For Redis mode, all consumers share one connection pool (1000 max conns). For in-memory mode, all consumers share one GoCache instance.	2026-04-14 19:57:31 +02:00
Viktor Liu	94a36cb53e	[client] Handle UPnP routers that only support permanent leases (#5826 )	2026-04-08 17:59:59 +02:00
Viktor Liu	d33cd4c95b	[client] Add NAT-PMP/UPnP support (#5202 )	2026-04-08 15:29:32 +08:00
Viktor Liu	aba5d6f0d2	[client] Error out on netbird expose when block inbound is enabled (#5818 )	2026-04-07 17:55:35 +02:00
Zoltan Papp	91f0d5cefd	[client] Feature/client metrics (#5512 ) * Add client metrics * Add client metrics system with OpenTelemetry and VictoriaMetrics support Implements a comprehensive client metrics system to track peer connection stages and performance. The system supports multiple backend implementations (OpenTelemetry, VictoriaMetrics, and no-op) and tracks detailed connection stage durations from creation through WireGuard handshake. Key changes: - Add metrics package with pluggable backend implementations - Implement OpenTelemetry metrics backend - Implement VictoriaMetrics metrics backend - Add no-op metrics implementation for disabled state - Track connection stages: creation, semaphore, signaling, connection ready, and WireGuard handshake - Move WireGuard watcher functionality to conn.go - Refactor engine to integrate metrics tracking - Add metrics export endpoint in debug server * Add signaling metrics tracking for initial and reconnection attempts * Reset connection stage timestamps during reconnections to exclude unnecessary metrics tracking * Delete otel lib from client * Update unit tests * Invoke callback on handshake success in WireGuard watcher * Add Netbird version tracking to client metrics Integrate Netbird version into VictoriaMetrics backend and metrics labels. Update `ClientMetrics` constructor and metric name formatting to include version information. * Add sync duration tracking to client metrics Introduce `RecordSyncDuration` for measuring sync message processing time. Update all metrics implementations (VictoriaMetrics, no-op) to support the new method. Refactor `ClientMetrics` to use `AgentInfo` for static agent data. * Remove no-op metrics implementation and simplify ClientMetrics constructor Eliminate unused `noopMetrics` and refactor `ClientMetrics` to always use the VictoriaMetrics implementation. Update associated logic to reflect these changes. * Add total duration tracking for connection attempts Calculate total duration for both initial connections and reconnections, accounting for different timestamp scenarios. Update `Export` method to include Prometheus HELP comments. * Add metrics push support to VictoriaMetrics integration * [client] anchor connection metrics to first signal received * Remove creation_to_semaphore connection stage metric The semaphore queuing stage (Created → SemaphoreAcquired) is no longer tracked. Connection metrics now start from SignalingReceived. Updated docs and Grafana dashboard accordingly. * [client] Add remote push config for metrics with version-based eligibility Introduce remoteconfig.Manager that fetches a remote JSON config to control metrics push interval and restrict pushing to a specific agent version range. When NB_METRICS_INTERVAL is set, remote config is bypassed entirely for local override. * [client] Add WASM-compatible NewClientMetrics implementation Replace NewClientMetrics in metrics.go with a WASM-specific stub in metrics_js.go, returning nil for compatibility with JS builds. Simplify method usage for WASM targets. * Add missing file * Update default case in DeploymentType.String to return "unknown" instead of "selfhosted" * [client] Rework metrics to use timestamped samples instead of histograms Replace cumulative Prometheus histograms with timestamped point-in-time samples that are pushed once and cleared. This fixes metrics for sparse events (connections/syncs that happen once at startup) where rate() and increase() produced incorrect or empty results. Changes: - Switch from VictoriaMetrics histogram library to raw Prometheus text format with explicit millisecond timestamps - Reset samples after successful push (no resending stale data) - Rename connection_to_handshake → connection_to_wg_handshake - Add netbird_peer_connection_count metric for ICE vs Relay tracking - Simplify dashboard: point-based scatter plots, donut pie chart - Add maxStalenessInterval=1m to VictoriaMetrics to prevent forward-fill - Fix deployment_type Unknown returning "selfhosted" instead of "unknown" - Fix inverted shouldPush condition in push.go * [client] Add InfluxDB metrics backend alongside VictoriaMetrics Add influxdb.go with timestamped line protocol export for sparse one-shot events. Restore victoria.go to use proper Prometheus histograms. Update Grafana dashboards, add InfluxDB datasource, and update docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [client] Fix metrics issues and update dev docker setup - Fix StopPush not clearing push state, preventing restart - Fix race condition reading currentConnPriority without lock in recordConnectionMetrics - Fix stale comment referencing old metrics server URL - Update docker-compose for InfluxDB: add scoped tokens, .env config, init scripts - Rename docker-compose.victoria.yml to docker-compose.yml * [client] Add anonymised peer tracking to pushed metrics Introduce peer_id and connection_pair_id tags to InfluxDB metrics. Public keys are hashed (truncated SHA-256) for anonymisation. The connection pair ID is deterministic regardless of which side computes it, enabling deduplication of reconnections in the ICE vs Relay dashboard. Also pin Grafana to v11.6.0 for file-based provisioning and fix datasource UID references. * Remove unused dependencies from go.mod and go.sum * Refactor InfluxDB ingest pipeline: extract validation logic - Move line validation logic to `validateLine` and `validateField` helper functions. - Improve error handling with structured validation and clearer separation of concerns. - Add stderr redirection for error messages in `create-tokens.sh`. * Set non-root user in Dockerfile for Ingest service * Fix Windows CI: command line too long * Remove Victoria metrics * Add hashed peer ID as Authorization header in metrics push * Revert influxdb in docker compose * Enable gzip compression and authorization validation for metrics push and ingest * Reducate code of complexity * Update debug documentation to include metrics.txt description * Increase `maxBodySize` limit to 50 MB and update gzip reader wrapping logic * Refactor deployment type detection to use URL parsing for improved accuracy * Update readme * Throttle remote config retries on fetch failure * Preserve first WG handshake timestamp, ignore rekeys * Skip adding empty metrics.txt to debug bundle in debug mode * Update default metrics server URL to https://ingest.netbird.io * Atomic metrics export-and-reset to prevent sample loss between Export and Reset calls * Fix doc * Refactor Push configuration to improve clarity and enforce minimum push interval * Remove `minPushInterval` and update push interval validation logic * Revert ExportAndReset, it is acceptable data loss * Fix metrics review issues: rename env var, remove stale infra, add tests - Rename NB_METRICS_ENABLED to NB_METRICS_PUSH_ENABLED to clarify that collection is always active (for debug bundles) and only push is opt-in - Change default config URL from staging to production (ingest.netbird.io) - Delete broken Prometheus dashboard (used non-existent metric names) - Delete unused VictoriaMetrics datasource config - Replace committed .env with .env.example containing placeholder values - Wire Grafana admin credentials through env vars in docker-compose - Make metricsStages a pointer to prevent reset-vs-write race on reconnect - Fix typed-nil interface in debug bundle path (GetClientMetrics) - Use deterministic field order in InfluxDB Export (sorted keys) - Replace Authorization header with X-Peer-ID for metrics push - Fix ingest server timeout to use time.Second instead of float - Fix gzip double-close, stale comments, trim log levels - Add tests for influxdb.go and MetricsStages * Add login duration metric, ingest tag validation, and duration bounds - Add netbird_login measurement recording login/auth duration to management server, with success/failure result tag - Validate InfluxDB tags against per-measurement allowlists in ingest server to prevent arbitrary tag injection - Cap all duration fields (_seconds) at 300s instead of only total_seconds - Add ingest server tests for tag/field validation, bounds, and auth Add arch tag to all metrics * Fix Grafana dashboard: add arch to drop columns, add login panels * Validate NB_METRICS_SERVER_URL is an absolute HTTP(S) URL * Address review comments: fix README wording, update stale comments * Clarify env var precedence does not bypass remote config eligibility * Remove accidentally committed pprof files --------- Co-authored-by: Viktor Liu <viktor@netbird.io>	2026-03-22 12:45:41 +01:00
Viktor Liu	3e6baea405	[management,proxy,client] Add L4 capabilities (TLS/TCP/UDP) (#5530 )	2026-03-13 18:36:44 +01:00
Zoltan Papp	fe9b844511	[client] refactor auto update workflow (#5448 ) Auto-update logic moved out of the UI into a dedicated updatemanager.Manager service that runs in the connection layer. The UI no longer polls or checks for updates independently. The update manager supports three modes driven by the management server's auto-update policy: No policy set by mgm: checks GitHub for the latest version and notifies the user (previous behavior, now centralized) mgm enforces update: the "About" menu triggers installation directly instead of just downloading the file — user still initiates the action mgm forces update: installation proceeds automatically without user interaction updateManager lifecycle is now owned by daemon, giving the daemon server direct control via a new TriggerUpdate RPC Introduces EngineServices struct to group external service dependencies passed to NewEngine, reducing its argument count from 11 to 4	2026-03-13 17:01:28 +01:00
Maycon Santos	15aa6bae1b	[client] Fix exit node menu not refreshing on Windows (#5553 ) * [client] Fix exit node menu not refreshing on Windows TrayOpenedCh is not implemented in the systray library on Windows, so exit nodes were never refreshed after the initial connect. Combined with the management sync not having populated routes yet when the Connected status fires, this caused the exit node menu to remain empty permanently after disconnect/reconnect cycles. Add a background poller on Windows that refreshes exit nodes while connected, with fast initial polling to catch routes from management sync followed by a steady 10s interval. On macOS/Linux, TrayOpenedCh continues to handle refreshes on each tray open. Also fix a data race on connectClient assignment in the server's connect() method and add nil checks in CleanState/DeleteState to prevent panics when connectClient is nil. * Remove unused exitNodeIDs * Remove unused exitNodeState struct	2026-03-09 18:39:11 +01:00
Zoltan Papp	3acd86e346	[client] "reset connection" error on wake from sleep (#5522 ) Capture engine reference before actCancel() in cleanupConnection(). After actCancel(), the connectWithRetryRuns goroutine sets engine to nil, causing connectClient.Stop() to skip shutdown. This allows the goroutine to set ErrResetConnection on the shared state after Down() clears it, causing the next Up() to fail.	2026-03-09 10:25:51 +01:00
Zoltan Papp	c2c4d9d336	[client] Fix Server mutex held across waitForUp in Up() (#5460 ) Up() acquired s.mutex with a deferred unlock, then called waitForUp() while still holding the lock. waitForUp() blocks for up to 50 seconds waiting on clientRunningChan/clientGiveUpChan, starving all concurrent gRPC calls that require the same mutex (Status, ListProfiles, etc.). Replace the deferred unlock with explicit s.mutex.Unlock() on every early-return path and immediately before waitForUp(), matching the pattern already used by the clientRunning==true branch.	2026-02-26 16:47:02 +01:00
Maycon Santos	63c83aa8d2	[client,management] Feature/client service expose (#5411 ) CLI: new expose command to publish a local port with flags for PIN, password, user groups, custom domain, name prefix and protocol (HTTP default). Management/API: create/renew/stop expose sessions (streamed status), automatic naming/domain, TTL renewals, background expiration, new management RPCs and client methods. UI/API: account settings now include peer_expose_enabled and peer_expose_groups; new activity codes for peer expose events.	2026-02-24 10:02:16 +01:00
Zoltan Papp	37f025c966	Fix a race condition where a concurrent user-issued Up or Down command (#5418 ) could interleave with a sleep/wake event causing out-of-order state transitions. The mutex now covers the full duration of each handler including the status check, the Up/Down call, and the flag update. Note: if Up or Down commands are triggered in parallel with sleep/wake events, the overall ordering of up/down/sleep/wake operations is still not guaranteed beyond what the mutex provides within the handler itself.	2026-02-24 10:00:33 +01:00
Zoltan Papp	ded04b7627	[client] Consolidate authentication logic (#5010 ) * Consolidate authentication logic - Moving auth functions from client/internal to client/internal/auth package - Creating unified auth.Auth client with NewAuth() constructor - Replacing direct auth function calls with auth client methods - Refactoring device flow and PKCE flow implementations - Updating iOS/Android/server code to use new auth client API * Refactor PKCE auth and login methods - Remove unnecessary internal package reference in PKCE flow test - Adjust context assignment placement in iOS and Android login methods	2026-01-23 22:28:32 +01:00
Zoltan Papp	ee3a67d2d8	[client] Fix/health result in bundle (#5164 ) * Add support for optional status refresh callback during debug bundle generation * Always update wg status * Remove duplicated wg status call	2026-01-23 17:06:07 +01:00
Viktor Liu	d0221a3e72	[client] Add cpu profile to debug bundle (#4700 )	2026-01-22 12:24:12 +01:00
Zoltan Papp	07e4a5a23c	Fixes profile switching and repeated down/up command failures. (#5142 ) When Down() and Up() are called in quick succession, the connectWithRetryRuns goroutine could set ErrResetConnection after Down() had cleared the state, causing the subsequent Up() to fail. Fix by waiting for the goroutine to exit (via clientGiveUpChan) before Down() returns. Uses a 5-second timeout to prevent RPC timeouts while ensuring the goroutine completes in most cases.	2026-01-20 18:22:37 +01:00
Zoltan Papp	58daa674ef	[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592 ) (#4832 ) This PR adds the ability to trigger debug bundle generation remotely from the Management API/Dashboard.	2026-01-19 11:22:16 +01:00
Viktor Liu	520d9c66cf	[client] Fix netstack upstream dns and add wasm debug methods (#4648 )	2026-01-14 13:56:16 +01:00
Zoltan Papp	9c9d8e17d7	Revert "Revert "[relay] Update GO version and QUIC version (#4736 )" (#5055 )" (#5071 ) This reverts commit `24df442198`.	2026-01-08 18:58:22 +01:00
Maycon Santos	24df442198	Revert "[relay] Update GO version and QUIC version (#4736 )" (#5055 ) This reverts commit `8722b79799`.	2026-01-07 19:02:20 +01:00
Zoltan Papp	8722b79799	[relay] Update GO version and QUIC version (#4736 ) - Go 1.25.5 - QUIC 0.55.0	2026-01-07 16:30:29 +01:00
Misha Bragin	e586c20e36	[management, infrastructure, idp] Simplified IdP Management - Embedded IdP (#5008 ) Embed Dex as a built-in IdP to simplify self-hosting setup. Adds an embedded OIDC Identity Provider (Dex) with local user management and optional external IdP connectors (Google/GitHub/OIDC/SAML), plus device-auth flow for CLI login. Introduces instance onboarding/setup endpoints (including owner creation), field-level encryption for sensitive user data, a streamlined self-hosting provisioning script, and expanded APIs + test coverage for IdP management. more at https://github.com/netbirdio/netbird/pull/5008#issuecomment-3718987393	2026-01-07 14:52:32 +01:00
Viktor Liu	f012fb8592	[client] Add port forwarding to ssh proxy (#5031 ) * Implement port forwarding for the ssh proxy * Allow user switching for port forwarding	2026-01-07 12:18:04 +08:00
Maycon Santos	07856f516c	[client] Fix/stuck connecting when can't access api.netbird.io (#5033 ) - Connect on daemon start only if the file existed before - fixed a bug that happened when the default profile config was removed, which would recreate it and reset the active profile to the default.	2026-01-05 13:53:17 +01:00
Zoltan Papp	011cc81678	[client, management] auto-update (#4732 )	2025-12-19 19:57:39 +01:00
Zoltan Papp	71b6855e09	[client] Fix engine shutdown deadlock and sync-signal message handling races (#4891 ) * Fix engine shutdown deadlock and message handling races - Release syncMsgMux before waiting for shutdownWg to prevent deadlock - Check context inside lock in handleSync and receiveSignalEvents - Prevents nil pointer access when messages arrive during engine stop	2025-12-04 19:51:50 +01:00
Maycon Santos	e87b4ace11	[client] Add sleep state tracking to handle wakeup/sleep events reliably (#4894 ) Adds a new NotifyOSLifecycle RPC and server handler to centralize OS sleep/wake handling, introduces Server.sleepTriggeredDown for coordination, updates client UI to call the new RPC, and adjusts the internal sleep event enum zero-value semantics.	2025-12-03 11:53:39 +01:00
Pascal Fischer	7193bd2da7	[management] Refactor network map controller (#4789 )	2025-12-02 12:34:28 +01:00
shuuri-labs	7285fef0f0	feat: Add support for displaying device code (UserCode) on Android TV SSO flow (#4800 ) - Modified URLOpener interface to pass userCode alongside URL in login.go - added ability to force device auth flow	2025-11-25 15:51:16 +01:00
Pascal Fischer	3351b38434	[management] pass config to controller (#4807 )	2025-11-19 11:52:18 +01:00
Viktor Liu	d71a82769c	[client,management] Rewrite the SSH feature (#4015 )	2025-11-17 17:10:41 +01:00
Pascal Fischer	cc97cffff1	[management] move network map logic into new design (#4774 )	2025-11-13 12:09:46 +01:00
Viktor Liu	75327d9519	[client] Add login_hint to oidc flows (#4724 )	2025-11-05 17:00:20 +01:00
Zoltan Papp	d7321c130b	[client] The status cmd will not be blocked by the ICE probe (#4597 ) The status cmd will not be blocked by the ICE probe Refactor the TURN and STUN probe, and cache the results. The NetBird status command will indicate a "checking…" state.	2025-10-28 16:11:35 +01:00
Viktor Liu	eddea14521	[client] Clean up bsd routes independently of the state file (#4688 )	2025-10-27 18:54:00 +01:00
Viktor Liu	277aa2b7cc	[client] Fix missing flag values in profiles (#4650 )	2025-10-16 15:13:41 +02:00
Viktor Liu	b5daec3b51	[client,signal,management] Add browser client support (#4415 )	2025-10-01 20:10:11 +02:00
Zoltan Papp	998fb30e1e	[client] Check the client status in the earlier phase (#4509 ) This PR improves the NetBird client's status checking mechanism by implementing earlier detection of client state changes and better handling of connection lifecycle management. The key improvements focus on: • Enhanced status detection - Added waitForReady option to StatusRequest for improved client status handling • Better connection management - Improved context handling for signal and management gRPC connections• Reduced connection timeouts - Increased gRPC dial timeout from 3 to 10 seconds for better reliability • Cleaner error handling - Enhanced error propagation and context cancellation in retry loops Key Changes Core Status Improvements: - Added waitForReady optional field to StatusRequest proto (daemon.proto:190) - Enhanced status checking logic to detect client state changes earlier in the connection process - Improved handling of client permanent exit scenarios from retry loops Connection & Context Management: - Fixed context cancellation in management and signal client retry mechanisms - Added proper context propagation for Login operations - Enhanced gRPC connection handling with better timeout management Error Handling & Cleanup: - Moved feedback channels to upper layers for better separation of concerns - Improved error handling patterns throughout the client server implementation - Fixed synchronization issues and removed debug logging	2025-09-20 22:14:01 +02:00
Zoltan Papp	47e64d72db	[client] Fix client status check (#4474 ) The client status is not enough to protect the RPC calls from concurrency issues, because it is handled internally in the client in an asynchronous way.	2025-09-11 16:21:09 +02:00
Bethuel Mmbaga	5113c70943	[management] Extends integration and peers manager (#4450 )	2025-09-06 13:13:49 +03:00
Bethuel Mmbaga	a8dcff69c2	[management] Add peers manager to integrations (#4405 )	2025-09-04 23:07:03 +03:00
Viktor Liu	f063866ce8	[client] Add flag to configure MTU (#4213 )	2025-08-26 16:00:14 +02:00
Pascal Fischer	b3056d0937	[management] Use DI containers for server bootstrapping (#4343 )	2025-08-15 17:14:48 +02:00
hakansa	70db8751d7	[client] Add --disable-update-settings flag to the service (#4335 ) [client] Add --disable-update-settings flag to the service (#4335)	2025-08-13 21:05:12 +03:00
Bethuel Mmbaga	a4e8647aef	[management] Enable flow groups (#4230 ) Adds the ability to limit traffic events logging to specific peer groups	2025-08-13 00:00:40 +03:00
Viktor Liu	1022a5015c	[client] Eliminate upstream server strings in dns code (#4267 )	2025-08-11 11:57:21 +02:00
Viktor Liu	1d5e871bdf	[misc] Move shared components to shared directory (#4286 ) Moved the following directories: ``` - management/client → shared/management/client - management/domain → shared/management/domain - management/proto → shared/management/proto - signal/client → shared/signal/client - signal/proto → shared/signal/proto - relay/client → shared/relay/client - relay/auth → shared/relay/auth ``` and adjusted import paths	2025-08-05 15:22:58 +02:00
Viktor Liu	3d3c4c5844	[client] Add full sync response to debug bundle (#4287 )	2025-08-05 14:55:50 +02:00
hakansa	9ccc13e6ea	[client]: Add config flag to service to override default profile path (#4276 ) [client]: Add config flag to service to override default profile path (#4276)	2025-08-05 12:33:43 +03:00
Viktor Liu	b5ed94808c	[management, client] Add logout feature (#4268 )	2025-08-04 10:17:36 +02:00

1 2 3 4

185 Commits