netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-04-16 07:16:38 +00:00

Author	SHA1	Message	Date
Maycon Santos	e2c2f64be7	[client] Fix iOS DNS upstream routing for deselected exit nodes (#5803 ) - Add GetSelectedClientRoutes() to the route manager that filters through FilterSelectedExitNodes, returning only active routes instead of all management routes - Use GetSelectedClientRoutes() in the DNS route checker so deselected exit nodes' 0.0.0.0/0 no longer matches upstream DNS IPs — this prevented the resolver from switching away from the utun-bound socket after exit node deselection - Initialize iOS DNS server with host DNS fallback addresses (1.1.1.1:53, 1.0.0.1:53) and a permanent root zone handler, matching Android's behavior — without this, unmatched DNS queries arriving via the 0.0.0.0/0 tunnel route had no handler and were silently dropped	2026-04-08 08:43:48 +02:00
Zoltan Papp	0efef671d7	[client] Unexport GetServerPublicKey, add HealthCheck method (#5735 ) * Unexport GetServerPublicKey, add HealthCheck method Internalize server key fetching into Login, Register, GetDeviceAuthorizationFlow, and GetPKCEAuthorizationFlow methods, removing the need for callers to fetch and pass the key separately. Replace the exported GetServerPublicKey with a HealthCheck() error method for connection validation, keeping IsHealthy() bool for non-blocking background monitoring. Fix test encryption to use correct key pairs (client public key as remotePubKey instead of server private key). * Refactor `doMgmLogin` to return only error, removing unused response	2026-04-07 12:18:21 +02:00
tham-le	81f45dab21	[client] Support embed.Client on Android with netstack mode (#5623 ) * [client] Support embed.Client on Android with netstack mode embed.Client.Start() calls ConnectClient.Run() which passes an empty MobileDependency{}. On Android, the engine dereferences nil fields (IFaceDiscover, NetworkChangeListener, DnsReadyListener) causing panics. Provide complete no-op stubs so the engine's existing Android code paths work unchanged — zero modifications to engine.go: - Add androidRunOverride hook in Run() for Android-specific dispatch - Add runOnAndroidEmbed() with complete MobileDependency (all stubs) - Wire default stubs via init() in connect_android_default.go: noopIFaceDiscover, noopNetworkChangeListener, noopDnsReadyListener - Forward logPath to c.run() Tested: embed.Client starts on Android arm64, joins mesh via relay, discovers peers, localhost proxy works for TCP+UDP forwarding. * [client] Fix TestServiceParamsPath for Windows path separators Use filepath.Join in test assertions instead of hardcoded POSIX paths so the test passes on Windows where filepath.Join uses backslashes.	2026-04-01 16:19:34 +02:00
Zoltan Papp	91f0d5cefd	[client] Feature/client metrics (#5512 ) * Add client metrics * Add client metrics system with OpenTelemetry and VictoriaMetrics support Implements a comprehensive client metrics system to track peer connection stages and performance. The system supports multiple backend implementations (OpenTelemetry, VictoriaMetrics, and no-op) and tracks detailed connection stage durations from creation through WireGuard handshake. Key changes: - Add metrics package with pluggable backend implementations - Implement OpenTelemetry metrics backend - Implement VictoriaMetrics metrics backend - Add no-op metrics implementation for disabled state - Track connection stages: creation, semaphore, signaling, connection ready, and WireGuard handshake - Move WireGuard watcher functionality to conn.go - Refactor engine to integrate metrics tracking - Add metrics export endpoint in debug server * Add signaling metrics tracking for initial and reconnection attempts * Reset connection stage timestamps during reconnections to exclude unnecessary metrics tracking * Delete otel lib from client * Update unit tests * Invoke callback on handshake success in WireGuard watcher * Add Netbird version tracking to client metrics Integrate Netbird version into VictoriaMetrics backend and metrics labels. Update `ClientMetrics` constructor and metric name formatting to include version information. * Add sync duration tracking to client metrics Introduce `RecordSyncDuration` for measuring sync message processing time. Update all metrics implementations (VictoriaMetrics, no-op) to support the new method. Refactor `ClientMetrics` to use `AgentInfo` for static agent data. * Remove no-op metrics implementation and simplify ClientMetrics constructor Eliminate unused `noopMetrics` and refactor `ClientMetrics` to always use the VictoriaMetrics implementation. Update associated logic to reflect these changes. * Add total duration tracking for connection attempts Calculate total duration for both initial connections and reconnections, accounting for different timestamp scenarios. Update `Export` method to include Prometheus HELP comments. * Add metrics push support to VictoriaMetrics integration * [client] anchor connection metrics to first signal received * Remove creation_to_semaphore connection stage metric The semaphore queuing stage (Created → SemaphoreAcquired) is no longer tracked. Connection metrics now start from SignalingReceived. Updated docs and Grafana dashboard accordingly. * [client] Add remote push config for metrics with version-based eligibility Introduce remoteconfig.Manager that fetches a remote JSON config to control metrics push interval and restrict pushing to a specific agent version range. When NB_METRICS_INTERVAL is set, remote config is bypassed entirely for local override. * [client] Add WASM-compatible NewClientMetrics implementation Replace NewClientMetrics in metrics.go with a WASM-specific stub in metrics_js.go, returning nil for compatibility with JS builds. Simplify method usage for WASM targets. * Add missing file * Update default case in DeploymentType.String to return "unknown" instead of "selfhosted" * [client] Rework metrics to use timestamped samples instead of histograms Replace cumulative Prometheus histograms with timestamped point-in-time samples that are pushed once and cleared. This fixes metrics for sparse events (connections/syncs that happen once at startup) where rate() and increase() produced incorrect or empty results. Changes: - Switch from VictoriaMetrics histogram library to raw Prometheus text format with explicit millisecond timestamps - Reset samples after successful push (no resending stale data) - Rename connection_to_handshake → connection_to_wg_handshake - Add netbird_peer_connection_count metric for ICE vs Relay tracking - Simplify dashboard: point-based scatter plots, donut pie chart - Add maxStalenessInterval=1m to VictoriaMetrics to prevent forward-fill - Fix deployment_type Unknown returning "selfhosted" instead of "unknown" - Fix inverted shouldPush condition in push.go * [client] Add InfluxDB metrics backend alongside VictoriaMetrics Add influxdb.go with timestamped line protocol export for sparse one-shot events. Restore victoria.go to use proper Prometheus histograms. Update Grafana dashboards, add InfluxDB datasource, and update docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [client] Fix metrics issues and update dev docker setup - Fix StopPush not clearing push state, preventing restart - Fix race condition reading currentConnPriority without lock in recordConnectionMetrics - Fix stale comment referencing old metrics server URL - Update docker-compose for InfluxDB: add scoped tokens, .env config, init scripts - Rename docker-compose.victoria.yml to docker-compose.yml * [client] Add anonymised peer tracking to pushed metrics Introduce peer_id and connection_pair_id tags to InfluxDB metrics. Public keys are hashed (truncated SHA-256) for anonymisation. The connection pair ID is deterministic regardless of which side computes it, enabling deduplication of reconnections in the ICE vs Relay dashboard. Also pin Grafana to v11.6.0 for file-based provisioning and fix datasource UID references. * Remove unused dependencies from go.mod and go.sum * Refactor InfluxDB ingest pipeline: extract validation logic - Move line validation logic to `validateLine` and `validateField` helper functions. - Improve error handling with structured validation and clearer separation of concerns. - Add stderr redirection for error messages in `create-tokens.sh`. * Set non-root user in Dockerfile for Ingest service * Fix Windows CI: command line too long * Remove Victoria metrics * Add hashed peer ID as Authorization header in metrics push * Revert influxdb in docker compose * Enable gzip compression and authorization validation for metrics push and ingest * Reducate code of complexity * Update debug documentation to include metrics.txt description * Increase `maxBodySize` limit to 50 MB and update gzip reader wrapping logic * Refactor deployment type detection to use URL parsing for improved accuracy * Update readme * Throttle remote config retries on fetch failure * Preserve first WG handshake timestamp, ignore rekeys * Skip adding empty metrics.txt to debug bundle in debug mode * Update default metrics server URL to https://ingest.netbird.io * Atomic metrics export-and-reset to prevent sample loss between Export and Reset calls * Fix doc * Refactor Push configuration to improve clarity and enforce minimum push interval * Remove `minPushInterval` and update push interval validation logic * Revert ExportAndReset, it is acceptable data loss * Fix metrics review issues: rename env var, remove stale infra, add tests - Rename NB_METRICS_ENABLED to NB_METRICS_PUSH_ENABLED to clarify that collection is always active (for debug bundles) and only push is opt-in - Change default config URL from staging to production (ingest.netbird.io) - Delete broken Prometheus dashboard (used non-existent metric names) - Delete unused VictoriaMetrics datasource config - Replace committed .env with .env.example containing placeholder values - Wire Grafana admin credentials through env vars in docker-compose - Make metricsStages a pointer to prevent reset-vs-write race on reconnect - Fix typed-nil interface in debug bundle path (GetClientMetrics) - Use deterministic field order in InfluxDB Export (sorted keys) - Replace Authorization header with X-Peer-ID for metrics push - Fix ingest server timeout to use time.Second instead of float - Fix gzip double-close, stale comments, trim log levels - Add tests for influxdb.go and MetricsStages * Add login duration metric, ingest tag validation, and duration bounds - Add netbird_login measurement recording login/auth duration to management server, with success/failure result tag - Validate InfluxDB tags against per-measurement allowlists in ingest server to prevent arbitrary tag injection - Cap all duration fields (_seconds) at 300s instead of only total_seconds - Add ingest server tests for tag/field validation, bounds, and auth Add arch tag to all metrics * Fix Grafana dashboard: add arch to drop columns, add login panels * Validate NB_METRICS_SERVER_URL is an absolute HTTP(S) URL * Address review comments: fix README wording, update stale comments * Clarify env var precedence does not bypass remote config eligibility * Remove accidentally committed pprof files --------- Co-authored-by: Viktor Liu <viktor@netbird.io>	2026-03-22 12:45:41 +01:00
Zoltan Papp	fe9b844511	[client] refactor auto update workflow (#5448 ) Auto-update logic moved out of the UI into a dedicated updatemanager.Manager service that runs in the connection layer. The UI no longer polls or checks for updates independently. The update manager supports three modes driven by the management server's auto-update policy: No policy set by mgm: checks GitHub for the latest version and notifies the user (previous behavior, now centralized) mgm enforces update: the "About" menu triggers installation directly instead of just downloading the file — user still initiates the action mgm forces update: installation proceeds automatically without user interaction updateManager lifecycle is now owned by daemon, giving the daemon server direct control via a new TriggerUpdate RPC Introduces EngineServices struct to group external service dependencies passed to NewEngine, reducing its argument count from 11 to 4	2026-03-13 17:01:28 +01:00
Viktor Liu	0b21498b39	[client] Fix close of closed channel panic in ConnectClient retry loop (#5470 )	2026-03-02 10:07:53 +01:00
Viktor Liu	0119f3e9f4	[client] Fix netstack detection and add wireguard port option (#5251 ) - Add WireguardPort option to embed.Options for custom port configuration - Fix KernelInterface detection to account for netstack mode - Skip SSH config updates when running in netstack mode - Skip interface removal wait when running in netstack mode - Use BindListener for netstack to avoid port conflicts on same host	2026-02-06 10:03:01 +01:00
Zoltan Papp	58daa674ef	[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592 ) (#4832 ) This PR adds the ability to trigger debug bundle generation remotely from the Management API/Dashboard.	2026-01-19 11:22:16 +01:00
Viktor Liu	520d9c66cf	[client] Fix netstack upstream dns and add wasm debug methods (#4648 )	2026-01-14 13:56:16 +01:00
Zoltan Papp	b7e98acd1f	[client] Android profile switch (#4884 ) Expose the profile-manager service for Android. Logout was not part of the manager service implementation. In the future, I recommend moving this logic there.	2025-12-22 22:09:05 +01:00
Zoltan Papp	011cc81678	[client, management] auto-update (#4732 )	2025-12-19 19:57:39 +01:00
Zoltan Papp	71b6855e09	[client] Fix engine shutdown deadlock and sync-signal message handling races (#4891 ) * Fix engine shutdown deadlock and message handling races - Release syncMsgMux before waiting for shutdownWg to prevent deadlock - Check context inside lock in handleSync and receiveSignalEvents - Prevents nil pointer access when messages arrive during engine stop	2025-12-04 19:51:50 +01:00
Diego Romar	32146e576d	[android] allow selection/deselection of network resources on android peers (#4607 )	2025-11-21 13:36:33 +01:00
Viktor Liu	d71a82769c	[client,management] Rewrite the SSH feature (#4015 )	2025-11-17 17:10:41 +01:00
Viktor Liu	c92e6c1b5f	[client] Block on all subsystems on shutdown (#4709 )	2025-11-05 12:15:37 +01:00
Viktor Liu	55126f990c	[client] Use native windows sock opts to avoid routing loops (#4314 ) - Move `util/grpc` and `util/net` to `client` so `internal` packages can be accessed - Add methods to return the next best interface after the NetBird interface. - Use `IP_UNICAST_IF` sock opt to force the outgoing interface for the NetBird `net.Dialer` and `net.ListenerConfig` to avoid routing loops. The interface is picked by the new route lookup method. - Some refactoring to avoid import cycles - Old behavior is available through `NB_USE_LEGACY_ROUTING=true` env var	2025-09-20 09:31:04 +02:00
Zoltan Papp	47e64d72db	[client] Fix client status check (#4474 ) The client status is not enough to protect the RPC calls from concurrency issues, because it is handled internally in the client in an asynchronous way.	2025-09-11 16:21:09 +02:00
Viktor Liu	d4c067f0af	[client] Don't deactivate upstream resolvers on failure (#4128 )	2025-08-29 17:40:05 +02:00
Viktor Liu	f063866ce8	[client] Add flag to configure MTU (#4213 )	2025-08-26 16:00:14 +02:00
Viktor Liu	1022a5015c	[client] Eliminate upstream server strings in dns code (#4267 )	2025-08-11 11:57:21 +02:00
Viktor Liu	1d5e871bdf	[misc] Move shared components to shared directory (#4286 ) Moved the following directories: ``` - management/client → shared/management/client - management/domain → shared/management/domain - management/proto → shared/management/proto - signal/client → shared/signal/client - signal/proto → shared/signal/proto - relay/client → shared/relay/client - relay/auth → shared/relay/auth ``` and adjusted import paths	2025-08-05 15:22:58 +02:00
Viktor Liu	3d3c4c5844	[client] Add full sync response to debug bundle (#4287 )	2025-08-05 14:55:50 +02:00
hakansa	cb8b6ca59b	[client] Feat: Support Multiple Profiles (#3980 ) [client] Feat: Support Multiple Profiles (#3980)	2025-07-25 16:54:46 +03:00
Maycon Santos	56a1a75e3f	[client] Support random wireguard port on client (#4085 ) Adds support for using a random available WireGuard port when the user specifies port `0`. - Updates `freePort` logic to bind to the requested port (including `0`) without falling back to the default. - Removes default port assignment in the configuration path, allowing `0` to propagate. - Adjusts tests to handle dynamically assigned ports when using `0`.	2025-07-02 09:01:02 +02:00
Viktor Liu	e71383dcb9	[client] Add missing client meta flags (#3898 )	2025-06-10 14:27:58 +02:00
Viktor Liu	1ce4ee0cef	[client] Add block inbound flag to disallow inbound connections of any kind (#3897 )	2025-06-03 10:53:27 +02:00
Zoltan Papp	daa8380df9	[client] Feature/lazy connection (#3379 ) With the lazy connection feature, the peer will connect to target peers on-demand. The trigger can be any IP traffic. This feature can be enabled with the NB_ENABLE_EXPERIMENTAL_LAZY_CONN environment variable. When the engine receives a network map, it binds a free UDP port for every remote peer, and the system configures WireGuard endpoints for these ports. When traffic appears on a UDP socket, the system removes this listener and starts the peer connection procedure immediately. Key changes Fix slow netbird status -d command Move from engine.go file to conn_mgr.go the peer connection related code Refactor the iface interface usage and moved interface file next to the engine code Add new command line flag and UI option to enable feature The peer.Conn struct is reusable after it has been closed. Change connection states Connection states Idle: The peer is not attempting to establish a connection. This typically means it's in a lazy state or the remote peer is expired. Connecting: The peer is actively trying to establish a connection. This occurs when the peer has entered an active state and is continuously attempting to reach the remote peer. Connected: A successful peer-to-peer connection has been established and communication is active.	2025-05-21 11:12:28 +02:00
Viktor Liu	a675531b5c	[client] Set up signal to generate debug bundles (#3683 )	2025-04-16 11:06:22 +02:00
Zoltan Papp	636a0e2475	[client] Fix engine restart (#3435 ) - Refactor the network monitoring to handle one event and it after return - In the engine restart cancel the upper layer context and the responsibility of the engine stop will be the upper layer - Before triggering a restart, the engine checks whether the state is already down. This helps avoid unnecessary delayed network restart events.	2025-03-10 13:32:12 +01:00
Zoltan Papp	aaa23beeec	[client] Prevent to block channel writing (#3474 ) The "runningChan" provides feedback to the UI or any client about whether the service is up and running. If the client exits earlier than when the service successfully starts, then this channel causes a block. - Added timeout for reading the channel to ensure we don't cause blocks for too long for the caller - Modified channel writing operations to be non-blocking	2025-03-10 13:17:09 +01:00
Viktor Liu	b307298b2f	[client] Add netbird ui improvements (#3222 )	2025-02-21 16:29:21 +01:00
hakansa	39986b0e97	[client, management] Support DNS Labels for Peer Addressing (#3252 ) * [client] Support Extra DNS Labels for Peer Addressing * [management] Support Extra DNS Labels for Peer Addressing --------- Co-authored-by: Viktor Liu <17948409+lixmal@users.noreply.github.com>	2025-02-20 13:43:20 +03:00
Viktor Liu	18f84f0df5	[client] Check for fwmark support and use fallback routing if not supported (#3220 )	2025-02-11 13:09:17 +01:00
Viktor Liu	97d498c59c	[misc, client, management] Replace Wiretrustee with Netbird (#3267 )	2025-02-05 16:49:41 +01:00
Viktor Liu	a7ddb8f1f8	[client] Replace engine probes with direct calls (#3195 )	2025-01-28 12:25:45 +01:00
Viktor Liu	bc7b2c6ba3	[client] Report client system flags to management server on login (#3187 )	2025-01-16 13:58:00 +01:00
Viktor Liu	78795a4a73	[client] Add block lan access flag for routers (#3171 )	2025-01-15 17:39:47 +01:00
Viktor Liu	d9905d1a57	[client] Add disable system flags (#3153 )	2025-01-07 20:38:18 +01:00
Viktor Liu	f08605a7f1	[client] Enable network map persistence by default (#3152 )	2025-01-06 14:11:43 +01:00
Pascal Fischer	e40a29ba17	[client] Add support for state manager on iOS (#2996 )	2024-12-06 16:51:42 +01:00
Viktor Liu	e5d42bc963	[client] Add state handling cmdline options (#2821 )	2024-12-03 16:07:18 +01:00
Viktor Liu	17c20b45ce	[client] Add network map to debug bundle (#2966 )	2024-12-03 14:50:12 +01:00
Zoltan Papp	2a5cb16494	[relay] Refactor initial Relay connection (#2800 ) Can support firewalls with restricted WS rules allow to run engine without Relay servers keep up to date Relay address changes	2024-11-22 18:12:34 +01:00
Viktor Liu	a7d5c52203	Fix error state race on mgmt connection error (#2892 )	2024-11-15 22:59:49 +01:00
Viktor Liu	e0bed2b0fb	[client] Fix race conditions (#2869 ) * Fix concurrent map access in status * Fix race when retrieving ctx state error * Fix race when accessing service controller server instance	2024-11-11 14:55:10 +01:00
Viktor Liu	8016710d24	[client] Cleanup firewall state on startup (#2768 )	2024-10-24 14:46:24 +02:00
Viktor Liu	869537c951	[client] Cleanup dns and route states on startup (#2757 )	2024-10-24 10:53:46 +02:00
Carlos Hernandez	f603cd9202	[client] Check wginterface instead of engine ctx (#2676 ) Moving code to ensure wgInterface is gone right after context is cancelled/stop in the off chance that on next retry the backoff operation is permanently cancelled and interface is abandoned without destroying.	2024-10-04 19:15:16 +02:00
Zoltan Papp	fd67892cb4	[client] Refactor/iface pkg (#2646 ) Refactor the flat code structure	2024-10-02 18:24:22 +02:00
Carlos Hernandez	1ef51a4ffa	[client] Ensure engine is stopped before starting it back (#2565 ) Before starting a new instance of the engine, check if it is nil and stop the current instance	2024-09-13 16:46:59 +02:00

1 2 3

113 Commits