Commit Graph

234 Commits

Author SHA1 Message Date
Zoltán Papp
99505b6bb2 [management, client] Add management-controlled client metrics push
Allow enabling/disabling client metrics push from the dashboard via
account settings instead of requiring env vars on every client.

- Add MetricsConfig proto message to NetbirdConfig
- Add MetricsPushEnabled to account Settings (DB-persisted)
- Expose metrics_push_enabled in OpenAPI and dashboard API handler
- Populate MetricsConfig in sync and login responses
- Client dynamically starts/stops push based on management config
- NB_METRICS_PUSH_ENABLED env var overrides management when explicitly set
- Add activity events for metrics push enable/disable
2026-04-14 17:09:24 +02:00
Viktor Liu
d33cd4c95b [client] Add NAT-PMP/UPnP support (#5202) 2026-04-08 15:29:32 +08:00
Maycon Santos
e2c2f64be7 [client] Fix iOS DNS upstream routing for deselected exit nodes (#5803)
- Add GetSelectedClientRoutes() to the route manager that filters through FilterSelectedExitNodes, returning only active routes instead of all management routes              
  - Use GetSelectedClientRoutes() in the DNS route checker so deselected exit nodes' 0.0.0.0/0 no longer matches upstream DNS IPs — this prevented the resolver from switching
  away from the utun-bound socket after exit node deselection                                                                                                                   
  - Initialize iOS DNS server with host DNS fallback addresses (1.1.1.1:53, 1.0.0.1:53) and a permanent root zone handler, matching Android's behavior — without this, unmatched
   DNS queries arriving via the 0.0.0.0/0 tunnel route had no handler and were silently dropped
2026-04-08 08:43:48 +02:00
Viktor Liu
cb73b94ffb [client] Add TCP DNS support for local listener (#5758) 2026-04-08 07:40:36 +02:00
Viktor Liu
aba5d6f0d2 [client] Error out on netbird expose when block inbound is enabled (#5818) 2026-04-07 17:55:35 +02:00
Maycon Santos
decb5dd3af [client] Add GetSelectedClientRoutes to route manager and update DNS route check (#5802)
- DNS resolution broke after deselecting an exit node because the route checker used all client routes (including deselected ones) to decide how to forward upstream DNS
  queries
  - Added GetSelectedClientRoutes() to the route manager that filters out deselected exit nodes, and switched the DNS route checker to use it
  - Confirmed fix via device testing: after deselecting exit node, DNS queries now correctly use a regular network socket instead of binding to the utun interface
2026-04-05 13:44:53 +02:00
Viktor Liu
145d82f322 [client] Replace iOS DNS IsPrivate heuristic with route manager check (#5694) 2026-03-26 18:11:05 +08:00
Zoltan Papp
91f0d5cefd [client] Feature/client metrics (#5512)
* Add client metrics

* Add client metrics system with OpenTelemetry and VictoriaMetrics support

Implements a comprehensive client metrics system to track peer connection
stages and performance. The system supports multiple backend implementations
(OpenTelemetry, VictoriaMetrics, and no-op) and tracks detailed connection
stage durations from creation through WireGuard handshake.

Key changes:
- Add metrics package with pluggable backend implementations
- Implement OpenTelemetry metrics backend
- Implement VictoriaMetrics metrics backend
- Add no-op metrics implementation for disabled state
- Track connection stages: creation, semaphore, signaling, connection ready, and WireGuard handshake
- Move WireGuard watcher functionality to conn.go
- Refactor engine to integrate metrics tracking
- Add metrics export endpoint in debug server

* Add signaling metrics tracking for initial and reconnection attempts

* Reset connection stage timestamps during reconnections to exclude unnecessary metrics tracking

* Delete otel lib from client

* Update unit tests

* Invoke callback on handshake success in WireGuard watcher

* Add Netbird version tracking to client metrics

Integrate Netbird version into VictoriaMetrics backend and metrics labels. Update `ClientMetrics` constructor and metric name formatting to include version information.

* Add sync duration tracking to client metrics

Introduce `RecordSyncDuration` for measuring sync message processing time. Update all metrics implementations (VictoriaMetrics, no-op) to support the new method. Refactor `ClientMetrics` to use `AgentInfo` for static agent data.

* Remove no-op metrics implementation and simplify ClientMetrics constructor

Eliminate unused `noopMetrics` and refactor `ClientMetrics` to always use the VictoriaMetrics implementation. Update associated logic to reflect these changes.

* Add total duration tracking for connection attempts

Calculate total duration for both initial connections and reconnections, accounting for different timestamp scenarios. Update `Export` method to include Prometheus HELP comments.

* Add metrics push support to VictoriaMetrics integration

* [client] anchor connection metrics to first signal received

* Remove creation_to_semaphore connection stage metric

The semaphore queuing stage (Created → SemaphoreAcquired) is no longer
tracked. Connection metrics now start from SignalingReceived. Updated
docs and Grafana dashboard accordingly.

* [client] Add remote push config for metrics with version-based eligibility

Introduce remoteconfig.Manager that fetches a remote JSON config to control
metrics push interval and restrict pushing to a specific agent version
range. When NB_METRICS_INTERVAL is set, remote config is bypassed
entirely for local override.

* [client] Add WASM-compatible NewClientMetrics implementation

Replace NewClientMetrics in metrics.go with a WASM-specific stub in metrics_js.go, returning nil for compatibility with JS builds. Simplify method usage for WASM targets.

* Add missing file

* Update default case in DeploymentType.String to return "unknown" instead of "selfhosted"

* [client] Rework metrics to use timestamped samples instead of histograms

Replace cumulative Prometheus histograms with timestamped point-in-time
samples that are pushed once and cleared. This fixes metrics for sparse
events (connections/syncs that happen once at startup) where rate() and
increase() produced incorrect or empty results.

Changes:
- Switch from VictoriaMetrics histogram library to raw Prometheus text
  format with explicit millisecond timestamps
- Reset samples after successful push (no resending stale data)
- Rename connection_to_handshake → connection_to_wg_handshake
- Add netbird_peer_connection_count metric for ICE vs Relay tracking
- Simplify dashboard: point-based scatter plots, donut pie chart
- Add maxStalenessInterval=1m to VictoriaMetrics to prevent forward-fill
- Fix deployment_type Unknown returning "selfhosted" instead of "unknown"
- Fix inverted shouldPush condition in push.go

* [client] Add InfluxDB metrics backend alongside VictoriaMetrics

Add influxdb.go with timestamped line protocol export for sparse
one-shot events. Restore victoria.go to use proper Prometheus
histograms. Update Grafana dashboards, add InfluxDB datasource,
and update docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [client] Fix metrics issues and update dev docker setup

- Fix StopPush not clearing push state, preventing restart
- Fix race condition reading currentConnPriority without lock in recordConnectionMetrics
- Fix stale comment referencing old metrics server URL
- Update docker-compose for InfluxDB: add scoped tokens, .env config, init scripts
- Rename docker-compose.victoria.yml to docker-compose.yml

* [client] Add anonymised peer tracking to pushed metrics

Introduce peer_id and connection_pair_id tags to InfluxDB metrics.
Public keys are hashed (truncated SHA-256) for anonymisation. The
connection pair ID is deterministic regardless of which side computes
it, enabling deduplication of reconnections in the ICE vs Relay
dashboard. Also pin Grafana to v11.6.0 for file-based provisioning
and fix datasource UID references.

* Remove unused dependencies from go.mod and go.sum

* Refactor InfluxDB ingest pipeline: extract validation logic

- Move line validation logic to `validateLine` and `validateField` helper functions.
- Improve error handling with structured validation and clearer separation of concerns.
- Add stderr redirection for error messages in `create-tokens.sh`.

* Set non-root user in Dockerfile for Ingest service

* Fix Windows CI: command line too long

* Remove Victoria metrics

* Add hashed peer ID as Authorization header in metrics push

* Revert influxdb in docker compose

* Enable gzip compression and authorization validation for metrics push and ingest

* Reducate code of complexity

* Update debug documentation to include metrics.txt description

* Increase `maxBodySize` limit to 50 MB and update gzip reader wrapping logic

* Refactor deployment type detection to use URL parsing for improved accuracy

* Update readme

* Throttle remote config retries on fetch failure

* Preserve first WG handshake timestamp, ignore rekeys

* Skip adding empty metrics.txt to debug bundle in debug mode

* Update default metrics server URL to https://ingest.netbird.io

* Atomic metrics export-and-reset to prevent sample loss between Export and Reset calls

* Fix doc

* Refactor Push configuration to improve clarity and enforce minimum push interval

* Remove `minPushInterval` and update push interval validation logic

* Revert ExportAndReset, it is acceptable data loss

* Fix metrics review issues: rename env var, remove stale infra, add tests

- Rename NB_METRICS_ENABLED to NB_METRICS_PUSH_ENABLED to clarify that
  collection is always active (for debug bundles) and only push is opt-in
- Change default config URL from staging to production (ingest.netbird.io)
- Delete broken Prometheus dashboard (used non-existent metric names)
- Delete unused VictoriaMetrics datasource config
- Replace committed .env with .env.example containing placeholder values
- Wire Grafana admin credentials through env vars in docker-compose
- Make metricsStages a pointer to prevent reset-vs-write race on reconnect
- Fix typed-nil interface in debug bundle path (GetClientMetrics)
- Use deterministic field order in InfluxDB Export (sorted keys)
- Replace Authorization header with X-Peer-ID for metrics push
- Fix ingest server timeout to use time.Second instead of float
- Fix gzip double-close, stale comments, trim log levels
- Add tests for influxdb.go and MetricsStages

* Add login duration metric, ingest tag validation, and duration bounds

- Add netbird_login measurement recording login/auth duration to management
  server, with success/failure result tag
- Validate InfluxDB tags against per-measurement allowlists in ingest server
  to prevent arbitrary tag injection
- Cap all duration fields (*_seconds) at 300s instead of only total_seconds
- Add ingest server tests for tag/field validation, bounds, and auth

* Add arch tag to all metrics

* Fix Grafana dashboard: add arch to drop columns, add login panels

* Validate NB_METRICS_SERVER_URL is an absolute HTTP(S) URL

* Address review comments: fix README wording, update stale comments

* Clarify env var precedence does not bypass remote config eligibility

* Remove accidentally committed pprof files

---------

Co-authored-by: Viktor Liu <viktor@netbird.io>
2026-03-22 12:45:41 +01:00
Viktor Liu
af8eaa23e2 [client] Restart engine when peer IP address changes (#5614) 2026-03-17 17:00:24 +01:00
Zoltan Papp
fe9b844511 [client] refactor auto update workflow (#5448)
Auto-update logic moved out of the UI into a dedicated updatemanager.Manager service that runs in the connection layer. The
UI no longer polls or checks for updates independently.
The update manager supports three modes driven by the management server's auto-update policy:
No policy set by mgm: checks GitHub for the latest version and notifies the user (previous behavior, now centralized)
mgm enforces update: the "About" menu triggers installation directly instead of just downloading the file — user still initiates the action
mgm forces update: installation proceeds automatically without user interaction
updateManager lifecycle is now owned by daemon, giving the daemon server direct control via a new TriggerUpdate RPC
Introduces EngineServices struct to group external service dependencies passed to NewEngine, reducing its argument count from 11 to 4
2026-03-13 17:01:28 +01:00
Zoltan Papp
f80fe506d5 [client] Fix DNS probe thread safety and avoid blocking engine sync (#5576)
* Fix DNS probe thread safety and avoid blocking engine sync

Refactor ProbeAvailability to prevent blocking the engine's sync mutex
during slow DNS probes. The probe now derives its context from the
server's own context (s.ctx) instead of accepting one from the caller,
and uses a mutex to ensure only one probe runs at a time — new calls
cancel the previous probe before starting. Also fixes a data race in
Stop() when accessing probeCancel without the probe mutex.

* Ensure DNS probe thread safety by locking critical sections

Add proper locking to prevent data races when accessing shared resources during DNS probe execution and Stop(). Update handlers snapshot logic to avoid conflicts with concurrent writers.

* Rename context and remove redundant cancellation

* Cancel first and lock

* Add locking to ensure thread safety when reactivating upstream servers
2026-03-13 13:22:43 +01:00
Maycon Santos
63c83aa8d2 [client,management] Feature/client service expose (#5411)
CLI: new expose command to publish a local port with flags for PIN, password, user groups, custom domain, name prefix and protocol (HTTP default).
Management/API: create/renew/stop expose sessions (streamed status), automatic naming/domain, TTL renewals, background expiration, new management RPCs and client methods.
UI/API: account settings now include peer_expose_enabled and peer_expose_groups; new activity codes for peer expose events.
2026-02-24 10:02:16 +01:00
Zoltan Papp
4a54f0d670 [Client] Remove connection semaphore (#5419)
* [Client] Remove connection semaphore

Remove the semaphore and the initial random sleep time (300ms) from the connectivity logic to speed up the initial connection time.

Note: Implement limiter logic that can prioritize router peers and keep the fast connection option for the first few peers.

* Remove unused function
2026-02-23 20:58:53 +01:00
Maycon Santos
f117fc7509 [client] Log lock acquisition time in receive message handling (#5393)
* Log lock acquisition time in receive message handling

* use offerAnswer.SessionID for session id
2026-02-19 19:18:47 +01:00
Zoltan Papp
fc6b93ae59 [ios] Ensure route settlement on iOS before handling DNS responses (#5360)
* Ensure route settlement on iOS before handling DNS responses to prevent bypassing the tunnel.

* add more logs

* rollback debug changes

* rollback  changes

* [client] Improve logging and add comments for iOS route settlement logic

- Switch iOS route settlement log level from Debug to Trace for finer control.
- Add clarifying comments for `waitForRouteSettlement` on non-iOS platforms.

---------

Co-authored-by: mlsmaycon <mlsmaycon@gmail.com>
2026-02-19 18:53:10 +01:00
Pascal Fischer
f53155562f [management, reverse proxy] Add reverse proxy feature (#5291)
* implement reverse proxy


---------

Co-authored-by: Alisdair MacLeod <git@alisdairmacleod.co.uk>
Co-authored-by: mlsmaycon <mlsmaycon@gmail.com>
Co-authored-by: Eduard Gert <kontakt@eduardgert.de>
Co-authored-by: Viktor Liu <viktor@netbird.io>
Co-authored-by: Diego Noguês <diego.sure@gmail.com>
Co-authored-by: Diego Noguês <49420+diegocn@users.noreply.github.com>
Co-authored-by: Bethuel Mmbaga <bethuelmbaga12@gmail.com>
Co-authored-by: Zoltan Papp <zoltan.pmail@gmail.com>
Co-authored-by: Ashley Mensah <ashleyamo982@gmail.com>
2026-02-13 19:37:43 +01:00
Viktor Liu
1ddc9ce2bf [client] Fix nil pointer panic in device and engine code (#5287) 2026-02-12 09:15:42 +01:00
Zoltan Papp
7bc85107eb Adds timing measurement to handleSync to help diagnose sync performance issues (#5228) 2026-02-06 19:50:48 +01:00
Viktor Liu
0119f3e9f4 [client] Fix netstack detection and add wireguard port option (#5251)
- Add WireguardPort option to embed.Options for custom port configuration
- Fix KernelInterface detection to account for netstack mode
- Skip SSH config updates when running in netstack mode
- Skip interface removal wait when running in netstack mode
- Use BindListener for netstack to avoid port conflicts on same host
2026-02-06 10:03:01 +01:00
Viktor Liu
f74bc48d16 [Client] Stop NetBird on firewall init failure (#5208) 2026-01-29 11:05:06 +01:00
Viktor Liu
d4f7df271a [cllient] Don't track ebpf traffic in conntrack (#5166) 2026-01-27 11:04:23 +01:00
Zoltan Papp
ee3a67d2d8 [client] Fix/health result in bundle (#5164)
* Add support for optional status refresh callback during debug bundle generation

* Always update wg status

* Remove duplicated wg status call
2026-01-23 17:06:07 +01:00
Diego Romar
b3a2992a10 [client/android] - Fix Rosenpass connectivity for Android peers (#5044)
* [client] Add WGConfigurer interface

To allow Rosenpass to work both with kernel
WireGuard via wgctrl (default behavior) and
userspace WireGuard via IPC on Android/iOS
using WGUSPConfigurer

* [client] Remove Rosenpass debug logs

* [client] Return simpler peer configuration in outputKey method

ConfigureDevice, the method previously used in
outputKey via wgClient to update the device's
properties, is now defined in the WGConfigurer
interface and implemented both in kernel_unix and
usp configurers.

PresharedKey datatype was also changed from
boolean to [32]byte to compare it
to the original NetBird PSK, so that Rosenpass
may replace it with its own when necessary.

* [client] Remove unused field

* [client] Replace usage of WGConfigurer

Replaced with preshared key setter interface,
which only defines a method to set / update the preshared key.

Logic has been migrated from rosenpass/netbird_handler to client/iface.

* [client] Use same default peer keepalive value when setting preshared keys

* [client] Store PresharedKeySetter iface in rosenpass manager

To avoid no-op if SetInterface is called before generateConfig

* [client] Add mutex usage in rosenpass netbird handler

* [client] change implementation setting Rosenpass preshared key

Instead of providing a method to configure a device (device/interface.go),
it forwards the new parameters to the configurer (either
kernel_unix.go / usp.go).

This removes dependency on reading FullStats, and makes use of a common
method (buildPresharedKeyConfig in configurer/common.go) to build a
minimal WG config that only sets/updates the PSK.

netbird_handler.go now keeps s list of initializedPeers to choose whether
to set the value of "UpdateOnly" when calling iface.SetPresharedKey.

* [client] Address possible race condition

Between outputKey calls and peer removal; it
checks again if the peer still exists in the
peers map before inserting it in the
initializedPeers map.

* [client] Add psk Rosenpass-initialized check

On client/internal/peer/conn.go, the presharedKey
function would always return the current key
set in wgConfig.presharedKey.

This would eventually overwrite a key set
by Rosenpass if the feature is active.

The purpose here is to set a handler that will
check if a given peer has its psk initialized
by Rosenpass to skip updating the psk
via updatePeer (since it calls presharedKey
method in conn.go).

* Add missing updateOnly flag setup for usp peers

* Change common.go buildPresharedKeyConfig signature

PeerKey datatype changed from string to
wgTypes.Key. Callers are responsible for parsing
a peer key with string datatype.
2026-01-20 13:26:51 -03:00
Zoltan Papp
58daa674ef [Management/Client] Trigger debug bundle runs from API/Dashboard (#4592) (#4832)
This PR adds the ability to trigger debug bundle generation remotely from the Management API/Dashboard.
2026-01-19 11:22:16 +01:00
Viktor Liu
520d9c66cf [client] Fix netstack upstream dns and add wasm debug methods (#4648) 2026-01-14 13:56:16 +01:00
Viktor Liu
b12c084a50 [client] Fall through dns chain for custom dns zones (#5081) 2026-01-12 13:56:39 +01:00
Zoltan Papp
ef2721f4e1 Filter out own peer from remote peers list during peer updates. (#4986) 2025-12-30 10:29:45 +01:00
Zoltan Papp
67f7b2404e [client, management] Feature/ssh fine grained access (#4969)
Add fine-grained SSH access control with authorized users/groups
2025-12-29 12:50:41 +01:00
Zoltan Papp
011cc81678 [client, management] auto-update (#4732) 2025-12-19 19:57:39 +01:00
Maycon Santos
cb6b086164 [client] Reorder subsystem shutdown so peer removal goes first (#4914)
Remove peers before DNS and routes
2025-12-04 21:01:22 +01:00
Zoltan Papp
71b6855e09 [client] Fix engine shutdown deadlock and sync-signal message handling races (#4891)
* Fix engine shutdown deadlock and message handling races

- Release syncMsgMux before waiting for shutdownWg to prevent deadlock
- Check context inside lock in handleSync and receiveSignalEvents
- Prevents nil pointer access when messages arrive during engine stop
2025-12-04 19:51:50 +01:00
Maycon Santos
a293f760af [client] Add conditional peer removal logic during shutdown (#4897) 2025-12-02 16:30:15 +01:00
Maycon Santos
20973063d8 [client] Support disable search domain for custom zones (#4826)
Two new boolean flags, SearchDomainDisabled and SkipPTRProcess, are added to CustomZone and its protobuf; they are propagated through the engine to DNS host logic. Host matching now uses SearchDomainDisabled directly, and PTR collection skips zones with SkipPTRProcess; reverse zones are initialized with SearchDomainDisabled: true.
2025-11-24 17:50:08 +01:00
Diego Romar
32146e576d [android] allow selection/deselection of network resources on android peers (#4607) 2025-11-21 13:36:33 +01:00
Pascal Fischer
3351b38434 [management] pass config to controller (#4807) 2025-11-19 11:52:18 +01:00
Viktor Liu
d71a82769c [client,management] Rewrite the SSH feature (#4015) 2025-11-17 17:10:41 +01:00
Viktor Liu
c92e6c1b5f [client] Block on all subsystems on shutdown (#4709) 2025-11-05 12:15:37 +01:00
Viktor Liu
641eb5140b [client] Allow INPUT traffic on the compat iptables filter table for nftables (#4742) 2025-11-04 21:56:53 +01:00
Viktor Liu
45c25dca84 [client] Clamp MSS on outbound traffic (#4735) 2025-11-04 17:18:51 +01:00
Viktor Liu
86eff0d750 [client] Fix netstack dns forwarder (#4727) 2025-10-31 14:18:09 +01:00
Viktor Liu
1ee575befe [client] Use management-provided dns forwarder port on the client side (#4712) 2025-10-28 22:58:43 +01:00
Zoltan Papp
d7321c130b [client] The status cmd will not be blocked by the ICE probe (#4597)
The status cmd will not be blocked by the ICE probe

Refactor the TURN and STUN probe, and cache the results. The NetBird status command will indicate a "checking…" state.
2025-10-28 16:11:35 +01:00
Viktor Liu
404cab90ba [client] Redirect dns forwarder port 5353 to new listening port 22054 (#4707)
- Port dnat changes from https://github.com/netbirdio/netbird/pull/4015 (nftables/iptables/userspace)
  - For userspace: rewrite the original port to the target port
  - Remember original destination port in conntrack
  - Rewrite the source port back to the original port for replies
- Redirect incoming port 5353 to 22054 (tcp/udp)
- Revert port changes based on the network map received from management
- Adjust tracer to show NAT stages
2025-10-28 15:12:53 +01:00
hakansa
4e03f708a4 fix dns forwarder port update (#4613)
fix dns forwarder port update (#4613)
2025-10-09 17:39:02 +03:00
hakansa
9bcd3ebed4 [management,client] Make DNS ForwarderPort Configurable & Change Well Known Port (#4479)
makes the DNS forwarder port configurable in the management and client components, while changing the well-known port from 5454 to 22054. The change includes version-aware port assignment to ensure backward compatibility.

- Adds a configurable `ForwarderPort` field to the DNS configuration protocol
- Implements version-based port computation that returns the new port (22054) only when all peers support version 0.59.0 or newer
- Updates the client to dynamically restart the DNS forwarder when the port changes
2025-10-02 01:02:10 +02:00
Viktor Liu
b5daec3b51 [client,signal,management] Add browser client support (#4415) 2025-10-01 20:10:11 +02:00
hakansa
644ed4b934 [client] Add WireGuard interface lifecycle monitoring (#4370)
* [client] Add WireGuard interface lifecycle monitoring
2025-09-25 15:36:26 +07:00
Viktor Liu
55126f990c [client] Use native windows sock opts to avoid routing loops (#4314)
- Move `util/grpc` and `util/net` to `client` so `internal` packages can be accessed
 - Add methods to return the next best interface after the NetBird interface.
- Use `IP_UNICAST_IF` sock opt to force the outgoing interface for the NetBird `net.Dialer` and `net.ListenerConfig` to avoid routing loops. The interface is picked by the new route lookup method.
- Some refactoring to avoid import cycles
- Old behavior is available through `NB_USE_LEGACY_ROUTING=true` env var
2025-09-20 09:31:04 +02:00
Zoltan Papp
9e81e782e5 [client] Fix/v4 stun routing (#4430)
Deduplicate STUN package sending.
Originally, because every peer shared the same UDP address, the library could not distinguish which STUN message was associated with which candidate. As a result, the Pion library responded from all candidates for every STUN message.
2025-09-11 10:08:54 +02:00
Zoltan Papp
69d87343d2 [client] Debug information for connection (#4439)
Improve logging

Print the exact time when the first WireGuard handshake occurs
Print the steps for gathering system information
2025-09-08 14:51:34 +02:00