30 Commits

Author SHA1 Message Date
Zoltan Papp
91f0d5cefd [client] Feature/client metrics (#5512)
* Add client metrics

* Add client metrics system with OpenTelemetry and VictoriaMetrics support

Implements a comprehensive client metrics system to track peer connection
stages and performance. The system supports multiple backend implementations
(OpenTelemetry, VictoriaMetrics, and no-op) and tracks detailed connection
stage durations from creation through WireGuard handshake.

Key changes:
- Add metrics package with pluggable backend implementations
- Implement OpenTelemetry metrics backend
- Implement VictoriaMetrics metrics backend
- Add no-op metrics implementation for disabled state
- Track connection stages: creation, semaphore, signaling, connection ready, and WireGuard handshake
- Move WireGuard watcher functionality to conn.go
- Refactor engine to integrate metrics tracking
- Add metrics export endpoint in debug server

* Add signaling metrics tracking for initial and reconnection attempts

* Reset connection stage timestamps during reconnections to exclude unnecessary metrics tracking

* Delete otel lib from client

* Update unit tests

* Invoke callback on handshake success in WireGuard watcher

* Add Netbird version tracking to client metrics

Integrate Netbird version into VictoriaMetrics backend and metrics labels. Update `ClientMetrics` constructor and metric name formatting to include version information.

* Add sync duration tracking to client metrics

Introduce `RecordSyncDuration` for measuring sync message processing time. Update all metrics implementations (VictoriaMetrics, no-op) to support the new method. Refactor `ClientMetrics` to use `AgentInfo` for static agent data.

* Remove no-op metrics implementation and simplify ClientMetrics constructor

Eliminate unused `noopMetrics` and refactor `ClientMetrics` to always use the VictoriaMetrics implementation. Update associated logic to reflect these changes.

* Add total duration tracking for connection attempts

Calculate total duration for both initial connections and reconnections, accounting for different timestamp scenarios. Update `Export` method to include Prometheus HELP comments.

* Add metrics push support to VictoriaMetrics integration

* [client] anchor connection metrics to first signal received

* Remove creation_to_semaphore connection stage metric

The semaphore queuing stage (Created → SemaphoreAcquired) is no longer
tracked. Connection metrics now start from SignalingReceived. Updated
docs and Grafana dashboard accordingly.

* [client] Add remote push config for metrics with version-based eligibility

Introduce remoteconfig.Manager that fetches a remote JSON config to control
metrics push interval and restrict pushing to a specific agent version
range. When NB_METRICS_INTERVAL is set, remote config is bypassed
entirely for local override.

* [client] Add WASM-compatible NewClientMetrics implementation

Replace NewClientMetrics in metrics.go with a WASM-specific stub in metrics_js.go, returning nil for compatibility with JS builds. Simplify method usage for WASM targets.

* Add missing file

* Update default case in DeploymentType.String to return "unknown" instead of "selfhosted"

* [client] Rework metrics to use timestamped samples instead of histograms

Replace cumulative Prometheus histograms with timestamped point-in-time
samples that are pushed once and cleared. This fixes metrics for sparse
events (connections/syncs that happen once at startup) where rate() and
increase() produced incorrect or empty results.

Changes:
- Switch from VictoriaMetrics histogram library to raw Prometheus text
  format with explicit millisecond timestamps
- Reset samples after successful push (no resending stale data)
- Rename connection_to_handshake → connection_to_wg_handshake
- Add netbird_peer_connection_count metric for ICE vs Relay tracking
- Simplify dashboard: point-based scatter plots, donut pie chart
- Add maxStalenessInterval=1m to VictoriaMetrics to prevent forward-fill
- Fix deployment_type Unknown returning "selfhosted" instead of "unknown"
- Fix inverted shouldPush condition in push.go

* [client] Add InfluxDB metrics backend alongside VictoriaMetrics

Add influxdb.go with timestamped line protocol export for sparse
one-shot events. Restore victoria.go to use proper Prometheus
histograms. Update Grafana dashboards, add InfluxDB datasource,
and update docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [client] Fix metrics issues and update dev docker setup

- Fix StopPush not clearing push state, preventing restart
- Fix race condition reading currentConnPriority without lock in recordConnectionMetrics
- Fix stale comment referencing old metrics server URL
- Update docker-compose for InfluxDB: add scoped tokens, .env config, init scripts
- Rename docker-compose.victoria.yml to docker-compose.yml

* [client] Add anonymised peer tracking to pushed metrics

Introduce peer_id and connection_pair_id tags to InfluxDB metrics.
Public keys are hashed (truncated SHA-256) for anonymisation. The
connection pair ID is deterministic regardless of which side computes
it, enabling deduplication of reconnections in the ICE vs Relay
dashboard. Also pin Grafana to v11.6.0 for file-based provisioning
and fix datasource UID references.

* Remove unused dependencies from go.mod and go.sum

* Refactor InfluxDB ingest pipeline: extract validation logic

- Move line validation logic to `validateLine` and `validateField` helper functions.
- Improve error handling with structured validation and clearer separation of concerns.
- Add stderr redirection for error messages in `create-tokens.sh`.

* Set non-root user in Dockerfile for Ingest service

* Fix Windows CI: command line too long

* Remove Victoria metrics

* Add hashed peer ID as Authorization header in metrics push

* Revert influxdb in docker compose

* Enable gzip compression and authorization validation for metrics push and ingest

* Reducate code of complexity

* Update debug documentation to include metrics.txt description

* Increase `maxBodySize` limit to 50 MB and update gzip reader wrapping logic

* Refactor deployment type detection to use URL parsing for improved accuracy

* Update readme

* Throttle remote config retries on fetch failure

* Preserve first WG handshake timestamp, ignore rekeys

* Skip adding empty metrics.txt to debug bundle in debug mode

* Update default metrics server URL to https://ingest.netbird.io

* Atomic metrics export-and-reset to prevent sample loss between Export and Reset calls

* Fix doc

* Refactor Push configuration to improve clarity and enforce minimum push interval

* Remove `minPushInterval` and update push interval validation logic

* Revert ExportAndReset, it is acceptable data loss

* Fix metrics review issues: rename env var, remove stale infra, add tests

- Rename NB_METRICS_ENABLED to NB_METRICS_PUSH_ENABLED to clarify that
  collection is always active (for debug bundles) and only push is opt-in
- Change default config URL from staging to production (ingest.netbird.io)
- Delete broken Prometheus dashboard (used non-existent metric names)
- Delete unused VictoriaMetrics datasource config
- Replace committed .env with .env.example containing placeholder values
- Wire Grafana admin credentials through env vars in docker-compose
- Make metricsStages a pointer to prevent reset-vs-write race on reconnect
- Fix typed-nil interface in debug bundle path (GetClientMetrics)
- Use deterministic field order in InfluxDB Export (sorted keys)
- Replace Authorization header with X-Peer-ID for metrics push
- Fix ingest server timeout to use time.Second instead of float
- Fix gzip double-close, stale comments, trim log levels
- Add tests for influxdb.go and MetricsStages

* Add login duration metric, ingest tag validation, and duration bounds

- Add netbird_login measurement recording login/auth duration to management
  server, with success/failure result tag
- Validate InfluxDB tags against per-measurement allowlists in ingest server
  to prevent arbitrary tag injection
- Cap all duration fields (*_seconds) at 300s instead of only total_seconds
- Add ingest server tests for tag/field validation, bounds, and auth

* Add arch tag to all metrics

* Fix Grafana dashboard: add arch to drop columns, add login panels

* Validate NB_METRICS_SERVER_URL is an absolute HTTP(S) URL

* Address review comments: fix README wording, update stale comments

* Clarify env var precedence does not bypass remote config eligibility

* Remove accidentally committed pprof files

---------

Co-authored-by: Viktor Liu <viktor@netbird.io>
2026-03-22 12:45:41 +01:00
Pascal Fischer
f53155562f [management, reverse proxy] Add reverse proxy feature (#5291)
* implement reverse proxy


---------

Co-authored-by: Alisdair MacLeod <git@alisdairmacleod.co.uk>
Co-authored-by: mlsmaycon <mlsmaycon@gmail.com>
Co-authored-by: Eduard Gert <kontakt@eduardgert.de>
Co-authored-by: Viktor Liu <viktor@netbird.io>
Co-authored-by: Diego Noguês <diego.sure@gmail.com>
Co-authored-by: Diego Noguês <49420+diegocn@users.noreply.github.com>
Co-authored-by: Bethuel Mmbaga <bethuelmbaga12@gmail.com>
Co-authored-by: Zoltan Papp <zoltan.pmail@gmail.com>
Co-authored-by: Ashley Mensah <ashleyamo982@gmail.com>
2026-02-13 19:37:43 +01:00
Maycon Santos
290fe2d8b9 [client/management/signal/relay] Update go.mod to use Go 1.24.10 and upgrade x/crypto dependencies (#4828)
Upgrade Go toolchain and golang.org/x/* deps to 1.24.10, standardize GitHub Actions to derive Go version from go.mod and adjust checkout ordering, raise WASM size limit to 55 MB, update FreeBSD tarball and gomobile refs, fix a few format-string/logging calls, treat usernames ending with $ as system accounts, and add Windows tests.
2025-11-22 10:10:18 +01:00
Maycon Santos
d817584f52 [misc] fix Windows client and management bench tests (#4424)
Windows tests had too many directories, causing issues to the payload via psexec.

Also migrated all checked benchmarks to send data to grafana.
2025-08-31 17:19:56 +02:00
Viktor Liu
73ce746ba7 [misc] Rename CI client tests (#3366) 2025-02-21 19:07:43 +01:00
Zoltan Papp
1ffa519387 [client,relay] Add QUIC support (#2962) 2025-01-15 16:28:19 +01:00
Maycon Santos
287ae81195 [misc] split tests with management and rest (#3051)
optimize go cache for tests
2024-12-14 21:18:46 +01:00
Maycon Santos
4c130a0291 Update Go version to 1.23 (#2588) 2024-09-12 13:46:28 +02:00
Maycon Santos
afb9673bc4 [misc] Update core github actions (#2584) 2024-09-11 21:49:05 +02:00
Maycon Santos
3875c29f6b Revert "Rollback new routing functionality (#1805)" (#1813)
This reverts commit 9f32ccd453.
2024-04-08 18:56:52 +02:00
Viktor Liu
9f32ccd453 Rollback new routing functionality (#1805) 2024-04-05 20:38:49 +02:00
Viktor Liu
7938295190 Feature/exit nodes - Windows and macOS support (#1726) 2024-04-03 11:11:46 +02:00
Maycon Santos
fc7c1e397f Disable force jsonfile variable (#1611)
This enables windows management tests

Added another DNS server to the dns server tests
2024-03-15 10:50:02 +01:00
Yury Gargay
0fbf72434e Make SQLite default for new installations (#1529)
* Make SQLite default for new installations

* if var is not set, return empty string

this allows getStoreEngineFromDatadir to detect json store files

---------

Co-authored-by: Maycon Santos <mlsmaycon@gmail.com>
2024-02-20 15:06:32 +01:00
Zoltan Papp
5242851ecc Use cached wintun zip package in github workflows (#1448) 2024-01-09 10:21:53 +01:00
pascal-fischer
5de4acf2fe Integrate Rosenpass (#1153)
This PR aims to integrate Rosenpass with NetBird. It adds a manager for Rosenpass that starts a Rosenpass server and handles the managed peers. It uses the cunicu/go-rosenpass implementation. Rosenpass will then negotiate a pre-shared key every 2 minutes and apply it to the wireguard connection.

The Feature can be enabled by setting a flag during the netbird up --enable-rosenpass command.

If two peers are both support and have the Rosenpass feature enabled they will create a post-quantum secure connection. If one of the peers or both don't have this feature enabled or are running an older version that does not have this feature yet, the NetBird client will fall back to a plain Wireguard connection without pre-shared keys for those connections (keeping Rosenpass negotiation for the rest).

Additionally, this PR includes an update of all Github Actions workflows to use go version 1.21.0 as this is a requirement for the integration.

---------

Co-authored-by: braginini <bangvalo@gmail.com>
Co-authored-by: Maycon Santos <mlsmaycon@gmail.com>
2024-01-08 12:25:35 +01:00
Bethuel Mmbaga
ee6be58a67 Fix update script's failure to update netbird-ui in binary installation (#1218)
Resolve the problem with the update script that prevents netbird-ui from updating during binary installation.

Introduce the variable UPDATE_NETBIRD. Now we can upgrade the binary installation with

A function stop_running_netbird_ui has been added which checks if NetBird UI is currently running. If so, it stops the UI to allow the application update process to proceed smoothly. This was necessary to prevent conflicts or errors during updates if the UI was running.


---------

Co-authored-by: Maycon Santos <mlsmaycon@gmail.com>
2023-10-19 17:47:39 +02:00
Yury Gargay
46f5f148da Move StoreKind under own StoreConfig configuration and rename to Engine (#1219)
* Move StoreKind under own StoreConfig configuration parameter

* Rename StoreKind option to Engine

* Rename StoreKind internal methods and types to Engine

* Add template engine value test

---------

Co-authored-by: Maycon Santos <mlsmaycon@gmail.com>
2023-10-16 11:19:39 +02:00
Yury Gargay
32880c56a4 Implement SQLite Store using gorm and relational approach (#1065)
Restructure data handling for improved performance and flexibility. 
Introduce 'G'-prefixed fields to represent Gorm relations, simplifying resource management. 
Eliminate complexity in lookup tables for enhanced query and write speed. 
Enable independent operations on data structures, requiring adjustments in the Store interface and Account Manager.
2023-10-12 15:42:36 +02:00
Zoltan Papp
7f5e1c623e Use forked Wireguard-go for custom bind (#823)
Update go version to 1.20
Use forked wireguard-go repo because of custom Bind implementation
2023-04-27 17:50:45 +02:00
Misha Bragin
2eeed55c18 Bind implementation (#779)
This PR adds supports for the WireGuard userspace implementation
using Bind interface from wireguard-go. 
The newly introduced ICEBind struct implements Bind with UDPMux-based
structs from pion/ice to handle hole punching using ICE.
The core implementation was taken from StdBind of wireguard-go.

The result is a single WireGuard port that is used for host and server reflexive candidates. 
Relay candidates are still handled separately and will be integrated in the following PRs.

ICEBind checks the incoming packets for being STUN or WireGuard ones
and routes them to UDPMux (to handle hole punching) or to WireGuard  respectively.
2023-04-13 17:00:01 +02:00
Maycon Santos
d2d5d4b4b9 Update go version (#603)
Removed ioctl code and remove exception from lint action
2022-12-04 13:22:21 +01:00
Maycon Santos
e8d82c1bd3 Feature/dns-server (#537)
Adding DNS server for client

Updated the API with new fields

Added custom zone object for peer's DNS resolution
2022-11-03 18:39:37 +01:00
Maycon Santos
c88e6a7342 Run tests only on branch main and on pull requests (#492)
* Use reusable workflow and control push and pr test exec

* use format

* use path ref

* Run tests on push to main and on PRs
2022-10-03 00:17:16 +05:00
Maycon Santos
e95f0f7acb Support 32 bit (#374)
Add build for 32 bits linux

improved windows test time
2022-07-01 10:42:38 +02:00
Maycon Santos
1aafc15607 Update self hosting scripts (#367)
split setup.env with example and base

add setup.env to .gitignore to avoid overwrite from new versions

Added test workflow for docker-compose 
and validated configure.sh generated variables
2022-06-24 14:50:14 +02:00
Maycon Santos
97ab8f4c34 Updating Go 1.18 and Wireguard dependencies (#282) 2022-03-22 14:59:17 +01:00
Maycon Santos
1cd1e84290 Run tests in serial and update multi-peer test (#269)
Updates test workflows with serial execution to avoid collision 
of ports and resource names.

Also, used -exec sudo flag for UNIX tests and removed not-needed
 limits configuration on Linux and added a 5 minutes timeout.

Updated the multi-peer tests in the client/internal/engine_test.go
 to provide proper validation when creating or starting 
a peer engine instance fails.

As some operations of the tests running on windows
 are slow, we will experiment with disabling the Defender before 
restoring cache and checkout a repository, then we reenable 
it to run the tests.

disabled extra logs for windows interface
2022-03-16 11:02:06 +01:00
Maycon Santos
0739038d51 Fix unstable parallel tests (#202)
* update interface tests and configuration messages

* little debug

* little debug on both errors

* print all devs

* list of devices

* debug func

* handle interface close

* debug socks

* debug socks

* if ports match

* use random assigned ports

* remove unused const

* close management client connection when stopping engine

* GracefulStop when management clients are closed

* enable workflows on PRs too

* remove iface_test debug code
2022-01-25 09:40:28 +01:00
Maycon Santos
64f2d295a8 Refactor Interface package and update windows driver (#192)
* script to generate syso files

* test wireguard-windows driver package

* set int log

* add windows test

* add windows test

* verbose bash

* use cd

* move checkout

* exit 0

* removed tty flag

* artifact path

* fix tags and add cache

* fix cache

* fix cache

* test dir

* restore artifacts in the root

* try dll file

* try dll file

* copy dll

* typo in copy dll

* compile test

* checkout first

* updated cicd

* fix add address issue and gen GUID

* psexec typo

* accept eula

* mod tidy before tests

* regular test exec and verbose test with psexec

* test all

* return WGInterface Interface

* use WgIfaceName and timeout after 30 seconds

* different ports and validate connect 2 peers

* Use time.After for timeout and close interface

* Use time.After for testing connect peers

* WG Interface struct

* Update engine and parse address

* refactor Linux create and assignAddress

* NewWGIface and configuration methods

* Update proxy with interface methods

* update up command test

* resolve lint warnings

* remove psexec test

* close copied files

* add goos before build

* run tests on mac,windows and linux

* cache by testing os

* run on push

* fix indentation

* adjust test timeouts

* remove parallel flag

* mod tidy before test

* ignore syso files

* removed functions and renamed vars

* different IPs for connect peers test

* Generate syso with DLL

* Single Close method

* use port from test constant

* test: remove wireguard interfaces after finishing engine test

* use load_wgnt_from_rsrc

Co-authored-by: braginini <bangvalo@gmail.com>
2022-01-17 14:01:58 +01:00