Adds Relay.HasURLs/AllURLs helpers so all relay-enabled checks use the
same definition ("any URL is configured, regardless of whether it came
from Addresses or Endpoints"), and a Normalize step run at startup that:
- Drops unknown transport identifiers (anything outside ws/quic/wt),
reporting them via a warning log so misconfigurations surface loudly
instead of being silently advertised to clients.
- De-duplicates endpoint URLs and per-endpoint transports.
- Preserves empty Transports lists — that's the valid "unknown, let the
client try whatever it supports" signal, distinct from a typo'd list.
The two relay-token guard sites in server.go switch to HasURLs so a
deployment with only Endpoints (no Addresses) actually issues tokens;
before this change those branches would skip token generation. The
startup log now prints the merged URL list plus per-endpoint transport
hints, matching what clients will receive.
Two follow-ups to the WebTransport/ALPN-mux landing:
Deployment templates publish UDP alongside TCP for the relay so the
single ALPN-multiplexed socket can serve raw QUIC and WebTransport
clients on the same port as the existing WebSocket transport.
- docker-compose.yml.tmpl: adds the matching `/udp` mapping; the relay
was already binding both stacks, the host port just wasn't published.
- docker-compose.yml.tmpl.traefik: WebTransport is the awkward case —
Traefik can't proxy WT sessions, so the relay container now publishes
UDP/443 directly and obtains its own Let's Encrypt cert (separate
volume), while the TCP /relay route stays behind Traefik unchanged so
WS-only clients keep working.
Management server learned to advertise per-relay transport hints:
- Config gains an optional `Endpoints []{URL, Transports}` block on the
Relay section, mirrored to clients as RelayConfig.endpoints.
- `Addresses` is still emitted as RelayConfig.urls so older agents keep
working unchanged.
- A single BuildRelayConfigProto helper is the only place that builds
the proto, called from both toNetbirdConfig and the token push paths.
The GeoDNS case is operator-asserted, not probed: a single URL fans out
to several physical relays, and the Transports list must already be the
intersection of what every backend supports. Documented on the config
struct — if any backend behind a hostname can't speak h3, the operator
drops "wt" from that hostname's list and no client tries it there.
When closing go routines and handling peer disconnect, we should avoid canceling the flow due to parent gRPC context cancellation.
This change triggers disconnection handling with a context that is not bound to the parent gRPC cancellation.
* Add support for legacy IDP cache environment variable
* Centralize cache store creation to reuse a single Redis connection pool
Each cache consumer (IDP cache, token store, PKCE store, secrets manager,
EDR validator) was independently calling NewStore, creating separate Redis
clients with their own connection pools — up to 1400 potential connections
from a single management server process.
Introduce a shared CacheStore() singleton on BaseServer that creates one
store at boot and injects it into all consumers. Consumer constructors now
receive a store.StoreInterface instead of creating their own.
For Redis mode, all consumers share one connection pool (1000 max conns).
For in-memory mode, all consumers share one GoCache instance.
* Update management-integrations module to latest version
* sync go.sum
* Export `GetAddrFromEnv` to allow reuse across packages
* Update management-integrations module version in go.mod and go.sum
* Update management-integrations module version in go.mod and go.sum
Remove client secret from gRPC auth flow. The secret was originally included to support providers like Google Workspace that don't offer a proper PKCE flow, but this is no longer necessary with the embedded IdP. Deployments using such providers should migrate to the embedded IdP instead.
The internal Target model uses a plain bool for ProxyProtocol,
which was always serialized to the API response as false even
when not configured. Only set the API field when true so it
gets omitted via omitempty when unset.
Auto-update logic moved out of the UI into a dedicated updatemanager.Manager service that runs in the connection layer. The
UI no longer polls or checks for updates independently.
The update manager supports three modes driven by the management server's auto-update policy:
No policy set by mgm: checks GitHub for the latest version and notifies the user (previous behavior, now centralized)
mgm enforces update: the "About" menu triggers installation directly instead of just downloading the file — user still initiates the action
mgm forces update: installation proceeds automatically without user interaction
updateManager lifecycle is now owned by daemon, giving the daemon server direct control via a new TriggerUpdate RPC
Introduces EngineServices struct to group external service dependencies passed to NewEngine, reducing its argument count from 11 to 4
* Network map now defaults to compacted mode at startup; environment parsing issues yield clearer warnings and disabling compacted mode is logged.
* **Bug Fixes**
* DNS enablement and nameserver selection now correctly respect group membership, reducing incorrect DNS assignments.
* **Refactor**
* Internal routing and firewall rule generation streamlined for more consistent rule IDs and safer peer handling.
* **Performance**
* Minor memory and slice allocation improvements for peer/group processing.
* **New Features**
* Access logs now include bytes_upload and bytes_download (API and schemas updated, fields required).
* Certificate issuance duration is now recorded as a metric.
* **Refactor**
* Metrics switched from Prometheus client to OpenTelemetry-backed meters; health endpoint now exposes OpenMetrics via OTLP exporter.
* **Tests**
* Metric tests updated to use OpenTelemetry Prometheus exporter and MeterProvider.
The combined server was using the hostname from exposedAddress for both
singleAccountModeDomain and dnsDomain, causing fresh installs to get
the wrong domain and existing installs to break if the config changed.
Add resolveDomains() to BaseServer that reads domain from the store:
- Fresh install (0 accounts): uses "netbird.selfhosted" default
- Existing install: reads persisted domain from the account in DB
- Store errors: falls back to default safely
The combined server opts in via AutoResolveDomains flag, while the
standalone management server is unaffected.
The expose tracker used sync.Map for in-memory TTL tracking of active expose sessions, which broke and lost all sessions on restart.
Replace with SQL-backed operations that reuse the existing meta_last_renewed_at column:
- Add store methods: RenewEphemeralService, GetExpiredEphemeralServices, CountEphemeralServicesByPeer, EphemeralServiceExists
- Move duplicate/limit checks inside a transaction with row-level locking (SELECT ... FOR UPDATE) to prevent concurrent bypass
- Reaper re-checks expiry under row lock to avoid deleting a just-renewed service and prevent duplicate event emission
- Add composite index on (source, source_peer) for efficient queries
- Batch-limit and column-select the reaper query to avoid DB/GC spikes
- Filter out malformed rows with empty source_peer