logoutFromProfile failed hard when the management server returned NotFound
(peer already deleted from the dashboard), blocking both profile logout and
profile removal. Treat NotFound as success — the peer is already gone, so
deregistering it is already satisfied.
Also drop the user-side per-profile state file on logout. The account email is
sourced from <profile>.state.json (written by the CLI after SSO login), which
the root daemon can't reach, so logout left a stale email showing in the UI.
Connection.Logout now removes it from the UI process after a successful logout;
the next SSO login recreates it.
On minimal window managers (fluxbox et al, the in-process XEmbed-tray
path) the WM neither centers small windows nor restores their position
across a hide -> show round-trip, so the main, Settings, and dialog
windows opened in the top-left corner instead of centered.
These windows are created Hidden, so Wails' Linux/GTK4 backend skips its
post-Show centering pass (gated on !Hidden) and InitialPosition has no
effect on an unrealized window. Re-center from Go after Show, gated on
the minimal-WM environment via a recenterOnShow predicate (set to
xembedTrayAvailable on Linux, nil on macOS/Windows where the WM handles
placement). centerWhenReady polls from a background goroutine until the
move actually lands -- Center() moves via raw X11, which no-ops while the
GdkSurface is still nil and GTK4 realizes it asynchronously after Show().
Also reorder xembed_host_linux.go so the static helpers (xembedTrayAvailable,
goMenuItemClicked) sit at the end, after the constructor and methods.
Previously the SyncResponse was persisted to syncStore before
updateNetworkMap() ran. If applying the network map failed, the engine
persisted state it never applied, so GetLatestSyncResponse() could return
stale/unapplied state. Move the persistence into the post-apply success
path so the persisted response always reflects what the engine applied.
The Profiles and Exit Node submenus (and the About version/Update rows)
stopped reflecting changes on KDE/Plasma: after the first profile switch
the menu froze on its initial snapshot, and "Manage Profiles" — plus the
profile rows themselves — stopped responding to clicks entirely.
Root cause (confirmed via dbus-monitor): Plasma's StatusNotifierItem host
caches a submenu's layout the first time it is opened (GetLayout for that
submenu id) and never re-fetches it on a LayoutUpdated(parent=0) signal.
The old submenu.Clear()+Add() repaint allocated fresh monotonic item ids
each time but reused the same submenu container id, so Plasma kept showing
the stale snapshot and, on click, sent the stale ids back — which the
rebuilt itemMap no longer knew, silently no-op'ing the click.
Fix: route every dynamic tray-menu change through a new relayoutMenu that
rebuilds the whole tree (buildMenu + repaint cached state + a single
SetMenu), allocating brand-new submenu container ids. Plasma treats those
as unseen and re-queries them on next open, fixing both the stale paint
and the dead clicks. loadProfiles/refreshExitNodes now cache their rows
and drive relayoutMenu; the update row goes through a new onMenuChange
hook; the daemon-version row relayouts too. relayoutMenu is serialised by
menuMu and the fill*Submenu helpers are pure UI (no fetch, no SetMenu) so
it never recurses. The whole-tree SetMenu also subsumes the prior darwin
detached-NSMenu workaround.
* Adds heuristic to detect an edge case on Linux where a system has configured logrotate as a separate service to rotate log files which would mangle our client log files. If we detect logrotate being configured for netbird, we disable our rotation.
* Adds new env var to disable log rotation: NB_LOG_DISABLE_ROTATION
* Adds compressed and plain logrotate files to debug bundle.
* Replaces lumberjack with timberjack (maintained fork with bug fixes and extra features).
* Clarifies which daemon version is running in the bundle stats.
* Change logging for client service status to console
The daemon emits no dedicated profile-changed RPC event, and a profile
add/remove doesn't move the connection status, so the UI's SubscribeStatus
path never fired for CLI-driven `netbird profile add|remove` (and the tray's
iconChanged guard would swallow it anyway). The tray menu and the React
profile list stayed stale until the next status-string transition.
AddProfile/RemoveProfile now publish a marked INFO/SYSTEM event over
SubscribeEvents (metadata kind=profile-list-changed, empty userMessage so it
stays silent). The UI's dispatchSystemEvent recognises the marker and
re-emits the existing EventProfileChanged, which the tray's loadProfiles and
React's ProfileContext.refresh already subscribe to — so both surfaces
refresh from a single signal that originates in the shared daemon handler
(covering both CLI and UI-initiated removals). No proto change.
Also drop a stray, build-breaking `app.Updater` line in main.go.
startLogin held both guards (the module-level loginInFlight and the
caller's React-level loginGuard) across the post-failure errorDialog
await. The native Windows MessageBox disables its parent for its whole
lifetime while the main window's WindowClosing hook hides instead of
closing, so the dialog promise can outlive the click — and even a clean
dismissal kept the guards held until the promise settled. Until then
every later Connect click and tray trigger-login was silently dropped at
the guard check, so the only way back was a client restart.
Release both guards the instant the flow itself settles, before the
dialog: startLogin now takes an onSettled callback fired in its finally
(driveLogin releases loginGuard through it), and the errorDialog await
moved out of the try/finally so no guard is ever gated on the dialog.
windows:build:console fixed -tags production, which disables the
WebKit/WebView2 DevTools inspector — so there was no way to get a
console-attached Windows binary with the frontend JS console reachable.
Mirror build:native's DEV handling: DEV=true drops the production tag
while keeping the console subsystem (no -H windowsgui).
The informational status row at the top of the tray menu was disabled on
Linux, which painted the connection-status indicator greyed-out. Enable
it on Linux so the row renders at full opacity; it has no OnClick handler
so clicking it remains a no-op.
* [infrastructure] allow docker image overrides for getting started
Make dashboard and server image configurations overrideable via environment variables
* [infrastructure] update Traefik gRPC rule to include ProxyService PathPrefix
* make Traefik and CrowdSec images configurable via environment variables
After a successful WaitSSOLogin the daemon deliberately stays in
StatusNeedsLogin, and after a mid-session expiry (peer kicked out by the
management server) the engine tears down with clientRunning == false. In
both cases the caller's Up takes the fresh-start branch, which only
accepted StatusIdle and rejected NeedsLogin with
"up already in progress: current status NeedsLogin".
This forced a second Up to actually connect (CLI: re-run `netbird up`;
GUI: click Connect again). Treat NeedsLogin as a legitimate fresh-start
entry state and reset it to Idle before starting the engine, so the
first Up after login drives Connecting -> Connected directly.
* Persist sync response via pluggable store (disk on iOS)
The latest Management sync response (which carries the network map) was
kept in memory for debug bundle generation. On memory-constrained
platforms like iOS the network map can be large enough to matter.
Introduce a syncstore package with a Store interface and two backends:
a memory backend (the previous behavior) and a disk backend that
serializes the response to a file in the state directory. The backend
is selected per-platform at build time: disk on iOS, memory elsewhere.
The disk store clears any leftover file on construction so a fresh
store never reads stale data from an earlier run (e.g. another
profile's network map).
In the engine, drop the separate persistSyncResponse bool: the store is
only instantiated while persistence is enabled, and its presence is
what marks persistence as active. The store is also cleared on engine
close so the file does not linger on disk.
* syncstore: silence nilnil linter on "nothing stored" returns
Get returns (nil, nil) to signal that nothing is stored, which is part
of the Store contract and preserves the original behaviour. Annotate
both backends with //nolint:nilnil so golangci-lint does not flag it.
* syncstore: hold syncRespMux for the whole store Set/Get
Both handleSync and GetLatestSyncResponse snapshotted e.syncStore under
the read lock and then released it before calling Set/Get. That allowed
SetSyncResponsePersistence(false) or engine close to clear the store
mid-call. In particular a concurrent Clear()+nil followed by a late
Set could re-create the file that was just removed, defeating the
leak/lingering protection.
Hold syncRespMux for the duration of the store operation in both spots
so the store cannot be cleared while a Set/Get is in flight.
* syncstore: avoid StateDir "." when state path is empty
On mobile the state path may be empty (the engine tolerates a missing
state file). filepath.Dir("") returns ".", which would make a
disk-backed syncstore write into the working directory instead of
letting NewDiskStore fall back to os.TempDir().
Only set engineConfig.StateDir when path is non-empty.
* Refactor to use a common checker for development version
* Adds commit sha to development version for cobra command only
Leave dashboard unaffected
* Adjust for "v0.31.1-dev" test case
which must be considered pre-release
* Drop synthetic "dev"/"0.50.0-dev" firewall feature-gate fixtures
These test cases encoded the loose strings.Contains(v, "dev")
semantics inherited from peerSupportedFirewallFeatures, but
NetbirdVersion() never produces those values — only the literal
"development" (and now "development-<sha>[-dirty]") ever flows
through the wire. The agent owns the semantics of an ephemeral
development build, so the tests should exercise the strings we
actually emit.
Replaced with development, development-<sha> and
development-<sha>-dirty cases that match the HasPrefix("development")
predicate introduced upstream.
* Remove unexistent tests on wire format
The sha / dirty flag are added only when the CLI asks the version.
Account versions is unaffacted and can only strictly match "development"
* Adds tests for IsDevelopmentVersion
WebKitGTK's accelerated GL compositor crashes with a SIGSEGV inside
g_application_run on some Intel setups, hitting Mesa anv/i965 code paths
for DRM format modifiers that aren't implemented (FINISHME: YUV
colorspace / multi-planar formats). Disabling the DMA-BUF renderer alone
doesn't cover the GL compositor, so the crash survived that workaround.
Set WEBKIT_DISABLE_COMPOSITING_MODE=1 in init() (skipped if the user
already set it) to force CPU rendering, which is fine for a UI this
small and sidesteps the broken modifier path.
Split Linux panel-theme detection into two files and fix the KDE case
where the tray icon picked the wrong mono variant.
The freedesktop Settings portal's color-scheme reports the *global*
light/dark preference, but the KDE panel is painted from the
Complementary colour group, which can be dark even when the global
scheme is Light. The tray sits on the panel, so keying its black/white
mono icon off the portal value alone gave the wrong contrast on KDE.
Changes:
- tray_theme_linux.go keeps the dark/light decision; on KDE it now
reads the user's kdeglobals [Colors:Complementary] BackgroundNormal
to determine the actual panel luminance, falling back to the portal
color-scheme / GTK_THEME chain elsewhere.
- tray_theme_watcher_linux.go (new) owns the live half: a private
session-bus connection for the portal SettingChanged signal plus an
fsnotify watch on kdeglobals, repainting the tray on a panel-theme
flip.
- tray_theme_linux_test.go (new) covers the kdeglobals Complementary
parse against the KDE test-VM's real file layout.
* fix(proxy): gate tunnel-peer fast-path on inbound listener marker
forwardWithTunnelPeer previously accepted any RFC1918 / ULA / CGNAT
source IP, so a public client whose address happened to fall in those
ranges could bypass the configured operator auth scheme by colliding
with a known tunnel IP. The fast-path is now gated on
TunnelLookupFromContext(r.Context()) being present — that context value
is attached only by the per-account inbound (overlay) listener, so the
host-facing listener never enters this branch.
Tests updated to reflect the new requirement: requests that don't
carry the inbound marker now fall through to the regular auth flow.
* fix(proxy): harden inbound listener resource + startup-ctx handling
Three correctness fixes on the per-account inbound path, with tests:
- Close the logrus ErrorLog PipeWriter on tearDown. WriterLevel hands
back an *io.PipeWriter backed by a pipe + scanner goroutine that the
caller owns; the two writers per account (https + plain) were never
closed, leaking the pipe and goroutine on every teardown.
- Run the post-Start hooks on context.Background(). runClientStartup
is launched in a goroutine from AddPeer and was inheriting the
caller's request-scoped ctx, so a cancelled request could abort the
inbound bring-up or fail the management status notification. The
tail is split into notifyClientReady so the contract is testable.
Tests cover the PipeWriter close behaviour and assert the readyHandler
+ NotifyStatus calls receive a non-cancelled background context.
* feat(proxy): short-circuit peer-own-target loops with 421
When a peer that hosts the target of a private service dials its own
service URL the request was being looped through the proxy and back
over WireGuard to the same peer — twice the WG round-trip for no
benefit, with no signal to the caller that something was wrong.
Add isSelfTargetLoop to ReverseProxy.ServeHTTP: when the request
arrived on the per-account overlay listener (IsOverlayOrigin) and the
source tunnel IP matches the target host, refuse the request with 421
Misdirected Request and a body pointing the operator at the backend
directly.
The gate is scoped to overlay origin so requests on the public
listener that happen to share a source IP with the target host are
forwarded normally.
* fix(management): private-service validation + tunnel-IP lookup semantics
- Require an explicit port for L4 cluster targets. validateL4Target
exempted TargetTypeCluster from the port check, but buildPathMappings
serializes every L4 target via net.JoinHostPort(host, port) — port=0
shipped a ":0" upstream. Cluster targets use the same Host/Port
fields, so the same requirement applies.
- GetPeerByIP returns NotFound on a tunnel-IP miss instead of mapping
every error to Internal. The proxy's ValidateTunnelPeer probes IPs
that legitimately aren't in the roster; the miss is expected and now
distinguishable from a real store failure.
- Thread ctx into getClusterCapability's gorm query so a cancelled
request doesn't keep the store busy.
Tests updated for the L4-cluster port requirement and the GetPeerByIP
NotFound path.
* fix(client): include offlinePeers in PeerStateByIP lookup
ReplaceOfflinePeers moves peers into d.offlinePeers but PeerStateByIP
only scanned d.peers. Callers (the local DNS filter via
localPeerConnectivity, embed.Client.IdentityForIP used by the
proxy's tunnel-peer validator) were treating known-but-offline peers
as unknown, which:
- causes the DNS filter to keep returning records pointing at peers
that have no live tunnel, AND
- makes the proxy's local-roster check deny a request from such a
peer rather than letting the cached management RPC carry the
authorisation decision.
Search both slices in PeerStateByIP. Adds a unit test for the IPv4
and IPv6 offline-match paths.
* fix(rest): reject empty Delete path params in reverse-proxy clients
ReverseProxyClustersAPI.Delete and ReverseProxyTokensAPI.Delete passed
the path parameter into url.PathEscape without an empty check.
PathEscape("") returns "" which collapses the request onto the
collection endpoint ("/api/reverse-proxies/clusters/" /
"/api/reverse-proxies/proxy-tokens/"), so a caller bug delete with no
id reached a routable URL with surprising semantics (typically 405).
Short-circuit with a typed error before the request is built. Tests
mount a handler on the collection path that fails the test if hit, so
the regression is impossible to reintroduce silently.
* chore(api,ci,docs,test): private-service schema, proto-check, fixups
Non-functional cleanups and contract/CI hardening around the
private-service work:
API schema (openapi.yml):
- Require a non-empty access_groups and mode=http when private=true,
on both Service and ServiceRequest, mirroring
validatePrivateRequirements. mode stays optional-but-constrained
(empty defaults to http server-side), matching runtime.
CI (proto-version-check.yml):
- Cover renamed .pb.go files (read base via previous_filename).
- Match protoc-gen-go-grpc version headers (optional "- " prefix and
-gen-go-grpc suffix) so grpc-generated files are in scope.
Docs / comments:
- Reword Config field docs to say defaults are applied at Server.Start
(initDefaults), not New.
- Rename the obsolete --private-inbound flag to --private across
comments and the proto doc.
Pre-existing test fixups surfaced by review:
- Repair the integration-tagged validate_session_test.go (SignToken
signature growth + new Manager interface methods).
- Fix the CI-skip boolean precedence so Windows isn't skipped
unconditionally.
- Guard the router.HTTPListener type assertion with comma-ok.
* fix(proxy): background ctx for already-started AddPeer notification
The earlier ctx fix covered the async runClientStartup path but missed
the synchronous branch: when a service is added to an already-started
client, AddPeer called NotifyStatus with the caller's request-scoped
ctx. A cancelled request/stream could drop the connected notification
to management. Use context.Background() here too, matching
notifyClientReady.
Extends TestNetBird_AddPeer_ExistingStartedClient_NotifiesStatus to
pass a pre-cancelled caller ctx and assert the notification still ran
on a non-cancelled context.
* use the cmd context for roundtripper
Two Linux packaging issues, both surfaced by the netbird-ui deb/rpm
built from .goreleaser_ui.yaml.
1) License / vendor metadata
----------------------------
The nfpm entries for the UI package set only maintainer/description/
homepage, leaving the License and Vendor RPM/DEB tags empty. KDE
Discover (and GNOME Software) then render the package as
"Licenses: Unknown" / "Unknown author", with a scary license-warning
popup on install. The daemon's main .goreleaser.yaml already set
license (commit #5659) but never vendor, and the UI config was skipped
entirely.
Fix: add `license: BSD-3-Clause` + `vendor: NetBird` to both UI nfpm
entries (deb + rpm), and `vendor: NetBird` to the daemon's deb + rpm
entries for consistency. BSD-3-Clause is correct for client/ui — the
repo is BSD-3-Clause except management/, signal/, relay/, combined/
(AGPLv3), none of which the UI touches.
2) KDE Wayland window/taskbar icon
----------------------------------
On KDE Plasma 6 under Wayland the app launched with the generic Wails
icon in the window titlebar and the taskbar / Alt-Tab switcher, even
though /usr/share/pixmaps/netbird.png (the launcher icon, resolved from
the desktop entry's `Icon=netbird`) was correct.
Root cause is how a Wayland compositor decides a window's icon. Unlike
X11 there is no per-window _NET_WM_ICON the app can push at runtime —
GTK4 even removed gtk_window_set_icon, so the embedded assets/netbird.png
the binary carries is simply ignored. Instead the compositor matches the
window's Wayland **app_id** to an installed .desktop file and uses that
entry's `Icon=` key.
The app_id is not "netbird": Wails hardcodes it as `org.wails.<name>`
(pkg/application/linux_cgo.go: `fmt.Sprintf("org.wails.%s", name)`, name
= sanitized Options.Name), yielding **org.wails.netbird**. There is no
Wails option to override the prefix. Verified the live value on the
Fedora-40 / KDE 6.3 / Wayland test VM by dumping workspace.windowList()
via a KWin script:
js: ZZZWIN org.wails.netbird ## NetBird
KDE needs two things to associate the running window with our desktop
entry and thus paint our icon:
- the desktop entry's basename should equal the app_id, so the
titlebar decoration (which looks the entry up by app_id ->
<app_id>.desktop) finds it, and
- a `StartupWMClass=<app_id>` line, which the taskbar / task switcher
use to map the surface to the entry.
Fix (no Wails fork needed — app_id stays org.wails.netbird, which the
user never sees; only Name=NetBird and the icon are visible):
- install the desktop file as `org.wails.netbird.desktop` instead of
`netbird.desktop` (both deb and rpm contents in .goreleaser_ui.yaml)
- add `StartupWMClass=org.wails.netbird` to
client/ui/build/linux/netbird.desktop
`Icon=netbird` and the pixmaps/netbird.png payload are unchanged — they
were already correct. Confirmed on the test VM that both the titlebar
and taskbar/Alt-Tab now show the NetBird icon.
The XEmbed tray (panel) can come up after the autostarted UI on minimal
WMs, so the single startup probe added in #6320 could miss a tray that
appears a second or two later, leaving the icon silently absent. Re-probe
for a ~10s grace period in a goroutine, claiming the watcher as soon as a
tray shows up; back off cleanly if none ever appears (headless/Wayland).
MarkManagement{Connected,Disconnected} and MarkSignal{Connected,
Disconnected} fired notifyStateChange unconditionally. The connect
goroutine re-marks the same state on every health-check cycle, so a
steady "connected -> connected" re-mark pushed a full SubscribeStatus
snapshot to every consumer each time — flooding the desktop UI (and its
tray) with identical Connected snapshots.
Guard each with an early return when neither the state nor the error
actually changed, so only real transitions wake SubscribeStatus
subscribers. The notifier already deduplicates, so collapsing both calls
under one guard is safe.
Status(GetFullPeerStatus=true) RPCs trigger a full health probe
(network round-trips to management, signal and the relays). The
desktop UI issues these frequently and concurrently, and a burst of
parallel Get() calls each fired its own probe — the lastProbe guard
was unprotected against concurrent access and only advanced when every
component was healthy, so a sustained unhealthy state (e.g. relay down)
disabled the throttle entirely and let every call re-probe.
Extract the throttle/single-flight policy into probeThrottle:
- single-flight: only one probe runs at a time; concurrent callers
that piled up while it ran share its result instead of each
launching another, even when that probe failed.
- throttle: lastOK only advances on a fully successful probe, so
while anything is unhealthy callers keep probing frequently and
notice recovery quickly (preserved from the original design).
RunHealthProbes now takes a context so a caller that gives up (e.g. a
Status RPC whose client disconnected) cancels the in-flight STUN/TURN
probe instead of letting it run to its per-component timeout. The
engine's own lifetime ctx still applies independently.
Linux now shows monochrome (black/white silhouette) tray icons instead
of the colored orange PNGs, matching the macOS template look. Since
Wails' Linux SNI backend ignores SetDarkModeIcon (its setDarkModeIcon
just calls setIcon, last-write-wins) and the SNI spec carries no panel
light/dark hint, the panel color scheme is detected in-process and the
black-vs-white silhouette is chosen in iconForState, pushed via a single
SetIcon.
Detection order (tray_theme_linux.go): freedesktop Settings portal
(org.freedesktop.appearance/color-scheme) -> GTK_THEME env (:dark
suffix) -> default dark. A SettingChanged subscription repaints live on
theme flips. macOS (template) and Windows (colored) paths are unchanged.
Icons are 48x48 mono PNGs (3% margin) generated from the macOS
silhouettes.
WebKitGTK crashes at startup when its bubblewrap sandbox can't create an
unprivileged user namespace (bwrap: setting up uid map: Permission denied
-> Failed to fully launch dbus-proxy -> panic in webkit_web_view_load_uri).
This happens in containers/VMs and on Ubuntu 24.04+ where AppArmor
restricts unprivileged user namespaces. Detect that the kernel blocks
userns via procfs and set WEBKIT_DISABLE_SANDBOX_THIS_IS_DANGEROUS so the
UI stays usable; honor an explicit user override either way.