Compare commits

..

6 Commits

Author SHA1 Message Date
Zoltán Papp
ef0601b8f4 [client/ios] Add Auth.Stop() to cancel an in-progress interactive login
The iOS PKCE login runs in the main-app process, decoupled from the network
extension (the extension's client context is torn down on login-required, which
would otherwise kill the WaitToken goroutine before the OAuth callback arrives).
Because it is decoupled, nothing aborted the flow when the user dismissed the
browser without logging in: WaitToken kept its loopback HTTP server bound to the
redirect port until the flow expired, so the next connect stalled trying to bind
the same port.

Make the Auth context cancellable and add Auth.Stop(), which cancels it. Cancelling
unblocks WaitToken, whose deferred server.Shutdown frees the port immediately. This
mirrors how Android's stopEngine() aborts login via the engine context.

NewAuthWithConfig now also derives a cancellable context; its only iOS caller uses
LoginSync (no interactive server), so behaviour is unchanged there.
2026-06-19 17:39:00 +02:00
Zoltan Papp
6df0175607 [client] Add IsLoginRequiredCached for iOS mobile client (#6447)
Expose a network-free login-required check backed by the in-memory status
recorder. Unlike IsLoginRequired(), which creates a fresh auth client and
performs a blocking network call, IsLoginRequiredCached() reports whether the
LAST observed management error was an auth failure (PermissionDenied/
InvalidArgument).

This lets the iOS connection listener detect a mid-session token expiry from
within onDisconnected during teardown without blocking on a slow or
unavailable network.
2026-06-16 16:15:19 +02:00
Zoltan Papp
3c23700e56 [client] Add iOS debug bundle support in Go (#6270)
* Add iOS debug bundle support in Go

Thread cacheDir through NewClient -> RunOniOS -> MobileDependency.TempDir
so the iOS client can pass its sandbox-writable cache directory for
debug bundle zip file creation instead of os.TempDir().

Move log collection into platform-dispatched addPlatformLog():
- iOS: adds the file-based Go client log (with rotation, stderr/stdout
  companions and anonymization handled by addLogfile) plus the Swift app
  log (swift-log.log) written by the iOS app into the same log directory
- Other non-Android platforms: existing file-based log + systemd fallback

Narrow the debug_nonandroid.go build tag to !android && !ios so iOS no
longer attempts the systemd journal fallback.

Add a DebugBundle() entry point to the iOS Go client that generates a
bundle, uploads it and returns the upload key. It works with or without
a running engine: when the engine is up it reuses the live config, sync
response and client metrics; otherwise it loads the config from disk (or
the preloaded tvOS config). Guard the live config/ConnectClient behind a
state mutex since DebugBundle may run on a different thread.

* Include the iOS state file in the debug bundle

addStateFile() resolved the state path via ServiceManager.GetStatePath(),
which on iOS points at a hard-coded default that does not exist in the app
sandbox, so the state file was silently skipped.

Add an optional StatePath to GeneratorDependencies and use it when set,
falling back to the ServiceManager default otherwise. The iOS DebugBundle
passes the client's actual state file path (the App Group profile state),
matching the Android bundle which includes the state file.

* ios: enable sync response persistence for debug bundle

Turn on sync response persistence before starting the engine so
DebugBundle can include the network map. On iOS the store is disk-backed
(see syncstore) to keep the map out of the constrained process memory.

* ios: pass log file path through NewClient constructor (#6393)

Add logFilePath field to Client struct and expose it as a parameter
in NewClient so callers provide the Go log path at construction time.
Wire it into DebugBundle via GeneratorDependencies.LogPath so the
debug bundle includes client.log and swift-log.log regardless of
whether the bundle is triggered by the app or the management server.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* ios: pass log file path to engine for remote debug bundles

RunOniOS started the engine with an empty LogPath, so EngineConfig.LogPath
was never set. Management-triggered (jobs) debug bundles read the log path
from the engine config, so they collected no client logs (client.log,
rotated logs, swift-log.log). The GUI path was unaffected because it passes
c.logFilePath directly to the bundle generator.

Thread c.logFilePath through RunOniOS into the engine config so remote
bundles include the client logs too.

---------

Co-authored-by: evgeniyChepelev <68751844+evgeniyChepelev@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-16 15:54:46 +02:00
Pascal Fischer
38ad2b67e8 [proxy] fix context for udprelay (#6444) 2026-06-16 14:41:17 +02:00
Pascal Fischer
01aa49433e [management] delete targets when deleting exposed service (#6442) 2026-06-16 14:33:24 +02:00
Zoltan Papp
08a2b63675 [client] propagate exit-node deselect to synthesized v6 (::/0) route (#6296)
* [client] propagate exit-node deselect to synthesized v6 (::/0) route

When a client deselects an IPv4 exit node, the auto-generated IPv6 default
route (::/0) was still selected and pushed onto the tunnel interface, even
though the user disabled the exit node. On an exit node without a real IPv6
egress this blackholes IPv6 traffic, and because clients prefer IPv6 (happy
eyeballs) it can break general connectivity.

Root cause: the synthesized v6 route gets a different NetID than its v4 base
(base + "-v6"). The route selector keys deselects by NetID and defaults
unknown NetIDs to selected, so the "-v6" entry was never matched by the v4
deselect. The effectiveNetID() mirror that solves exactly this is used by
HasUserSelectionForRoute and FilterSelectedExitNodes, but categorizeUserSelection
called the raw IsSelected(), bypassing it and mis-categorizing the v6 pair as
user-selected.

Add RouteSelector.IsSelectedForExitNode(), which applies effectiveNetID before
the selection check, and use it in categorizeUserSelection. IsSelected() is left
untouched so non-exit code paths don't make unrelated "*-v6" routes inherit v4
state. Adds regression tests for the v4/v6 deselect mirror and explicit-v6
override.

* [client] add DIAG logging to trace exit-node v6 (::/0) route filtering

Temporary diagnostics to find why a deselected v4 exit node's synthesized
::/0 route still reaches the tunnel. Logs the full install path: incoming
client networks, route-selector state before/after the management-driven
update, what updateExitNodeSelections deselects/selects, and per-route
KEEP/SKIP/DROP decisions in FilterSelectedExitNodes and applyExitNodeFilter.
To be reverted once the real root cause is confirmed from a client log.

* [client] clear orphaned v6 exit selection when v4 pair is toggled

Root cause of the leaking ::/0 route, confirmed from client logs: the
synthesized "-v6" exit route could stay explicitly selected in the persisted
route-selector state while its v4 base was deselected (selected=[...-v6],
deselected=[...v4base]). Because the v6 entry then has its own explicit state,
effectiveNetID stops mirroring the v4 base, so FilterSelectedExitNodes keeps
::/0 and it is installed on the tunnel even though the user disabled the exit
node. This happened because the iOS SDK's deselect only pairs the "-v6" sibling
via ExpandV6ExitPairs when the v6 route is present in the current routesMap; a
deselect at a moment it wasn't expanded left the v6 selection orphaned.

Fix at the selector write path so it is independent of routesMap timing: when a
v4 exit NetID is selected or deselected, clear any orphaned explicit state on
its "-v6" sibling (clearPairedV6Locked), unless the sibling is part of the same
batch (the deliberate ExpandV6ExitPairs case). The v6 then falls back to
inheriting the v4 base via effectiveNetID, so a v4 deselect also drops ::/0 and
a v4 select brings both back.

Adds regression tests: a stale explicit v6 selection is cleared by a later v4
deselect, and an explicit v6 select made in the same batch is preserved.

* [ios] compute route connection status in the bridge

The iOS bridge exposed a route's Network as a possibly comma-joined string
("0.0.0.0/0, ::/0" for a merged exit node) but no connection status, forcing
the UI to infer status by string-matching that joined value against peer
routes — which never matched for the merged exit node, leaving it stuck as
not-connected. Android already computes status in the core (findBestRoutePeer).

Mirror that here: add a Status field to RoutesSelectionInfo and compute it from
the connected peers' route tables, matching the route's primary prefix, a merged
exit node's extra v6 prefix, or a dynamic route's domain pattern (the key the
route manager records). The UI can now read the status directly.

* [client] remove exit-node v6 DIAG logging and tidy routeselector

Drop the temporary DIAG diagnostics added to trace the leaking ::/0 route
(the root cause is fixed and confirmed). Also reorganize routeselector.go so
the exit-node helpers (clearPairedV6Locked, isExitNode) sit next to the
exit-node code paths and MarshalJSON/UnmarshalJSON are grouped together.

* [client] mirror v4 exit selection onto v6 pair at write time

The synthesized "-v6" exit route shares its v4 base's NetID plus a "-v6"
suffix. Selection state was reconciled at read time via effectiveNetID, a
mirror that could only be applied on exit-node code paths, which forced a
parallel IsSelectedForExitNode() alongside IsSelected() and a clearPairedV6Locked()
orphan cleanup on every toggle. That machinery still missed the case observed
in the field: a persisted state with the v4 base deselected but its "-v6"
sibling explicitly selected (orphaned). Because effectiveNetID returns the v6
entry itself once it carries explicit state, and clearPairedV6Locked only fires
on a live toggle, the loaded orphan survived and the ::/0 route leaked onto the
tunnel despite the exit node being disabled, breaking IPv6 (happy eyeballs).

Treat the v4/v6 exit pair as a single toggle and keep state consistent at write
time instead. RouteSelector.SyncPairedSelection forces the "-v6" entry to match
its v4 base unconditionally, resetting any orphaned explicit state. The route
manager, which knows the route prefixes, computes the pairs (V6ExitMergeSet) and
calls it from updateRouteSelectorFromManagement before selection is read, so both
collectExitNodeInfo and FilterSelectedExitNodes see consistent state, including
pairs loaded from persisted selector state.

This removes effectiveNetID, IsSelectedForExitNode and clearPairedV6Locked; the
selector is literal again and no longer needs the "exit-node paths only" caveat.
HasUserSelectionForRoute and applyExitNodeFilter use the raw NetID.

Adds a selector test for SyncPairedSelection (including the orphaned-v6 case) and
a route-manager test reproducing the persisted-orphan scenario from the field log.

* [client] add DIAG logging to trace v6 exit-pair mirror

The write-time mirror did not eliminate the leak in field testing. Re-add the
DIAG diagnostics around the exit-node selection flow to capture a fresh trace:

- UpdateRoutes: incoming client networks, selector state before/after the
  management update, and the networks remaining after FilterSelectedExitNodes.
- mirrorV6ExitPairSelections: the NetIDs present in this update and the v6 pairs
  V6ExitMergeSet derives from them (reveals whether the v4 base and its ::/0 pair
  are present in the same update so the pair can be matched).
- SyncPairedSelection: the base/paired state before and after the sync.
- FilterSelectedExitNodes / applyExitNodeFilter: per-route SKIP/KEEP/DROP and the
  selection lookups behind each decision.
- updateExitNodeSelections / logExitNodeUpdate: categorization and deselect set.

Temporary; to be removed once the root cause is confirmed.

* [client] remove v6 exit-pair mirror DIAG logging

Drop the temporary DIAG diagnostics added to trace the v4/v6 exit-pair mirror.
The field log confirmed the write-time mirror keeps the pair consistent (the
::/0 route is only ever applied alongside its v4 base and is dropped on deselect),
so the diagnostics are no longer needed.
2026-06-16 12:27:58 +02:00
22 changed files with 640 additions and 272 deletions

View File

@@ -279,11 +279,9 @@ func (c *Client) Start(startCtx context.Context) error {
select {
case <-startCtx.Done():
// ConnectClient.Stop now cancels its own run context and waits for the
// run loop to tear the engine down, so this cancel() is no longer
// required to break the deadlock and could be removed. It is kept as a
// defensive belt-and-suspenders: cancelling the parent context first
// guarantees the run loop is unblocked even if Stop's contract regresses.
// Cancel the client context before stopping: Engine.Start blocks on the
// signal stream while holding the engine mutex and only unblocks on
// cancellation. Stopping first would deadlock on that mutex.
cancel()
if stopErr := client.Stop(); stopErr != nil {
return fmt.Errorf("stop error after context done. Stop error: %w. Context done: %w", stopErr, startCtx.Err())

View File

@@ -11,7 +11,6 @@ import (
"runtime/debug"
"strings"
"sync"
"sync/atomic"
"time"
"github.com/cenkalti/backoff/v4"
@@ -55,10 +54,6 @@ var androidRunOverride func(c *ConnectClient, runningChan chan struct{}, logPath
type ConnectClient struct {
ctx context.Context
runCancel context.CancelFunc
runExited chan struct{}
runOnce sync.Once
runStarted atomic.Bool
config *profilemanager.Config
statusRecorder *peer.Status
@@ -75,14 +70,8 @@ func NewConnectClient(
config *profilemanager.Config,
statusRecorder *peer.Status,
) *ConnectClient {
// Derive the run context here so Stop owns the cancel that unblocks the run
// loop. runCancel is set once at construction, so Stop can call it without
// racing the run loop's startup. Callers therefore need not cancel before Stop.
runCtx, runCancel := context.WithCancel(ctx)
return &ConnectClient{
ctx: runCtx,
runCancel: runCancel,
runExited: make(chan struct{}),
ctx: ctx,
config: config,
statusRecorder: statusRecorder,
engineMutex: sync.Mutex{},
@@ -129,6 +118,8 @@ func (c *ConnectClient) RunOniOS(
networkChangeListener listener.NetworkChangeListener,
dnsManager dns.IosDnsManager,
stateFilePath string,
cacheDir string,
logFilePath string,
) error {
// Set GC percent to 5% to reduce memory usage as iOS only allows 50MB of memory for the extension.
debug.SetGCPercent(5)
@@ -138,16 +129,12 @@ func (c *ConnectClient) RunOniOS(
NetworkChangeListener: networkChangeListener,
DnsManager: dnsManager,
StateFilePath: stateFilePath,
TempDir: cacheDir,
}
return c.run(mobileDependency, nil, "")
return c.run(mobileDependency, nil, logFilePath)
}
func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan struct{}, logPath string) error {
// Mark the loop as started and signal exit on return so Stop can wait for
// the loop to finish (and skip the wait if the loop never ran).
c.runStarted.Store(true)
defer c.runOnce.Do(func() { close(c.runExited) })
defer func() {
if r := recover(); r != nil {
rec := c.statusRecorder
@@ -303,7 +290,7 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
log.Debug(err)
if s, ok := gstatus.FromError(err); ok && (s.Code() == codes.PermissionDenied) {
state.Set(StatusNeedsLogin)
c.runCancel()
_ = c.Stop()
return backoff.Permanent(wrapErr(err)) // unrecoverable error
}
return wrapErr(err)
@@ -423,10 +410,14 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
c.engine = nil
c.engineMutex.Unlock()
log.Infof("ensuring wg interface is removed, Netbird engine context cancelled")
// todo: consider to remove this condition. Is not thread safe.
// We should always call Stop(), but we need to verify that it is idempotent
if engine.wgInterface != nil {
log.Infof("ensuring %s is removed, Netbird engine context cancelled", engine.wgInterface.Name())
if err := engine.Stop(); err != nil {
log.Errorf("Failed to stop engine: %v", err)
if err := engine.Stop(); err != nil {
log.Errorf("Failed to stop engine: %v", err)
}
}
c.statusRecorder.ClientTeardown()
@@ -442,12 +433,12 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
}
c.statusRecorder.ClientStart()
err = backoff.Retry(operation, backoff.WithContext(backOff, c.ctx))
err = backoff.Retry(operation, backOff)
if err != nil {
log.Debugf("exiting client retry loop due to unrecoverable error: %s", err)
if s, ok := gstatus.FromError(err); ok && (s.Code() == codes.PermissionDenied) {
state.Set(StatusNeedsLogin)
c.runCancel()
_ = c.Stop()
}
return err
}
@@ -525,9 +516,11 @@ func (c *ConnectClient) Status() StatusType {
}
func (c *ConnectClient) Stop() error {
c.runCancel()
if c.runStarted.Load() {
<-c.runExited
engine := c.Engine()
if engine != nil {
if err := engine.Stop(); err != nil {
return fmt.Errorf("stop engine: %w", err)
}
}
return nil
}

View File

@@ -250,6 +250,7 @@ type BundleGenerator struct {
syncResponse *mgmProto.SyncResponse
logPath string
tempDir string
statePath string
cpuProfile []byte
capturePath string
refreshStatus func() // Optional callback to refresh status before bundle generation
@@ -276,6 +277,7 @@ type GeneratorDependencies struct {
SyncResponse *mgmProto.SyncResponse
LogPath string
TempDir string // Directory for temporary bundle zip files. If empty, os.TempDir() is used.
StatePath string // Path to the state file. If empty, the ServiceManager default path is used.
CPUProfile []byte
CapturePath string
RefreshStatus func()
@@ -299,6 +301,7 @@ func NewBundleGenerator(deps GeneratorDependencies, cfg BundleConfig) *BundleGen
syncResponse: deps.SyncResponse,
logPath: deps.LogPath,
tempDir: deps.TempDir,
statePath: deps.StatePath,
cpuProfile: deps.CPUProfile,
capturePath: deps.CapturePath,
refreshStatus: deps.RefreshStatus,
@@ -850,8 +853,11 @@ func (g *BundleGenerator) maskSecrets() {
}
func (g *BundleGenerator) addStateFile() error {
sm := profilemanager.NewServiceManager("")
path := sm.GetStatePath()
path := g.statePath
if path == "" {
sm := profilemanager.NewServiceManager("")
path = sm.GetStatePath()
}
if path == "" {
return nil
}

View File

@@ -0,0 +1,36 @@
//go:build ios
package debug
import (
"path/filepath"
log "github.com/sirupsen/logrus"
)
// swiftLogFile is the Swift app log written by the iOS app into the same log
// directory as the Go client log, so it can be collected into the bundle.
const swiftLogFile = "swift-log.log"
// addPlatformLog collects logs for the iOS debug bundle. iOS has no logcat or
// systemd journal, so we rely on file-based logs. addLogfile handles the Go
// client log (logPath) with rotation, the stderr/stdout companions and
// anonymization. The iOS app writes its own Swift log into the same directory,
// so we add it alongside the Go log.
func (g *BundleGenerator) addPlatformLog() error {
if err := g.addLogfile(); err != nil {
return err
}
if g.logPath == "" {
return nil
}
swiftLogPath := filepath.Join(filepath.Dir(g.logPath), swiftLogFile)
if err := g.addSingleLogfile(swiftLogPath, swiftLogFile); err != nil {
// The Swift log is best-effort: the app may not have written it yet.
log.Warnf("failed to add %s to debug bundle: %v", swiftLogFile, err)
}
return nil
}

View File

@@ -1,4 +1,4 @@
//go:build !android
//go:build !android && !ios
package debug

View File

@@ -86,8 +86,6 @@ const (
var ErrResetConnection = fmt.Errorf("reset connection")
var ErrEngineAlreadyStarted = errors.New("engine already started")
type EngineConfig struct {
WgPort int
WgIfaceName string
@@ -201,8 +199,6 @@ type Engine struct {
ctx context.Context
cancel context.CancelFunc
started bool
wgInterface WGIface
udpMux *udpmux.UniversalUDPMuxDefault
@@ -283,15 +279,9 @@ func NewEngine(
services EngineServices,
mobileDep MobileDependency,
) *Engine {
// The engine is single-use: a fresh instance is built per connection
// cycle (see Client.run), so the run context is created once here rather
// than in Start.
ctx, cancel := context.WithCancel(clientCtx)
engine := &Engine{
clientCtx: clientCtx,
clientCancel: clientCancel,
ctx: ctx,
cancel: cancel,
signal: services.SignalClient,
signaler: peer.NewSignaler(services.SignalClient, config.WgPrivateKey),
mgmClient: services.MgmClient,
@@ -324,34 +314,8 @@ func (e *Engine) Stop() error {
log.Debugf("tried stopping engine that is nil")
return nil
}
e.cancel()
e.syncMsgMux.Lock()
e.stopLocked()
e.syncMsgMux.Unlock()
timeout := e.calculateShutdownTimeout()
log.Debugf("waiting for goroutines to finish with timeout: %v", timeout)
shutdownCtx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
if err := waitWithContext(shutdownCtx, &e.shutdownWg); err != nil {
log.Warnf("shutdown timeout exceeded after %v, some goroutines may still be running", timeout)
}
log.Infof("stopped Netbird Engine")
return nil
}
// stopLocked tears down everything Start may have brought up, in the order
// teardown requires (DNS before the interface goes down, flow manager after).
// The caller must hold syncMsgMux. It is shared by Stop and by Start's failure
// path, so a partially-initialized engine is cleaned up the same way; every
// step is nil-guarded. It does not wait on shutdownWg — the caller does that
// after releasing the lock, since the goroutines also take syncMsgMux.
func (e *Engine) stopLocked() {
if e.connMgr != nil {
e.connMgr.Close()
}
@@ -402,6 +366,10 @@ func (e *Engine) stopLocked() {
// so dbus and friends don't complain because of a missing interface
e.stopDNSServer()
if e.cancel != nil {
e.cancel()
}
e.jobExecutorWG.Wait() // block until job goroutines finish
e.close()
@@ -420,6 +388,21 @@ func (e *Engine) stopLocked() {
if err := e.stateManager.PersistState(context.Background()); err != nil {
log.Errorf("failed to persist state: %v", err)
}
e.syncMsgMux.Unlock()
timeout := e.calculateShutdownTimeout()
log.Debugf("waiting for goroutines to finish with timeout: %v", timeout)
shutdownCtx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
if err := waitWithContext(shutdownCtx, &e.shutdownWg); err != nil {
log.Warnf("shutdown timeout exceeded after %v, some goroutines may still be running", timeout)
}
log.Infof("stopped Netbird Engine")
return nil
}
// calculateShutdownTimeout returns shutdown timeout: 10s base + 100ms per peer, capped at 30s.
@@ -457,38 +440,18 @@ func waitWithContext(ctx context.Context, wg *sync.WaitGroup) error {
// Start creates a new WireGuard tunnel interface and listens to events from Signal and Management services
// Connections to remote peers are not established here.
// However, they will be established once an event with a list of peers to connect to will be received from Management Service
func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL) (err error) {
func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL) error {
e.syncMsgMux.Lock()
defer e.syncMsgMux.Unlock()
// The engine is single-use. Reject a duplicate start and a start on an
// already-stopped engine (run context cancelled).
if e.started {
return ErrEngineAlreadyStarted
}
if ctxErr := e.ctx.Err(); ctxErr != nil {
return fmt.Errorf("engine already stopped: %w", ctxErr)
}
e.started = true
// Tear down any partially-initialized state on a failed start. Cancel the
// run context first so goroutines started before the failure (connMgr,
// srWatcher, monitors) unwind, then stopLocked mirrors Stop's teardown (we
// already hold syncMsgMux), cleaning up route/DNS/flow/state managers too,
// not just what close() covers.
defer func() {
if err != nil {
e.cancel()
e.stopLocked()
}
}()
if err = iface.ValidateMTU(e.config.MTU); err != nil {
if err := iface.ValidateMTU(e.config.MTU); err != nil {
return fmt.Errorf("invalid MTU configuration: %w", err)
}
if e.cancel != nil {
e.cancel()
}
e.ctx, e.cancel = context.WithCancel(e.clientCtx)
e.exposeManager = expose.NewManager(e.ctx, e.mgmClient)
wgIface, err := e.newWgIface()
@@ -522,11 +485,13 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
initialRoutes, dnsConfig, dnsFeatureFlag, err := e.readInitialSettings()
if err != nil {
e.close()
return fmt.Errorf("read initial settings: %w", err)
}
dnsServer, err := e.newDnsServer(dnsConfig)
if err != nil {
e.close()
return fmt.Errorf("create dns server: %w", err)
}
e.dnsServer = dnsServer
@@ -561,6 +526,7 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
if err = e.wgInterfaceCreate(); err != nil {
log.Errorf("failed creating tunnel interface %s: [%s]", e.config.WgIfaceName, err.Error())
e.close()
return fmt.Errorf("create wg interface: %w", err)
}
@@ -569,6 +535,7 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
}
if err := e.createFirewall(); err != nil {
e.close()
return err
}
@@ -580,6 +547,7 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
e.udpMux, err = e.wgInterface.Up()
if err != nil {
log.Errorf("failed to pull up wgInterface [%s]: %s", e.wgInterface.Name(), err.Error())
e.close()
return fmt.Errorf("up wg interface: %w", err)
}
@@ -604,7 +572,9 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
e.acl = acl.NewDefaultManager(e.firewall)
}
if err := e.dnsServer.Initialize(); err != nil {
err = e.dnsServer.Initialize()
if err != nil {
e.close()
return fmt.Errorf("initialize dns server: %w", err)
}
@@ -616,9 +586,7 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
e.srWatcher = guard.NewSRWatcher(e.signal, e.relayManager, e.mobileDep.IFaceDiscover, iceCfg)
e.srWatcher.Start(peer.IsForceRelayed())
if err = e.receiveSignalEvents(); err != nil {
return err
}
e.receiveSignalEvents()
e.receiveManagementEvents()
e.receiveJobEvents()
@@ -670,6 +638,7 @@ func (e *Engine) createFirewall() error {
func (e *Engine) initFirewall() error {
if err := e.routeManager.SetFirewall(e.firewall); err != nil {
e.close()
return fmt.Errorf("set firewall: %w", err)
}
@@ -1729,7 +1698,7 @@ func (e *Engine) createPeerConn(pubKey string, allowedIPs []netip.Prefix, agentV
}
// receiveSignalEvents connects to the Signal Service event stream to negotiate connection with remote peers
func (e *Engine) receiveSignalEvents() error {
func (e *Engine) receiveSignalEvents() {
e.shutdownWg.Add(1)
go func() {
defer e.shutdownWg.Done()
@@ -1793,12 +1762,7 @@ func (e *Engine) receiveSignalEvents() error {
}
}()
// todo: consider to remove this blocker. I do not see benefit to block the Start operations
e.signal.WaitStreamConnected(e.ctx)
if err := e.ctx.Err(); err != nil {
return fmt.Errorf("wait for signal stream: %w", err)
}
return nil
e.signal.WaitStreamConnected()
}
func (e *Engine) parseNATExternalIPMappings() []string {

View File

@@ -247,7 +247,7 @@ func TestEngine_SSH(t *testing.T) {
return
}
ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
@@ -426,7 +426,7 @@ func TestEngine_UpdateNetworkMap(t *testing.T) {
return
}
ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
@@ -638,7 +638,7 @@ func TestEngine_Sync(t *testing.T) {
return
}
ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// feed updates to Engine via mocked Management client
@@ -817,7 +817,7 @@ func TestEngine_UpdateNetworkMapWithRoutes(t *testing.T) {
return
}
ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
wgIfaceName := fmt.Sprintf("utun%d", 104+n)
@@ -1024,7 +1024,7 @@ func TestEngine_UpdateNetworkMapWithDNSUpdate(t *testing.T) {
return
}
ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
wgIfaceName := fmt.Sprintf("utun%d", 104+n)

View File

@@ -9,6 +9,7 @@ import (
"net/url"
"runtime"
"slices"
"strings"
"sync"
"sync/atomic"
"time"
@@ -700,6 +701,8 @@ func resolveURLsToIPs(urls []string) []net.IP {
// updateRouteSelectorFromManagement updates the route selector based on the isSelected status from the management server
func (m *DefaultManager) updateRouteSelectorFromManagement(clientRoutes route.HAMap) {
m.mirrorV6ExitPairSelections(clientRoutes)
// An explicit user "deselect all" must not be overridden by management auto-apply.
// Auto-applying an exit node here would call SelectRoutes, which clears the
// deselect-all flag and re-enables every route the user turned off.
@@ -716,6 +719,24 @@ func (m *DefaultManager) updateRouteSelectorFromManagement(clientRoutes route.HA
m.logExitNodeUpdate(exitNodeInfo)
}
// mirrorV6ExitPairSelections keeps every synthesized "-v6" exit route's selection
// consistent with its v4 base. The v4/v6 exit pair is a single toggle, so the v6
// entry always follows the base: deselecting the v4 exit node also drops its ::/0
// pair, and any stale (orphaned) explicit selection on the v6 entry is reset. This
// runs before selection is read so both collectExitNodeInfo and FilterSelectedExitNodes
// see consistent state, including pairs loaded from persisted selector state.
func (m *DefaultManager) mirrorV6ExitPairSelections(clientRoutes route.HAMap) {
routesByNetID := make(map[route.NetID][]*route.Route, len(clientRoutes))
for haID, routes := range clientRoutes {
routesByNetID[haID.NetID()] = routes
}
for v6ID := range route.V6ExitMergeSet(routesByNetID) {
baseID := route.NetID(strings.TrimSuffix(string(v6ID), route.V6ExitSuffix))
m.routeSelector.SyncPairedSelection(baseID, v6ID)
}
}
type exitNodeInfo struct {
allIDs []route.NetID
selectedByManagement []route.NetID

View File

@@ -0,0 +1,47 @@
package routemanager
import (
"net/netip"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/netbirdio/netbird/client/internal/routeselector"
"github.com/netbirdio/netbird/route"
)
// TestUpdateRouteSelectorFromManagement_MirrorsV6ExitPair reproduces the bug seen
// in netbird-engine.log: persisted selector state has the v4 exit node deselected
// but its synthesized "-v6" pair explicitly selected (orphaned), so the ::/0 route
// leaked onto the tunnel. The management update must mirror the v4 deselect onto the
// v6 pair so FilterSelectedExitNodes drops it.
func TestUpdateRouteSelectorFromManagement_MirrorsV6ExitPair(t *testing.T) {
const (
v4ID = route.NetID("Exit Node (raspberrypi)")
v6ID = route.NetID("Exit Node (raspberrypi)-v6")
)
all := []route.NetID{v4ID, v6ID}
rs := routeselector.NewRouteSelector()
// Orphan the v6 selection: select the pair, then deselect only the v4 base.
require.NoError(t, rs.SelectRoutes([]route.NetID{v4ID, v6ID}, true, all))
require.NoError(t, rs.DeselectRoutes([]route.NetID{v4ID}, all))
require.True(t, rs.IsSelected(v6ID), "precondition: orphaned v6 selection survives v4 deselect")
m := &DefaultManager{routeSelector: rs}
v4Route := &route.Route{NetID: v4ID, Network: netip.MustParsePrefix("0.0.0.0/0")}
v6Route := &route.Route{NetID: v6ID, Network: netip.MustParsePrefix("::/0")}
clientRoutes := route.HAMap{
"Exit Node (raspberrypi)|0.0.0.0/0": {v4Route},
"Exit Node (raspberrypi)-v6|::/0": {v6Route},
}
m.updateRouteSelectorFromManagement(clientRoutes)
assert.False(t, rs.IsSelected(v6ID), "v6 pair must follow the v4 base deselect after the management update")
filtered := rs.FilterSelectedExitNodes(clientRoutes)
assert.Empty(t, filtered, "deselected v4 exit node must not leak its ::/0 pair onto the tunnel")
}

View File

@@ -4,7 +4,6 @@ import (
"encoding/json"
"fmt"
"slices"
"strings"
"sync"
"github.com/hashicorp/go-multierror"
@@ -132,6 +131,33 @@ func (rs *RouteSelector) IsSelected(routeID route.NetID) bool {
return rs.isSelectedLocked(routeID)
}
// SyncPairedSelection forces pairedID's explicit selection state to match baseID's,
// so a synthesized "-v6" exit route always follows its v4 base: selecting or
// deselecting the v4 exit node governs the ::/0 pair, and any stale (orphaned)
// explicit state on the v6 entry is reset. The v4/v6 exit pair is treated as a single
// toggle, so the v6 entry carries no independent selection of its own.
func (rs *RouteSelector) SyncPairedSelection(baseID, pairedID route.NetID) {
rs.mu.Lock()
defer rs.mu.Unlock()
if rs.deselectAll {
return
}
_, baseSelected := rs.selectedRoutes[baseID]
_, baseDeselected := rs.deselectedRoutes[baseID]
delete(rs.selectedRoutes, pairedID)
delete(rs.deselectedRoutes, pairedID)
switch {
case baseSelected:
rs.selectedRoutes[pairedID] = struct{}{}
case baseDeselected:
rs.deselectedRoutes[pairedID] = struct{}{}
}
}
// FilterSelected removes unselected routes from the provided map.
func (rs *RouteSelector) FilterSelected(routes route.HAMap) route.HAMap {
rs.mu.RLock()
@@ -151,14 +177,13 @@ func (rs *RouteSelector) FilterSelected(routes route.HAMap) route.HAMap {
}
// HasUserSelectionForRoute returns true if the user has explicitly selected or deselected this route.
// Intended for exit-node code paths: a v6 exit-node pair (e.g. "MyExit-v6") with no explicit state of
// its own inherits its v4 base's state, so legacy persisted selections that predate v6 pairing
// transparently apply to the synthesized v6 entry.
// The lookup is literal; v4/v6 exit pairs are kept consistent at write time via SyncPairedSelection,
// so a synthesized "-v6" entry carries the same explicit state as its v4 base.
func (rs *RouteSelector) HasUserSelectionForRoute(routeID route.NetID) bool {
rs.mu.RLock()
defer rs.mu.RUnlock()
return rs.hasUserSelectionForRouteLocked(rs.effectiveNetID(routeID))
return rs.hasUserSelectionForRouteLocked(routeID)
}
func (rs *RouteSelector) FilterSelectedExitNodes(routes route.HAMap) route.HAMap {
@@ -187,83 +212,6 @@ func (rs *RouteSelector) FilterSelectedExitNodes(routes route.HAMap) route.HAMap
return filtered
}
// effectiveNetID returns the v4 base for a "-v6" exit pair entry that has no explicit
// state of its own, so selections made on the v4 entry govern the v6 entry automatically.
// Only call this from exit-node-specific code paths: applying it to a non-exit "-v6" route
// would make it inherit unrelated v4 state. Must be called with rs.mu held.
func (rs *RouteSelector) effectiveNetID(id route.NetID) route.NetID {
name := string(id)
if !strings.HasSuffix(name, route.V6ExitSuffix) {
return id
}
if _, ok := rs.selectedRoutes[id]; ok {
return id
}
if _, ok := rs.deselectedRoutes[id]; ok {
return id
}
return route.NetID(strings.TrimSuffix(name, route.V6ExitSuffix))
}
func (rs *RouteSelector) isSelectedLocked(routeID route.NetID) bool {
if rs.deselectAll {
return false
}
_, deselected := rs.deselectedRoutes[routeID]
return !deselected
}
func (rs *RouteSelector) isDeselectedLocked(netID route.NetID) bool {
if rs.deselectAll {
return true
}
_, deselected := rs.deselectedRoutes[netID]
return deselected
}
func (rs *RouteSelector) hasUserSelectionForRouteLocked(routeID route.NetID) bool {
_, selected := rs.selectedRoutes[routeID]
_, deselected := rs.deselectedRoutes[routeID]
return selected || deselected
}
func isExitNode(rt []*route.Route) bool {
return len(rt) > 0 && (route.IsV4DefaultRoute(rt[0].Network) || route.IsV6DefaultRoute(rt[0].Network))
}
func (rs *RouteSelector) applyExitNodeFilter(
id route.HAUniqueID,
netID route.NetID,
rt []*route.Route,
out route.HAMap,
) {
// Exit-node path: apply the v4/v6 pair mirror so a deselect on the v4 base also
// drops the synthesized v6 entry that lacks its own explicit state.
effective := rs.effectiveNetID(netID)
if rs.hasUserSelectionForRouteLocked(effective) {
if rs.isSelectedLocked(effective) {
out[id] = rt
}
return
}
// no explicit selection for this route: defer to management's SkipAutoApply flag
sel := collectSelected(rt)
if len(sel) > 0 {
out[id] = sel
}
}
func collectSelected(rt []*route.Route) []*route.Route {
var sel []*route.Route
for _, r := range rt {
if !r.SkipAutoApply {
sel = append(sel, r)
}
}
return sel
}
// MarshalJSON implements the json.Marshaler interface
func (rs *RouteSelector) MarshalJSON() ([]byte, error) {
rs.mu.RLock()
@@ -317,3 +265,59 @@ func (rs *RouteSelector) UnmarshalJSON(data []byte) error {
return nil
}
func (rs *RouteSelector) isSelectedLocked(routeID route.NetID) bool {
if rs.deselectAll {
return false
}
_, deselected := rs.deselectedRoutes[routeID]
return !deselected
}
func (rs *RouteSelector) isDeselectedLocked(netID route.NetID) bool {
if rs.deselectAll {
return true
}
_, deselected := rs.deselectedRoutes[netID]
return deselected
}
func (rs *RouteSelector) hasUserSelectionForRouteLocked(routeID route.NetID) bool {
_, selected := rs.selectedRoutes[routeID]
_, deselected := rs.deselectedRoutes[routeID]
return selected || deselected
}
func (rs *RouteSelector) applyExitNodeFilter(
id route.HAUniqueID,
netID route.NetID,
rt []*route.Route,
out route.HAMap,
) {
if rs.hasUserSelectionForRouteLocked(netID) {
if rs.isSelectedLocked(netID) {
out[id] = rt
}
return
}
// no explicit selection for this route: defer to management's SkipAutoApply flag
sel := collectSelected(rt)
if len(sel) > 0 {
out[id] = sel
}
}
func isExitNode(rt []*route.Route) bool {
return len(rt) > 0 && (route.IsV4DefaultRoute(rt[0].Network) || route.IsV6DefaultRoute(rt[0].Network))
}
func collectSelected(rt []*route.Route) []*route.Route {
var sel []*route.Route
for _, r := range rt {
if !r.SkipAutoApply {
sel = append(sel, r)
}
}
return sel
}

View File

@@ -330,39 +330,73 @@ func TestRouteSelector_FilterSelectedExitNodes(t *testing.T) {
assert.Len(t, filtered, 0) // No routes should be selected
}
// TestRouteSelector_V6ExitPairInherits covers the v4/v6 exit-node pair selection
// mirror. The mirror is scoped to exit-node code paths: HasUserSelectionForRoute
// and FilterSelectedExitNodes resolve a "-v6" entry without explicit state to its
// v4 base, so legacy persisted selections that predate v6 pairing transparently
// apply to the synthesized v6 entry. General lookups (IsSelected, FilterSelected)
// stay literal so unrelated routes named "*-v6" don't inherit unrelated state.
func TestRouteSelector_V6ExitPairInherits(t *testing.T) {
// TestRouteSelector_V6ExitPairSync covers SyncPairedSelection, which keeps a v4
// exit node and its synthesized "-v6" counterpart consistent. The selector itself
// is literal and never infers a v6 entry's state from its v4 base; callers that know
// the pairing (exit-node code paths) call SyncPairedSelection to force the v6 entry
// to follow the base, treating the pair as a single toggle.
func TestRouteSelector_V6ExitPairSync(t *testing.T) {
all := []route.NetID{"exit1", "exit1-v6", "exit2", "exit2-v6", "corp", "corp-v6"}
t.Run("HasUserSelectionForRoute mirrors deselected v4 base", func(t *testing.T) {
t.Run("selector lookups stay literal without sync", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
assert.True(t, rs.HasUserSelectionForRoute("exit1-v6"), "v6 pair sees v4 base's user selection")
// The selector does not pair-resolve: the v6 entry is independent until synced.
assert.False(t, rs.HasUserSelectionForRoute("exit1-v6"), "v6 entry has no state of its own")
assert.True(t, rs.IsSelected("exit1-v6"), "unsynced v6 entry stays selected by default")
// unrelated v6 with no v4 base touched is unaffected
assert.False(t, rs.HasUserSelectionForRoute("exit2-v6"))
// A route literally named "exit1-something" must never pair-resolve either.
assert.False(t, rs.HasUserSelectionForRoute("exit1-something"))
})
t.Run("IsSelected stays literal for non-exit lookups", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
require.NoError(t, rs.DeselectRoutes([]route.NetID{"corp"}, all))
// A non-exit route literally named "corp-v6" must not inherit "corp"'s state
// via the mirror; the mirror only applies in exit-node code paths.
assert.False(t, rs.IsSelected("corp"))
assert.True(t, rs.IsSelected("corp-v6"), "non-exit *-v6 routes must not inherit unrelated v4 state")
})
t.Run("explicit v6 state overrides v4 base in filter", func(t *testing.T) {
t.Run("sync mirrors deselected v4 base onto v6", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
rs.SyncPairedSelection("exit1", "exit1-v6")
assert.False(t, rs.IsSelected("exit1"))
assert.False(t, rs.IsSelected("exit1-v6"), "v6 pair follows v4 base deselect")
assert.True(t, rs.HasUserSelectionForRoute("exit1-v6"), "v6 carries explicit deselect after sync")
})
t.Run("sync mirrors selected v4 base onto v6", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
require.NoError(t, rs.SelectRoutes([]route.NetID{"exit1"}, false, all))
rs.SyncPairedSelection("exit1", "exit1-v6")
assert.True(t, rs.IsSelected("exit1"))
assert.True(t, rs.IsSelected("exit1-v6"), "v6 pair follows v4 base select")
})
t.Run("sync clears v6 state when base has no explicit selection", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
require.NoError(t, rs.SelectRoutes([]route.NetID{"exit1-v6"}, true, all))
require.True(t, rs.HasUserSelectionForRoute("exit1-v6"))
rs.SyncPairedSelection("exit1", "exit1-v6")
assert.False(t, rs.HasUserSelectionForRoute("exit1-v6"),
"v6 explicit state is cleared so it follows management like its base")
})
// Regression for the observed bug (see netbird-engine.log): persisted state has
// the v4 base deselected but the v6 sibling explicitly selected (orphaned). The
// sync must reset the orphan so the ::/0 route does not leak onto the tunnel.
t.Run("sync clears orphaned explicit v6 selection on deselected base", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
// Prior state: both explicitly selected, then only the v4 base deselected,
// leaving the v6 entry as a stale explicit selection.
require.NoError(t, rs.SelectRoutes([]route.NetID{"exit1", "exit1-v6"}, true, all))
require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
require.True(t, rs.IsSelected("exit1-v6"), "precondition: orphaned v6 selection")
rs.SyncPairedSelection("exit1", "exit1-v6")
assert.False(t, rs.IsSelected("exit1-v6"), "orphaned v6 selection reset to follow v4 deselect")
v4Route := &route.Route{NetID: "exit1", Network: netip.MustParsePrefix("0.0.0.0/0")}
v6Route := &route.Route{NetID: "exit1-v6", Network: netip.MustParsePrefix("::/0")}
@@ -370,23 +404,14 @@ func TestRouteSelector_V6ExitPairInherits(t *testing.T) {
"exit1|0.0.0.0/0": {v4Route},
"exit1-v6|::/0": {v6Route},
}
filtered := rs.FilterSelectedExitNodes(routes)
assert.NotContains(t, filtered, route.HAUniqueID("exit1|0.0.0.0/0"))
assert.Contains(t, filtered, route.HAUniqueID("exit1-v6|::/0"), "explicit v6 select wins over v4 base")
assert.Empty(t, filtered, "deselecting v4 base must drop the v6 pair even if it was explicitly selected before")
})
t.Run("non-v6-suffix routes unaffected", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
// A route literally named "exit1-something" must not pair-resolve.
assert.False(t, rs.HasUserSelectionForRoute("exit1-something"))
})
t.Run("filter v6 paired with deselected v4 base", func(t *testing.T) {
t.Run("filter drops synced v6 pair of deselected v4 base", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
rs.SyncPairedSelection("exit1", "exit1-v6")
v4Route := &route.Route{NetID: "exit1", Network: netip.MustParsePrefix("0.0.0.0/0")}
v6Route := &route.Route{NetID: "exit1-v6", Network: netip.MustParsePrefix("::/0")}
@@ -399,6 +424,15 @@ func TestRouteSelector_V6ExitPairInherits(t *testing.T) {
assert.Empty(t, filtered, "deselecting v4 base must also drop the v6 pair")
})
t.Run("deselectAll makes sync a no-op", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
rs.DeselectAllRoutes()
rs.SyncPairedSelection("exit1", "exit1-v6")
assert.False(t, rs.HasUserSelectionForRoute("exit1-v6"), "sync must not write explicit state under deselectAll")
})
t.Run("non-exit *-v6 routes pass through FilterSelectedExitNodes", func(t *testing.T) {
rs := routeselector.NewRouteSelector()
require.NoError(t, rs.DeselectRoutes([]route.NetID{"corp"}, all))

View File

@@ -17,6 +17,7 @@ import (
"github.com/netbirdio/netbird/client/internal"
"github.com/netbirdio/netbird/client/internal/auth"
"github.com/netbirdio/netbird/client/internal/debug"
"github.com/netbirdio/netbird/client/internal/dns"
"github.com/netbirdio/netbird/client/internal/listener"
"github.com/netbirdio/netbird/client/internal/peer"
@@ -25,6 +26,7 @@ import (
"github.com/netbirdio/netbird/formatter"
"github.com/netbirdio/netbird/route"
"github.com/netbirdio/netbird/shared/management/domain"
types "github.com/netbirdio/netbird/upload-server/types"
)
// ConnectionListener export internal Listener for mobile
@@ -54,6 +56,7 @@ type selectRoute struct {
Network netip.Prefix
Domains domain.List
Selected bool
Status string
extraNetworks []netip.Prefix
}
@@ -65,6 +68,8 @@ func init() {
type Client struct {
cfgFile string
stateFile string
cacheDir string
logFilePath string
recorder *peer.Status
ctxCancel context.CancelFunc
ctxCancelLock *sync.Mutex
@@ -75,16 +80,21 @@ type Client struct {
onHostDnsFn func([]string)
dnsManager dns.IosDnsManager
loginComplete bool
connectClient *internal.ConnectClient
// preloadedConfig holds config loaded from JSON (used on tvOS where file writes are blocked)
preloadedConfig *profilemanager.Config
stateMu sync.RWMutex
connectClient *internal.ConnectClient
config *profilemanager.Config
}
// NewClient instantiate a new Client
func NewClient(cfgFile, stateFile, deviceName string, osVersion string, osName string, networkChangeListener NetworkChangeListener, dnsManager DnsManager) *Client {
func NewClient(cfgFile, stateFile, cacheDir, logFilePath, deviceName string, osVersion string, osName string, networkChangeListener NetworkChangeListener, dnsManager DnsManager) *Client {
return &Client{
cfgFile: cfgFile,
stateFile: stateFile,
cacheDir: cacheDir,
logFilePath: logFilePath,
deviceName: deviceName,
osName: osName,
osVersion: osVersion,
@@ -161,8 +171,13 @@ func (c *Client) Run(fd int32, interfaceName string, envList *EnvList) error {
c.onHostDnsFn = func([]string) {}
cfg.WgIface = interfaceName
c.connectClient = internal.NewConnectClient(ctx, cfg, c.recorder)
return c.connectClient.RunOniOS(fd, c.networkChangeListener, c.dnsManager, c.stateFile)
connectClient := internal.NewConnectClient(ctx, cfg, c.recorder)
c.setState(cfg, connectClient)
// Persist the latest sync response so DebugBundle can include the network
// map. On iOS this is backed by disk to keep it out of the constrained
// process memory (see the syncstore package).
connectClient.SetSyncResponsePersistence(true)
return connectClient.RunOniOS(fd, c.networkChangeListener, c.dnsManager, c.stateFile, c.cacheDir, c.logFilePath)
}
// Stop the internal client and free the resources
@@ -174,6 +189,84 @@ func (c *Client) Stop() {
}
c.ctxCancel()
c.setState(nil, nil)
}
// DebugBundle generates a debug bundle, uploads it and returns the upload key.
// It works with or without a running engine: when the engine is up it reuses
// the live config, sync response and client metrics; otherwise it loads the
// config from disk (or the preloaded tvOS config).
func (c *Client) DebugBundle(anonymize bool) (string, error) {
cfg, cc := c.stateSnapshot()
// If the engine hasn't been started, load config so we can reach management.
if cfg == nil {
if c.preloadedConfig != nil {
cfg = c.preloadedConfig
} else {
var err error
// Use DirectUpdateOrCreateConfig to avoid atomic file operations
// (temp file + rename) blocked by the tvOS sandbox.
cfg, err = profilemanager.DirectUpdateOrCreateConfig(profilemanager.ConfigInput{
ConfigPath: c.cfgFile,
StateFilePath: c.stateFile,
})
if err != nil {
return "", fmt.Errorf("load config: %w", err)
}
}
}
deps := debug.GeneratorDependencies{
InternalConfig: cfg,
StatusRecorder: c.recorder,
TempDir: c.cacheDir,
StatePath: c.stateFile,
LogPath: c.logFilePath,
}
if cc != nil {
resp, err := cc.GetLatestSyncResponse()
if err != nil {
log.Warnf("get latest sync response: %v", err)
}
deps.SyncResponse = resp
if e := cc.Engine(); e != nil {
if cm := e.GetClientMetrics(); cm != nil {
deps.ClientMetrics = cm
}
}
}
bundleGenerator := debug.NewBundleGenerator(
deps,
debug.BundleConfig{
Anonymize: anonymize,
IncludeSystemInfo: true,
},
)
path, err := bundleGenerator.Generate()
if err != nil {
return "", fmt.Errorf("generate debug bundle: %w", err)
}
defer func() {
if err := os.Remove(path); err != nil {
log.Errorf("failed to remove debug bundle file: %v", err)
}
}()
uploadCtx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
defer cancel()
key, err := debug.UploadDebugBundle(uploadCtx, types.DefaultBundleURL, cfg.ManagementURL.String(), path)
if err != nil {
return "", fmt.Errorf("upload debug bundle: %w", err)
}
log.Infof("debug bundle uploaded with key %s", key)
return key, nil
}
// SetTraceLogLevel configure the logger to trace level
@@ -227,6 +320,16 @@ func (c *Client) RemoveConnectionListener() {
c.recorder.RemoveConnectionListener()
}
// IsLoginRequiredCached reports whether the LAST observed management error was an
// auth failure (PermissionDenied/InvalidArgument), using the in-memory status
// recorder. Unlike IsLoginRequired() it performs NO network call, so it is safe to
// call from the connection listener during teardown (e.g. onDisconnected) without
// blocking on a slow or unavailable network. Returns false while connected to
// management or when the last error was not auth-related.
func (c *Client) IsLoginRequiredCached() bool {
return c.recorder.IsLoginRequired()
}
func (c *Client) IsLoginRequired() bool {
var ctx context.Context
//nolint
@@ -354,11 +457,12 @@ func (c *Client) ClearLoginComplete() {
}
func (c *Client) GetRoutesSelectionDetails() (*RoutesSelectionDetails, error) {
if c.connectClient == nil {
_, connectClient := c.stateSnapshot()
if connectClient == nil {
return nil, fmt.Errorf("not connected")
}
engine := c.connectClient.Engine()
engine := connectClient.Engine()
if engine == nil {
return nil, fmt.Errorf("not connected")
}
@@ -377,9 +481,57 @@ func (c *Client) GetRoutesSelectionDetails() (*RoutesSelectionDetails, error) {
routes := buildSelectRoutes(routesMap, routeSelector.IsSelected, v6ExitMerged)
resolvedDomains := c.recorder.GetResolvedDomainsStates()
// Compute each route's connection status in the core (mirroring the Android
// bridge), so the UI doesn't have to infer it by string-matching the joined
// Network value against peer routes. For a merged exit node the status reflects
// whichever of the v4/v6 prefixes is served by a connected peer; for dynamic
// (DNS) routes the peer route key is the domain pattern (see dynamic.Route.String).
connectedRoutes := c.connectedRouteSet()
for _, r := range routes {
r.Status = routeStatus(r, connectedRoutes)
}
return prepareRouteSelectionDetails(routes, resolvedDomains), nil
}
// connectedRouteSet returns the set of route keys (as strings) currently served by a
// connected peer, gathered across all connected peers' route tables. The keys match
// what the route manager records: a prefix string for static routes (e.g. "0.0.0.0/0")
// and the domain pattern for dynamic routes (e.g. "*.example.com").
func (c *Client) connectedRouteSet() map[string]struct{} {
connected := map[string]struct{}{}
for _, p := range c.recorder.GetFullStatus().Peers {
if p.ConnStatus != peer.StatusConnected {
continue
}
for r := range p.GetRoutes() {
connected[r] = struct{}{}
}
}
return connected
}
// routeStatus reports "Connected" if any of the route's keys is served by a connected
// peer: the primary Network prefix, an extra v6 network of a merged exit node, or the
// domain pattern for a dynamic DNS route. Otherwise "Idle".
func routeStatus(r *selectRoute, connectedRoutes map[string]struct{}) string {
keys := make([]string, 0, 1+len(r.extraNetworks))
if len(r.Domains) > 0 {
keys = append(keys, r.Domains.SafeString())
} else {
keys = append(keys, r.Network.String())
}
for _, extra := range r.extraNetworks {
keys = append(keys, extra.String())
}
for _, k := range keys {
if _, ok := connectedRoutes[k]; ok {
return peer.StatusConnected.String()
}
}
return peer.StatusIdle.String()
}
func buildSelectRoutes(routesMap map[route.NetID][]*route.Route, isSelected func(route.NetID) bool, v6Merged map[route.NetID]struct{}) []*selectRoute {
var routes []*selectRoute
for id, rt := range routesMap {
@@ -462,6 +614,7 @@ func prepareRouteSelectionDetails(routes []*selectRoute, resolvedDomains map[dom
Network: netStr,
Domains: &domainDetails,
Selected: r.Selected,
Status: r.Status,
})
}
@@ -470,11 +623,12 @@ func prepareRouteSelectionDetails(routes []*selectRoute, resolvedDomains map[dom
}
func (c *Client) SelectRoute(id string) error {
if c.connectClient == nil {
_, connectClient := c.stateSnapshot()
if connectClient == nil {
return fmt.Errorf("not connected")
}
engine := c.connectClient.Engine()
engine := connectClient.Engine()
if engine == nil {
return fmt.Errorf("not connected")
}
@@ -500,10 +654,11 @@ func (c *Client) SelectRoute(id string) error {
}
func (c *Client) DeselectRoute(id string) error {
if c.connectClient == nil {
_, connectClient := c.stateSnapshot()
if connectClient == nil {
return fmt.Errorf("not connected")
}
engine := c.connectClient.Engine()
engine := connectClient.Engine()
if engine == nil {
return fmt.Errorf("not connected")
}
@@ -527,6 +682,22 @@ func (c *Client) DeselectRoute(id string) error {
return nil
}
// setState stores the running engine state so DebugBundle can reuse the live
// config and ConnectClient. It is cleared on Stop.
func (c *Client) setState(cfg *profilemanager.Config, cc *internal.ConnectClient) {
c.stateMu.Lock()
defer c.stateMu.Unlock()
c.config = cfg
c.connectClient = cc
}
// stateSnapshot returns the current config and ConnectClient under the lock.
func (c *Client) stateSnapshot() (*profilemanager.Config, *internal.ConnectClient) {
c.stateMu.RLock()
defer c.stateMu.RUnlock()
return c.config, c.connectClient
}
func formatDuration(d time.Duration) string {
ds := d.String()
dotIndex := strings.Index(ds, ".")

View File

@@ -36,6 +36,7 @@ type URLOpener interface {
// Auth can register or login new client
type Auth struct {
ctx context.Context
cancel context.CancelFunc
config *profilemanager.Config
cfgPath string
}
@@ -51,8 +52,19 @@ func NewAuth(cfgPath string, mgmURL string) (*Auth, error) {
return nil, err
}
// Use a cancellable context so Stop() can abort an in-progress interactive
// login. The PKCE flow's WaitToken blocks (and keeps its loopback HTTP server
// bound to a port) until the OAuth callback arrives or the flow expires;
// cancelling the context unblocks WaitToken, which then shuts that server down
// and frees the port for the next login attempt. iOS runs login in the main-app
// process (decoupled from the network extension), so without this the server
// lingers after the user dismisses the browser and the next connect stalls
// trying to bind the same port.
ctx, cancel := context.WithCancel(context.Background())
return &Auth{
ctx: context.Background(),
ctx: ctx,
cancel: cancel,
config: cfg,
cfgPath: cfgPath,
}, nil
@@ -60,12 +72,24 @@ func NewAuth(cfgPath string, mgmURL string) (*Auth, error) {
// NewAuthWithConfig instantiate Auth based on existing config
func NewAuthWithConfig(ctx context.Context, config *profilemanager.Config) *Auth {
ctx, cancel := context.WithCancel(ctx)
return &Auth{
ctx: ctx,
cancel: cancel,
config: config,
}
}
// Stop aborts an in-progress interactive login started via Login/LoginWithDeviceName.
// It cancels the auth context, which unblocks the PKCE WaitToken and shuts down its
// loopback HTTP server, freeing the redirect port. Safe to call multiple times and
// safe to call when no login is running.
func (a *Auth) Stop() {
if a.cancel != nil {
a.cancel()
}
}
// SaveConfigIfSSOSupported test the connectivity with the management server by retrieving the server device flow info.
// If it returns a flow info than save the configuration and return true. If it gets a codes.NotFound, it means that SSO
// is not supported and returns false without saving the configuration. For other errors return false.

View File

@@ -20,6 +20,7 @@ type RoutesSelectionInfo struct {
Network string
Domains *DomainDetails
Selected bool
Status string
}
type DomainCollection interface {

View File

@@ -988,10 +988,6 @@ func (s *Server) cleanupConnection() error {
return nil
}
// TODO: consider calling s.connectClient.Stop() instead of engine.Stop().
// actCancel() lets the run loop stop the engine too, so both stop it
// concurrently; ConnectClient.Stop cancels and waits for the run loop,
// making the run loop the sole owner of engine shutdown.
if engine != nil {
if err := engine.Stop(); err != nil {
return err

View File

@@ -918,6 +918,10 @@ func (m *Manager) DeleteAllServices(ctx context.Context, accountID, userID strin
}
for _, svc := range services {
if err = transaction.DeleteServiceTargets(ctx, accountID, svc.ID); err != nil {
return fmt.Errorf("failed to delete service targets: %w", err)
}
if err = transaction.DeleteService(ctx, accountID, svc.ID); err != nil {
return fmt.Errorf("failed to delete service: %w", err)
}
@@ -1270,6 +1274,10 @@ func (m *Manager) deletePeerService(ctx context.Context, accountID, peerID, serv
return status.Errorf(status.PermissionDenied, "cannot delete service exposed by another peer")
}
if err = transaction.DeleteServiceTargets(ctx, accountID, serviceID); err != nil {
return fmt.Errorf("delete service targets: %w", err)
}
if err = transaction.DeleteService(ctx, accountID, serviceID); err != nil {
return fmt.Errorf("delete service: %w", err)
}
@@ -1319,6 +1327,10 @@ func (m *Manager) deleteExpiredPeerService(ctx context.Context, accountID, peerI
return nil
}
if err = transaction.DeleteServiceTargets(ctx, accountID, serviceID); err != nil {
return fmt.Errorf("delete service targets: %w", err)
}
if err = transaction.DeleteService(ctx, accountID, serviceID); err != nil {
return fmt.Errorf("delete service: %w", err)
}

View File

@@ -458,6 +458,9 @@ func TestDeletePeerService_SourcePeerValidation(t *testing.T) {
txMock.EXPECT().
GetServiceByID(ctx, store.LockingStrengthUpdate, accountID, serviceID).
Return(newEphemeralService(), nil)
txMock.EXPECT().
DeleteServiceTargets(ctx, accountID, serviceID).
Return(nil)
txMock.EXPECT().
DeleteService(ctx, accountID, serviceID).
Return(nil)
@@ -560,6 +563,9 @@ func TestDeletePeerService_SourcePeerValidation(t *testing.T) {
txMock.EXPECT().
GetServiceByID(ctx, store.LockingStrengthUpdate, accountID, serviceID).
Return(newEphemeralService(), nil)
txMock.EXPECT().
DeleteServiceTargets(ctx, accountID, serviceID).
Return(nil)
txMock.EXPECT().
DeleteService(ctx, accountID, serviceID).
Return(nil)
@@ -604,6 +610,9 @@ func TestDeletePeerService_SourcePeerValidation(t *testing.T) {
txMock.EXPECT().
GetServiceByID(ctx, store.LockingStrengthUpdate, accountID, serviceID).
Return(newEphemeralService(), nil)
txMock.EXPECT().
DeleteServiceTargets(ctx, accountID, serviceID).
Return(nil)
txMock.EXPECT().
DeleteService(ctx, accountID, serviceID).
Return(nil)
@@ -1192,6 +1201,67 @@ func TestDeleteService_DeletesTargets(t *testing.T) {
assert.Len(t, targets, 0, "All targets should be deleted when service is deleted")
}
func TestDeleteExpiredPeerService_DeletesTargets(t *testing.T) {
ctx := context.Background()
mgr, testStore := setupIntegrationTest(t)
resp, err := mgr.CreateServiceFromPeer(ctx, testAccountID, testPeerID, &rpservice.ExposeServiceRequest{
Port: 8080,
Mode: "http",
})
require.NoError(t, err)
svcID := resolveServiceIDByDomain(t, testStore, resp.Domain)
targets, err := testStore.GetTargetsByServiceID(ctx, store.LockingStrengthNone, testAccountID, svcID)
require.NoError(t, err)
require.Len(t, targets, 1, "ephemeral peer-exposed service should have exactly one persisted target before reaping")
expireEphemeralService(t, testStore, testAccountID, resp.Domain)
err = mgr.deleteExpiredPeerService(ctx, testAccountID, testPeerID, svcID)
require.NoError(t, err)
_, err = testStore.GetServiceByDomain(ctx, resp.Domain)
require.Error(t, err, "expired peer-exposed service should be deleted")
s, ok := status.FromError(err)
require.True(t, ok)
assert.Equal(t, status.NotFound, s.Type())
targets, err = testStore.GetTargetsByServiceID(ctx, store.LockingStrengthNone, testAccountID, svcID)
require.NoError(t, err)
assert.Len(t, targets, 0, "orphaned target rows must be deleted when an expired peer-exposed service is reaped")
}
func TestDeleteServiceFromPeer_DeletesTargets(t *testing.T) {
ctx := context.Background()
mgr, testStore := setupIntegrationTest(t)
resp, err := mgr.CreateServiceFromPeer(ctx, testAccountID, testPeerID, &rpservice.ExposeServiceRequest{
Port: 8080,
Mode: "http",
})
require.NoError(t, err)
svcID := resolveServiceIDByDomain(t, testStore, resp.Domain)
targets, err := testStore.GetTargetsByServiceID(ctx, store.LockingStrengthNone, testAccountID, svcID)
require.NoError(t, err)
require.Len(t, targets, 1, "ephemeral peer-exposed service should have exactly one persisted target before stopping")
err = mgr.StopServiceFromPeer(ctx, testAccountID, testPeerID, svcID)
require.NoError(t, err)
_, err = testStore.GetServiceByDomain(ctx, resp.Domain)
require.Error(t, err, "stopped peer-exposed service should be deleted")
s, ok := status.FromError(err)
require.True(t, ok)
assert.Equal(t, status.NotFound, s.Type())
targets, err = testStore.GetTargetsByServiceID(ctx, store.LockingStrengthNone, testAccountID, svcID)
require.NoError(t, err)
assert.Len(t, targets, 0, "orphaned target rows must be deleted when a peer stops its exposed service")
}
func TestValidateProtocolChange(t *testing.T) {
tests := []struct {
name string

View File

@@ -1989,7 +1989,7 @@ func (s *Server) addUDPRelay(ctx context.Context, mapping *proto.ProxyMapping, t
"service_id": svcID,
})
relay := udprelay.New(ctx, udprelay.RelayConfig{
relay := udprelay.New(s.portRouterContext(ctx), udprelay.RelayConfig{
Logger: entry,
Listener: listener,
Target: targetAddress,

View File

@@ -33,7 +33,7 @@ type Client interface {
Receive(ctx context.Context, msgHandler func(msg *proto.Message) error) error
Ready() bool
IsHealthy() bool
WaitStreamConnected(context.Context)
WaitStreamConnected()
SendToStream(msg *proto.EncryptedMessage) error
Send(msg *proto.Message) error
SetOnReconnectedListener(func())

View File

@@ -65,10 +65,7 @@ var _ = Describe("GrpcClient", func() {
return
}
}()
ctxA, cancelA := context.WithTimeout(context.Background(), 5*time.Second)
defer cancelA()
clientA.WaitStreamConnected(ctxA)
Expect(clientA.StreamConnected()).To(BeTrue())
clientA.WaitStreamConnected()
// connect PeerB to Signal
keyB, _ := wgtypes.GenerateKey()
@@ -94,10 +91,7 @@ var _ = Describe("GrpcClient", func() {
}
}()
ctxB, cancelB := context.WithTimeout(context.Background(), 5*time.Second)
defer cancelB()
clientB.WaitStreamConnected(ctxB)
Expect(clientB.StreamConnected()).To(BeTrue())
clientB.WaitStreamConnected()
// PeerA initiates ping-pong
err := clientA.Send(&sigProto.Message{
@@ -135,10 +129,8 @@ var _ = Describe("GrpcClient", func() {
return
}
}()
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
client.WaitStreamConnected(ctx)
Expect(client.StreamConnected()).To(BeTrue())
client.WaitStreamConnected()
Expect(client).NotTo(BeNil())
})
})

View File

@@ -213,6 +213,15 @@ func (c *GrpcClient) notifyStreamConnected() {
}
}
func (c *GrpcClient) getStreamStatusChan() <-chan struct{} {
c.mux.Lock()
defer c.mux.Unlock()
if c.connectedCh == nil {
c.connectedCh = make(chan struct{})
}
return c.connectedCh
}
func (c *GrpcClient) connect(ctx context.Context, key string) (proto.SignalExchange_ConnectStreamClient, error) {
c.stream = nil
@@ -273,24 +282,14 @@ func (c *GrpcClient) IsHealthy() bool {
}
// WaitStreamConnected waits until the client is connected to the Signal stream
func (c *GrpcClient) WaitStreamConnected(ctx context.Context) {
// Check the status and obtain the wait channel atomically: otherwise
// notifyStreamConnected could flip the status and close/clear the channel
// between the check and the channel creation, leaving us waiting forever on
// a stale channel.
c.mux.Lock()
func (c *GrpcClient) WaitStreamConnected() {
if c.status == StreamConnected {
c.mux.Unlock()
return
}
if c.connectedCh == nil {
c.connectedCh = make(chan struct{})
}
ch := c.connectedCh
c.mux.Unlock()
ch := c.getStreamStatusChan()
select {
case <-ctx.Done():
case <-c.ctx.Done():
case <-ch:
}

View File

@@ -55,7 +55,7 @@ func (sm *MockClient) Ready() bool {
return sm.ReadyFunc()
}
func (sm *MockClient) WaitStreamConnected(context.Context) {
func (sm *MockClient) WaitStreamConnected() {
if sm.WaitStreamConnectedFunc == nil {
return
}