[client] clean up all started components on Start failure

Start's failure defer only called close(), which covers the wg interface, firewall, rosenpass and port forwarding but leaves connMgr, srWatcher, route/DNS/flow/state managers and the monitor goroutines running. A late failure (e.g. the context-cancelled check after the signal stream) thus leaked them. Extract Stop's locked teardown into stopLocked (caller holds syncMsgMux, does not wait on shutdownWg) and call it from both Stop and Start's defer. The defer also cancels the run context first so goroutines started before the failure unwind. Teardown order is unchanged.
[client] abort Start if context cancelled while waiting for signal stream
2026-06-20 06:49:55 +00:00 · 2026-06-16 15:14:47 +02:00 · 2026-06-16 15:03:29 +02:00 · 2026-06-16 14:51:17 +02:00 · 2026-06-16 14:47:07 +02:00 · 2026-06-16 14:42:35 +02:00
22 changed files with 272 additions and 640 deletions
--- a/client/embed/embed.go
+++ b/client/embed/embed.go
@@ -279,9 +279,11 @@ func (c *Client) Start(startCtx context.Context) error {

 	select {
 	case <-startCtx.Done():
-		// Cancel the client context before stopping: Engine.Start blocks on the
-		// signal stream while holding the engine mutex and only unblocks on
-		// cancellation. Stopping first would deadlock on that mutex.
+		// ConnectClient.Stop now cancels its own run context and waits for the
+		// run loop to tear the engine down, so this cancel() is no longer
+		// required to break the deadlock and could be removed. It is kept as a
+		// defensive belt-and-suspenders: cancelling the parent context first
+		// guarantees the run loop is unblocked even if Stop's contract regresses.
 		cancel()
 		if stopErr := client.Stop(); stopErr != nil {
 			return fmt.Errorf("stop error after context done. Stop error: %w. Context done: %w", stopErr, startCtx.Err())
--- a/client/internal/connect.go
+++ b/client/internal/connect.go
@@ -11,6 +11,7 @@ import (
 	"runtime/debug"
 	"strings"
 	"sync"
+	"sync/atomic"
 	"time"

 	"github.com/cenkalti/backoff/v4"
@@ -54,6 +55,10 @@ var androidRunOverride func(c *ConnectClient, runningChan chan struct{}, logPath

 type ConnectClient struct {
 	ctx            context.Context
+	runCancel      context.CancelFunc
+	runExited      chan struct{}
+	runOnce        sync.Once
+	runStarted     atomic.Bool
 	config         *profilemanager.Config
 	statusRecorder *peer.Status

@@ -70,8 +75,14 @@ func NewConnectClient(
 	config *profilemanager.Config,
 	statusRecorder *peer.Status,
 ) *ConnectClient {
+	// Derive the run context here so Stop owns the cancel that unblocks the run
+	// loop. runCancel is set once at construction, so Stop can call it without
+	// racing the run loop's startup. Callers therefore need not cancel before Stop.
+	runCtx, runCancel := context.WithCancel(ctx)
 	return &ConnectClient{
-		ctx:            ctx,
+		ctx:            runCtx,
+		runCancel:      runCancel,
+		runExited:      make(chan struct{}),
 		config:         config,
 		statusRecorder: statusRecorder,
 		engineMutex:    sync.Mutex{},
@@ -118,8 +129,6 @@ func (c *ConnectClient) RunOniOS(
 	networkChangeListener listener.NetworkChangeListener,
 	dnsManager dns.IosDnsManager,
 	stateFilePath string,
-	cacheDir string,
-	logFilePath string,
 ) error {
 	// Set GC percent to 5% to reduce memory usage as iOS only allows 50MB of memory for the extension.
 	debug.SetGCPercent(5)
@@ -129,12 +138,16 @@ func (c *ConnectClient) RunOniOS(
 		NetworkChangeListener: networkChangeListener,
 		DnsManager:            dnsManager,
 		StateFilePath:         stateFilePath,
-		TempDir:               cacheDir,
 	}
-	return c.run(mobileDependency, nil, logFilePath)
+	return c.run(mobileDependency, nil, "")
 }

 func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan struct{}, logPath string) error {
+	// Mark the loop as started and signal exit on return so Stop can wait for
+	// the loop to finish (and skip the wait if the loop never ran).
+	c.runStarted.Store(true)
+	defer c.runOnce.Do(func() { close(c.runExited) })
+
 	defer func() {
 		if r := recover(); r != nil {
 			rec := c.statusRecorder
@@ -290,7 +303,7 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
 			log.Debug(err)
 			if s, ok := gstatus.FromError(err); ok && (s.Code() == codes.PermissionDenied) {
 				state.Set(StatusNeedsLogin)
-				_ = c.Stop()
+				c.runCancel()
 				return backoff.Permanent(wrapErr(err)) // unrecoverable error
 			}
 			return wrapErr(err)
@@ -410,14 +423,10 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
 		c.engine = nil
 		c.engineMutex.Unlock()

-		// todo: consider to remove this condition. Is not thread safe.
-		// We should always call Stop(), but we need to verify that it is idempotent
-		if engine.wgInterface != nil {
-			log.Infof("ensuring %s is removed, Netbird engine context cancelled", engine.wgInterface.Name())
+		log.Infof("ensuring wg interface is removed, Netbird engine context cancelled")

-			if err := engine.Stop(); err != nil {
-				log.Errorf("Failed to stop engine: %v", err)
-			}
+		if err := engine.Stop(); err != nil {
+			log.Errorf("Failed to stop engine: %v", err)
 		}
 		c.statusRecorder.ClientTeardown()

@@ -433,12 +442,12 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
 	}

 	c.statusRecorder.ClientStart()
-	err = backoff.Retry(operation, backOff)
+	err = backoff.Retry(operation, backoff.WithContext(backOff, c.ctx))
 	if err != nil {
 		log.Debugf("exiting client retry loop due to unrecoverable error: %s", err)
 		if s, ok := gstatus.FromError(err); ok && (s.Code() == codes.PermissionDenied) {
 			state.Set(StatusNeedsLogin)
-			_ = c.Stop()
+			c.runCancel()
 		}
 		return err
 	}
@@ -516,11 +525,9 @@ func (c *ConnectClient) Status() StatusType {
 }

 func (c *ConnectClient) Stop() error {
-	engine := c.Engine()
-	if engine != nil {
-		if err := engine.Stop(); err != nil {
-			return fmt.Errorf("stop engine: %w", err)
-		}
+	c.runCancel()
+	if c.runStarted.Load() {
+		<-c.runExited
 	}
 	return nil
 }
--- a/client/internal/debug/debug.go
+++ b/client/internal/debug/debug.go
@@ -250,7 +250,6 @@ type BundleGenerator struct {
 	syncResponse   *mgmProto.SyncResponse
 	logPath        string
 	tempDir        string
-	statePath      string
 	cpuProfile     []byte
 	capturePath    string
 	refreshStatus  func() // Optional callback to refresh status before bundle generation
@@ -277,7 +276,6 @@ type GeneratorDependencies struct {
 	SyncResponse   *mgmProto.SyncResponse
 	LogPath        string
 	TempDir        string // Directory for temporary bundle zip files. If empty, os.TempDir() is used.
-	StatePath      string // Path to the state file. If empty, the ServiceManager default path is used.
 	CPUProfile     []byte
 	CapturePath    string
 	RefreshStatus  func()
@@ -301,7 +299,6 @@ func NewBundleGenerator(deps GeneratorDependencies, cfg BundleConfig) *BundleGen
 		syncResponse:   deps.SyncResponse,
 		logPath:        deps.LogPath,
 		tempDir:        deps.TempDir,
-		statePath:      deps.StatePath,
 		cpuProfile:     deps.CPUProfile,
 		capturePath:    deps.CapturePath,
 		refreshStatus:  deps.RefreshStatus,
@@ -853,11 +850,8 @@ func (g *BundleGenerator) maskSecrets() {
 }

 func (g *BundleGenerator) addStateFile() error {
-	path := g.statePath
-	if path == "" {
-		sm := profilemanager.NewServiceManager("")
-		path = sm.GetStatePath()
-	}
+	sm := profilemanager.NewServiceManager("")
+	path := sm.GetStatePath()
 	if path == "" {
 		return nil
 	}
--- a/client/internal/debug/debug_ios.go
+++ b/client/internal/debug/debug_ios.go
@@ -1,36 +0,0 @@
-//go:build ios
-
-package debug
-
-import (
-	"path/filepath"
-
-	log "github.com/sirupsen/logrus"
-)
-
-// swiftLogFile is the Swift app log written by the iOS app into the same log
-// directory as the Go client log, so it can be collected into the bundle.
-const swiftLogFile = "swift-log.log"
-
-// addPlatformLog collects logs for the iOS debug bundle. iOS has no logcat or
-// systemd journal, so we rely on file-based logs. addLogfile handles the Go
-// client log (logPath) with rotation, the stderr/stdout companions and
-// anonymization. The iOS app writes its own Swift log into the same directory,
-// so we add it alongside the Go log.
-func (g *BundleGenerator) addPlatformLog() error {
-	if err := g.addLogfile(); err != nil {
-		return err
-	}
-
-	if g.logPath == "" {
-		return nil
-	}
-
-	swiftLogPath := filepath.Join(filepath.Dir(g.logPath), swiftLogFile)
-	if err := g.addSingleLogfile(swiftLogPath, swiftLogFile); err != nil {
-		// The Swift log is best-effort: the app may not have written it yet.
-		log.Warnf("failed to add %s to debug bundle: %v", swiftLogFile, err)
-	}
-
-	return nil
-}
--- a/client/internal/debug/debug_nonandroid.go
+++ b/client/internal/debug/debug_nonandroid.go
@@ -1,4 +1,4 @@
-//go:build !android && !ios
+//go:build !android

 package debug

--- a/client/internal/engine.go
+++ b/client/internal/engine.go
@@ -86,6 +86,8 @@ const (

 var ErrResetConnection = fmt.Errorf("reset connection")

+var ErrEngineAlreadyStarted = errors.New("engine already started")
+
 type EngineConfig struct {
 	WgPort      int
 	WgIfaceName string
@@ -199,6 +201,8 @@ type Engine struct {
 	ctx    context.Context
 	cancel context.CancelFunc

+	started bool
+
 	wgInterface WGIface

 	udpMux *udpmux.UniversalUDPMuxDefault
@@ -279,9 +283,15 @@ func NewEngine(
 	services EngineServices,
 	mobileDep MobileDependency,
 ) *Engine {
+	// The engine is single-use: a fresh instance is built per connection
+	// cycle (see Client.run), so the run context is created once here rather
+	// than in Start.
+	ctx, cancel := context.WithCancel(clientCtx)
 	engine := &Engine{
 		clientCtx:          clientCtx,
 		clientCancel:       clientCancel,
+		ctx:                ctx,
+		cancel:             cancel,
 		signal:             services.SignalClient,
 		signaler:           peer.NewSignaler(services.SignalClient, config.WgPrivateKey),
 		mgmClient:          services.MgmClient,
@@ -314,8 +324,34 @@ func (e *Engine) Stop() error {
 		log.Debugf("tried stopping engine that is nil")
 		return nil
 	}
+	e.cancel()
 	e.syncMsgMux.Lock()

+	e.stopLocked()
+
+	e.syncMsgMux.Unlock()
+
+	timeout := e.calculateShutdownTimeout()
+	log.Debugf("waiting for goroutines to finish with timeout: %v", timeout)
+	shutdownCtx, cancel := context.WithTimeout(context.Background(), timeout)
+	defer cancel()
+
+	if err := waitWithContext(shutdownCtx, &e.shutdownWg); err != nil {
+		log.Warnf("shutdown timeout exceeded after %v, some goroutines may still be running", timeout)
+	}
+
+	log.Infof("stopped Netbird Engine")
+
+	return nil
+}
+
+// stopLocked tears down everything Start may have brought up, in the order
+// teardown requires (DNS before the interface goes down, flow manager after).
+// The caller must hold syncMsgMux. It is shared by Stop and by Start's failure
+// path, so a partially-initialized engine is cleaned up the same way; every
+// step is nil-guarded. It does not wait on shutdownWg — the caller does that
+// after releasing the lock, since the goroutines also take syncMsgMux.
+func (e *Engine) stopLocked() {
 	if e.connMgr != nil {
 		e.connMgr.Close()
 	}
@@ -366,10 +402,6 @@ func (e *Engine) Stop() error {
 	// so dbus and friends don't complain because of a missing interface
 	e.stopDNSServer()

-	if e.cancel != nil {
-		e.cancel()
-	}
-
 	e.jobExecutorWG.Wait() // block until job goroutines finish

 	e.close()
@@ -388,21 +420,6 @@ func (e *Engine) Stop() error {
 	if err := e.stateManager.PersistState(context.Background()); err != nil {
 		log.Errorf("failed to persist state: %v", err)
 	}
-
-	e.syncMsgMux.Unlock()
-
-	timeout := e.calculateShutdownTimeout()
-	log.Debugf("waiting for goroutines to finish with timeout: %v", timeout)
-	shutdownCtx, cancel := context.WithTimeout(context.Background(), timeout)
-	defer cancel()
-
-	if err := waitWithContext(shutdownCtx, &e.shutdownWg); err != nil {
-		log.Warnf("shutdown timeout exceeded after %v, some goroutines may still be running", timeout)
-	}
-
-	log.Infof("stopped Netbird Engine")
-
-	return nil
 }

 // calculateShutdownTimeout returns shutdown timeout: 10s base + 100ms per peer, capped at 30s.
@@ -440,18 +457,38 @@ func waitWithContext(ctx context.Context, wg *sync.WaitGroup) error {
 // Start creates a new WireGuard tunnel interface and listens to events from Signal and Management services
 // Connections to remote peers are not established here.
 // However, they will be established once an event with a list of peers to connect to will be received from Management Service
-func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL) error {
+func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL) (err error) {
 	e.syncMsgMux.Lock()
 	defer e.syncMsgMux.Unlock()

-	if err := iface.ValidateMTU(e.config.MTU); err != nil {
+	// The engine is single-use. Reject a duplicate start and a start on an
+	// already-stopped engine (run context cancelled).
+	if e.started {
+		return ErrEngineAlreadyStarted
+	}
+
+	if ctxErr := e.ctx.Err(); ctxErr != nil {
+		return fmt.Errorf("engine already stopped: %w", ctxErr)
+	}
+
+	e.started = true
+
+	// Tear down any partially-initialized state on a failed start. Cancel the
+	// run context first so goroutines started before the failure (connMgr,
+	// srWatcher, monitors) unwind, then stopLocked mirrors Stop's teardown (we
+	// already hold syncMsgMux), cleaning up route/DNS/flow/state managers too,
+	// not just what close() covers.
+	defer func() {
+		if err != nil {
+			e.cancel()
+			e.stopLocked()
+		}
+	}()
+
+	if err = iface.ValidateMTU(e.config.MTU); err != nil {
 		return fmt.Errorf("invalid MTU configuration: %w", err)
 	}

-	if e.cancel != nil {
-		e.cancel()
-	}
-	e.ctx, e.cancel = context.WithCancel(e.clientCtx)
 	e.exposeManager = expose.NewManager(e.ctx, e.mgmClient)

 	wgIface, err := e.newWgIface()
@@ -485,13 +522,11 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)

 	initialRoutes, dnsConfig, dnsFeatureFlag, err := e.readInitialSettings()
 	if err != nil {
-		e.close()
 		return fmt.Errorf("read initial settings: %w", err)
 	}

 	dnsServer, err := e.newDnsServer(dnsConfig)
 	if err != nil {
-		e.close()
 		return fmt.Errorf("create dns server: %w", err)
 	}
 	e.dnsServer = dnsServer
@@ -526,7 +561,6 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)

 	if err = e.wgInterfaceCreate(); err != nil {
 		log.Errorf("failed creating tunnel interface %s: [%s]", e.config.WgIfaceName, err.Error())
-		e.close()
 		return fmt.Errorf("create wg interface: %w", err)
 	}

@@ -535,7 +569,6 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
 	}

 	if err := e.createFirewall(); err != nil {
-		e.close()
 		return err
 	}

@@ -547,7 +580,6 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
 	e.udpMux, err = e.wgInterface.Up()
 	if err != nil {
 		log.Errorf("failed to pull up wgInterface [%s]: %s", e.wgInterface.Name(), err.Error())
-		e.close()
 		return fmt.Errorf("up wg interface: %w", err)
 	}

@@ -572,9 +604,7 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
 		e.acl = acl.NewDefaultManager(e.firewall)
 	}

-	err = e.dnsServer.Initialize()
-	if err != nil {
-		e.close()
+	if err := e.dnsServer.Initialize(); err != nil {
 		return fmt.Errorf("initialize dns server: %w", err)
 	}

@@ -586,7 +616,9 @@ func (e *Engine) Start(netbirdConfig *mgmProto.NetbirdConfig, mgmtURL *url.URL)
 	e.srWatcher = guard.NewSRWatcher(e.signal, e.relayManager, e.mobileDep.IFaceDiscover, iceCfg)
 	e.srWatcher.Start(peer.IsForceRelayed())

-	e.receiveSignalEvents()
+	if err = e.receiveSignalEvents(); err != nil {
+		return err
+	}
 	e.receiveManagementEvents()
 	e.receiveJobEvents()

@@ -638,7 +670,6 @@ func (e *Engine) createFirewall() error {

 func (e *Engine) initFirewall() error {
 	if err := e.routeManager.SetFirewall(e.firewall); err != nil {
-		e.close()
 		return fmt.Errorf("set firewall: %w", err)
 	}

@@ -1698,7 +1729,7 @@ func (e *Engine) createPeerConn(pubKey string, allowedIPs []netip.Prefix, agentV
 }

 // receiveSignalEvents connects to the Signal Service event stream to negotiate connection with remote peers
-func (e *Engine) receiveSignalEvents() {
+func (e *Engine) receiveSignalEvents() error {
 	e.shutdownWg.Add(1)
 	go func() {
 		defer e.shutdownWg.Done()
@@ -1762,7 +1793,12 @@ func (e *Engine) receiveSignalEvents() {
 		}
 	}()

-	e.signal.WaitStreamConnected()
+	// todo: consider to remove this blocker. I do not see benefit to block the Start operations
+	e.signal.WaitStreamConnected(e.ctx)
+	if err := e.ctx.Err(); err != nil {
+		return fmt.Errorf("wait for signal stream: %w", err)
+	}
+	return nil
 }

 func (e *Engine) parseNATExternalIPMappings() []string {
--- a/client/internal/engine_test.go
+++ b/client/internal/engine_test.go
@@ -247,7 +247,7 @@ func TestEngine_SSH(t *testing.T) {
 		return
 	}

-	ctx, cancel := context.WithCancel(context.Background())
+	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
 	defer cancel()

 	relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
@@ -426,7 +426,7 @@ func TestEngine_UpdateNetworkMap(t *testing.T) {
 		return
 	}

-	ctx, cancel := context.WithCancel(context.Background())
+	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
 	defer cancel()

 	relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
@@ -638,7 +638,7 @@ func TestEngine_Sync(t *testing.T) {
 		return
 	}

-	ctx, cancel := context.WithCancel(context.Background())
+	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
 	defer cancel()

 	// feed updates to Engine via mocked Management client
@@ -817,7 +817,7 @@ func TestEngine_UpdateNetworkMapWithRoutes(t *testing.T) {
 				return
 			}

-			ctx, cancel := context.WithCancel(context.Background())
+			ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
 			defer cancel()

 			wgIfaceName := fmt.Sprintf("utun%d", 104+n)
@@ -1024,7 +1024,7 @@ func TestEngine_UpdateNetworkMapWithDNSUpdate(t *testing.T) {
 				return
 			}

-			ctx, cancel := context.WithCancel(context.Background())
+			ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
 			defer cancel()

 			wgIfaceName := fmt.Sprintf("utun%d", 104+n)
--- a/client/internal/routemanager/manager.go
+++ b/client/internal/routemanager/manager.go
@@ -9,7 +9,6 @@ import (
 	"net/url"
 	"runtime"
 	"slices"
-	"strings"
 	"sync"
 	"sync/atomic"
 	"time"
@@ -701,8 +700,6 @@ func resolveURLsToIPs(urls []string) []net.IP {

 // updateRouteSelectorFromManagement updates the route selector based on the isSelected status from the management server
 func (m *DefaultManager) updateRouteSelectorFromManagement(clientRoutes route.HAMap) {
-	m.mirrorV6ExitPairSelections(clientRoutes)
-
 	// An explicit user "deselect all" must not be overridden by management auto-apply.
 	// Auto-applying an exit node here would call SelectRoutes, which clears the
 	// deselect-all flag and re-enables every route the user turned off.
@@ -719,24 +716,6 @@ func (m *DefaultManager) updateRouteSelectorFromManagement(clientRoutes route.HA
 	m.logExitNodeUpdate(exitNodeInfo)
 }

-// mirrorV6ExitPairSelections keeps every synthesized "-v6" exit route's selection
-// consistent with its v4 base. The v4/v6 exit pair is a single toggle, so the v6
-// entry always follows the base: deselecting the v4 exit node also drops its ::/0
-// pair, and any stale (orphaned) explicit selection on the v6 entry is reset. This
-// runs before selection is read so both collectExitNodeInfo and FilterSelectedExitNodes
-// see consistent state, including pairs loaded from persisted selector state.
-func (m *DefaultManager) mirrorV6ExitPairSelections(clientRoutes route.HAMap) {
-	routesByNetID := make(map[route.NetID][]*route.Route, len(clientRoutes))
-	for haID, routes := range clientRoutes {
-		routesByNetID[haID.NetID()] = routes
-	}
-
-	for v6ID := range route.V6ExitMergeSet(routesByNetID) {
-		baseID := route.NetID(strings.TrimSuffix(string(v6ID), route.V6ExitSuffix))
-		m.routeSelector.SyncPairedSelection(baseID, v6ID)
-	}
-}
-
 type exitNodeInfo struct {
 	allIDs               []route.NetID
 	selectedByManagement []route.NetID
--- a/client/internal/routemanager/manager_v6exit_test.go
+++ b/client/internal/routemanager/manager_v6exit_test.go
@@ -1,47 +0,0 @@
-package routemanager
-
-import (
-	"net/netip"
-	"testing"
-
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-
-	"github.com/netbirdio/netbird/client/internal/routeselector"
-	"github.com/netbirdio/netbird/route"
-)
-
-// TestUpdateRouteSelectorFromManagement_MirrorsV6ExitPair reproduces the bug seen
-// in netbird-engine.log: persisted selector state has the v4 exit node deselected
-// but its synthesized "-v6" pair explicitly selected (orphaned), so the ::/0 route
-// leaked onto the tunnel. The management update must mirror the v4 deselect onto the
-// v6 pair so FilterSelectedExitNodes drops it.
-func TestUpdateRouteSelectorFromManagement_MirrorsV6ExitPair(t *testing.T) {
-	const (
-		v4ID = route.NetID("Exit Node (raspberrypi)")
-		v6ID = route.NetID("Exit Node (raspberrypi)-v6")
-	)
-	all := []route.NetID{v4ID, v6ID}
-
-	rs := routeselector.NewRouteSelector()
-	// Orphan the v6 selection: select the pair, then deselect only the v4 base.
-	require.NoError(t, rs.SelectRoutes([]route.NetID{v4ID, v6ID}, true, all))
-	require.NoError(t, rs.DeselectRoutes([]route.NetID{v4ID}, all))
-	require.True(t, rs.IsSelected(v6ID), "precondition: orphaned v6 selection survives v4 deselect")
-
-	m := &DefaultManager{routeSelector: rs}
-
-	v4Route := &route.Route{NetID: v4ID, Network: netip.MustParsePrefix("0.0.0.0/0")}
-	v6Route := &route.Route{NetID: v6ID, Network: netip.MustParsePrefix("::/0")}
-	clientRoutes := route.HAMap{
-		"Exit Node (raspberrypi)|0.0.0.0/0": {v4Route},
-		"Exit Node (raspberrypi)-v6|::/0":   {v6Route},
-	}
-
-	m.updateRouteSelectorFromManagement(clientRoutes)
-
-	assert.False(t, rs.IsSelected(v6ID), "v6 pair must follow the v4 base deselect after the management update")
-
-	filtered := rs.FilterSelectedExitNodes(clientRoutes)
-	assert.Empty(t, filtered, "deselected v4 exit node must not leak its ::/0 pair onto the tunnel")
-}
--- a/client/internal/routeselector/routeselector.go
+++ b/client/internal/routeselector/routeselector.go
@@ -4,6 +4,7 @@ import (
 	"encoding/json"
 	"fmt"
 	"slices"
+	"strings"
 	"sync"

 	"github.com/hashicorp/go-multierror"
@@ -131,33 +132,6 @@ func (rs *RouteSelector) IsSelected(routeID route.NetID) bool {
 	return rs.isSelectedLocked(routeID)
 }

-// SyncPairedSelection forces pairedID's explicit selection state to match baseID's,
-// so a synthesized "-v6" exit route always follows its v4 base: selecting or
-// deselecting the v4 exit node governs the ::/0 pair, and any stale (orphaned)
-// explicit state on the v6 entry is reset. The v4/v6 exit pair is treated as a single
-// toggle, so the v6 entry carries no independent selection of its own.
-func (rs *RouteSelector) SyncPairedSelection(baseID, pairedID route.NetID) {
-	rs.mu.Lock()
-	defer rs.mu.Unlock()
-
-	if rs.deselectAll {
-		return
-	}
-
-	_, baseSelected := rs.selectedRoutes[baseID]
-	_, baseDeselected := rs.deselectedRoutes[baseID]
-
-	delete(rs.selectedRoutes, pairedID)
-	delete(rs.deselectedRoutes, pairedID)
-
-	switch {
-	case baseSelected:
-		rs.selectedRoutes[pairedID] = struct{}{}
-	case baseDeselected:
-		rs.deselectedRoutes[pairedID] = struct{}{}
-	}
-}
-
 // FilterSelected removes unselected routes from the provided map.
 func (rs *RouteSelector) FilterSelected(routes route.HAMap) route.HAMap {
 	rs.mu.RLock()
@@ -177,13 +151,14 @@ func (rs *RouteSelector) FilterSelected(routes route.HAMap) route.HAMap {
 }

 // HasUserSelectionForRoute returns true if the user has explicitly selected or deselected this route.
-// The lookup is literal; v4/v6 exit pairs are kept consistent at write time via SyncPairedSelection,
-// so a synthesized "-v6" entry carries the same explicit state as its v4 base.
+// Intended for exit-node code paths: a v6 exit-node pair (e.g. "MyExit-v6") with no explicit state of
+// its own inherits its v4 base's state, so legacy persisted selections that predate v6 pairing
+// transparently apply to the synthesized v6 entry.
 func (rs *RouteSelector) HasUserSelectionForRoute(routeID route.NetID) bool {
 	rs.mu.RLock()
 	defer rs.mu.RUnlock()

-	return rs.hasUserSelectionForRouteLocked(routeID)
+	return rs.hasUserSelectionForRouteLocked(rs.effectiveNetID(routeID))
 }

 func (rs *RouteSelector) FilterSelectedExitNodes(routes route.HAMap) route.HAMap {
@@ -212,6 +187,83 @@ func (rs *RouteSelector) FilterSelectedExitNodes(routes route.HAMap) route.HAMap
 	return filtered
 }

+// effectiveNetID returns the v4 base for a "-v6" exit pair entry that has no explicit
+// state of its own, so selections made on the v4 entry govern the v6 entry automatically.
+// Only call this from exit-node-specific code paths: applying it to a non-exit "-v6" route
+// would make it inherit unrelated v4 state. Must be called with rs.mu held.
+func (rs *RouteSelector) effectiveNetID(id route.NetID) route.NetID {
+	name := string(id)
+	if !strings.HasSuffix(name, route.V6ExitSuffix) {
+		return id
+	}
+	if _, ok := rs.selectedRoutes[id]; ok {
+		return id
+	}
+	if _, ok := rs.deselectedRoutes[id]; ok {
+		return id
+	}
+	return route.NetID(strings.TrimSuffix(name, route.V6ExitSuffix))
+}
+
+func (rs *RouteSelector) isSelectedLocked(routeID route.NetID) bool {
+	if rs.deselectAll {
+		return false
+	}
+	_, deselected := rs.deselectedRoutes[routeID]
+	return !deselected
+}
+
+func (rs *RouteSelector) isDeselectedLocked(netID route.NetID) bool {
+	if rs.deselectAll {
+		return true
+	}
+	_, deselected := rs.deselectedRoutes[netID]
+	return deselected
+}
+
+func (rs *RouteSelector) hasUserSelectionForRouteLocked(routeID route.NetID) bool {
+	_, selected := rs.selectedRoutes[routeID]
+	_, deselected := rs.deselectedRoutes[routeID]
+	return selected || deselected
+}
+
+func isExitNode(rt []*route.Route) bool {
+	return len(rt) > 0 && (route.IsV4DefaultRoute(rt[0].Network) || route.IsV6DefaultRoute(rt[0].Network))
+}
+
+func (rs *RouteSelector) applyExitNodeFilter(
+	id route.HAUniqueID,
+	netID route.NetID,
+	rt []*route.Route,
+	out route.HAMap,
+) {
+	// Exit-node path: apply the v4/v6 pair mirror so a deselect on the v4 base also
+	// drops the synthesized v6 entry that lacks its own explicit state.
+	effective := rs.effectiveNetID(netID)
+	if rs.hasUserSelectionForRouteLocked(effective) {
+		if rs.isSelectedLocked(effective) {
+			out[id] = rt
+		}
+		return
+	}
+
+	// no explicit selection for this route: defer to management's SkipAutoApply flag
+	sel := collectSelected(rt)
+	if len(sel) > 0 {
+		out[id] = sel
+	}
+}
+
+func collectSelected(rt []*route.Route) []*route.Route {
+	var sel []*route.Route
+	for _, r := range rt {
+		if !r.SkipAutoApply {
+			sel = append(sel, r)
+		}
+	}
+	return sel
+}
+
 // MarshalJSON implements the json.Marshaler interface
 func (rs *RouteSelector) MarshalJSON() ([]byte, error) {
 	rs.mu.RLock()
@@ -265,59 +317,3 @@ func (rs *RouteSelector) UnmarshalJSON(data []byte) error {

 	return nil
 }
-
-func (rs *RouteSelector) isSelectedLocked(routeID route.NetID) bool {
-	if rs.deselectAll {
-		return false
-	}
-	_, deselected := rs.deselectedRoutes[routeID]
-	return !deselected
-}
-
-func (rs *RouteSelector) isDeselectedLocked(netID route.NetID) bool {
-	if rs.deselectAll {
-		return true
-	}
-	_, deselected := rs.deselectedRoutes[netID]
-	return deselected
-}
-
-func (rs *RouteSelector) hasUserSelectionForRouteLocked(routeID route.NetID) bool {
-	_, selected := rs.selectedRoutes[routeID]
-	_, deselected := rs.deselectedRoutes[routeID]
-	return selected || deselected
-}
-
-func (rs *RouteSelector) applyExitNodeFilter(
-	id route.HAUniqueID,
-	netID route.NetID,
-	rt []*route.Route,
-	out route.HAMap,
-) {
-	if rs.hasUserSelectionForRouteLocked(netID) {
-		if rs.isSelectedLocked(netID) {
-			out[id] = rt
-		}
-		return
-	}
-
-	// no explicit selection for this route: defer to management's SkipAutoApply flag
-	sel := collectSelected(rt)
-	if len(sel) > 0 {
-		out[id] = sel
-	}
-}
-
-func isExitNode(rt []*route.Route) bool {
-	return len(rt) > 0 && (route.IsV4DefaultRoute(rt[0].Network) || route.IsV6DefaultRoute(rt[0].Network))
-}
-
-func collectSelected(rt []*route.Route) []*route.Route {
-	var sel []*route.Route
-	for _, r := range rt {
-		if !r.SkipAutoApply {
-			sel = append(sel, r)
-		}
-	}
-	return sel
-}
--- a/client/internal/routeselector/routeselector_test.go
+++ b/client/internal/routeselector/routeselector_test.go
@@ -330,73 +330,39 @@ func TestRouteSelector_FilterSelectedExitNodes(t *testing.T) {
 	assert.Len(t, filtered, 0) // No routes should be selected
 }

-// TestRouteSelector_V6ExitPairSync covers SyncPairedSelection, which keeps a v4
-// exit node and its synthesized "-v6" counterpart consistent. The selector itself
-// is literal and never infers a v6 entry's state from its v4 base; callers that know
-// the pairing (exit-node code paths) call SyncPairedSelection to force the v6 entry
-// to follow the base, treating the pair as a single toggle.
-func TestRouteSelector_V6ExitPairSync(t *testing.T) {
+// TestRouteSelector_V6ExitPairInherits covers the v4/v6 exit-node pair selection
+// mirror. The mirror is scoped to exit-node code paths: HasUserSelectionForRoute
+// and FilterSelectedExitNodes resolve a "-v6" entry without explicit state to its
+// v4 base, so legacy persisted selections that predate v6 pairing transparently
+// apply to the synthesized v6 entry. General lookups (IsSelected, FilterSelected)
+// stay literal so unrelated routes named "*-v6" don't inherit unrelated state.
+func TestRouteSelector_V6ExitPairInherits(t *testing.T) {
 	all := []route.NetID{"exit1", "exit1-v6", "exit2", "exit2-v6", "corp", "corp-v6"}

-	t.Run("selector lookups stay literal without sync", func(t *testing.T) {
+	t.Run("HasUserSelectionForRoute mirrors deselected v4 base", func(t *testing.T) {
 		rs := routeselector.NewRouteSelector()
 		require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))

-		// The selector does not pair-resolve: the v6 entry is independent until synced.
-		assert.False(t, rs.HasUserSelectionForRoute("exit1-v6"), "v6 entry has no state of its own")
-		assert.True(t, rs.IsSelected("exit1-v6"), "unsynced v6 entry stays selected by default")
+		assert.True(t, rs.HasUserSelectionForRoute("exit1-v6"), "v6 pair sees v4 base's user selection")

-		// A route literally named "exit1-something" must never pair-resolve either.
-		assert.False(t, rs.HasUserSelectionForRoute("exit1-something"))
+		// unrelated v6 with no v4 base touched is unaffected
+		assert.False(t, rs.HasUserSelectionForRoute("exit2-v6"))
 	})

-	t.Run("sync mirrors deselected v4 base onto v6", func(t *testing.T) {
+	t.Run("IsSelected stays literal for non-exit lookups", func(t *testing.T) {
+		rs := routeselector.NewRouteSelector()
+		require.NoError(t, rs.DeselectRoutes([]route.NetID{"corp"}, all))
+
+		// A non-exit route literally named "corp-v6" must not inherit "corp"'s state
+		// via the mirror; the mirror only applies in exit-node code paths.
+		assert.False(t, rs.IsSelected("corp"))
+		assert.True(t, rs.IsSelected("corp-v6"), "non-exit *-v6 routes must not inherit unrelated v4 state")
+	})
+
+	t.Run("explicit v6 state overrides v4 base in filter", func(t *testing.T) {
 		rs := routeselector.NewRouteSelector()
 		require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
-
-		rs.SyncPairedSelection("exit1", "exit1-v6")
-
-		assert.False(t, rs.IsSelected("exit1"))
-		assert.False(t, rs.IsSelected("exit1-v6"), "v6 pair follows v4 base deselect")
-		assert.True(t, rs.HasUserSelectionForRoute("exit1-v6"), "v6 carries explicit deselect after sync")
-	})
-
-	t.Run("sync mirrors selected v4 base onto v6", func(t *testing.T) {
-		rs := routeselector.NewRouteSelector()
-		require.NoError(t, rs.SelectRoutes([]route.NetID{"exit1"}, false, all))
-
-		rs.SyncPairedSelection("exit1", "exit1-v6")
-
-		assert.True(t, rs.IsSelected("exit1"))
-		assert.True(t, rs.IsSelected("exit1-v6"), "v6 pair follows v4 base select")
-	})
-
-	t.Run("sync clears v6 state when base has no explicit selection", func(t *testing.T) {
-		rs := routeselector.NewRouteSelector()
 		require.NoError(t, rs.SelectRoutes([]route.NetID{"exit1-v6"}, true, all))
-		require.True(t, rs.HasUserSelectionForRoute("exit1-v6"))
-
-		rs.SyncPairedSelection("exit1", "exit1-v6")
-
-		assert.False(t, rs.HasUserSelectionForRoute("exit1-v6"),
-			"v6 explicit state is cleared so it follows management like its base")
-	})
-
-	// Regression for the observed bug (see netbird-engine.log): persisted state has
-	// the v4 base deselected but the v6 sibling explicitly selected (orphaned). The
-	// sync must reset the orphan so the ::/0 route does not leak onto the tunnel.
-	t.Run("sync clears orphaned explicit v6 selection on deselected base", func(t *testing.T) {
-		rs := routeselector.NewRouteSelector()
-
-		// Prior state: both explicitly selected, then only the v4 base deselected,
-		// leaving the v6 entry as a stale explicit selection.
-		require.NoError(t, rs.SelectRoutes([]route.NetID{"exit1", "exit1-v6"}, true, all))
-		require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
-		require.True(t, rs.IsSelected("exit1-v6"), "precondition: orphaned v6 selection")
-
-		rs.SyncPairedSelection("exit1", "exit1-v6")
-
-		assert.False(t, rs.IsSelected("exit1-v6"), "orphaned v6 selection reset to follow v4 deselect")

 		v4Route := &route.Route{NetID: "exit1", Network: netip.MustParsePrefix("0.0.0.0/0")}
 		v6Route := &route.Route{NetID: "exit1-v6", Network: netip.MustParsePrefix("::/0")}
@@ -404,14 +370,23 @@ func TestRouteSelector_V6ExitPairSync(t *testing.T) {
 			"exit1|0.0.0.0/0": {v4Route},
 			"exit1-v6|::/0":   {v6Route},
 		}
+
 		filtered := rs.FilterSelectedExitNodes(routes)
-		assert.Empty(t, filtered, "deselecting v4 base must drop the v6 pair even if it was explicitly selected before")
+		assert.NotContains(t, filtered, route.HAUniqueID("exit1|0.0.0.0/0"))
+		assert.Contains(t, filtered, route.HAUniqueID("exit1-v6|::/0"), "explicit v6 select wins over v4 base")
 	})

-	t.Run("filter drops synced v6 pair of deselected v4 base", func(t *testing.T) {
+	t.Run("non-v6-suffix routes unaffected", func(t *testing.T) {
+		rs := routeselector.NewRouteSelector()
+		require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
+
+		// A route literally named "exit1-something" must not pair-resolve.
+		assert.False(t, rs.HasUserSelectionForRoute("exit1-something"))
+	})
+
+	t.Run("filter v6 paired with deselected v4 base", func(t *testing.T) {
 		rs := routeselector.NewRouteSelector()
 		require.NoError(t, rs.DeselectRoutes([]route.NetID{"exit1"}, all))
-		rs.SyncPairedSelection("exit1", "exit1-v6")

 		v4Route := &route.Route{NetID: "exit1", Network: netip.MustParsePrefix("0.0.0.0/0")}
 		v6Route := &route.Route{NetID: "exit1-v6", Network: netip.MustParsePrefix("::/0")}
@@ -424,15 +399,6 @@ func TestRouteSelector_V6ExitPairSync(t *testing.T) {
 		assert.Empty(t, filtered, "deselecting v4 base must also drop the v6 pair")
 	})

-	t.Run("deselectAll makes sync a no-op", func(t *testing.T) {
-		rs := routeselector.NewRouteSelector()
-		rs.DeselectAllRoutes()
-
-		rs.SyncPairedSelection("exit1", "exit1-v6")
-
-		assert.False(t, rs.HasUserSelectionForRoute("exit1-v6"), "sync must not write explicit state under deselectAll")
-	})
-
 	t.Run("non-exit *-v6 routes pass through FilterSelectedExitNodes", func(t *testing.T) {
 		rs := routeselector.NewRouteSelector()
 		require.NoError(t, rs.DeselectRoutes([]route.NetID{"corp"}, all))
--- a/client/ios/NetBirdSDK/client.go
+++ b/client/ios/NetBirdSDK/client.go
@@ -17,7 +17,6 @@ import (

 	"github.com/netbirdio/netbird/client/internal"
 	"github.com/netbirdio/netbird/client/internal/auth"
-	"github.com/netbirdio/netbird/client/internal/debug"
 	"github.com/netbirdio/netbird/client/internal/dns"
 	"github.com/netbirdio/netbird/client/internal/listener"
 	"github.com/netbirdio/netbird/client/internal/peer"
@@ -26,7 +25,6 @@ import (
 	"github.com/netbirdio/netbird/formatter"
 	"github.com/netbirdio/netbird/route"
 	"github.com/netbirdio/netbird/shared/management/domain"
-	types "github.com/netbirdio/netbird/upload-server/types"
 )

 // ConnectionListener export internal Listener for mobile
@@ -56,7 +54,6 @@ type selectRoute struct {
 	Network       netip.Prefix
 	Domains       domain.List
 	Selected      bool
-	Status        string
 	extraNetworks []netip.Prefix
 }

@@ -68,8 +65,6 @@ func init() {
 type Client struct {
 	cfgFile               string
 	stateFile             string
-	cacheDir              string
-	logFilePath           string
 	recorder              *peer.Status
 	ctxCancel             context.CancelFunc
 	ctxCancelLock         *sync.Mutex
@@ -80,21 +75,16 @@ type Client struct {
 	onHostDnsFn           func([]string)
 	dnsManager            dns.IosDnsManager
 	loginComplete         bool
+	connectClient         *internal.ConnectClient
 	// preloadedConfig holds config loaded from JSON (used on tvOS where file writes are blocked)
 	preloadedConfig *profilemanager.Config
-
-	stateMu       sync.RWMutex
-	connectClient *internal.ConnectClient
-	config        *profilemanager.Config
 }

 // NewClient instantiate a new Client
-func NewClient(cfgFile, stateFile, cacheDir, logFilePath, deviceName string, osVersion string, osName string, networkChangeListener NetworkChangeListener, dnsManager DnsManager) *Client {
+func NewClient(cfgFile, stateFile, deviceName string, osVersion string, osName string, networkChangeListener NetworkChangeListener, dnsManager DnsManager) *Client {
 	return &Client{
 		cfgFile:               cfgFile,
 		stateFile:             stateFile,
-		cacheDir:              cacheDir,
-		logFilePath:           logFilePath,
 		deviceName:            deviceName,
 		osName:                osName,
 		osVersion:             osVersion,
@@ -171,13 +161,8 @@ func (c *Client) Run(fd int32, interfaceName string, envList *EnvList) error {
 	c.onHostDnsFn = func([]string) {}
 	cfg.WgIface = interfaceName

-	connectClient := internal.NewConnectClient(ctx, cfg, c.recorder)
-	c.setState(cfg, connectClient)
-	// Persist the latest sync response so DebugBundle can include the network
-	// map. On iOS this is backed by disk to keep it out of the constrained
-	// process memory (see the syncstore package).
-	connectClient.SetSyncResponsePersistence(true)
-	return connectClient.RunOniOS(fd, c.networkChangeListener, c.dnsManager, c.stateFile, c.cacheDir, c.logFilePath)
+	c.connectClient = internal.NewConnectClient(ctx, cfg, c.recorder)
+	return c.connectClient.RunOniOS(fd, c.networkChangeListener, c.dnsManager, c.stateFile)
 }

 // Stop the internal client and free the resources
@@ -189,84 +174,6 @@ func (c *Client) Stop() {
 	}

 	c.ctxCancel()
-	c.setState(nil, nil)
-}
-
-// DebugBundle generates a debug bundle, uploads it and returns the upload key.
-// It works with or without a running engine: when the engine is up it reuses
-// the live config, sync response and client metrics; otherwise it loads the
-// config from disk (or the preloaded tvOS config).
-func (c *Client) DebugBundle(anonymize bool) (string, error) {
-	cfg, cc := c.stateSnapshot()
-
-	// If the engine hasn't been started, load config so we can reach management.
-	if cfg == nil {
-		if c.preloadedConfig != nil {
-			cfg = c.preloadedConfig
-		} else {
-			var err error
-			// Use DirectUpdateOrCreateConfig to avoid atomic file operations
-			// (temp file + rename) blocked by the tvOS sandbox.
-			cfg, err = profilemanager.DirectUpdateOrCreateConfig(profilemanager.ConfigInput{
-				ConfigPath:    c.cfgFile,
-				StateFilePath: c.stateFile,
-			})
-			if err != nil {
-				return "", fmt.Errorf("load config: %w", err)
-			}
-		}
-	}
-
-	deps := debug.GeneratorDependencies{
-		InternalConfig: cfg,
-		StatusRecorder: c.recorder,
-		TempDir:        c.cacheDir,
-		StatePath:      c.stateFile,
-		LogPath:        c.logFilePath,
-	}
-
-	if cc != nil {
-		resp, err := cc.GetLatestSyncResponse()
-		if err != nil {
-			log.Warnf("get latest sync response: %v", err)
-		}
-		deps.SyncResponse = resp
-
-		if e := cc.Engine(); e != nil {
-			if cm := e.GetClientMetrics(); cm != nil {
-				deps.ClientMetrics = cm
-			}
-		}
-	}
-
-	bundleGenerator := debug.NewBundleGenerator(
-		deps,
-		debug.BundleConfig{
-			Anonymize:         anonymize,
-			IncludeSystemInfo: true,
-		},
-	)
-
-	path, err := bundleGenerator.Generate()
-	if err != nil {
-		return "", fmt.Errorf("generate debug bundle: %w", err)
-	}
-	defer func() {
-		if err := os.Remove(path); err != nil {
-			log.Errorf("failed to remove debug bundle file: %v", err)
-		}
-	}()
-
-	uploadCtx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
-	defer cancel()
-
-	key, err := debug.UploadDebugBundle(uploadCtx, types.DefaultBundleURL, cfg.ManagementURL.String(), path)
-	if err != nil {
-		return "", fmt.Errorf("upload debug bundle: %w", err)
-	}
-
-	log.Infof("debug bundle uploaded with key %s", key)
-	return key, nil
 }

 // SetTraceLogLevel configure the logger to trace level
@@ -320,16 +227,6 @@ func (c *Client) RemoveConnectionListener() {
 	c.recorder.RemoveConnectionListener()
 }

-// IsLoginRequiredCached reports whether the LAST observed management error was an
-// auth failure (PermissionDenied/InvalidArgument), using the in-memory status
-// recorder. Unlike IsLoginRequired() it performs NO network call, so it is safe to
-// call from the connection listener during teardown (e.g. onDisconnected) without
-// blocking on a slow or unavailable network. Returns false while connected to
-// management or when the last error was not auth-related.
-func (c *Client) IsLoginRequiredCached() bool {
-	return c.recorder.IsLoginRequired()
-}
-
 func (c *Client) IsLoginRequired() bool {
 	var ctx context.Context
 	//nolint
@@ -457,12 +354,11 @@ func (c *Client) ClearLoginComplete() {
 }

 func (c *Client) GetRoutesSelectionDetails() (*RoutesSelectionDetails, error) {
-	_, connectClient := c.stateSnapshot()
-	if connectClient == nil {
+	if c.connectClient == nil {
 		return nil, fmt.Errorf("not connected")
 	}

-	engine := connectClient.Engine()
+	engine := c.connectClient.Engine()
 	if engine == nil {
 		return nil, fmt.Errorf("not connected")
 	}
@@ -481,57 +377,9 @@ func (c *Client) GetRoutesSelectionDetails() (*RoutesSelectionDetails, error) {
 	routes := buildSelectRoutes(routesMap, routeSelector.IsSelected, v6ExitMerged)
 	resolvedDomains := c.recorder.GetResolvedDomainsStates()

-	// Compute each route's connection status in the core (mirroring the Android
-	// bridge), so the UI doesn't have to infer it by string-matching the joined
-	// Network value against peer routes. For a merged exit node the status reflects
-	// whichever of the v4/v6 prefixes is served by a connected peer; for dynamic
-	// (DNS) routes the peer route key is the domain pattern (see dynamic.Route.String).
-	connectedRoutes := c.connectedRouteSet()
-	for _, r := range routes {
-		r.Status = routeStatus(r, connectedRoutes)
-	}
-
 	return prepareRouteSelectionDetails(routes, resolvedDomains), nil
 }

-// connectedRouteSet returns the set of route keys (as strings) currently served by a
-// connected peer, gathered across all connected peers' route tables. The keys match
-// what the route manager records: a prefix string for static routes (e.g. "0.0.0.0/0")
-// and the domain pattern for dynamic routes (e.g. "*.example.com").
-func (c *Client) connectedRouteSet() map[string]struct{} {
-	connected := map[string]struct{}{}
-	for _, p := range c.recorder.GetFullStatus().Peers {
-		if p.ConnStatus != peer.StatusConnected {
-			continue
-		}
-		for r := range p.GetRoutes() {
-			connected[r] = struct{}{}
-		}
-	}
-	return connected
-}
-
-// routeStatus reports "Connected" if any of the route's keys is served by a connected
-// peer: the primary Network prefix, an extra v6 network of a merged exit node, or the
-// domain pattern for a dynamic DNS route. Otherwise "Idle".
-func routeStatus(r *selectRoute, connectedRoutes map[string]struct{}) string {
-	keys := make([]string, 0, 1+len(r.extraNetworks))
-	if len(r.Domains) > 0 {
-		keys = append(keys, r.Domains.SafeString())
-	} else {
-		keys = append(keys, r.Network.String())
-	}
-	for _, extra := range r.extraNetworks {
-		keys = append(keys, extra.String())
-	}
-	for _, k := range keys {
-		if _, ok := connectedRoutes[k]; ok {
-			return peer.StatusConnected.String()
-		}
-	}
-	return peer.StatusIdle.String()
-}
-
 func buildSelectRoutes(routesMap map[route.NetID][]*route.Route, isSelected func(route.NetID) bool, v6Merged map[route.NetID]struct{}) []*selectRoute {
 	var routes []*selectRoute
 	for id, rt := range routesMap {
@@ -614,7 +462,6 @@ func prepareRouteSelectionDetails(routes []*selectRoute, resolvedDomains map[dom
 			Network:  netStr,
 			Domains:  &domainDetails,
 			Selected: r.Selected,
-			Status:   r.Status,
 		})
 	}

@@ -623,12 +470,11 @@ func prepareRouteSelectionDetails(routes []*selectRoute, resolvedDomains map[dom
 }

 func (c *Client) SelectRoute(id string) error {
-	_, connectClient := c.stateSnapshot()
-	if connectClient == nil {
+	if c.connectClient == nil {
 		return fmt.Errorf("not connected")
 	}

-	engine := connectClient.Engine()
+	engine := c.connectClient.Engine()
 	if engine == nil {
 		return fmt.Errorf("not connected")
 	}
@@ -654,11 +500,10 @@ func (c *Client) SelectRoute(id string) error {
 }

 func (c *Client) DeselectRoute(id string) error {
-	_, connectClient := c.stateSnapshot()
-	if connectClient == nil {
+	if c.connectClient == nil {
 		return fmt.Errorf("not connected")
 	}
-	engine := connectClient.Engine()
+	engine := c.connectClient.Engine()
 	if engine == nil {
 		return fmt.Errorf("not connected")
 	}
@@ -682,22 +527,6 @@ func (c *Client) DeselectRoute(id string) error {
 	return nil
 }

-// setState stores the running engine state so DebugBundle can reuse the live
-// config and ConnectClient. It is cleared on Stop.
-func (c *Client) setState(cfg *profilemanager.Config, cc *internal.ConnectClient) {
-	c.stateMu.Lock()
-	defer c.stateMu.Unlock()
-	c.config = cfg
-	c.connectClient = cc
-}
-
-// stateSnapshot returns the current config and ConnectClient under the lock.
-func (c *Client) stateSnapshot() (*profilemanager.Config, *internal.ConnectClient) {
-	c.stateMu.RLock()
-	defer c.stateMu.RUnlock()
-	return c.config, c.connectClient
-}
-
 func formatDuration(d time.Duration) string {
 	ds := d.String()
 	dotIndex := strings.Index(ds, ".")
--- a/client/ios/NetBirdSDK/login.go
+++ b/client/ios/NetBirdSDK/login.go
@@ -36,7 +36,6 @@ type URLOpener interface {
 // Auth can register or login new client
 type Auth struct {
 	ctx     context.Context
-	cancel  context.CancelFunc
 	config  *profilemanager.Config
 	cfgPath string
 }
@@ -52,19 +51,8 @@ func NewAuth(cfgPath string, mgmURL string) (*Auth, error) {
 		return nil, err
 	}

-	// Use a cancellable context so Stop() can abort an in-progress interactive
-	// login. The PKCE flow's WaitToken blocks (and keeps its loopback HTTP server
-	// bound to a port) until the OAuth callback arrives or the flow expires;
-	// cancelling the context unblocks WaitToken, which then shuts that server down
-	// and frees the port for the next login attempt. iOS runs login in the main-app
-	// process (decoupled from the network extension), so without this the server
-	// lingers after the user dismisses the browser and the next connect stalls
-	// trying to bind the same port.
-	ctx, cancel := context.WithCancel(context.Background())
-
 	return &Auth{
-		ctx:     ctx,
-		cancel:  cancel,
+		ctx:     context.Background(),
 		config:  cfg,
 		cfgPath: cfgPath,
 	}, nil
@@ -72,24 +60,12 @@ func NewAuth(cfgPath string, mgmURL string) (*Auth, error) {

 // NewAuthWithConfig instantiate Auth based on existing config
 func NewAuthWithConfig(ctx context.Context, config *profilemanager.Config) *Auth {
-	ctx, cancel := context.WithCancel(ctx)
 	return &Auth{
 		ctx:    ctx,
-		cancel: cancel,
 		config: config,
 	}
 }

-// Stop aborts an in-progress interactive login started via Login/LoginWithDeviceName.
-// It cancels the auth context, which unblocks the PKCE WaitToken and shuts down its
-// loopback HTTP server, freeing the redirect port. Safe to call multiple times and
-// safe to call when no login is running.
-func (a *Auth) Stop() {
-	if a.cancel != nil {
-		a.cancel()
-	}
-}
-
 // SaveConfigIfSSOSupported test the connectivity with the management server by retrieving the server device flow info.
 // If it returns a flow info than save the configuration and return true. If it gets a codes.NotFound, it means that SSO
 // is not supported and returns false without saving the configuration. For other errors return false.
--- a/client/ios/NetBirdSDK/routes.go
+++ b/client/ios/NetBirdSDK/routes.go
@@ -20,7 +20,6 @@ type RoutesSelectionInfo struct {
 	Network  string
 	Domains  *DomainDetails
 	Selected bool
-	Status   string
 }

 type DomainCollection interface {
--- a/client/server/server.go
+++ b/client/server/server.go
@@ -988,6 +988,10 @@ func (s *Server) cleanupConnection() error {
 		return nil
 	}

+	// TODO: consider calling s.connectClient.Stop() instead of engine.Stop().
+	// actCancel() lets the run loop stop the engine too, so both stop it
+	// concurrently; ConnectClient.Stop cancels and waits for the run loop,
+	// making the run loop the sole owner of engine shutdown.
 	if engine != nil {
 		if err := engine.Stop(); err != nil {
 			return err
--- a/management/internals/modules/reverseproxy/service/manager/manager.go
+++ b/management/internals/modules/reverseproxy/service/manager/manager.go
@@ -918,10 +918,6 @@ func (m *Manager) DeleteAllServices(ctx context.Context, accountID, userID strin
 		}

 		for _, svc := range services {
-			if err = transaction.DeleteServiceTargets(ctx, accountID, svc.ID); err != nil {
-				return fmt.Errorf("failed to delete service targets: %w", err)
-			}
-
 			if err = transaction.DeleteService(ctx, accountID, svc.ID); err != nil {
 				return fmt.Errorf("failed to delete service: %w", err)
 			}
@@ -1274,10 +1270,6 @@ func (m *Manager) deletePeerService(ctx context.Context, accountID, peerID, serv
 			return status.Errorf(status.PermissionDenied, "cannot delete service exposed by another peer")
 		}

-		if err = transaction.DeleteServiceTargets(ctx, accountID, serviceID); err != nil {
-			return fmt.Errorf("delete service targets: %w", err)
-		}
-
 		if err = transaction.DeleteService(ctx, accountID, serviceID); err != nil {
 			return fmt.Errorf("delete service: %w", err)
 		}
@@ -1327,10 +1319,6 @@ func (m *Manager) deleteExpiredPeerService(ctx context.Context, accountID, peerI
 			return nil
 		}

-		if err = transaction.DeleteServiceTargets(ctx, accountID, serviceID); err != nil {
-			return fmt.Errorf("delete service targets: %w", err)
-		}
-
 		if err = transaction.DeleteService(ctx, accountID, serviceID); err != nil {
 			return fmt.Errorf("delete service: %w", err)
 		}
--- a/management/internals/modules/reverseproxy/service/manager/manager_test.go
+++ b/management/internals/modules/reverseproxy/service/manager/manager_test.go
@@ -458,9 +458,6 @@ func TestDeletePeerService_SourcePeerValidation(t *testing.T) {
 				txMock.EXPECT().
 					GetServiceByID(ctx, store.LockingStrengthUpdate, accountID, serviceID).
 					Return(newEphemeralService(), nil)
-				txMock.EXPECT().
-					DeleteServiceTargets(ctx, accountID, serviceID).
-					Return(nil)
 				txMock.EXPECT().
 					DeleteService(ctx, accountID, serviceID).
 					Return(nil)
@@ -563,9 +560,6 @@ func TestDeletePeerService_SourcePeerValidation(t *testing.T) {
 				txMock.EXPECT().
 					GetServiceByID(ctx, store.LockingStrengthUpdate, accountID, serviceID).
 					Return(newEphemeralService(), nil)
-				txMock.EXPECT().
-					DeleteServiceTargets(ctx, accountID, serviceID).
-					Return(nil)
 				txMock.EXPECT().
 					DeleteService(ctx, accountID, serviceID).
 					Return(nil)
@@ -610,9 +604,6 @@ func TestDeletePeerService_SourcePeerValidation(t *testing.T) {
 				txMock.EXPECT().
 					GetServiceByID(ctx, store.LockingStrengthUpdate, accountID, serviceID).
 					Return(newEphemeralService(), nil)
-				txMock.EXPECT().
-					DeleteServiceTargets(ctx, accountID, serviceID).
-					Return(nil)
 				txMock.EXPECT().
 					DeleteService(ctx, accountID, serviceID).
 					Return(nil)
@@ -1201,67 +1192,6 @@ func TestDeleteService_DeletesTargets(t *testing.T) {
 	assert.Len(t, targets, 0, "All targets should be deleted when service is deleted")
 }

-func TestDeleteExpiredPeerService_DeletesTargets(t *testing.T) {
-	ctx := context.Background()
-	mgr, testStore := setupIntegrationTest(t)
-
-	resp, err := mgr.CreateServiceFromPeer(ctx, testAccountID, testPeerID, &rpservice.ExposeServiceRequest{
-		Port: 8080,
-		Mode: "http",
-	})
-	require.NoError(t, err)
-
-	svcID := resolveServiceIDByDomain(t, testStore, resp.Domain)
-
-	targets, err := testStore.GetTargetsByServiceID(ctx, store.LockingStrengthNone, testAccountID, svcID)
-	require.NoError(t, err)
-	require.Len(t, targets, 1, "ephemeral peer-exposed service should have exactly one persisted target before reaping")
-
-	expireEphemeralService(t, testStore, testAccountID, resp.Domain)
-	err = mgr.deleteExpiredPeerService(ctx, testAccountID, testPeerID, svcID)
-	require.NoError(t, err)
-
-	_, err = testStore.GetServiceByDomain(ctx, resp.Domain)
-	require.Error(t, err, "expired peer-exposed service should be deleted")
-	s, ok := status.FromError(err)
-	require.True(t, ok)
-	assert.Equal(t, status.NotFound, s.Type())
-
-	targets, err = testStore.GetTargetsByServiceID(ctx, store.LockingStrengthNone, testAccountID, svcID)
-	require.NoError(t, err)
-	assert.Len(t, targets, 0, "orphaned target rows must be deleted when an expired peer-exposed service is reaped")
-}
-
-func TestDeleteServiceFromPeer_DeletesTargets(t *testing.T) {
-	ctx := context.Background()
-	mgr, testStore := setupIntegrationTest(t)
-
-	resp, err := mgr.CreateServiceFromPeer(ctx, testAccountID, testPeerID, &rpservice.ExposeServiceRequest{
-		Port: 8080,
-		Mode: "http",
-	})
-	require.NoError(t, err)
-
-	svcID := resolveServiceIDByDomain(t, testStore, resp.Domain)
-
-	targets, err := testStore.GetTargetsByServiceID(ctx, store.LockingStrengthNone, testAccountID, svcID)
-	require.NoError(t, err)
-	require.Len(t, targets, 1, "ephemeral peer-exposed service should have exactly one persisted target before stopping")
-
-	err = mgr.StopServiceFromPeer(ctx, testAccountID, testPeerID, svcID)
-	require.NoError(t, err)
-
-	_, err = testStore.GetServiceByDomain(ctx, resp.Domain)
-	require.Error(t, err, "stopped peer-exposed service should be deleted")
-	s, ok := status.FromError(err)
-	require.True(t, ok)
-	assert.Equal(t, status.NotFound, s.Type())
-
-	targets, err = testStore.GetTargetsByServiceID(ctx, store.LockingStrengthNone, testAccountID, svcID)
-	require.NoError(t, err)
-	assert.Len(t, targets, 0, "orphaned target rows must be deleted when a peer stops its exposed service")
-}
-
 func TestValidateProtocolChange(t *testing.T) {
 	tests := []struct {
 		name    string
--- a/proxy/server.go
+++ b/proxy/server.go
@@ -1989,7 +1989,7 @@ func (s *Server) addUDPRelay(ctx context.Context, mapping *proto.ProxyMapping, t
 		"service_id":  svcID,
 	})

-	relay := udprelay.New(s.portRouterContext(ctx), udprelay.RelayConfig{
+	relay := udprelay.New(ctx, udprelay.RelayConfig{
 		Logger:      entry,
 		Listener:    listener,
 		Target:      targetAddress,
--- a/shared/signal/client/client.go
+++ b/shared/signal/client/client.go
@@ -33,7 +33,7 @@ type Client interface {
 	Receive(ctx context.Context, msgHandler func(msg *proto.Message) error) error
 	Ready() bool
 	IsHealthy() bool
-	WaitStreamConnected()
+	WaitStreamConnected(context.Context)
 	SendToStream(msg *proto.EncryptedMessage) error
 	Send(msg *proto.Message) error
 	SetOnReconnectedListener(func())
--- a/shared/signal/client/client_test.go
+++ b/shared/signal/client/client_test.go
@@ -65,7 +65,10 @@ var _ = Describe("GrpcClient", func() {
 						return
 					}
 				}()
-				clientA.WaitStreamConnected()
+				ctxA, cancelA := context.WithTimeout(context.Background(), 5*time.Second)
+				defer cancelA()
+				clientA.WaitStreamConnected(ctxA)
+				Expect(clientA.StreamConnected()).To(BeTrue())

 				// connect PeerB to Signal
 				keyB, _ := wgtypes.GenerateKey()
@@ -91,7 +94,10 @@ var _ = Describe("GrpcClient", func() {
 					}
 				}()

-				clientB.WaitStreamConnected()
+				ctxB, cancelB := context.WithTimeout(context.Background(), 5*time.Second)
+				defer cancelB()
+				clientB.WaitStreamConnected(ctxB)
+				Expect(clientB.StreamConnected()).To(BeTrue())

 				// PeerA initiates ping-pong
 				err := clientA.Send(&sigProto.Message{
@@ -129,8 +135,10 @@ var _ = Describe("GrpcClient", func() {
 						return
 					}
 				}()
-				client.WaitStreamConnected()
-				Expect(client).NotTo(BeNil())
+				ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+				defer cancel()
+				client.WaitStreamConnected(ctx)
+				Expect(client.StreamConnected()).To(BeTrue())
 			})
 		})

--- a/shared/signal/client/grpc.go
+++ b/shared/signal/client/grpc.go
@@ -213,15 +213,6 @@ func (c *GrpcClient) notifyStreamConnected() {
 	}
 }

-func (c *GrpcClient) getStreamStatusChan() <-chan struct{} {
-	c.mux.Lock()
-	defer c.mux.Unlock()
-	if c.connectedCh == nil {
-		c.connectedCh = make(chan struct{})
-	}
-	return c.connectedCh
-}
-
 func (c *GrpcClient) connect(ctx context.Context, key string) (proto.SignalExchange_ConnectStreamClient, error) {
 	c.stream = nil

@@ -282,14 +273,24 @@ func (c *GrpcClient) IsHealthy() bool {
 }

 // WaitStreamConnected waits until the client is connected to the Signal stream
-func (c *GrpcClient) WaitStreamConnected() {
-
+func (c *GrpcClient) WaitStreamConnected(ctx context.Context) {
+	// Check the status and obtain the wait channel atomically: otherwise
+	// notifyStreamConnected could flip the status and close/clear the channel
+	// between the check and the channel creation, leaving us waiting forever on
+	// a stale channel.
+	c.mux.Lock()
 	if c.status == StreamConnected {
+		c.mux.Unlock()
 		return
 	}
+	if c.connectedCh == nil {
+		c.connectedCh = make(chan struct{})
+	}
+	ch := c.connectedCh
+	c.mux.Unlock()

-	ch := c.getStreamStatusChan()
 	select {
+	case <-ctx.Done():
 	case <-c.ctx.Done():
 	case <-ch:
 	}
--- a/shared/signal/client/mock.go
+++ b/shared/signal/client/mock.go
@@ -55,7 +55,7 @@ func (sm *MockClient) Ready() bool {
 	return sm.ReadyFunc()
 }

-func (sm *MockClient) WaitStreamConnected() {
+func (sm *MockClient) WaitStreamConnected(context.Context) {
 	if sm.WaitStreamConnectedFunc == nil {
 		return
 	}
Author	SHA1	Message	Date
Zoltán Papp	f4e2836d3a	[client] clean up all started components on Start failure Start's failure defer only called close(), which covers the wg interface, firewall, rosenpass and port forwarding but leaves connMgr, srWatcher, route/DNS/flow/state managers and the monitor goroutines running. A late failure (e.g. the context-cancelled check after the signal stream) thus leaked them. Extract Stop's locked teardown into stopLocked (caller holds syncMsgMux, does not wait on shutdownWg) and call it from both Stop and Start's defer. The defer also cancels the run context first so goroutines started before the failure unwind. Teardown order is unchanged.	2026-06-16 15:14:47 +02:00
Zoltán Papp	3190347849	[client] abort Start if context cancelled while waiting for signal stream receiveSignalEvents blocks in WaitStreamConnected until the signal stream connects or the context is cancelled. If Stop cancelled e.ctx while Start was parked there, Start kept going: it started the remaining subsystems on a cancelled context and marked a shutting-down engine as started. Return the context error from receiveSignalEvents and propagate it from Start, so the deferred cleanup runs and the cancellation reaches the caller.	2026-06-16 15:03:29 +02:00
Zoltán Papp	90af9dd8ae	[client] fix WaitStreamConnected stale-channel race The StreamConnected check and the wait-channel creation took the mutex separately, so notifyStreamConnected could set the status and close/clear connectedCh in between: the waiter then created a fresh channel nobody would ever close and blocked forever. Also, the status read was unlocked while notify wrote it under the mutex (a data race). Do the check and the channel fetch in one locked section; drop the now-unused getStreamStatusChan helper. Pre-existing bug, not introduced by this branch.	2026-06-16 14:51:17 +02:00
Zoltán Papp	5cf865b243	[client] bound WaitStreamConnected in signal client tests The tests waited on WaitStreamConnected with context.Background() and the client's own context was also Background, so a stream that never connects would hang until the suite timeout. Pass a 5s timeout context and assert StreamConnected afterwards so the tests fail fast with a clear reason.	2026-06-16 14:47:07 +02:00
Zoltán Papp	67b362b4a4	[client] interrupt connect backoff on context cancel The run loop retried with a raw ExponentialBackOff, so a backoff sleep ignored context cancellation. Now that ConnectClient.Stop waits for the run loop to exit, a cancel landing during a sleep would block Stop for the full interval (up to MaxInterval). Wrap the backoff with the run context so Retry returns promptly on cancel; the retry budget itself (MaxElapsedTime) is unchanged.	2026-06-16 14:42:35 +02:00
Zoltán Papp	32fccdeede	[client] init context state in engine tests Engine tests built the engine context with context.WithCancel( context.Background()), omitting CtxInitState. Now that the run context is created in the constructor, the wgIfaceMonitor goroutine can reach triggerClientRestart during teardown, which calls CtxGetState and panics on the missing state. Real entry points (up, embed, service) always CtxInitState; only the tests skipped it.	2026-06-16 14:35:44 +02:00
Zoltán Papp	98c71d7913	[client] fix Start/Stop race by making the run loop own engine shutdown ConnectClient.Stop stopped the engine directly while the run loop's backoff cycle could still be starting an engine, so Engine.close raced Engine.Start (e.g. firewall setup reading wgInterface while close nils it). embed.Client.Start's rollback only avoided a deadlock by cancelling before Stop; the race itself remained and was caught by -race. Make the run loop the sole owner of engine shutdown: derive the run context in NewConnectClient, and have Stop cancel it and wait for the loop to exit (skipping the wait when the loop never ran) instead of calling engine.Stop. The loop now always stops the engine on its way out, dropping the unsynchronised wgInterface check it used to guard that call. Self-calls from within the loop use runCancel to avoid waiting on themselves. embed keeps a defensive pre-Stop cancel(); the daemon's cleanupConnection gets a TODO to adopt Stop() rather than stopping the engine in parallel.	2026-06-16 13:57:41 +02:00
Zoltán Papp	002e0b036f	[client] let engine context unblock WaitStreamConnected WaitStreamConnected only watched the signal client's own context, which derives from the parent engineCtx rather than the engine's run context. A Start blocked here (signal stream not yet up) could therefore not be released by Engine.Stop, since Stop only cancels the engine's run context. Pass a context into WaitStreamConnected and select on it too, and have the engine pass e.ctx, so Stop cancelling e.ctx unblocks a parked Start. Update the Client interface, the mock, and callers accordingly.	2026-06-16 13:11:46 +02:00
Zoltán Papp	c370c72d93	[client] make Engine single-use and guard against double Start Create the run context once in NewEngine instead of in Start. This keeps e.cancel valid for the engine's whole lifetime, so Stop can cancel a Start that is blocked waiting on the network while holding syncMsgMux: Stop now cancels before taking the lock, unblocking that Start so it can release the mutex. Reject re-entry into Start: a non-nil wgInterface means a prior Start already ran (ErrEngineAlreadyStarted), and a cancelled run context means the engine was stopped (ErrEngineAlreadyStopped). Both checks run before the cleanup defer so a duplicate call cannot tear down the running engine's state.	2026-06-16 13:11:46 +02:00
Zoltán Papp	5895a39380	[client] always clean up on Engine.Start failure via defer The rosenpass init paths (NewManager/Run) returned without calling e.close(), leaking the WireGuard interface and other partially initialized state on failure. Per-branch cleanup was easy to miss when adding new early returns. Convert Start to a named error return and tear down via a single defer that calls e.close() whenever err != nil, removing the scattered per-branch close() calls (including the redundant one in initFirewall).	2026-06-16 11:54:08 +02:00