[client] Refresh signal receive liveness when worker handoff drains (#6594 )

[client] report management unhealthy while Sync stream is failing (#6575 )
* fix(mgm): report management unhealthy while Sync stream is failing The health probe (IsHealthy) only checked the gRPC transport and a GetServerKey call. GetServerKey succeeds even when the peer cannot sync (e.g. the server returns "settings not found"), so the probe kept marking management Connected while the Sync stream failed in a tight retry loop — pinning the status to "Connected" forever despite no sync ever succeeding. Track the last Sync stream error and have IsHealthy consult it, so a healthy transport is no longer enough to report the connection healthy. * fix(mgm): record disconnected state when sync stream setup fails The connectToSyncStream failure path in handleSyncStream returned early without updating syncStreamErr, so the client could still report healthy even when stream setup failed. Mirror the receiveUpdatesEvents error path by calling notifyDisconnected and setSyncStreamDisconnected.
2026-06-29 11:19:56 +00:00 · 2026-06-29 12:16:47 +02:00 · 2026-06-29 11:28:58 +02:00 · 2026-06-29 11:24:25 +02:00 · 2026-06-29 11:02:02 +02:00 · 2026-06-29 09:19:01 +02:00
280 changed files with 4996 additions and 41079 deletions
--- a/.github/workflows/check-license-dependencies.yml
+++ b/.github/workflows/check-license-dependencies.yml
@@ -64,7 +64,7 @@ jobs:
          persist-credentials: false

      - name: Set up Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: true
--- a/.github/workflows/golang-test-darwin.yml
+++ b/.github/workflows/golang-test-darwin.yml
@@ -21,13 +21,13 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false

      - name: Cache Go modules
-        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: ~/go/pkg/mod
          key: macos-gotest-${{ hashFiles('**/go.sum') }}
@@ -45,7 +45,7 @@ jobs:
        run: git --no-pager diff --exit-code

      - name: Test
-        run: NETBIRD_STORE_ENGINE=${{ matrix.store }} CI=true go test -coverprofile=coverage.txt -tags=devcert -exec 'sudo --preserve-env=CI,NETBIRD_STORE_ENGINE' -timeout 5m -p 1 $(go list ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined)
+        run: NETBIRD_STORE_ENGINE=${{ matrix.store }} CI=true go test -coverprofile=coverage.txt -tags 'devcert privileged' -exec 'sudo --preserve-env=CI,NETBIRD_STORE_ENGINE' -timeout 5m -p 1 $(go list ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined -e /client/testutil/privileged)

      - name: Upload coverage reports to Codecov
        uses: codecov/codecov-action@fb8b3582c8e4def4969c97caa2f19720cb33a72f #v7.0.0
--- a/.github/workflows/golang-test-freebsd.yml
+++ b/.github/workflows/golang-test-freebsd.yml
@@ -48,14 +48,14 @@ jobs:
            export PATH=$PATH:/usr/local/go/bin:$HOME/go/bin
            time go build -o netbird client/main.go
            # check all component except management, since we do not support management server on freebsd
-            time go test -timeout 1m -failfast ./base62/...
+            time go test -tags privileged -timeout 1m -failfast ./base62/...
            # NOTE: without -p1 `client/internal/dns` will fail because of `listen udp4 :33100: bind: address already in use`
-            time go test -timeout 8m -failfast -v -p 1 ./client/...
-            time go test -timeout 1m -failfast ./dns/...
-            time go test -timeout 1m -failfast ./encryption/...
-            time go test -timeout 1m -failfast ./formatter/...
-            time go test -timeout 1m -failfast ./client/iface/...
-            time go test -timeout 1m -failfast ./route/...
-            time go test -timeout 1m -failfast ./sharedsock/...
-            time go test -timeout 1m -failfast ./util/...
-            time go test -timeout 1m -failfast ./version/...
+            time go test -tags privileged -timeout 8m -failfast -v -p 1 ./client/...
+            time go test -tags privileged -timeout 1m -failfast ./dns/...
+            time go test -tags privileged -timeout 1m -failfast ./encryption/...
+            time go test -tags privileged -timeout 1m -failfast ./formatter/...
+            time go test -tags privileged -timeout 1m -failfast ./client/iface/...
+            time go test -tags privileged -timeout 1m -failfast ./route/...
+            time go test -tags privileged -timeout 1m -failfast ./sharedsock/...
+            time go test -tags privileged -timeout 1m -failfast ./util/...
+            time go test -tags privileged -timeout 1m -failfast ./version/...
--- a/.github/workflows/golang-test-linux.yml
+++ b/.github/workflows/golang-test-linux.yml
@@ -30,7 +30,7 @@ jobs:
              - 'management/**'

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -41,7 +41,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        id: cache
        with:
          path: |
@@ -124,7 +124,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -135,7 +135,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
@@ -158,7 +158,7 @@ jobs:
        run: git --no-pager diff --exit-code

      - name: Test
-        run: CGO_ENABLED=1 GOARCH=${{ matrix.arch }} CI=true go test -coverprofile=coverage.txt -tags devcert -exec 'sudo' -timeout 10m -p 1 $(go list ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined)
+        run: CGO_ENABLED=1 GOARCH=${{ matrix.arch }} CI=true go test -coverprofile=coverage.txt -tags devcert -timeout 10m -p 1 $(go list ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined)

      - name: Upload coverage reports to Codecov
        if: matrix.arch == 'amd64'
@@ -180,7 +180,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -192,7 +192,7 @@ jobs:
          echo "modcache_dir=$(go env GOMODCACHE)" >> $GITHUB_OUTPUT

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        id: cache-restore
        with:
          path: |
@@ -229,7 +229,7 @@ jobs:
            sh -c ' \
              apk update; apk add --no-cache \
                ca-certificates iptables ip6tables dbus dbus-dev libpcap-dev build-base; \
-              go test -buildvcs=false -tags devcert -v -timeout 10m -p 1 $(go list -buildvcs=false ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined -e /client/ui -e /upload-server)
+              go test -buildvcs=false -tags "devcert privileged" -v -timeout 10m -p 1 $(go list -buildvcs=false ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined -e /client/ui -e /upload-server -e /client/testutil/privileged)
            '

  test_relay:
@@ -251,7 +251,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -266,7 +266,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
@@ -311,7 +311,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -325,7 +325,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
@@ -368,7 +368,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -383,7 +383,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
@@ -429,7 +429,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -440,7 +440,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
@@ -534,7 +534,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -545,7 +545,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
@@ -629,7 +629,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -640,7 +640,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
@@ -699,7 +699,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
@@ -710,7 +710,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
--- a/.github/workflows/golang-test-windows.yml
+++ b/.github/workflows/golang-test-windows.yml
@@ -23,7 +23,7 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        id: go
        with:
          go-version-file: "go.mod"
@@ -35,7 +35,7 @@ jobs:
          echo "modcache=$(go env GOMODCACHE)" >> $env:GITHUB_ENV

      - name: Cache Go modules
-        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ${{ env.cache }}
@@ -68,7 +68,7 @@ jobs:
        run: |
          $packages = go list ./... | Where-Object { $_ -notmatch '/management' } | Where-Object { $_ -notmatch '/relay' } | Where-Object { $_ -notmatch '/signal' } | Where-Object { $_ -notmatch '/proxy' } | Where-Object { $_ -notmatch '/combined' }
          $goExe = "C:\hostedtoolcache\windows\go\${{ steps.go.outputs.go-version }}\x64\bin\go.exe"
-          $cmd = "$goExe test -tags=devcert -timeout 10m -p 1 $($packages -join ' ') > test-out.txt 2>&1"
+          $cmd = "$goExe test -tags `"devcert privileged`" -timeout 10m -p 1 $($packages -join ' ') > test-out.txt 2>&1"
          Set-Content -Path "${{ github.workspace }}\run-tests.cmd" -Value $cmd

      - name: test
--- a/.github/workflows/golangci-lint.yml
+++ b/.github/workflows/golangci-lint.yml
@@ -21,7 +21,7 @@ jobs:
      - name: codespell
        uses: codespell-project/actions-codespell@8f01853be192eb0f849a5c7d721450e7a467c579 # v2.2
        with:
-          ignore_words_list: erro,clienta,hastable,iif,groupd,testin,groupe,cros,ans,deriver,te,userA,ede,additionals,flate,recordin,unparseable
+          ignore_words_list: erro,clienta,hastable,iif,groupd,testin,groupe,cros,ans,deriver,te,userA,ede,additionals
          skip: go.mod,go.sum,**/proxy/web/**
  golangci:
    strategy:
@@ -48,7 +48,7 @@ jobs:
        run: |
          ! awk '/const \(/,/)/{print $0}' management/server/activity/codes.go | grep -o '= [0-9]*' | sort | uniq -d | grep .
      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
--- a/.github/workflows/mobile-build-validation.yml
+++ b/.github/workflows/mobile-build-validation.yml
@@ -20,7 +20,7 @@ jobs:
        with:
          persist-credentials: false
      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
      - name: Setup Android SDK
@@ -28,13 +28,13 @@ jobs:
        with:
          cmdline-tools-version: 8512546
      - name: Setup Java
-        uses: actions/setup-java@ad2b38190b15e4d6bdf0c97fb4fca8412226d287
+        uses: actions/setup-java@1bcf9fb12cf4aa7d266a90ae39939e61372fe520
        with:
          java-version: "11"
          distribution: "adopt"
      - name: NDK Cache
        id: ndk-cache
-        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: /usr/local/lib/android/sdk/ndk
          key: ndk-cache-23.1.7779620
@@ -58,7 +58,7 @@ jobs:
        with:
          persist-credentials: false
      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
      - name: install gomobile
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -166,12 +166,12 @@ jobs:
          fi

      - name: Set up Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
      - name: Cache Go modules
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache/restore@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ~/go/pkg/mod
@@ -374,12 +374,12 @@ jobs:
          fi

      - name: Set up Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
      - name: Cache Go modules
-        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ~/go/pkg/mod
@@ -469,12 +469,12 @@ jobs:
          fetch-depth: 0 # It is required for GoReleaser to work properly
          persist-credentials: false
      - name: Set up Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
          cache: false
      - name: Cache Go modules
-        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: |
            ~/go/pkg/mod
--- a/.github/workflows/test-infrastructure-files.yml
+++ b/.github/workflows/test-infrastructure-files.yml
@@ -73,12 +73,12 @@ jobs:
          persist-credentials: false

      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"

      - name: Cache Go modules
-        uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
+        uses: actions/cache@2c8a9bd7457de244a408f35966fab2fb45fda9c8 # v6.0.0
        with:
          path: ~/go/pkg/mod
          key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
--- a/.github/workflows/wasm-build-validation.yml
+++ b/.github/workflows/wasm-build-validation.yml
@@ -23,7 +23,7 @@ jobs:
        with:
          persist-credentials: false
      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
      - name: Install dependencies
@@ -48,7 +48,7 @@ jobs:
        with:
          persist-credentials: false
      - name: Install Go
-        uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
+        uses: actions/setup-go@924ae3a1cded613372ab5595356fb5720e22ba16 # v6.5.0
        with:
          go-version-file: "go.mod"
      - name: Build Wasm client
--- a/14
+++ b/14
@@ -1,4 +1,4 @@
-.PHONY: lint lint-all lint-install setup-hooks
+.PHONY: lint lint-all lint-install setup-hooks test-unit test-privileged
 GOLANGCI_LINT := $(shell pwd)/bin/golangci-lint

 # Install golangci-lint locally if needed
@@ -25,3 +25,15 @@ setup-hooks:
 	@git config core.hooksPath .githooks
 	@chmod +x .githooks/pre-push
 	@echo "✅ Git hooks configured! Pre-push will now run 'make lint'"
+
+# Host-safe unit tests: excludes the privileged-tagged tests (root / system-mutating).
+# Runs as a normal user with no sudo and leaves host networking untouched.
+test-unit:
+	@go test -tags devcert -timeout 10m ./...
+
+# Privileged suite: runs the `privileged`-tagged tests inside a --privileged
+# --cap-add=NET_ADMIN container via the ory/dockertest harness. Requires Docker.
+# Narrow the run with env vars, e.g.:
+#   PRIV_RUN=TestNftablesManager PRIV_PKGS=./client/firewall/nftables/... make test-privileged
+test-privileged:
+	@go test -tags 'devcert privileged' -timeout 30m -run TestRunPrivilegedSuiteInDocker -v ./client/testutil/privileged/...
--- a/README.md
+++ b/README.md
@@ -33,7 +33,7 @@
  <br/>
  <br/>
  <strong>
-    🚀 <a href="https://careers.netbird.io">We are hiring! Join us at careers.netbird.io</a>
+    🚀 <a href="https://netbird.io/careers">We are hiring! Join us at https://netbird.io/careers</a>
  </strong>
 </p>

--- a/client/cmd/service_privileged_test.go
+++ b/client/cmd/service_privileged_test.go
@@ -0,0 +1,196 @@
+//go:build privileged
+
+package cmd
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"runtime"
+	"testing"
+	"time"
+
+	"github.com/kardianos/service"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+const (
+	serviceStartTimeout = 10 * time.Second
+	serviceStopTimeout  = 5 * time.Second
+	statusPollInterval  = 500 * time.Millisecond
+)
+
+// waitForServiceStatus waits for service to reach expected status with timeout
+func waitForServiceStatus(expectedStatus service.Status, timeout time.Duration) (bool, error) {
+	cfg, err := newSVCConfig()
+	if err != nil {
+		return false, err
+	}
+
+	ctxSvc, cancel := context.WithCancel(context.Background())
+	defer cancel()
+
+	s, err := newSVC(newProgram(ctxSvc, cancel), cfg)
+	if err != nil {
+		return false, err
+	}
+
+	ctx, timeoutCancel := context.WithTimeout(context.Background(), timeout)
+	defer timeoutCancel()
+
+	ticker := time.NewTicker(statusPollInterval)
+	defer ticker.Stop()
+
+	for {
+		select {
+		case <-ctx.Done():
+			return false, fmt.Errorf("timeout waiting for service status %v", expectedStatus)
+		case <-ticker.C:
+			status, err := s.Status()
+			if err != nil {
+				// Continue polling on transient errors
+				continue
+			}
+			if status == expectedStatus {
+				return true, nil
+			}
+		}
+	}
+}
+
+// TestServiceLifecycle tests the complete service lifecycle
+func TestServiceLifecycle(t *testing.T) {
+	// TODO: Add support for Windows and macOS
+	if runtime.GOOS != "linux" && runtime.GOOS != "freebsd" {
+		t.Skipf("Skipping service lifecycle test on unsupported OS: %s", runtime.GOOS)
+	}
+
+	if os.Getenv("CONTAINER") == "true" {
+		t.Skip("Skipping service lifecycle test in container environment")
+	}
+
+	originalServiceName := serviceName
+	serviceName = "netbirdtest" + fmt.Sprintf("%d", time.Now().Unix())
+	defer func() {
+		serviceName = originalServiceName
+	}()
+
+	tempDir := t.TempDir()
+	configPath = fmt.Sprintf("%s/netbird-test-config.json", tempDir)
+	logLevel = "info"
+	daemonAddr = fmt.Sprintf("unix://%s/netbird-test.sock", tempDir)
+
+	// Ensure cleanup even if a subtest fails and Stop/Uninstall subtests don't run.
+	t.Cleanup(func() {
+		cfg, err := newSVCConfig()
+		if err != nil {
+			t.Errorf("cleanup: create service config: %v", err)
+			return
+		}
+		ctxSvc, cancel := context.WithCancel(context.Background())
+		defer cancel()
+		s, err := newSVC(newProgram(ctxSvc, cancel), cfg)
+		if err != nil {
+			t.Errorf("cleanup: create service: %v", err)
+			return
+		}
+
+		// If the subtests already cleaned up, there's nothing to do.
+		if _, err := s.Status(); err != nil {
+			return
+		}
+
+		if err := s.Stop(); err != nil {
+			t.Errorf("cleanup: stop service: %v", err)
+		}
+		if err := s.Uninstall(); err != nil {
+			t.Errorf("cleanup: uninstall service: %v", err)
+		}
+	})
+
+	ctx := context.Background()
+
+	t.Run("Install", func(t *testing.T) {
+		installCmd.SetContext(ctx)
+		err := installCmd.RunE(installCmd, []string{})
+		require.NoError(t, err)
+
+		cfg, err := newSVCConfig()
+		require.NoError(t, err)
+
+		ctxSvc, cancel := context.WithCancel(context.Background())
+		defer cancel()
+
+		s, err := newSVC(newProgram(ctxSvc, cancel), cfg)
+		require.NoError(t, err)
+
+		status, err := s.Status()
+		assert.NoError(t, err)
+		assert.NotEqual(t, service.StatusUnknown, status)
+	})
+
+	t.Run("Start", func(t *testing.T) {
+		startCmd.SetContext(ctx)
+		err := startCmd.RunE(startCmd, []string{})
+		require.NoError(t, err)
+
+		running, err := waitForServiceStatus(service.StatusRunning, serviceStartTimeout)
+		require.NoError(t, err)
+		assert.True(t, running)
+	})
+
+	t.Run("Restart", func(t *testing.T) {
+		restartCmd.SetContext(ctx)
+		err := restartCmd.RunE(restartCmd, []string{})
+		require.NoError(t, err)
+
+		running, err := waitForServiceStatus(service.StatusRunning, serviceStartTimeout)
+		require.NoError(t, err)
+		assert.True(t, running)
+	})
+
+	t.Run("Reconfigure", func(t *testing.T) {
+		originalLogLevel := logLevel
+		logLevel = "debug"
+		defer func() {
+			logLevel = originalLogLevel
+		}()
+
+		reconfigureCmd.SetContext(ctx)
+		err := reconfigureCmd.RunE(reconfigureCmd, []string{})
+		require.NoError(t, err)
+
+		running, err := waitForServiceStatus(service.StatusRunning, serviceStartTimeout)
+		require.NoError(t, err)
+		assert.True(t, running)
+	})
+
+	t.Run("Stop", func(t *testing.T) {
+		stopCmd.SetContext(ctx)
+		err := stopCmd.RunE(stopCmd, []string{})
+		require.NoError(t, err)
+
+		stopped, err := waitForServiceStatus(service.StatusStopped, serviceStopTimeout)
+		require.NoError(t, err)
+		assert.True(t, stopped)
+	})
+
+	t.Run("Uninstall", func(t *testing.T) {
+		uninstallCmd.SetContext(ctx)
+		err := uninstallCmd.RunE(uninstallCmd, []string{})
+		require.NoError(t, err)
+
+		cfg, err := newSVCConfig()
+		require.NoError(t, err)
+
+		ctxSvc, cancel := context.WithCancel(context.Background())
+		defer cancel()
+
+		s, err := newSVC(newProgram(ctxSvc, cancel), cfg)
+		require.NoError(t, err)
+
+		_, err = s.Status()
+		assert.Error(t, err)
+	})
+}
--- a/client/cmd/service_test.go
+++ b/client/cmd/service_test.go
@@ -1,16 +1,12 @@
 package cmd

 import (
-	"context"
-	"fmt"
 	"os"
 	"os/signal"
 	"runtime"
 	"syscall"
 	"testing"
-	"time"

-	"github.com/kardianos/service"
 	"github.com/stretchr/testify/assert"
 	"github.com/stretchr/testify/require"
 )
@@ -31,186 +27,6 @@ func TestMain(m *testing.M) {
 	os.Exit(m.Run())
 }

-const (
-	serviceStartTimeout = 10 * time.Second
-	serviceStopTimeout  = 5 * time.Second
-	statusPollInterval  = 500 * time.Millisecond
-)
-
-// waitForServiceStatus waits for service to reach expected status with timeout
-func waitForServiceStatus(expectedStatus service.Status, timeout time.Duration) (bool, error) {
-	cfg, err := newSVCConfig()
-	if err != nil {
-		return false, err
-	}
-
-	ctxSvc, cancel := context.WithCancel(context.Background())
-	defer cancel()
-
-	s, err := newSVC(newProgram(ctxSvc, cancel), cfg)
-	if err != nil {
-		return false, err
-	}
-
-	ctx, timeoutCancel := context.WithTimeout(context.Background(), timeout)
-	defer timeoutCancel()
-
-	ticker := time.NewTicker(statusPollInterval)
-	defer ticker.Stop()
-
-	for {
-		select {
-		case <-ctx.Done():
-			return false, fmt.Errorf("timeout waiting for service status %v", expectedStatus)
-		case <-ticker.C:
-			status, err := s.Status()
-			if err != nil {
-				// Continue polling on transient errors
-				continue
-			}
-			if status == expectedStatus {
-				return true, nil
-			}
-		}
-	}
-}
-
-// TestServiceLifecycle tests the complete service lifecycle
-func TestServiceLifecycle(t *testing.T) {
-	// TODO: Add support for Windows and macOS
-	if runtime.GOOS != "linux" && runtime.GOOS != "freebsd" {
-		t.Skipf("Skipping service lifecycle test on unsupported OS: %s", runtime.GOOS)
-	}
-
-	if os.Getenv("CONTAINER") == "true" {
-		t.Skip("Skipping service lifecycle test in container environment")
-	}
-
-	originalServiceName := serviceName
-	serviceName = "netbirdtest" + fmt.Sprintf("%d", time.Now().Unix())
-	defer func() {
-		serviceName = originalServiceName
-	}()
-
-	tempDir := t.TempDir()
-	configPath = fmt.Sprintf("%s/netbird-test-config.json", tempDir)
-	logLevel = "info"
-	daemonAddr = fmt.Sprintf("unix://%s/netbird-test.sock", tempDir)
-
-	// Ensure cleanup even if a subtest fails and Stop/Uninstall subtests don't run.
-	t.Cleanup(func() {
-		cfg, err := newSVCConfig()
-		if err != nil {
-			t.Errorf("cleanup: create service config: %v", err)
-			return
-		}
-		ctxSvc, cancel := context.WithCancel(context.Background())
-		defer cancel()
-		s, err := newSVC(newProgram(ctxSvc, cancel), cfg)
-		if err != nil {
-			t.Errorf("cleanup: create service: %v", err)
-			return
-		}
-
-		// If the subtests already cleaned up, there's nothing to do.
-		if _, err := s.Status(); err != nil {
-			return
-		}
-
-		if err := s.Stop(); err != nil {
-			t.Errorf("cleanup: stop service: %v", err)
-		}
-		if err := s.Uninstall(); err != nil {
-			t.Errorf("cleanup: uninstall service: %v", err)
-		}
-	})
-
-	ctx := context.Background()
-
-	t.Run("Install", func(t *testing.T) {
-		installCmd.SetContext(ctx)
-		err := installCmd.RunE(installCmd, []string{})
-		require.NoError(t, err)
-
-		cfg, err := newSVCConfig()
-		require.NoError(t, err)
-
-		ctxSvc, cancel := context.WithCancel(context.Background())
-		defer cancel()
-
-		s, err := newSVC(newProgram(ctxSvc, cancel), cfg)
-		require.NoError(t, err)
-
-		status, err := s.Status()
-		assert.NoError(t, err)
-		assert.NotEqual(t, service.StatusUnknown, status)
-	})
-
-	t.Run("Start", func(t *testing.T) {
-		startCmd.SetContext(ctx)
-		err := startCmd.RunE(startCmd, []string{})
-		require.NoError(t, err)
-
-		running, err := waitForServiceStatus(service.StatusRunning, serviceStartTimeout)
-		require.NoError(t, err)
-		assert.True(t, running)
-	})
-
-	t.Run("Restart", func(t *testing.T) {
-		restartCmd.SetContext(ctx)
-		err := restartCmd.RunE(restartCmd, []string{})
-		require.NoError(t, err)
-
-		running, err := waitForServiceStatus(service.StatusRunning, serviceStartTimeout)
-		require.NoError(t, err)
-		assert.True(t, running)
-	})
-
-	t.Run("Reconfigure", func(t *testing.T) {
-		originalLogLevel := logLevel
-		logLevel = "debug"
-		defer func() {
-			logLevel = originalLogLevel
-		}()
-
-		reconfigureCmd.SetContext(ctx)
-		err := reconfigureCmd.RunE(reconfigureCmd, []string{})
-		require.NoError(t, err)
-
-		running, err := waitForServiceStatus(service.StatusRunning, serviceStartTimeout)
-		require.NoError(t, err)
-		assert.True(t, running)
-	})
-
-	t.Run("Stop", func(t *testing.T) {
-		stopCmd.SetContext(ctx)
-		err := stopCmd.RunE(stopCmd, []string{})
-		require.NoError(t, err)
-
-		stopped, err := waitForServiceStatus(service.StatusStopped, serviceStopTimeout)
-		require.NoError(t, err)
-		assert.True(t, stopped)
-	})
-
-	t.Run("Uninstall", func(t *testing.T) {
-		uninstallCmd.SetContext(ctx)
-		err := uninstallCmd.RunE(uninstallCmd, []string{})
-		require.NoError(t, err)
-
-		cfg, err := newSVCConfig()
-		require.NoError(t, err)
-
-		ctxSvc, cancel := context.WithCancel(context.Background())
-		defer cancel()
-
-		s, err := newSVC(newProgram(ctxSvc, cancel), cfg)
-		require.NoError(t, err)
-
-		_, err = s.Status()
-		assert.Error(t, err)
-	})
-}
-
 // TestServiceEnvVars tests environment variable parsing
 func TestServiceEnvVars(t *testing.T) {
 	tests := []struct {
--- a/client/firewall/iptables/manager_linux_test.go
+++ b/client/firewall/iptables/manager_linux_test.go
@@ -1,3 +1,5 @@
+//go:build privileged
+
 package iptables

 import (
--- a/client/firewall/iptables/router_linux_test.go
+++ b/client/firewall/iptables/router_linux_test.go
@@ -1,4 +1,4 @@
-//go:build !android
+//go:build !android && privileged

 package iptables

--- a/client/firewall/nftables/manager_linux_test.go
+++ b/client/firewall/nftables/manager_linux_test.go
@@ -1,3 +1,5 @@
+//go:build privileged
+
 package nftables

 import (
--- a/client/firewall/nftables/router_linux_test.go
+++ b/client/firewall/nftables/router_linux_test.go
@@ -1,4 +1,4 @@
-//go:build !android
+//go:build !android && privileged

 package nftables

--- a/client/iface/iface_test.go
+++ b/client/iface/iface_test.go
@@ -1,3 +1,5 @@
+//go:build privileged
+
 package iface

 import (
--- a/client/iface/wgproxy/proxy_linux_test.go
+++ b/client/iface/wgproxy/proxy_linux_test.go
@@ -1,4 +1,4 @@
-//go:build linux && !android
+//go:build linux && !android && privileged

 package wgproxy

--- a/client/iface/wgproxy/proxy_seed_test.go
+++ b/client/iface/wgproxy/proxy_seed_test.go
@@ -1,4 +1,4 @@
-//go:build !linux
+//go:build !linux || !privileged

 package wgproxy

--- a/client/iface/wgproxy/redirect_test.go
+++ b/client/iface/wgproxy/redirect_test.go
@@ -1,4 +1,4 @@
-//go:build linux && !android
+//go:build linux && !android && privileged

 package wgproxy

@@ -26,64 +26,6 @@ func compareUDPAddr(addr1, addr2 net.Addr) bool {
 	return udpAddr1.IP.Equal(udpAddr2.IP) && udpAddr1.Port == udpAddr2.Port
 }

-// TestRedirectAs_eBPF_IPv4 tests RedirectAs with eBPF proxy using IPv4 addresses
-func TestRedirectAs_eBPF_IPv4(t *testing.T) {
-	wgPort := 51850
-	ebpfProxy := ebpf.NewWGEBPFProxy(wgPort, 1280)
-	if err := ebpfProxy.Listen(); err != nil {
-		t.Fatalf("failed to initialize ebpf proxy: %v", err)
-	}
-	defer func() {
-		if err := ebpfProxy.Free(); err != nil {
-			t.Errorf("failed to free ebpf proxy: %v", err)
-		}
-	}()
-
-	proxy := ebpf.NewProxyWrapper(ebpfProxy)
-
-	// NetBird UDP address of the remote peer
-	nbAddr := &net.UDPAddr{
-		IP:   net.ParseIP("100.108.111.177"),
-		Port: 38746,
-	}
-
-	p2pEndpoint := &net.UDPAddr{
-		IP:   net.ParseIP("192.168.0.56"),
-		Port: 51820,
-	}
-
-	testRedirectAs(t, proxy, wgPort, nbAddr, p2pEndpoint)
-}
-
-// TestRedirectAs_eBPF_IPv6 tests RedirectAs with eBPF proxy using IPv6 addresses
-func TestRedirectAs_eBPF_IPv6(t *testing.T) {
-	wgPort := 51851
-	ebpfProxy := ebpf.NewWGEBPFProxy(wgPort, 1280)
-	if err := ebpfProxy.Listen(); err != nil {
-		t.Fatalf("failed to initialize ebpf proxy: %v", err)
-	}
-	defer func() {
-		if err := ebpfProxy.Free(); err != nil {
-			t.Errorf("failed to free ebpf proxy: %v", err)
-		}
-	}()
-
-	proxy := ebpf.NewProxyWrapper(ebpfProxy)
-
-	// NetBird UDP address of the remote peer
-	nbAddr := &net.UDPAddr{
-		IP:   net.ParseIP("100.108.111.177"),
-		Port: 38746,
-	}
-
-	p2pEndpoint := &net.UDPAddr{
-		IP:   net.ParseIP("fe80::56"),
-		Port: 51820,
-	}
-
-	testRedirectAs(t, proxy, wgPort, nbAddr, p2pEndpoint)
-}
-
 // TestRedirectAs_UDP_IPv4 tests RedirectAs with UDP proxy using IPv4 addresses
 func TestRedirectAs_UDP_IPv4(t *testing.T) {
 	wgPort := 51852
@@ -256,6 +198,64 @@ func testRedirectAs(t *testing.T, proxy Proxy, wgPort int, nbAddr, p2pEndpoint *
 	}
 }

+// TestRedirectAs_eBPF_IPv4 tests RedirectAs with eBPF proxy using IPv4 addresses
+func TestRedirectAs_eBPF_IPv4(t *testing.T) {
+	wgPort := 51850
+	ebpfProxy := ebpf.NewWGEBPFProxy(wgPort, 1280)
+	if err := ebpfProxy.Listen(); err != nil {
+		t.Fatalf("failed to initialize ebpf proxy: %v", err)
+	}
+	defer func() {
+		if err := ebpfProxy.Free(); err != nil {
+			t.Errorf("failed to free ebpf proxy: %v", err)
+		}
+	}()
+
+	proxy := ebpf.NewProxyWrapper(ebpfProxy)
+
+	// NetBird UDP address of the remote peer
+	nbAddr := &net.UDPAddr{
+		IP:   net.ParseIP("100.108.111.177"),
+		Port: 38746,
+	}
+
+	p2pEndpoint := &net.UDPAddr{
+		IP:   net.ParseIP("192.168.0.56"),
+		Port: 51820,
+	}
+
+	testRedirectAs(t, proxy, wgPort, nbAddr, p2pEndpoint)
+}
+
+// TestRedirectAs_eBPF_IPv6 tests RedirectAs with eBPF proxy using IPv6 addresses
+func TestRedirectAs_eBPF_IPv6(t *testing.T) {
+	wgPort := 51851
+	ebpfProxy := ebpf.NewWGEBPFProxy(wgPort, 1280)
+	if err := ebpfProxy.Listen(); err != nil {
+		t.Fatalf("failed to initialize ebpf proxy: %v", err)
+	}
+	defer func() {
+		if err := ebpfProxy.Free(); err != nil {
+			t.Errorf("failed to free ebpf proxy: %v", err)
+		}
+	}()
+
+	proxy := ebpf.NewProxyWrapper(ebpfProxy)
+
+	// NetBird UDP address of the remote peer
+	nbAddr := &net.UDPAddr{
+		IP:   net.ParseIP("100.108.111.177"),
+		Port: 38746,
+	}
+
+	p2pEndpoint := &net.UDPAddr{
+		IP:   net.ParseIP("fe80::56"),
+		Port: 51820,
+	}
+
+	testRedirectAs(t, proxy, wgPort, nbAddr, p2pEndpoint)
+}
+
 // TestRedirectAs_Multiple_Switches tests switching between multiple endpoints
 func TestRedirectAs_Multiple_Switches(t *testing.T) {
 	wgPort := 51856
--- a/client/internal/dns/resutil/resolve.go
+++ b/client/internal/dns/resutil/resolve.go
@@ -8,6 +8,7 @@ import (
 	"errors"
 	"net"
 	"net/netip"
+	"slices"
 	"strings"

 	"github.com/miekg/dns"
@@ -167,7 +168,10 @@ func getRcodeForNotFound(ctx context.Context, r resolver, domain string, origina
 	case dns.TypeA:
 		alternativeNetwork = "ip6"
 	default:
-		return dns.RcodeNameError
+		// Non-address types reach LookupIP only unexpectedly; without an
+		// address pair to probe we cannot prove the name is absent, so answer
+		// NODATA rather than a poisoning NXDOMAIN.
+		return dns.RcodeSuccess
 	}

 	if _, err := r.LookupNetIP(ctx, alternativeNetwork, domain); err != nil {
@@ -184,6 +188,230 @@ func getRcodeForNotFound(ctx context.Context, r resolver, domain string, origina
 	return dns.RcodeSuccess
 }

+// RecordResolver is the host resolver surface used to forward non-address
+// record queries. net.DefaultResolver satisfies it.
+type RecordResolver interface {
+	LookupMX(ctx context.Context, name string) ([]*net.MX, error)
+	LookupTXT(ctx context.Context, name string) ([]string, error)
+	LookupNS(ctx context.Context, name string) ([]*net.NS, error)
+	LookupSRV(ctx context.Context, service, proto, name string) (string, []*net.SRV, error)
+	LookupCNAME(ctx context.Context, host string) (string, error)
+	LookupAddr(ctx context.Context, addr string) ([]string, error)
+}
+
+// LookupRecords resolves a non-address DNS record type through the host
+// resolver and returns the resource records and the DNS rcode. Types the host
+// resolver cannot answer (anything not covered by the net.Resolver Lookup*
+// methods) yield NODATA so that a routed name is never poisoned with NXDOMAIN
+// for an unsupported type.
+func LookupRecords(ctx context.Context, r RecordResolver, name string, qtype uint16, ttl uint32) ([]dns.RR, int) {
+	fqdn := dns.Fqdn(name)
+
+	switch qtype {
+	case dns.TypeMX:
+		return lookupMX(ctx, r, name, fqdn, ttl)
+	case dns.TypeTXT:
+		return lookupTXT(ctx, r, name, fqdn, ttl)
+	case dns.TypeNS:
+		return lookupNS(ctx, r, name, fqdn, ttl)
+	case dns.TypeSRV:
+		return lookupSRV(ctx, r, name, fqdn, ttl)
+	case dns.TypeCNAME:
+		return lookupCNAME(ctx, r, name, fqdn, ttl)
+	case dns.TypePTR:
+		return lookupPTR(ctx, r, name, fqdn, ttl)
+	default:
+		return nil, dns.RcodeSuccess
+	}
+}
+
+func recordHeader(fqdn string, rrtype uint16, ttl uint32) dns.RR_Header {
+	return dns.RR_Header{Name: fqdn, Rrtype: rrtype, Class: dns.ClassINET, Ttl: ttl}
+}
+
+func lookupMX(ctx context.Context, r RecordResolver, name, fqdn string, ttl uint32) ([]dns.RR, int) {
+	recs, err := r.LookupMX(ctx, name)
+	if err != nil {
+		return nil, rcodeForRecordError(err)
+	}
+	rrs := make([]dns.RR, 0, len(recs))
+	for _, mx := range recs {
+		rrs = append(rrs, &dns.MX{
+			Hdr:        recordHeader(fqdn, dns.TypeMX, ttl),
+			Preference: mx.Pref,
+			Mx:         dns.Fqdn(mx.Host),
+		})
+	}
+	return rrs, dns.RcodeSuccess
+}
+
+func lookupTXT(ctx context.Context, r RecordResolver, name, fqdn string, ttl uint32) ([]dns.RR, int) {
+	recs, err := r.LookupTXT(ctx, name)
+	if err != nil {
+		return nil, rcodeForRecordError(err)
+	}
+	rrs := make([]dns.RR, 0, len(recs))
+	for _, txt := range recs {
+		rrs = append(rrs, &dns.TXT{
+			Hdr: recordHeader(fqdn, dns.TypeTXT, ttl),
+			Txt: chunkTXT(txt),
+		})
+	}
+	return rrs, dns.RcodeSuccess
+}
+
+func lookupNS(ctx context.Context, r RecordResolver, name, fqdn string, ttl uint32) ([]dns.RR, int) {
+	recs, err := r.LookupNS(ctx, name)
+	if err != nil {
+		return nil, rcodeForRecordError(err)
+	}
+	rrs := make([]dns.RR, 0, len(recs))
+	for _, ns := range recs {
+		rrs = append(rrs, &dns.NS{
+			Hdr: recordHeader(fqdn, dns.TypeNS, ttl),
+			Ns:  dns.Fqdn(ns.Host),
+		})
+	}
+	return rrs, dns.RcodeSuccess
+}
+
+func lookupSRV(ctx context.Context, r RecordResolver, name, fqdn string, ttl uint32) ([]dns.RR, int) {
+	_, recs, err := r.LookupSRV(ctx, "", "", name)
+	if err != nil {
+		return nil, rcodeForRecordError(err)
+	}
+	rrs := make([]dns.RR, 0, len(recs))
+	for _, srv := range recs {
+		rrs = append(rrs, &dns.SRV{
+			Hdr:      recordHeader(fqdn, dns.TypeSRV, ttl),
+			Priority: srv.Priority,
+			Weight:   srv.Weight,
+			Port:     srv.Port,
+			Target:   dns.Fqdn(srv.Target),
+		})
+	}
+	return rrs, dns.RcodeSuccess
+}
+
+func lookupCNAME(ctx context.Context, r RecordResolver, name, fqdn string, ttl uint32) ([]dns.RR, int) {
+	cname, err := r.LookupCNAME(ctx, name)
+	if err != nil {
+		return nil, rcodeForRecordError(err)
+	}
+	// LookupCNAME returns the queried name itself when the name resolves but
+	// has no CNAME record; that is a NODATA result, not a CNAME.
+	if strings.EqualFold(dns.Fqdn(cname), fqdn) {
+		return nil, dns.RcodeSuccess
+	}
+	return []dns.RR{&dns.CNAME{
+		Hdr:    recordHeader(fqdn, dns.TypeCNAME, ttl),
+		Target: dns.Fqdn(cname),
+	}}, dns.RcodeSuccess
+}
+
+func lookupPTR(ctx context.Context, r RecordResolver, name, fqdn string, ttl uint32) ([]dns.RR, int) {
+	addr, ok := ptrQueryAddr(name)
+	if !ok {
+		return nil, dns.RcodeSuccess
+	}
+	names, err := r.LookupAddr(ctx, addr)
+	if err != nil {
+		return nil, rcodeForRecordError(err)
+	}
+	rrs := make([]dns.RR, 0, len(names))
+	for _, n := range names {
+		rrs = append(rrs, &dns.PTR{
+			Hdr: recordHeader(fqdn, dns.TypePTR, ttl),
+			Ptr: dns.Fqdn(n),
+		})
+	}
+	return rrs, dns.RcodeSuccess
+}
+
+// ptrQueryAddr converts a reverse-DNS query name (in-addr.arpa or ip6.arpa)
+// into the address string expected by net.Resolver.LookupAddr. It reports false
+// when the name is not a well-formed reverse name.
+func ptrQueryAddr(qname string) (string, bool) {
+	name := strings.TrimSuffix(strings.ToLower(dns.Fqdn(qname)), ".")
+
+	switch {
+	case strings.HasSuffix(name, ".in-addr.arpa"):
+		return parseInAddrArpa(strings.TrimSuffix(name, ".in-addr.arpa"))
+	case strings.HasSuffix(name, ".ip6.arpa"):
+		return parseIP6Arpa(strings.TrimSuffix(name, ".ip6.arpa"))
+	default:
+		return "", false
+	}
+}
+
+// parseInAddrArpa turns the label portion of an in-addr.arpa name into an IPv4
+// address string, reporting false when it is not a well-formed reverse name.
+func parseInAddrArpa(labelPart string) (string, bool) {
+	labels := strings.Split(labelPart, ".")
+	if len(labels) != 4 {
+		return "", false
+	}
+	slices.Reverse(labels)
+	addr, err := netip.ParseAddr(strings.Join(labels, "."))
+	if err != nil || !addr.Is4() {
+		return "", false
+	}
+	return addr.String(), true
+}
+
+// parseIP6Arpa turns the nibble portion of an ip6.arpa name into an IPv6
+// address string, reporting false when it is not a well-formed reverse name.
+func parseIP6Arpa(nibblePart string) (string, bool) {
+	nibbles := strings.Split(nibblePart, ".")
+	if len(nibbles) != 32 {
+		return "", false
+	}
+	slices.Reverse(nibbles)
+	var sb strings.Builder
+	for i, n := range nibbles {
+		if i > 0 && i%4 == 0 {
+			sb.WriteByte(':')
+		}
+		sb.WriteString(n)
+	}
+	addr, err := netip.ParseAddr(sb.String())
+	if err != nil || !addr.Is6() {
+		return "", false
+	}
+	return addr.String(), true
+}
+
+// rcodeForRecordError maps a non-address lookup error to a DNS rcode. A
+// not-found result becomes NODATA rather than NXDOMAIN: net.DNSError.IsNotFound
+// does not distinguish a missing name from a name that exists only with records
+// of other types, so the name cannot be proven absent and must not be poisoned.
+func rcodeForRecordError(err error) int {
+	var dnsErr *net.DNSError
+	if errors.As(err, &dnsErr) && dnsErr.IsNotFound {
+		return dns.RcodeSuccess
+	}
+	return dns.RcodeServerFailure
+}
+
+// chunkTXT splits a TXT string into character-strings no longer than 255 bytes
+// so the record can be packed. The chunks form one TXT resource record.
+func chunkTXT(s string) []string {
+	const maxLen = 255
+	if len(s) <= maxLen {
+		return []string{s}
+	}
+
+	var chunks []string
+	for len(s) > maxLen {
+		chunks = append(chunks, s[:maxLen])
+		s = s[maxLen:]
+	}
+	if len(s) > 0 {
+		chunks = append(chunks, s)
+	}
+	return chunks
+}
+
 // FormatAnswers formats DNS resource records for logging.
 func FormatAnswers(answers []dns.RR) string {
 	if len(answers) == 0 {
--- a/client/internal/dns/resutil/resolve_test.go
+++ b/client/internal/dns/resutil/resolve_test.go
@@ -5,6 +5,7 @@ import (
 	"errors"
 	"net"
 	"net/netip"
+	"strings"
 	"testing"

 	"github.com/miekg/dns"
@@ -121,6 +122,164 @@ func TestLookupIP_DNSErrorNotIsNotFound(t *testing.T) {
 	assert.Equal(t, dns.RcodeServerFailure, result.Rcode, "upstream failure should map to SERVFAIL")
 }

+func TestPtrQueryAddr(t *testing.T) {
+	tests := []struct {
+		name   string
+		qname  string
+		want   string
+		wantOK bool
+	}{
+		{name: "ipv4", qname: "4.3.2.1.in-addr.arpa.", want: "1.2.3.4", wantOK: true},
+		{name: "ipv4 no trailing dot", qname: "1.0.0.127.in-addr.arpa", want: "127.0.0.1", wantOK: true},
+		{
+			name:   "ipv6",
+			qname:  "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d.0.1.0.0.2.ip6.arpa.",
+			want:   "2001:db8::1",
+			wantOK: true,
+		},
+		{name: "ipv4 wrong label count", qname: "2.1.in-addr.arpa.", wantOK: false},
+		{name: "ipv6 wrong nibble count", qname: "1.0.ip6.arpa.", wantOK: false},
+		{name: "not a reverse name", qname: "example.com.", wantOK: false},
+		{name: "ipv4 bad octet", qname: "4.3.2.999.in-addr.arpa.", wantOK: false},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got, ok := ptrQueryAddr(tt.qname)
+			assert.Equal(t, tt.wantOK, ok, "parse success mismatch")
+			if tt.wantOK {
+				assert.Equal(t, tt.want, got, "parsed address mismatch")
+			}
+		})
+	}
+}
+
+type mockRecordResolver struct {
+	mx    []*net.MX
+	txt   []string
+	ns    []*net.NS
+	srv   []*net.SRV
+	cname string
+	ptr   []string
+	err   error
+}
+
+func (m *mockRecordResolver) LookupMX(context.Context, string) ([]*net.MX, error) {
+	return m.mx, m.err
+}
+func (m *mockRecordResolver) LookupTXT(context.Context, string) ([]string, error) {
+	return m.txt, m.err
+}
+func (m *mockRecordResolver) LookupNS(context.Context, string) ([]*net.NS, error) {
+	return m.ns, m.err
+}
+func (m *mockRecordResolver) LookupSRV(context.Context, string, string, string) (string, []*net.SRV, error) {
+	return "", m.srv, m.err
+}
+func (m *mockRecordResolver) LookupCNAME(context.Context, string) (string, error) {
+	return m.cname, m.err
+}
+func (m *mockRecordResolver) LookupAddr(context.Context, string) ([]string, error) {
+	return m.ptr, m.err
+}
+
+func TestLookupRecords(t *testing.T) {
+	notFound := &net.DNSError{IsNotFound: true, Name: "example.com."}
+
+	t.Run("MX success", func(t *testing.T) {
+		r := &mockRecordResolver{mx: []*net.MX{{Host: "mail.example.com.", Pref: 10}}}
+		rrs, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypeMX, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		require.Len(t, rrs, 1)
+		assert.Equal(t, "mail.example.com.", rrs[0].(*dns.MX).Mx)
+	})
+
+	t.Run("TXT short string is one character-string", func(t *testing.T) {
+		r := &mockRecordResolver{txt: []string{"v=spf1 -all"}}
+		rrs, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypeTXT, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		require.Len(t, rrs, 1)
+		assert.Equal(t, []string{"v=spf1 -all"}, rrs[0].(*dns.TXT).Txt)
+	})
+
+	t.Run("TXT chunks long strings", func(t *testing.T) {
+		long := strings.Repeat("a", 300)
+		r := &mockRecordResolver{txt: []string{long}}
+		rrs, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypeTXT, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		require.Len(t, rrs, 1)
+		txt := rrs[0].(*dns.TXT).Txt
+		require.Len(t, txt, 2, "300-byte string should split into two character-strings")
+		assert.Equal(t, 255, len(txt[0]))
+		assert.Equal(t, 45, len(txt[1]))
+	})
+
+	t.Run("NS success", func(t *testing.T) {
+		r := &mockRecordResolver{ns: []*net.NS{{Host: "ns1.example.com."}}}
+		rrs, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypeNS, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		require.Len(t, rrs, 1)
+		assert.Equal(t, "ns1.example.com.", rrs[0].(*dns.NS).Ns)
+	})
+
+	t.Run("SRV success", func(t *testing.T) {
+		r := &mockRecordResolver{srv: []*net.SRV{{Target: "sip.example.com.", Port: 5060}}}
+		rrs, rcode := LookupRecords(context.Background(), r, "_sip._tcp.example.com.", dns.TypeSRV, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		require.Len(t, rrs, 1)
+		assert.Equal(t, uint16(5060), rrs[0].(*dns.SRV).Port)
+	})
+
+	t.Run("CNAME success", func(t *testing.T) {
+		r := &mockRecordResolver{cname: "target.example.com."}
+		rrs, rcode := LookupRecords(context.Background(), r, "www.example.com.", dns.TypeCNAME, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		require.Len(t, rrs, 1)
+		assert.Equal(t, "target.example.com.", rrs[0].(*dns.CNAME).Target)
+	})
+
+	t.Run("CNAME equal to name is NODATA", func(t *testing.T) {
+		r := &mockRecordResolver{cname: "example.com."}
+		rrs, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypeCNAME, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		assert.Empty(t, rrs, "self-referential CNAME is NODATA")
+	})
+
+	t.Run("PTR success", func(t *testing.T) {
+		r := &mockRecordResolver{ptr: []string{"host.example.com."}}
+		rrs, rcode := LookupRecords(context.Background(), r, "4.3.2.1.in-addr.arpa.", dns.TypePTR, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		require.Len(t, rrs, 1)
+		assert.Equal(t, "host.example.com.", rrs[0].(*dns.PTR).Ptr)
+	})
+
+	t.Run("PTR malformed name is NODATA", func(t *testing.T) {
+		r := &mockRecordResolver{}
+		rrs, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypePTR, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		assert.Empty(t, rrs)
+	})
+
+	t.Run("not found is NODATA never NXDOMAIN", func(t *testing.T) {
+		r := &mockRecordResolver{err: notFound}
+		_, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypeMX, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode, "missing record must not poison the name")
+	})
+
+	t.Run("server failure maps to SERVFAIL", func(t *testing.T) {
+		r := &mockRecordResolver{err: &net.DNSError{Err: "server misbehaving", IsTemporary: true}}
+		_, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypeMX, 300)
+		assert.Equal(t, dns.RcodeServerFailure, rcode)
+	})
+
+	t.Run("unsupported type is NODATA", func(t *testing.T) {
+		r := &mockRecordResolver{}
+		rrs, rcode := LookupRecords(context.Background(), r, "example.com.", dns.TypeCAA, 300)
+		assert.Equal(t, dns.RcodeSuccess, rcode)
+		assert.Empty(t, rrs)
+	})
+}
+
 func TestStripOPT(t *testing.T) {
 	rm := &dns.Msg{
 		Extra: []dns.RR{
--- a/client/internal/dns/server_privileged_test.go
+++ b/client/internal/dns/server_privileged_test.go
@@ -0,0 +1,485 @@
+//go:build privileged
+
+package dns
+
+import (
+	"context"
+	"fmt"
+	"net/netip"
+	"os"
+	"testing"
+
+	"github.com/golang/mock/gomock"
+	"github.com/miekg/dns"
+	"github.com/stretchr/testify/assert"
+	"golang.zx2c4.com/wireguard/wgctrl/wgtypes"
+
+	"github.com/netbirdio/netbird/client/iface"
+	pfmock "github.com/netbirdio/netbird/client/iface/mocks"
+	"github.com/netbirdio/netbird/client/iface/wgaddr"
+	"github.com/netbirdio/netbird/client/internal/dns/local"
+	"github.com/netbirdio/netbird/client/internal/dns/test"
+	"github.com/netbirdio/netbird/client/internal/peer"
+	"github.com/netbirdio/netbird/client/internal/stdnet"
+	nbdns "github.com/netbirdio/netbird/dns"
+)
+
+func TestUpdateDNSServer(t *testing.T) {
+
+	nameServers := []nbdns.NameServer{
+		{
+			IP:     netip.MustParseAddr("8.8.8.8"),
+			NSType: nbdns.UDPNameServerType,
+			Port:   53,
+		},
+		{
+			IP:     netip.MustParseAddr("8.8.4.4"),
+			NSType: nbdns.UDPNameServerType,
+			Port:   53,
+		},
+	}
+
+	testCases := []struct {
+		name                string
+		initUpstreamMap     []handlerWrapper
+		initLocalZones      []nbdns.CustomZone
+		initSerial          uint64
+		inputSerial         uint64
+		inputUpdate         nbdns.Config
+		shouldFail          bool
+		expectedUpstreamMap []handlerWrapper
+		expectedLocalQs     []dns.Question
+	}{
+		{
+			name:            "Initial Config Should Succeed",
+			initUpstreamMap: nil,
+			initSerial:      0,
+			inputSerial:     1,
+			inputUpdate: nbdns.Config{
+				ServiceEnable: true,
+				CustomZones: []nbdns.CustomZone{
+					{
+						Domain:  "netbird.cloud",
+						Records: zoneRecords,
+					},
+				},
+				NameServerGroups: []*nbdns.NameServerGroup{
+					{
+						Domains:     []string{"netbird.io"},
+						NameServers: nameServers,
+					},
+					{
+						NameServers: nameServers,
+						Primary:     true,
+					},
+				},
+			},
+			expectedUpstreamMap: []handlerWrapper{
+				{
+					domain:   "netbird.io",
+					priority: PriorityUpstream,
+				},
+				{
+					domain:   "netbird.cloud",
+					priority: PriorityLocal,
+				},
+				{
+					domain:   nbdns.RootZone,
+					priority: PriorityDefault,
+				},
+			},
+			expectedLocalQs: []dns.Question{{Name: "peera.netbird.cloud.", Qtype: dns.TypeA, Qclass: dns.ClassINET}},
+		},
+		{
+			name:           "New Config Should Succeed",
+			initLocalZones: []nbdns.CustomZone{{Domain: "netbird.cloud", Records: []nbdns.SimpleRecord{{Name: "netbird.cloud", Type: 1, Class: nbdns.DefaultClass, TTL: 300, RData: "10.0.0.1"}}}},
+			initUpstreamMap: []handlerWrapper{
+				{
+					domain:   "netbird.cloud",
+					handler:  &mockHandler{},
+					priority: PriorityUpstream,
+				},
+			},
+			initSerial:  0,
+			inputSerial: 1,
+			inputUpdate: nbdns.Config{
+				ServiceEnable: true,
+				CustomZones: []nbdns.CustomZone{
+					{
+						Domain:  "netbird.cloud",
+						Records: zoneRecords,
+					},
+				},
+				NameServerGroups: []*nbdns.NameServerGroup{
+					{
+						Domains:     []string{"netbird.io"},
+						NameServers: nameServers,
+					},
+				},
+			},
+			expectedUpstreamMap: []handlerWrapper{
+				{
+					domain:   "netbird.io",
+					priority: PriorityUpstream,
+				},
+				{
+					domain:   "netbird.cloud",
+					priority: PriorityLocal,
+				},
+			},
+			expectedLocalQs: []dns.Question{{Name: zoneRecords[0].Name, Qtype: 1, Qclass: 1}},
+		},
+		{
+			name:            "Smaller Config Serial Should Be Skipped",
+			initLocalZones:  []nbdns.CustomZone{},
+			initUpstreamMap: nil,
+			initSerial:      2,
+			inputSerial:     1,
+			shouldFail:      true,
+		},
+		{
+			name:            "Empty NS Group Domain Or Not Primary Element Should Fail",
+			initLocalZones:  []nbdns.CustomZone{},
+			initUpstreamMap: nil,
+			initSerial:      0,
+			inputSerial:     1,
+			inputUpdate: nbdns.Config{
+				ServiceEnable: true,
+				CustomZones: []nbdns.CustomZone{
+					{
+						Domain:  "netbird.cloud",
+						Records: zoneRecords,
+					},
+				},
+				NameServerGroups: []*nbdns.NameServerGroup{
+					{
+						NameServers: nameServers,
+					},
+				},
+			},
+			shouldFail: true,
+		},
+		{
+			name:            "Invalid NS Group Nameservers list Should Fail",
+			initLocalZones:  []nbdns.CustomZone{},
+			initUpstreamMap: nil,
+			initSerial:      0,
+			inputSerial:     1,
+			inputUpdate: nbdns.Config{
+				ServiceEnable: true,
+				CustomZones: []nbdns.CustomZone{
+					{
+						Domain:  "netbird.cloud",
+						Records: zoneRecords,
+					},
+				},
+				NameServerGroups: []*nbdns.NameServerGroup{
+					{
+						NameServers: nameServers,
+					},
+				},
+			},
+			shouldFail: true,
+		},
+		{
+			name:            "Invalid Custom Zone Records list Should Skip",
+			initLocalZones:  []nbdns.CustomZone{},
+			initUpstreamMap: nil,
+			initSerial:      0,
+			inputSerial:     1,
+			inputUpdate: nbdns.Config{
+				ServiceEnable: true,
+				CustomZones: []nbdns.CustomZone{
+					{
+						Domain: "netbird.cloud",
+					},
+				},
+				NameServerGroups: []*nbdns.NameServerGroup{
+					{
+						NameServers: nameServers,
+						Primary:     true,
+					},
+				},
+			},
+			expectedUpstreamMap: []handlerWrapper{{
+				domain:   ".",
+				priority: PriorityDefault,
+			}},
+		},
+		{
+			name:           "Empty Config Should Succeed and Clean Maps",
+			initLocalZones: []nbdns.CustomZone{{Domain: "netbird.cloud", Records: []nbdns.SimpleRecord{{Name: "netbird.cloud", Type: int(dns.TypeA), Class: nbdns.DefaultClass, TTL: 300, RData: "10.0.0.1"}}}},
+			initUpstreamMap: []handlerWrapper{
+				{
+					domain:   zoneRecords[0].Name,
+					handler:  &mockHandler{},
+					priority: PriorityUpstream,
+				},
+			},
+			initSerial:          0,
+			inputSerial:         1,
+			inputUpdate:         nbdns.Config{ServiceEnable: true},
+			expectedUpstreamMap: nil,
+			expectedLocalQs:     []dns.Question{},
+		},
+		{
+			name:           "Disabled Service Should clean map",
+			initLocalZones: []nbdns.CustomZone{{Domain: "netbird.cloud", Records: []nbdns.SimpleRecord{{Name: "netbird.cloud", Type: int(dns.TypeA), Class: nbdns.DefaultClass, TTL: 300, RData: "10.0.0.1"}}}},
+			initUpstreamMap: []handlerWrapper{
+				{
+					domain:   zoneRecords[0].Name,
+					handler:  &mockHandler{},
+					priority: PriorityUpstream,
+				},
+			},
+			initSerial:          0,
+			inputSerial:         1,
+			inputUpdate:         nbdns.Config{ServiceEnable: false},
+			expectedUpstreamMap: nil,
+			expectedLocalQs:     []dns.Question{},
+		},
+	}
+
+	for n, testCase := range testCases {
+		t.Run(testCase.name, func(t *testing.T) {
+			privKey, _ := wgtypes.GenerateKey()
+			newNet, err := stdnet.NewNet(context.Background(), nil)
+			if err != nil {
+				t.Fatal(err)
+			}
+
+			opts := iface.WGIFaceOpts{
+				IFaceName:    fmt.Sprintf("utun230%d", n),
+				Address:      wgaddr.MustParseWGAddress(fmt.Sprintf("100.66.100.%d/32", n+1)),
+				WGPort:       33100,
+				WGPrivKey:    privKey.String(),
+				MTU:          iface.DefaultMTU,
+				TransportNet: newNet,
+			}
+
+			wgIface, err := iface.NewWGIFace(opts)
+			if err != nil {
+				t.Fatal(err)
+			}
+			err = wgIface.Create()
+			if err != nil {
+				t.Fatal(err)
+			}
+			defer func() {
+				err = wgIface.Close()
+				if err != nil {
+					t.Log(err)
+				}
+			}()
+			dnsServer, err := NewDefaultServer(context.Background(), DefaultServerConfig{
+				WgInterface:    wgIface,
+				CustomAddress:  "",
+				StatusRecorder: peer.NewRecorder("mgm"),
+				StateManager:   nil,
+				DisableSys:     false,
+			})
+			if err != nil {
+				t.Fatal(err)
+			}
+			err = dnsServer.Initialize()
+			if err != nil {
+				t.Fatal(err)
+			}
+			defer func() {
+				err = dnsServer.hostManager.restoreHostDNS()
+				if err != nil {
+					t.Log(err)
+				}
+			}()
+
+			dnsServer.dnsMuxHandlers = testCase.initUpstreamMap
+			dnsServer.localResolver.Update(testCase.initLocalZones)
+			dnsServer.updateSerial = testCase.initSerial
+
+			err = dnsServer.UpdateDNSServer(testCase.inputSerial, testCase.inputUpdate)
+			if err != nil {
+				if testCase.shouldFail {
+					return
+				}
+				t.Fatalf("update dns server should not fail, got error: %v", err)
+			}
+
+			if len(dnsServer.dnsMuxHandlers) != len(testCase.expectedUpstreamMap) {
+				t.Fatalf("update upstream failed, map size is different than expected, want %d, got %d", len(testCase.expectedUpstreamMap), len(dnsServer.dnsMuxHandlers))
+			}
+
+			for _, expected := range testCase.expectedUpstreamMap {
+				found := false
+				for _, got := range dnsServer.dnsMuxHandlers {
+					if got.domain == expected.domain && got.priority == expected.priority {
+						found = true
+						break
+					}
+				}
+				if !found {
+					t.Fatalf("update upstream failed, handler for domain=%s priority=%d not found in dnsMuxHandlers: %#v", expected.domain, expected.priority, dnsServer.dnsMuxHandlers)
+				}
+			}
+
+			var responseMSG *dns.Msg
+			responseWriter := &test.MockResponseWriter{
+				WriteMsgFunc: func(m *dns.Msg) error {
+					responseMSG = m
+					return nil
+				},
+			}
+			for _, q := range testCase.expectedLocalQs {
+				dnsServer.localResolver.ServeDNS(responseWriter, &dns.Msg{
+					Question: []dns.Question{q},
+				})
+			}
+
+			if len(testCase.expectedLocalQs) > 0 {
+				assert.NotNil(t, responseMSG, "response message should not be nil")
+				assert.Equal(t, dns.RcodeSuccess, responseMSG.Rcode, "response code should be success")
+				assert.NotEmpty(t, responseMSG.Answer, "response message should have answers")
+			}
+		})
+	}
+}
+
+func TestDNSFakeResolverHandleUpdates(t *testing.T) {
+	ov := os.Getenv("NB_WG_KERNEL_DISABLED")
+	defer t.Setenv("NB_WG_KERNEL_DISABLED", ov)
+
+	t.Setenv("NB_WG_KERNEL_DISABLED", "true")
+	newNet, err := stdnet.NewNet(context.Background(), []string{"utun2301"})
+	if err != nil {
+		t.Errorf("create stdnet: %v", err)
+		return
+	}
+
+	privKey, _ := wgtypes.GeneratePrivateKey()
+	opts := iface.WGIFaceOpts{
+		IFaceName:    "utun2301",
+		Address:      wgaddr.MustParseWGAddress("100.66.100.1/32"),
+		WGPort:       33100,
+		WGPrivKey:    privKey.String(),
+		MTU:          iface.DefaultMTU,
+		TransportNet: newNet,
+	}
+	wgIface, err := iface.NewWGIFace(opts)
+	if err != nil {
+		t.Errorf("build interface wireguard: %v", err)
+		return
+	}
+
+	err = wgIface.Create()
+	if err != nil {
+		t.Errorf("create and init wireguard interface: %v", err)
+		return
+	}
+	defer func() {
+		if err = wgIface.Close(); err != nil {
+			t.Logf("close wireguard interface: %v", err)
+		}
+	}()
+
+	ctrl := gomock.NewController(t)
+	defer ctrl.Finish()
+
+	packetfilter := pfmock.NewMockPacketFilter(ctrl)
+	packetfilter.EXPECT().FilterOutbound(gomock.Any(), gomock.Any()).AnyTimes()
+	packetfilter.EXPECT().SetUDPPacketHook(gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes()
+	packetfilter.EXPECT().SetTCPPacketHook(gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes()
+
+	if err := wgIface.SetFilter(packetfilter); err != nil {
+		t.Errorf("set packet filter: %v", err)
+		return
+	}
+
+	dnsServer, err := NewDefaultServer(context.Background(), DefaultServerConfig{
+		WgInterface:    wgIface,
+		CustomAddress:  "",
+		StatusRecorder: peer.NewRecorder("mgm"),
+		StateManager:   nil,
+		DisableSys:     false,
+	})
+	if err != nil {
+		t.Errorf("create DNS server: %v", err)
+		return
+	}
+
+	err = dnsServer.Initialize()
+	if err != nil {
+		t.Errorf("run DNS server: %v", err)
+		return
+	}
+	defer func() {
+		if err = dnsServer.hostManager.restoreHostDNS(); err != nil {
+			t.Logf("restore DNS settings on the host: %v", err)
+			return
+		}
+	}()
+
+	dnsServer.dnsMuxHandlers = []handlerWrapper{
+		{
+			domain:   zoneRecords[0].Name,
+			handler:  &local.Resolver{},
+			priority: PriorityUpstream,
+		},
+	}
+	dnsServer.localResolver.Update([]nbdns.CustomZone{{Domain: "netbird.cloud", Records: []nbdns.SimpleRecord{{Name: "netbird.cloud", Type: int(dns.TypeA), Class: nbdns.DefaultClass, TTL: 300, RData: "10.0.0.1"}}}})
+	dnsServer.updateSerial = 0
+
+	nameServers := []nbdns.NameServer{
+		{
+			IP:     netip.MustParseAddr("8.8.8.8"),
+			NSType: nbdns.UDPNameServerType,
+			Port:   53,
+		},
+		{
+			IP:     netip.MustParseAddr("8.8.4.4"),
+			NSType: nbdns.UDPNameServerType,
+			Port:   53,
+		},
+	}
+
+	update := nbdns.Config{
+		ServiceEnable: true,
+		CustomZones: []nbdns.CustomZone{
+			{
+				Domain:  "netbird.cloud",
+				Records: zoneRecords,
+			},
+		},
+		NameServerGroups: []*nbdns.NameServerGroup{
+			{
+				Domains:     []string{"netbird.io"},
+				NameServers: nameServers,
+			},
+			{
+				NameServers: nameServers,
+				Primary:     true,
+			},
+		},
+	}
+
+	// Start the server with regular configuration
+	if err := dnsServer.UpdateDNSServer(1, update); err != nil {
+		t.Fatalf("update dns server should not fail, got error: %v", err)
+		return
+	}
+
+	update2 := update
+	update2.ServiceEnable = false
+	// Disable the server, stop the listener
+	if err := dnsServer.UpdateDNSServer(2, update2); err != nil {
+		t.Fatalf("update dns server should not fail, got error: %v", err)
+		return
+	}
+
+	update3 := update2
+	update3.NameServerGroups = update3.NameServerGroups[:1]
+	// But service still get updates and we checking that we handle
+	// internal state in the right way
+	if err := dnsServer.UpdateDNSServer(3, update3); err != nil {
+		t.Fatalf("update dns server should not fail, got error: %v", err)
+		return
+	}
+}
--- a/client/internal/dns/server_test.go
+++ b/client/internal/dns/server_test.go
@@ -10,7 +10,6 @@ import (
 	"testing"
 	"time"

-	"github.com/golang/mock/gomock"
 	"github.com/miekg/dns"
 	log "github.com/sirupsen/logrus"
 	"github.com/stretchr/testify/assert"
@@ -23,7 +22,6 @@ import (
 	"github.com/netbirdio/netbird/client/iface"
 	"github.com/netbirdio/netbird/client/iface/configurer"
 	"github.com/netbirdio/netbird/client/iface/device"
-	pfmock "github.com/netbirdio/netbird/client/iface/mocks"
 	"github.com/netbirdio/netbird/client/iface/wgaddr"
 	"github.com/netbirdio/netbird/client/internal/dns/local"
 	"github.com/netbirdio/netbird/client/internal/dns/test"
@@ -104,466 +102,6 @@ func init() {
 	formatter.SetTextFormatter(log.StandardLogger())
 }

-func TestUpdateDNSServer(t *testing.T) {
-
-	nameServers := []nbdns.NameServer{
-		{
-			IP:     netip.MustParseAddr("8.8.8.8"),
-			NSType: nbdns.UDPNameServerType,
-			Port:   53,
-		},
-		{
-			IP:     netip.MustParseAddr("8.8.4.4"),
-			NSType: nbdns.UDPNameServerType,
-			Port:   53,
-		},
-	}
-
-	testCases := []struct {
-		name                string
-		initUpstreamMap     []handlerWrapper
-		initLocalZones      []nbdns.CustomZone
-		initSerial          uint64
-		inputSerial         uint64
-		inputUpdate         nbdns.Config
-		shouldFail          bool
-		expectedUpstreamMap []handlerWrapper
-		expectedLocalQs     []dns.Question
-	}{
-		{
-			name:            "Initial Config Should Succeed",
-			initUpstreamMap: nil,
-			initSerial:      0,
-			inputSerial:     1,
-			inputUpdate: nbdns.Config{
-				ServiceEnable: true,
-				CustomZones: []nbdns.CustomZone{
-					{
-						Domain:  "netbird.cloud",
-						Records: zoneRecords,
-					},
-				},
-				NameServerGroups: []*nbdns.NameServerGroup{
-					{
-						Domains:     []string{"netbird.io"},
-						NameServers: nameServers,
-					},
-					{
-						NameServers: nameServers,
-						Primary:     true,
-					},
-				},
-			},
-			expectedUpstreamMap: []handlerWrapper{
-				{
-					domain:   "netbird.io",
-					priority: PriorityUpstream,
-				},
-				{
-					domain:   "netbird.cloud",
-					priority: PriorityLocal,
-				},
-				{
-					domain:   nbdns.RootZone,
-					priority: PriorityDefault,
-				},
-			},
-			expectedLocalQs: []dns.Question{{Name: "peera.netbird.cloud.", Qtype: dns.TypeA, Qclass: dns.ClassINET}},
-		},
-		{
-			name:           "New Config Should Succeed",
-			initLocalZones: []nbdns.CustomZone{{Domain: "netbird.cloud", Records: []nbdns.SimpleRecord{{Name: "netbird.cloud", Type: 1, Class: nbdns.DefaultClass, TTL: 300, RData: "10.0.0.1"}}}},
-			initUpstreamMap: []handlerWrapper{
-				{
-					domain:   "netbird.cloud",
-					handler:  &mockHandler{},
-					priority: PriorityUpstream,
-				},
-			},
-			initSerial:  0,
-			inputSerial: 1,
-			inputUpdate: nbdns.Config{
-				ServiceEnable: true,
-				CustomZones: []nbdns.CustomZone{
-					{
-						Domain:  "netbird.cloud",
-						Records: zoneRecords,
-					},
-				},
-				NameServerGroups: []*nbdns.NameServerGroup{
-					{
-						Domains:     []string{"netbird.io"},
-						NameServers: nameServers,
-					},
-				},
-			},
-			expectedUpstreamMap: []handlerWrapper{
-				{
-					domain:   "netbird.io",
-					priority: PriorityUpstream,
-				},
-				{
-					domain:   "netbird.cloud",
-					priority: PriorityLocal,
-				},
-			},
-			expectedLocalQs: []dns.Question{{Name: zoneRecords[0].Name, Qtype: 1, Qclass: 1}},
-		},
-		{
-			name:            "Smaller Config Serial Should Be Skipped",
-			initLocalZones:  []nbdns.CustomZone{},
-			initUpstreamMap: nil,
-			initSerial:      2,
-			inputSerial:     1,
-			shouldFail:      true,
-		},
-		{
-			name:            "Empty NS Group Domain Or Not Primary Element Should Fail",
-			initLocalZones:  []nbdns.CustomZone{},
-			initUpstreamMap: nil,
-			initSerial:      0,
-			inputSerial:     1,
-			inputUpdate: nbdns.Config{
-				ServiceEnable: true,
-				CustomZones: []nbdns.CustomZone{
-					{
-						Domain:  "netbird.cloud",
-						Records: zoneRecords,
-					},
-				},
-				NameServerGroups: []*nbdns.NameServerGroup{
-					{
-						NameServers: nameServers,
-					},
-				},
-			},
-			shouldFail: true,
-		},
-		{
-			name:            "Invalid NS Group Nameservers list Should Fail",
-			initLocalZones:  []nbdns.CustomZone{},
-			initUpstreamMap: nil,
-			initSerial:      0,
-			inputSerial:     1,
-			inputUpdate: nbdns.Config{
-				ServiceEnable: true,
-				CustomZones: []nbdns.CustomZone{
-					{
-						Domain:  "netbird.cloud",
-						Records: zoneRecords,
-					},
-				},
-				NameServerGroups: []*nbdns.NameServerGroup{
-					{
-						NameServers: nameServers,
-					},
-				},
-			},
-			shouldFail: true,
-		},
-		{
-			name:            "Invalid Custom Zone Records list Should Skip",
-			initLocalZones:  []nbdns.CustomZone{},
-			initUpstreamMap: nil,
-			initSerial:      0,
-			inputSerial:     1,
-			inputUpdate: nbdns.Config{
-				ServiceEnable: true,
-				CustomZones: []nbdns.CustomZone{
-					{
-						Domain: "netbird.cloud",
-					},
-				},
-				NameServerGroups: []*nbdns.NameServerGroup{
-					{
-						NameServers: nameServers,
-						Primary:     true,
-					},
-				},
-			},
-			expectedUpstreamMap: []handlerWrapper{{
-				domain:   ".",
-				priority: PriorityDefault,
-			}},
-		},
-		{
-			name:           "Empty Config Should Succeed and Clean Maps",
-			initLocalZones: []nbdns.CustomZone{{Domain: "netbird.cloud", Records: []nbdns.SimpleRecord{{Name: "netbird.cloud", Type: int(dns.TypeA), Class: nbdns.DefaultClass, TTL: 300, RData: "10.0.0.1"}}}},
-			initUpstreamMap: []handlerWrapper{
-				{
-					domain:   zoneRecords[0].Name,
-					handler:  &mockHandler{},
-					priority: PriorityUpstream,
-				},
-			},
-			initSerial:          0,
-			inputSerial:         1,
-			inputUpdate:         nbdns.Config{ServiceEnable: true},
-			expectedUpstreamMap: nil,
-			expectedLocalQs:     []dns.Question{},
-		},
-		{
-			name:           "Disabled Service Should clean map",
-			initLocalZones: []nbdns.CustomZone{{Domain: "netbird.cloud", Records: []nbdns.SimpleRecord{{Name: "netbird.cloud", Type: int(dns.TypeA), Class: nbdns.DefaultClass, TTL: 300, RData: "10.0.0.1"}}}},
-			initUpstreamMap: []handlerWrapper{
-				{
-					domain:   zoneRecords[0].Name,
-					handler:  &mockHandler{},
-					priority: PriorityUpstream,
-				},
-			},
-			initSerial:          0,
-			inputSerial:         1,
-			inputUpdate:         nbdns.Config{ServiceEnable: false},
-			expectedUpstreamMap: nil,
-			expectedLocalQs:     []dns.Question{},
-		},
-	}
-
-	for n, testCase := range testCases {
-		t.Run(testCase.name, func(t *testing.T) {
-			privKey, _ := wgtypes.GenerateKey()
-			newNet, err := stdnet.NewNet(context.Background(), nil)
-			if err != nil {
-				t.Fatal(err)
-			}
-
-			opts := iface.WGIFaceOpts{
-				IFaceName:    fmt.Sprintf("utun230%d", n),
-				Address:      wgaddr.MustParseWGAddress(fmt.Sprintf("100.66.100.%d/32", n+1)),
-				WGPort:       33100,
-				WGPrivKey:    privKey.String(),
-				MTU:          iface.DefaultMTU,
-				TransportNet: newNet,
-			}
-
-			wgIface, err := iface.NewWGIFace(opts)
-			if err != nil {
-				t.Fatal(err)
-			}
-			err = wgIface.Create()
-			if err != nil {
-				t.Fatal(err)
-			}
-			defer func() {
-				err = wgIface.Close()
-				if err != nil {
-					t.Log(err)
-				}
-			}()
-			dnsServer, err := NewDefaultServer(context.Background(), DefaultServerConfig{
-				WgInterface:    wgIface,
-				CustomAddress:  "",
-				StatusRecorder: peer.NewRecorder("mgm"),
-				StateManager:   nil,
-				DisableSys:     false,
-			})
-			if err != nil {
-				t.Fatal(err)
-			}
-			err = dnsServer.Initialize()
-			if err != nil {
-				t.Fatal(err)
-			}
-			defer func() {
-				err = dnsServer.hostManager.restoreHostDNS()
-				if err != nil {
-					t.Log(err)
-				}
-			}()
-
-			dnsServer.dnsMuxHandlers = testCase.initUpstreamMap
-			dnsServer.localResolver.Update(testCase.initLocalZones)
-			dnsServer.updateSerial = testCase.initSerial
-
-			err = dnsServer.UpdateDNSServer(testCase.inputSerial, testCase.inputUpdate)
-			if err != nil {
-				if testCase.shouldFail {
-					return
-				}
-				t.Fatalf("update dns server should not fail, got error: %v", err)
-			}
-
-			if len(dnsServer.dnsMuxHandlers) != len(testCase.expectedUpstreamMap) {
-				t.Fatalf("update upstream failed, map size is different than expected, want %d, got %d", len(testCase.expectedUpstreamMap), len(dnsServer.dnsMuxHandlers))
-			}
-
-			for _, expected := range testCase.expectedUpstreamMap {
-				found := false
-				for _, got := range dnsServer.dnsMuxHandlers {
-					if got.domain == expected.domain && got.priority == expected.priority {
-						found = true
-						break
-					}
-				}
-				if !found {
-					t.Fatalf("update upstream failed, handler for domain=%s priority=%d not found in dnsMuxHandlers: %#v", expected.domain, expected.priority, dnsServer.dnsMuxHandlers)
-				}
-			}
-
-			var responseMSG *dns.Msg
-			responseWriter := &test.MockResponseWriter{
-				WriteMsgFunc: func(m *dns.Msg) error {
-					responseMSG = m
-					return nil
-				},
-			}
-			for _, q := range testCase.expectedLocalQs {
-				dnsServer.localResolver.ServeDNS(responseWriter, &dns.Msg{
-					Question: []dns.Question{q},
-				})
-			}
-
-			if len(testCase.expectedLocalQs) > 0 {
-				assert.NotNil(t, responseMSG, "response message should not be nil")
-				assert.Equal(t, dns.RcodeSuccess, responseMSG.Rcode, "response code should be success")
-				assert.NotEmpty(t, responseMSG.Answer, "response message should have answers")
-			}
-		})
-	}
-}
-
-func TestDNSFakeResolverHandleUpdates(t *testing.T) {
-	ov := os.Getenv("NB_WG_KERNEL_DISABLED")
-	defer t.Setenv("NB_WG_KERNEL_DISABLED", ov)
-
-	t.Setenv("NB_WG_KERNEL_DISABLED", "true")
-	newNet, err := stdnet.NewNet(context.Background(), []string{"utun2301"})
-	if err != nil {
-		t.Errorf("create stdnet: %v", err)
-		return
-	}
-
-	privKey, _ := wgtypes.GeneratePrivateKey()
-	opts := iface.WGIFaceOpts{
-		IFaceName:    "utun2301",
-		Address:      wgaddr.MustParseWGAddress("100.66.100.1/32"),
-		WGPort:       33100,
-		WGPrivKey:    privKey.String(),
-		MTU:          iface.DefaultMTU,
-		TransportNet: newNet,
-	}
-	wgIface, err := iface.NewWGIFace(opts)
-	if err != nil {
-		t.Errorf("build interface wireguard: %v", err)
-		return
-	}
-
-	err = wgIface.Create()
-	if err != nil {
-		t.Errorf("create and init wireguard interface: %v", err)
-		return
-	}
-	defer func() {
-		if err = wgIface.Close(); err != nil {
-			t.Logf("close wireguard interface: %v", err)
-		}
-	}()
-
-	ctrl := gomock.NewController(t)
-	defer ctrl.Finish()
-
-	packetfilter := pfmock.NewMockPacketFilter(ctrl)
-	packetfilter.EXPECT().FilterOutbound(gomock.Any(), gomock.Any()).AnyTimes()
-	packetfilter.EXPECT().SetUDPPacketHook(gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes()
-	packetfilter.EXPECT().SetTCPPacketHook(gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes()
-
-	if err := wgIface.SetFilter(packetfilter); err != nil {
-		t.Errorf("set packet filter: %v", err)
-		return
-	}
-
-	dnsServer, err := NewDefaultServer(context.Background(), DefaultServerConfig{
-		WgInterface:    wgIface,
-		CustomAddress:  "",
-		StatusRecorder: peer.NewRecorder("mgm"),
-		StateManager:   nil,
-		DisableSys:     false,
-	})
-	if err != nil {
-		t.Errorf("create DNS server: %v", err)
-		return
-	}
-
-	err = dnsServer.Initialize()
-	if err != nil {
-		t.Errorf("run DNS server: %v", err)
-		return
-	}
-	defer func() {
-		if err = dnsServer.hostManager.restoreHostDNS(); err != nil {
-			t.Logf("restore DNS settings on the host: %v", err)
-			return
-		}
-	}()
-
-	dnsServer.dnsMuxHandlers = []handlerWrapper{
-		{
-			domain:   zoneRecords[0].Name,
-			handler:  &local.Resolver{},
-			priority: PriorityUpstream,
-		},
-	}
-	dnsServer.localResolver.Update([]nbdns.CustomZone{{Domain: "netbird.cloud", Records: []nbdns.SimpleRecord{{Name: "netbird.cloud", Type: int(dns.TypeA), Class: nbdns.DefaultClass, TTL: 300, RData: "10.0.0.1"}}}})
-	dnsServer.updateSerial = 0
-
-	nameServers := []nbdns.NameServer{
-		{
-			IP:     netip.MustParseAddr("8.8.8.8"),
-			NSType: nbdns.UDPNameServerType,
-			Port:   53,
-		},
-		{
-			IP:     netip.MustParseAddr("8.8.4.4"),
-			NSType: nbdns.UDPNameServerType,
-			Port:   53,
-		},
-	}
-
-	update := nbdns.Config{
-		ServiceEnable: true,
-		CustomZones: []nbdns.CustomZone{
-			{
-				Domain:  "netbird.cloud",
-				Records: zoneRecords,
-			},
-		},
-		NameServerGroups: []*nbdns.NameServerGroup{
-			{
-				Domains:     []string{"netbird.io"},
-				NameServers: nameServers,
-			},
-			{
-				NameServers: nameServers,
-				Primary:     true,
-			},
-		},
-	}
-
-	// Start the server with regular configuration
-	if err := dnsServer.UpdateDNSServer(1, update); err != nil {
-		t.Fatalf("update dns server should not fail, got error: %v", err)
-		return
-	}
-
-	update2 := update
-	update2.ServiceEnable = false
-	// Disable the server, stop the listener
-	if err := dnsServer.UpdateDNSServer(2, update2); err != nil {
-		t.Fatalf("update dns server should not fail, got error: %v", err)
-		return
-	}
-
-	update3 := update2
-	update3.NameServerGroups = update3.NameServerGroups[:1]
-	// But service still get updates and we checking that we handle
-	// internal state in the right way
-	if err := dnsServer.UpdateDNSServer(3, update3); err != nil {
-		t.Fatalf("update dns server should not fail, got error: %v", err)
-		return
-	}
-}
-
 func TestDNSServerStartStop(t *testing.T) {
 	testCases := []struct {
 		name     string
--- a/client/internal/dnsfwd/forwarder.go
+++ b/client/internal/dnsfwd/forwarder.go
@@ -37,6 +37,12 @@ const (

 type resolver interface {
 	LookupNetIP(ctx context.Context, network, host string) ([]netip.Addr, error)
+	LookupMX(ctx context.Context, name string) ([]*net.MX, error)
+	LookupTXT(ctx context.Context, name string) ([]string, error)
+	LookupNS(ctx context.Context, name string) ([]*net.NS, error)
+	LookupSRV(ctx context.Context, service, proto, name string) (string, []*net.SRV, error)
+	LookupCNAME(ctx context.Context, host string) (string, error)
+	LookupAddr(ctx context.Context, addr string) ([]string, error)
 }

 type firewaller interface {
@@ -210,12 +216,6 @@ func (f *DNSForwarder) handleDNSQuery(logger *log.Entry, w dns.ResponseWriter, q
 		qname, dns.TypeToString[question.Qtype], dns.ClassToString[question.Qclass])

 	resp := query.SetReply(query)
-	network := resutil.NetworkForQtype(question.Qtype)
-	if network == "" {
-		resp.Rcode = dns.RcodeNotImplemented
-		f.writeResponse(logger, w, resp, qname, startTime)
-		return
-	}

 	mostSpecificResId, matchingEntries := f.getMatchingEntries(strings.TrimSuffix(qname, "."))
 	if mostSpecificResId == "" {
@@ -227,9 +227,46 @@ func (f *DNSForwarder) handleDNSQuery(logger *log.Entry, w dns.ResponseWriter, q
 	ctx, cancel := context.WithTimeout(context.Background(), upstreamTimeout)
 	defer cancel()

+	reqHasEdns := query.IsEdns0() != nil
+
+	switch question.Qtype {
+	case dns.TypeA, dns.TypeAAAA:
+		f.handleAddressQuery(ctx, logger, w, resp, mostSpecificResId, matchingEntries, reqHasEdns, startTime)
+	case dns.TypeMX, dns.TypeTXT, dns.TypeNS, dns.TypeSRV, dns.TypeCNAME, dns.TypePTR:
+		f.handleRecordQuery(ctx, logger, w, resp, startTime)
+	default:
+		// The domain is routed here, so any other type is answered NODATA
+		// (NOERROR, empty answer) rather than falling back to a resolver that
+		// would poison the name with NXDOMAIN. The Extended DNS Error lets a
+		// client tell this capability-driven NODATA apart from an
+		// authoritative one. The OPT pseudo-record must not appear unless the
+		// query advertised EDNS0.
+		if reqHasEdns {
+			attachEDE(resp, dns.ExtendedErrorCodeNotSupported, "netbird forwarder: unsupported query type")
+		}
+		f.writeResponse(logger, w, resp, qname, startTime)
+	}
+}
+
+// handleAddressQuery resolves A/AAAA queries, programs the firewall sets and
+// resolved-IP state, and caches the answer for resilience on upstream failure.
+func (f *DNSForwarder) handleAddressQuery(
+	ctx context.Context,
+	logger *log.Entry,
+	w dns.ResponseWriter,
+	resp *dns.Msg,
+	mostSpecificResId route.ResID,
+	matchingEntries []*ForwarderEntry,
+	reqHasEdns bool,
+	startTime time.Time,
+) {
+	question := resp.Question[0]
+	qname := strings.ToLower(question.Name)
+
+	network := resutil.NetworkForQtype(question.Qtype)
 	result := resutil.LookupIP(ctx, f.resolver, network, qname, question.Qtype)
 	if result.Err != nil {
-		f.handleDNSError(ctx, logger, w, question, resp, qname, result, query.IsEdns0() != nil, startTime)
+		f.handleDNSError(ctx, logger, w, question, resp, qname, result, reqHasEdns, startTime)
 		return
 	}

@@ -240,6 +277,25 @@ func (f *DNSForwarder) handleDNSQuery(logger *log.Entry, w dns.ResponseWriter, q
 	f.writeResponse(logger, w, resp, qname, startTime)
 }

+// handleRecordQuery resolves non-address record types (MX, TXT, NS, SRV,
+// CNAME, PTR) through the host resolver. Missing records are answered NODATA so
+// the routed name is never poisoned with NXDOMAIN.
+func (f *DNSForwarder) handleRecordQuery(
+	ctx context.Context,
+	logger *log.Entry,
+	w dns.ResponseWriter,
+	resp *dns.Msg,
+	startTime time.Time,
+) {
+	question := resp.Question[0]
+	qname := strings.ToLower(question.Name)
+
+	records, rcode := resutil.LookupRecords(ctx, f.resolver, qname, question.Qtype, f.ttl)
+	resp.Rcode = rcode
+	resp.Answer = append(resp.Answer, records...)
+	f.writeResponse(logger, w, resp, qname, startTime)
+}
+
 func (f *DNSForwarder) writeResponse(logger *log.Entry, w dns.ResponseWriter, resp *dns.Msg, qname string, startTime time.Time) {
 	if err := w.WriteMsg(resp); err != nil {
 		logger.Errorf("failed to write DNS response: %v", err)
--- a/client/internal/dnsfwd/forwarder_test.go
+++ b/client/internal/dnsfwd/forwarder_test.go
@@ -133,6 +133,41 @@ func (m *MockResolver) LookupNetIP(ctx context.Context, network, host string) ([
 	return args.Get(0).([]netip.Addr), args.Error(1)
 }

+func (m *MockResolver) LookupMX(ctx context.Context, name string) ([]*net.MX, error) {
+	args := m.Called(ctx, name)
+	recs, _ := args.Get(0).([]*net.MX)
+	return recs, args.Error(1)
+}
+
+func (m *MockResolver) LookupTXT(ctx context.Context, name string) ([]string, error) {
+	args := m.Called(ctx, name)
+	recs, _ := args.Get(0).([]string)
+	return recs, args.Error(1)
+}
+
+func (m *MockResolver) LookupNS(ctx context.Context, name string) ([]*net.NS, error) {
+	args := m.Called(ctx, name)
+	recs, _ := args.Get(0).([]*net.NS)
+	return recs, args.Error(1)
+}
+
+func (m *MockResolver) LookupSRV(ctx context.Context, service, proto, name string) (string, []*net.SRV, error) {
+	args := m.Called(ctx, service, proto, name)
+	recs, _ := args.Get(1).([]*net.SRV)
+	return args.String(0), recs, args.Error(2)
+}
+
+func (m *MockResolver) LookupCNAME(ctx context.Context, host string) (string, error) {
+	args := m.Called(ctx, host)
+	return args.String(0), args.Error(1)
+}
+
+func (m *MockResolver) LookupAddr(ctx context.Context, addr string) ([]string, error) {
+	args := m.Called(ctx, addr)
+	recs, _ := args.Get(0).([]string)
+	return recs, args.Error(1)
+}
+
 func TestDNSForwarder_SubdomainAccessLogic(t *testing.T) {
 	tests := []struct {
 		name             string
@@ -545,12 +580,15 @@ func TestDNSForwarder_MultipleIPsInSingleUpdate(t *testing.T) {
 }

 func TestDNSForwarder_ResponseCodes(t *testing.T) {
+	// A type with no net.Resolver Lookup method (CAA) must answer NODATA
+	// (NOERROR, empty) rather than NXDOMAIN/NOTIMP to avoid poisoning the name.
 	tests := []struct {
 		name         string
 		queryType    uint16
 		queryDomain  string
 		configured   string
 		expectedCode int
+		expectEDE    bool
 		description  string
 	}{
 		{
@@ -562,28 +600,13 @@ func TestDNSForwarder_ResponseCodes(t *testing.T) {
 			description:  "RFC compliant REFUSED for unauthorized queries",
 		},
 		{
-			name:         "unsupported query type returns NOTIMP",
-			queryType:    dns.TypeMX,
+			name:         "unsupported query type returns NODATA",
+			queryType:    dns.TypeCAA,
 			queryDomain:  "example.com",
 			configured:   "example.com",
-			expectedCode: dns.RcodeNotImplemented,
-			description:  "RFC compliant NOTIMP for unsupported types",
-		},
-		{
-			name:         "CNAME query returns NOTIMP",
-			queryType:    dns.TypeCNAME,
-			queryDomain:  "example.com",
-			configured:   "example.com",
-			expectedCode: dns.RcodeNotImplemented,
-			description:  "CNAME queries not supported",
-		},
-		{
-			name:         "TXT query returns NOTIMP",
-			queryType:    dns.TypeTXT,
-			queryDomain:  "example.com",
-			configured:   "example.com",
-			expectedCode: dns.RcodeNotImplemented,
-			description:  "TXT queries not supported",
+			expectedCode: dns.RcodeSuccess,
+			expectEDE:    true,
+			description:  "Unsupported types answer NODATA, not NXDOMAIN/NOTIMP",
 		},
 	}

@@ -599,6 +622,7 @@ func TestDNSForwarder_ResponseCodes(t *testing.T) {

 			query := &dns.Msg{}
 			query.SetQuestion(dns.Fqdn(tt.queryDomain), tt.queryType)
+			query.SetEdns0(dns.DefaultMsgSize, false)

 			// Capture the written response
 			var writtenResp *dns.Msg
@@ -614,10 +638,213 @@ func TestDNSForwarder_ResponseCodes(t *testing.T) {
 			// Check the response written to the writer
 			require.NotNil(t, writtenResp, "Expected response to be written")
 			assert.Equal(t, tt.expectedCode, writtenResp.Rcode, tt.description)
+			assert.Empty(t, writtenResp.Answer, "Non-address response should carry no answers")
+
+			if tt.expectEDE {
+				require.NotNil(t, writtenResp.IsEdns0(), "EDNS0 client should get an OPT in the reply")
+				assert.True(t, hasEDE(writtenResp, dns.ExtendedErrorCodeNotSupported),
+					"unsupported type NODATA should carry EDE Not Supported")
+			}
 		})
 	}
 }

+func hasEDE(m *dns.Msg, code uint16) bool {
+	opt := m.IsEdns0()
+	if opt == nil {
+		return false
+	}
+	for _, o := range opt.Option {
+		if ede, ok := o.(*dns.EDNS0_EDE); ok && ede.InfoCode == code {
+			return true
+		}
+	}
+	return false
+}
+
+func TestDNSForwarder_RecordQueries(t *testing.T) {
+	notFound := &net.DNSError{IsNotFound: true, Name: "example.com"}
+
+	t.Run("MX records are forwarded", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "example.com")
+
+		mockResolver.On("LookupMX", mock.Anything, "example.com.").
+			Return([]*net.MX{{Host: "mail.example.com.", Pref: 10}}, nil).Once()
+
+		resp := runRecordQuery(t, forwarder, "example.com", dns.TypeMX)
+		require.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		require.Len(t, resp.Answer, 1)
+		mx, ok := resp.Answer[0].(*dns.MX)
+		require.True(t, ok, "answer should be an MX record")
+		assert.Equal(t, uint16(10), mx.Preference)
+		assert.Equal(t, "mail.example.com.", mx.Mx)
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("missing MX is NODATA not NXDOMAIN", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "example.com")
+
+		// A not-found cannot prove the name is absent (it may exist with only
+		// other record types), so it must answer NODATA, never NXDOMAIN.
+		mockResolver.On("LookupMX", mock.Anything, "example.com.").
+			Return(nil, notFound).Once()
+
+		resp := runRecordQuery(t, forwarder, "example.com", dns.TypeMX)
+		assert.Equal(t, dns.RcodeSuccess, resp.Rcode, "missing record must be NODATA")
+		assert.Empty(t, resp.Answer)
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("NS records are forwarded", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "example.com")
+
+		mockResolver.On("LookupNS", mock.Anything, "example.com.").
+			Return([]*net.NS{{Host: "ns1.example.com."}}, nil).Once()
+
+		resp := runRecordQuery(t, forwarder, "example.com", dns.TypeNS)
+		require.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		require.Len(t, resp.Answer, 1)
+		ns, ok := resp.Answer[0].(*dns.NS)
+		require.True(t, ok, "answer should be an NS record")
+		assert.Equal(t, "ns1.example.com.", ns.Ns)
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("missing NS is NODATA", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "example.com")
+
+		mockResolver.On("LookupNS", mock.Anything, "example.com.").
+			Return(nil, notFound).Once()
+
+		resp := runRecordQuery(t, forwarder, "example.com", dns.TypeNS)
+		assert.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		assert.Empty(t, resp.Answer)
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("SRV records are forwarded", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "_sip._tcp.example.com")
+
+		mockResolver.On("LookupSRV", mock.Anything, "", "", "_sip._tcp.example.com.").
+			Return("", []*net.SRV{{Target: "sip.example.com.", Port: 5060, Priority: 10, Weight: 5}}, nil).Once()
+
+		resp := runRecordQuery(t, forwarder, "_sip._tcp.example.com", dns.TypeSRV)
+		require.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		require.Len(t, resp.Answer, 1)
+		srv, ok := resp.Answer[0].(*dns.SRV)
+		require.True(t, ok, "answer should be an SRV record")
+		assert.Equal(t, "sip.example.com.", srv.Target)
+		assert.Equal(t, uint16(5060), srv.Port)
+		assert.Equal(t, uint16(10), srv.Priority)
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("missing SRV is NODATA", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "_sip._tcp.example.com")
+
+		mockResolver.On("LookupSRV", mock.Anything, "", "", "_sip._tcp.example.com.").
+			Return("", nil, notFound).Once()
+
+		resp := runRecordQuery(t, forwarder, "_sip._tcp.example.com", dns.TypeSRV)
+		assert.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		assert.Empty(t, resp.Answer)
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("TXT records are forwarded", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "example.com")
+
+		mockResolver.On("LookupTXT", mock.Anything, "example.com.").
+			Return([]string{"v=spf1 -all"}, nil).Once()
+
+		resp := runRecordQuery(t, forwarder, "example.com", dns.TypeTXT)
+		require.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		require.Len(t, resp.Answer, 1)
+		txt, ok := resp.Answer[0].(*dns.TXT)
+		require.True(t, ok, "answer should be a TXT record")
+		assert.Equal(t, []string{"v=spf1 -all"}, txt.Txt)
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("CNAME record is forwarded", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "www.example.com")
+
+		mockResolver.On("LookupCNAME", mock.Anything, "www.example.com.").
+			Return("target.example.com.", nil).Once()
+
+		resp := runRecordQuery(t, forwarder, "www.example.com", dns.TypeCNAME)
+		require.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		require.Len(t, resp.Answer, 1)
+		cname, ok := resp.Answer[0].(*dns.CNAME)
+		require.True(t, ok, "answer should be a CNAME record")
+		assert.Equal(t, "target.example.com.", cname.Target)
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("CNAME equal to the name is NODATA", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "example.com")
+
+		// No CNAME exists: LookupCNAME echoes the queried name back.
+		mockResolver.On("LookupCNAME", mock.Anything, "example.com.").
+			Return("example.com.", nil).Once()
+
+		resp := runRecordQuery(t, forwarder, "example.com", dns.TypeCNAME)
+		assert.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		assert.Empty(t, resp.Answer, "self-referential CNAME means no CNAME record")
+		mockResolver.AssertExpectations(t)
+	})
+
+	t.Run("PTR record is forwarded", func(t *testing.T) {
+		mockResolver := &MockResolver{}
+		forwarder := newRecordTestForwarder(t, mockResolver, "*.in-addr.arpa")
+
+		// The reverse name is parsed back to the address LookupAddr expects.
+		mockResolver.On("LookupAddr", mock.Anything, "1.2.3.4").
+			Return([]string{"host.example.com."}, nil).Once()
+
+		resp := runRecordQuery(t, forwarder, "4.3.2.1.in-addr.arpa", dns.TypePTR)
+		require.Equal(t, dns.RcodeSuccess, resp.Rcode)
+		require.Len(t, resp.Answer, 1)
+		ptr, ok := resp.Answer[0].(*dns.PTR)
+		require.True(t, ok, "answer should be a PTR record")
+		assert.Equal(t, "host.example.com.", ptr.Ptr)
+		mockResolver.AssertExpectations(t)
+	})
+}
+
+func newRecordTestForwarder(t *testing.T, r resolver, configured string) *DNSForwarder {
+	t.Helper()
+	forwarder := NewDNSForwarder(netip.MustParseAddrPort("127.0.0.1:0"), 300, nil, &peer.Status{}, nil)
+	forwarder.resolver = r
+
+	d, err := domain.FromString(configured)
+	require.NoError(t, err)
+	forwarder.UpdateDomains([]*ForwarderEntry{{Domain: d, ResID: "test-res"}})
+	return forwarder
+}
+
+func runRecordQuery(t *testing.T, forwarder *DNSForwarder, qname string, qtype uint16) *dns.Msg {
+	t.Helper()
+	query := &dns.Msg{}
+	query.SetQuestion(dns.Fqdn(qname), qtype)
+
+	mockWriter := &test.MockResponseWriter{}
+	forwarder.handleDNSQuery(log.NewEntry(log.StandardLogger()), mockWriter, query, time.Now())
+
+	resp := mockWriter.GetLastResponse()
+	require.NotNil(t, resp, "expected response to be written")
+	return resp
+}
+
 func TestDNSForwarder_UpstreamFailureEDE(t *testing.T) {
 	tests := []struct {
 		name        string
--- a/client/internal/engine.go
+++ b/client/internal/engine.go
@@ -895,6 +895,16 @@ func (e *Engine) handleAutoUpdateVersion(autoUpdateSettings *mgmProto.AutoUpdate
 	e.updateManager.SetVersion(autoUpdateSettings.Version, autoUpdateSettings.AlwaysUpdate)
 }

+// phase times a sync sub-phase: it returns a function that records the elapsed
+// duration when called. Starting the timer at the call site keeps inter-phase
+// glue code out of the measurement.
+func (e *Engine) phase(name string) func() {
+	start := time.Now()
+	return func() {
+		e.clientMetrics.RecordSyncPhase(e.ctx, name, time.Since(start))
+	}
+}
+
 func (e *Engine) handleSync(update *mgmProto.SyncResponse) error {
 	started := time.Now()
 	defer func() {
@@ -914,7 +924,10 @@ func (e *Engine) handleSync(update *mgmProto.SyncResponse) error {
 		e.handleAutoUpdateVersion(update.NetworkMap.PeerConfig.AutoUpdate)
 	}

-	if err := e.updateNetbirdConfig(update.GetNetbirdConfig()); err != nil {
+	done := e.phase("netbird_config")
+	err := e.updateNetbirdConfig(update.GetNetbirdConfig())
+	done()
+	if err != nil {
 		return err
 	}

@@ -928,11 +941,16 @@ func (e *Engine) handleSync(update *mgmProto.SyncResponse) error {
 		return nil
 	}

-	if err := e.updateChecksIfNew(update.Checks); err != nil {
+	done = e.phase("checks")
+	err = e.updateChecksIfNew(update.Checks)
+	done()
+	if err != nil {
 		return err
 	}

+	done = e.phase("persist")
 	e.persistSyncResponse(update)
+	done()

 	// only apply new changes and ignore old ones
 	if err := e.updateNetworkMap(nm); err != nil {
@@ -1066,7 +1084,7 @@ func (e *Engine) updateChecksIfNew(checks []*mgmProto.Checks) error {
 	}
 	e.checks = checks

-	info, err := system.GetInfoWithChecks(e.ctx, checks)
+	info, err := system.GetInfoWithChecks(e.ctx, checks, e.overlayAddresses()...)
 	if err != nil {
 		log.Warnf("failed to get system info with checks: %v", err)
 		info = system.GetInfo(e.ctx)
@@ -1097,6 +1115,20 @@ func (e *Engine) updateChecksIfNew(checks []*mgmProto.Checks) error {
 	return nil
 }

+// overlayAddresses returns our own WireGuard overlay address (v4 and v6) so it
+// can be excluded from the reported network addresses; the interface coming and
+// going otherwise churns the peer meta on the management server.
+func (e *Engine) overlayAddresses() []netip.Addr {
+	var ips []netip.Addr
+	if e.config.WgAddr.IP.IsValid() {
+		ips = append(ips, e.config.WgAddr.IP)
+	}
+	if e.config.WgAddr.HasIPv6() {
+		ips = append(ips, e.config.WgAddr.IPv6)
+	}
+	return ips
+}
+
 func (e *Engine) updateConfig(conf *mgmProto.PeerConfig) error {
 	if e.wgInterface == nil {
 		return errors.New("wireguard interface is not initialized")
@@ -1240,7 +1272,7 @@ func (e *Engine) receiveManagementEvents() {
 	e.shutdownWg.Add(1)
 	go func() {
 		defer e.shutdownWg.Done()
-		info, err := system.GetInfoWithChecks(e.ctx, e.checks)
+		info, err := system.GetInfoWithChecks(e.ctx, e.checks, e.overlayAddresses()...)
 		if err != nil {
 			log.Warnf("failed to get system info with checks: %v", err)
 			info = system.GetInfo(e.ctx)
@@ -1357,13 +1389,16 @@ func (e *Engine) updateNetworkMap(networkMap *mgmProto.NetworkMap) error {

 	dnsConfig := toDNSConfig(protoDNSConfig, e.wgInterface.Address())

+	done := e.phase("dns_server")
 	if err := e.dnsServer.UpdateDNSServer(serial, dnsConfig); err != nil {
 		log.Errorf("failed to update dns server, err: %v", err)
 	}
+	done()

 	e.routeManager.SetDNSForwarderPort(dnsConfig.ForwarderPort)

 	// apply routes first, route related actions might depend on routing being enabled
+	done = e.phase("routes_classify")
 	routes := toRoutes(networkMap.GetRoutes())
 	serverRoutes, clientRoutes := e.routeManager.ClassifyRoutes(routes)

@@ -1372,29 +1407,60 @@ func (e *Engine) updateNetworkMap(networkMap *mgmProto.NetworkMap) error {
 		e.connMgr.UpdateRouteHAMap(clientRoutes)
 		log.Debugf("updated lazy connection manager with %d HA groups", len(clientRoutes))
 	}
+	done()

+	done = e.phase("routes_apply")
 	dnsRouteFeatureFlag := toDNSFeatureFlag(networkMap)
 	if err := e.routeManager.UpdateRoutes(serial, serverRoutes, clientRoutes, dnsRouteFeatureFlag); err != nil {
 		log.Errorf("failed to update routes: %v", err)
 	}
+	done()

+	done = e.phase("filtering")
 	if e.acl != nil {
 		e.acl.ApplyFiltering(networkMap, dnsRouteFeatureFlag)
 	}
+	done()

+	done = e.phase("dns_forwarder")
 	fwdEntries := toRouteDomains(e.config.WgPrivateKey.PublicKey().String(), routes)
 	e.updateDNSForwarder(dnsRouteFeatureFlag, fwdEntries)
+	done()

 	// Ingress forward rules
+	done = e.phase("forward_rules")
 	forwardingRules, err := e.updateForwardRules(networkMap.GetForwardingRules())
 	if err != nil {
 		log.Errorf("failed to update forward rules, err: %v", err)
 	}
+	done()

 	log.Debugf("got peers update from Management Service, total peers to connect to = %d", len(networkMap.GetRemotePeers()))

+	done = e.phase("offline_peers")
 	e.updateOfflinePeers(networkMap.GetOfflinePeers())
+	done()

+	remotePeers, err := e.reconcilePeers(networkMap)
+	if err != nil {
+		return err
+	}
+
+	// must set the exclude list after the peers are added. Without it the manager can not figure out the peers parameters from the store
+	done = e.phase("lazy_exclude")
+	excludedLazyPeers := e.toExcludedLazyPeers(forwardingRules, remotePeers)
+	e.connMgr.SetExcludeList(e.ctx, excludedLazyPeers)
+	done()
+
+	e.networkSerial = serial
+
+	return nil
+}
+
+// reconcilePeers applies the remote peer list from the network map (removing,
+// modifying and adding peers, then updating SSH config) and returns the remote
+// peers with our own peer filtered out, for use by later sync steps.
+func (e *Engine) reconcilePeers(networkMap *mgmProto.NetworkMap) ([]*mgmProto.RemotePeerConfig, error) {
 	// Filter out own peer from the remote peers list
 	localPubKey := e.config.WgPrivateKey.PublicKey().String()
 	remotePeers := make([]*mgmProto.RemotePeerConfig, 0, len(networkMap.GetRemotePeers()))
@@ -1409,42 +1475,43 @@ func (e *Engine) updateNetworkMap(networkMap *mgmProto.NetworkMap) error {
 		err := e.removeAllPeers()
 		e.statusRecorder.FinishPeerListModifications()
 		if err != nil {
-			return err
+			return nil, err
 		}
-	} else {
-		err := e.removePeers(remotePeers)
-		if err != nil {
-			return err
-		}
-
-		err = e.modifyPeers(remotePeers)
-		if err != nil {
-			return err
-		}
-
-		err = e.addNewPeers(remotePeers)
-		if err != nil {
-			return err
-		}
-
-		e.statusRecorder.FinishPeerListModifications()
-
-		e.updatePeerSSHHostKeys(remotePeers)
-
-		if err := e.updateSSHClientConfig(remotePeers); err != nil {
-			log.Warnf("failed to update SSH client config: %v", err)
-		}
-
-		e.updateSSHServerAuth(networkMap.GetSshAuth())
+		return remotePeers, nil
 	}

-	// must set the exclude list after the peers are added. Without it the manager can not figure out the peers parameters from the store
-	excludedLazyPeers := e.toExcludedLazyPeers(forwardingRules, remotePeers)
-	e.connMgr.SetExcludeList(e.ctx, excludedLazyPeers)
+	done := e.phase("removed_peers")
+	err := e.removePeers(remotePeers)
+	done()
+	if err != nil {
+		return nil, err
+	}

-	e.networkSerial = serial
+	done = e.phase("modified_peers")
+	err = e.modifyPeers(remotePeers)
+	done()
+	if err != nil {
+		return nil, err
+	}

-	return nil
+	done = e.phase("added_peers")
+	err = e.addNewPeers(remotePeers)
+	done()
+	if err != nil {
+		return nil, err
+	}
+
+	e.statusRecorder.FinishPeerListModifications()
+
+	e.updatePeerSSHHostKeys(remotePeers)
+
+	if err := e.updateSSHClientConfig(remotePeers); err != nil {
+		log.Warnf("failed to update SSH client config: %v", err)
+	}
+
+	e.updateSSHServerAuth(networkMap.GetSshAuth())
+
+	return remotePeers, nil
 }

 func toDNSFeatureFlag(networkMap *mgmProto.NetworkMap) bool {
--- a/client/internal/engine_privileged_test.go
+++ b/client/internal/engine_privileged_test.go
@@ -0,0 +1,565 @@
+//go:build privileged
+
+package internal
+
+import (
+	"context"
+	"fmt"
+	"net"
+	"runtime"
+	"strings"
+	"sync"
+	"testing"
+	"time"
+
+	"github.com/golang/mock/gomock"
+	"github.com/google/uuid"
+	log "github.com/sirupsen/logrus"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+	"go.opentelemetry.io/otel"
+	"golang.zx2c4.com/wireguard/wgctrl/wgtypes"
+	"google.golang.org/grpc"
+	"google.golang.org/grpc/keepalive"
+
+	"github.com/netbirdio/netbird/client/iface"
+	"github.com/netbirdio/netbird/client/iface/device"
+	"github.com/netbirdio/netbird/client/iface/wgaddr"
+	"github.com/netbirdio/netbird/client/internal/dns"
+	"github.com/netbirdio/netbird/client/internal/peer"
+	nbssh "github.com/netbirdio/netbird/client/ssh"
+	"github.com/netbirdio/netbird/client/system"
+	nbdns "github.com/netbirdio/netbird/dns"
+	"github.com/netbirdio/netbird/management/internals/controllers/network_map/controller"
+	"github.com/netbirdio/netbird/management/internals/controllers/network_map/update_channel"
+	"github.com/netbirdio/netbird/management/internals/modules/peers"
+	"github.com/netbirdio/netbird/management/internals/modules/peers/ephemeral/manager"
+	"github.com/netbirdio/netbird/management/internals/server/config"
+	nbgrpc "github.com/netbirdio/netbird/management/internals/shared/grpc"
+	"github.com/netbirdio/netbird/management/server"
+	"github.com/netbirdio/netbird/management/server/activity"
+	nbcache "github.com/netbirdio/netbird/management/server/cache"
+	"github.com/netbirdio/netbird/management/server/groups"
+	"github.com/netbirdio/netbird/management/server/integrations/integrated_validator/validator"
+	"github.com/netbirdio/netbird/management/server/integrations/port_forwarding"
+	"github.com/netbirdio/netbird/management/server/job"
+	"github.com/netbirdio/netbird/management/server/permissions"
+	"github.com/netbirdio/netbird/management/server/settings"
+	"github.com/netbirdio/netbird/management/server/store"
+	"github.com/netbirdio/netbird/management/server/telemetry"
+	"github.com/netbirdio/netbird/management/server/types"
+	mgmt "github.com/netbirdio/netbird/shared/management/client"
+	mgmtProto "github.com/netbirdio/netbird/shared/management/proto"
+	relayClient "github.com/netbirdio/netbird/shared/relay/client"
+	signal "github.com/netbirdio/netbird/shared/signal/client"
+	"github.com/netbirdio/netbird/shared/signal/proto"
+	signalServer "github.com/netbirdio/netbird/signal/server"
+	"github.com/netbirdio/netbird/util"
+)
+
+func TestEngine_SSH(t *testing.T) {
+	key, err := wgtypes.GeneratePrivateKey()
+	if err != nil {
+		t.Fatal(err)
+		return
+	}
+
+	sshKey, err := nbssh.GeneratePrivateKey(nbssh.ED25519)
+	if err != nil {
+		t.Fatal(err)
+		return
+	}
+
+	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
+	defer cancel()
+
+	relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
+	engine := NewEngine(
+		ctx, cancel,
+		&EngineConfig{
+			WgIfaceName:      "utun101",
+			WgAddr:           wgaddr.MustParseWGAddress("100.64.0.1/24"),
+			WgPrivateKey:     key,
+			WgPort:           33100,
+			ServerSSHAllowed: true,
+			MTU:              iface.DefaultMTU,
+			SSHKey:           sshKey,
+		},
+		EngineServices{
+			SignalClient:   &signal.MockClient{},
+			MgmClient:      &mgmt.MockClient{},
+			RelayManager:   relayMgr,
+			StatusRecorder: peer.NewRecorder("https://mgm"),
+		},
+		MobileDependency{},
+	)
+
+	engine.dnsServer = &dns.MockServer{
+		UpdateDNSServerFunc: func(serial uint64, update nbdns.Config) error { return nil },
+	}
+
+	err = engine.Start(nil, nil)
+	require.NoError(t, err)
+
+	defer func() {
+		err := engine.Stop()
+		if err != nil {
+			return
+		}
+	}()
+
+	peerWithSSH := &mgmtProto.RemotePeerConfig{
+		WgPubKey:   "MNHf3Ma6z6mdLbriAJbqhX7+nM/B71lgw2+91q3LfhU=",
+		AllowedIps: []string{"100.64.0.21/24"},
+		SshConfig: &mgmtProto.SSHConfig{
+			SshPubKey: []byte("ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFATYCqaQw/9id1Qkq3n16JYhDhXraI6Pc1fgB8ynEfQ"),
+		},
+	}
+
+	// SSH server is not enabled so SSH config of a remote peer should be ignored
+	networkMap := &mgmtProto.NetworkMap{
+		Serial:             6,
+		PeerConfig:         nil,
+		RemotePeers:        []*mgmtProto.RemotePeerConfig{peerWithSSH},
+		RemotePeersIsEmpty: false,
+	}
+
+	err = engine.updateNetworkMap(networkMap)
+	require.NoError(t, err)
+
+	assert.Nil(t, engine.sshServer)
+
+	// SSH server is enabled, therefore SSH config should be applied
+	networkMap = &mgmtProto.NetworkMap{
+		Serial: 7,
+		PeerConfig: &mgmtProto.PeerConfig{Address: "100.64.0.1/24",
+			SshConfig: &mgmtProto.SSHConfig{
+				SshEnabled: true,
+				JwtConfig: &mgmtProto.JWTConfig{
+					Issuer:       "test-issuer",
+					Audience:     "test-audience",
+					KeysLocation: "test-keys",
+					MaxTokenAge:  3600,
+				},
+			}},
+		RemotePeers:        []*mgmtProto.RemotePeerConfig{peerWithSSH},
+		RemotePeersIsEmpty: false,
+	}
+
+	err = engine.updateNetworkMap(networkMap)
+	require.NoError(t, err)
+
+	time.Sleep(250 * time.Millisecond)
+	assert.NotNil(t, engine.sshServer)
+
+	// now remove peer
+	networkMap = &mgmtProto.NetworkMap{
+		Serial:             8,
+		RemotePeers:        []*mgmtProto.RemotePeerConfig{},
+		RemotePeersIsEmpty: false,
+	}
+
+	err = engine.updateNetworkMap(networkMap)
+	require.NoError(t, err)
+
+	// time.Sleep(250 * time.Millisecond)
+	assert.NotNil(t, engine.sshServer)
+
+	// now disable SSH server
+	networkMap = &mgmtProto.NetworkMap{
+		Serial: 9,
+		PeerConfig: &mgmtProto.PeerConfig{Address: "100.64.0.1/24",
+			SshConfig: &mgmtProto.SSHConfig{SshEnabled: false}},
+		RemotePeers:        []*mgmtProto.RemotePeerConfig{peerWithSSH},
+		RemotePeersIsEmpty: false,
+	}
+
+	err = engine.updateNetworkMap(networkMap)
+	require.NoError(t, err)
+
+	assert.Nil(t, engine.sshServer)
+}
+
+func TestEngine_Sync(t *testing.T) {
+	key, err := wgtypes.GeneratePrivateKey()
+	if err != nil {
+		t.Fatal(err)
+		return
+	}
+
+	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
+	defer cancel()
+
+	// feed updates to Engine via mocked Management client
+	updates := make(chan *mgmtProto.SyncResponse)
+	defer close(updates)
+	syncFunc := func(ctx context.Context, info *system.Info, msgHandler func(msg *mgmtProto.SyncResponse) error) error {
+		for msg := range updates {
+			err := msgHandler(msg)
+			if err != nil {
+				t.Fatal(err)
+			}
+		}
+		return nil
+	}
+	relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
+	engine := NewEngine(ctx, cancel, &EngineConfig{
+		WgIfaceName:  "utun103",
+		WgAddr:       wgaddr.MustParseWGAddress("100.64.0.1/24"),
+		WgPrivateKey: key,
+		WgPort:       33100,
+		MTU:          iface.DefaultMTU,
+	}, EngineServices{
+		SignalClient:   &signal.MockClient{},
+		MgmClient:      &mgmt.MockClient{SyncFunc: syncFunc},
+		RelayManager:   relayMgr,
+		StatusRecorder: peer.NewRecorder("https://mgm"),
+	}, MobileDependency{})
+	engine.ctx = ctx
+
+	engine.dnsServer = &dns.MockServer{
+		UpdateDNSServerFunc: func(serial uint64, update nbdns.Config) error { return nil },
+	}
+
+	defer func() {
+		err := engine.Stop()
+		if err != nil {
+			return
+		}
+	}()
+
+	err = engine.Start(nil, nil)
+	if err != nil {
+		t.Fatal(err)
+		return
+	}
+
+	peer1 := &mgmtProto.RemotePeerConfig{
+		WgPubKey:   "RRHf3Ma6z6mdLbriAJbqhX7+nM/B71lgw2+91q3LfhU=",
+		AllowedIps: []string{"100.64.0.10/24"},
+	}
+	peer2 := &mgmtProto.RemotePeerConfig{
+		WgPubKey:   "LLHf3Ma6z6mdLbriAJbqhX9+nM/B71lgw2+91q3LlhU=",
+		AllowedIps: []string{"100.64.0.11/24"},
+	}
+	peer3 := &mgmtProto.RemotePeerConfig{
+		WgPubKey:   "GGHf3Ma6z6mdLbriAJbqhX9+nM/B71lgw2+91q3LlhU=",
+		AllowedIps: []string{"100.64.0.12/24"},
+	}
+	// 1st update with just 1 peer and serial larger than the current serial of the engine => apply update
+	updates <- &mgmtProto.SyncResponse{
+		NetworkMap: &mgmtProto.NetworkMap{
+			Serial:             10,
+			PeerConfig:         nil,
+			RemotePeers:        []*mgmtProto.RemotePeerConfig{peer1, peer2, peer3},
+			RemotePeersIsEmpty: false,
+		},
+	}
+
+	timeout := time.After(time.Second * 2)
+	for {
+		select {
+		case <-timeout:
+			t.Fatalf("timeout while waiting for test to finish")
+			return
+		default:
+		}
+
+		if getPeers(engine) == 3 && engine.networkSerial == 10 {
+			break
+		}
+	}
+}
+
+func TestEngine_MultiplePeers(t *testing.T) {
+	// log.SetLevel(log.DebugLevel)
+
+	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
+	defer cancel()
+
+	sigServer, signalAddr, err := startSignal(t)
+	if err != nil {
+		t.Fatal(err)
+		return
+	}
+	defer sigServer.Stop()
+	mgmtServer, mgmtAddr, err := startManagement(t, t.TempDir(), "../testdata/store.sql")
+	if err != nil {
+		t.Fatal(err)
+		return
+	}
+	defer mgmtServer.GracefulStop()
+
+	setupKey := "A2C8E62B-38F5-4553-B31E-DD66C696CEBB"
+
+	mu := sync.Mutex{}
+	engines := []*Engine{}
+	numPeers := 10
+	wg := sync.WaitGroup{}
+	wg.Add(numPeers)
+	// create and start peers
+	for i := 0; i < numPeers; i++ {
+		j := i
+		go func() {
+			engine, err := createEngine(ctx, cancel, setupKey, j, mgmtAddr, signalAddr)
+			if err != nil {
+				wg.Done()
+				t.Errorf("unable to create the engine for peer %d with error %v", j, err)
+				return
+			}
+			engine.dnsServer = &dns.MockServer{}
+			mu.Lock()
+			defer mu.Unlock()
+			guid := fmt.Sprintf("{%s}", uuid.New().String())
+			device.CustomWindowsGUIDString = strings.ToLower(guid)
+			err = engine.Start(nil, nil)
+			if err != nil {
+				t.Errorf("unable to start engine for peer %d with error %v", j, err)
+				wg.Done()
+				return
+			}
+			engines = append(engines, engine)
+			wg.Done()
+		}()
+	}
+
+	// wait until all have been created and started
+	wg.Wait()
+	if len(engines) != numPeers {
+		t.Fatal("not all peers were started")
+	}
+	// check whether all the peer have expected peers connected
+
+	expectedConnected := numPeers * (numPeers - 1)
+
+	// adjust according to timeouts
+	timeout := 50 * time.Second
+	timeoutChan := time.After(timeout)
+	ticker := time.NewTicker(time.Second)
+	defer ticker.Stop()
+loop:
+	for {
+		select {
+		case <-timeoutChan:
+			t.Fatalf("waiting for expected connections timeout after %s", timeout.String())
+			break loop
+		case <-ticker.C:
+			totalConnected := 0
+			for _, engine := range engines {
+				totalConnected += getConnectedPeers(engine)
+			}
+			if totalConnected == expectedConnected {
+				log.Infof("total connected=%d", totalConnected)
+				break loop
+			}
+			log.Infof("total connected=%d", totalConnected)
+		}
+	}
+	// cleanup test
+	for n, peerEngine := range engines {
+		t.Logf("stopping peer with interface %s from multipeer test, loopIndex %d", peerEngine.wgInterface.Name(), n)
+		errStop := peerEngine.mgmClient.Close()
+		if errStop != nil {
+			log.Infoln("got error trying to close management clients from engine: ", errStop)
+		}
+		errStop = peerEngine.Stop()
+		if errStop != nil {
+			log.Infoln("got error trying to close testing peers engine: ", errStop)
+		}
+	}
+}
+
+var (
+	kaep = keepalive.EnforcementPolicy{
+		MinTime:             15 * time.Second,
+		PermitWithoutStream: true,
+	}
+
+	kasp = keepalive.ServerParameters{
+		MaxConnectionIdle:     15 * time.Second,
+		MaxConnectionAgeGrace: 5 * time.Second,
+		Time:                  5 * time.Second,
+		Timeout:               2 * time.Second,
+	}
+)
+
+func createEngine(ctx context.Context, cancel context.CancelFunc, setupKey string, i int, mgmtAddr string, signalAddr string) (*Engine, error) {
+	key, err := wgtypes.GeneratePrivateKey()
+	if err != nil {
+		return nil, err
+	}
+	mgmtClient, err := mgmt.NewClient(ctx, mgmtAddr, key, false)
+	if err != nil {
+		return nil, err
+	}
+	signalClient, err := signal.NewClient(ctx, signalAddr, key, false)
+	if err != nil {
+		return nil, err
+	}
+
+	info := system.GetInfo(ctx)
+	resp, err := mgmtClient.Register(setupKey, "", info, nil, nil)
+	if err != nil {
+		return nil, err
+	}
+
+	var ifaceName string
+	if runtime.GOOS == "darwin" {
+		ifaceName = fmt.Sprintf("utun1%d", i)
+	} else {
+		ifaceName = fmt.Sprintf("wt%d", i)
+	}
+
+	wgPort := 33100 + i
+	conf := &EngineConfig{
+		WgIfaceName:  ifaceName,
+		WgAddr:       wgaddr.MustParseWGAddress(resp.PeerConfig.Address),
+		WgPrivateKey: key,
+		WgPort:       wgPort,
+		MTU:          iface.DefaultMTU,
+	}
+
+	relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
+	e, err := NewEngine(ctx, cancel, conf, EngineServices{
+		SignalClient:   signalClient,
+		MgmClient:      mgmtClient,
+		RelayManager:   relayMgr,
+		StatusRecorder: peer.NewRecorder("https://mgm"),
+	}, MobileDependency{}), nil
+	e.ctx = ctx
+	return e, err
+}
+
+func startSignal(t *testing.T) (*grpc.Server, string, error) {
+	t.Helper()
+
+	s := grpc.NewServer(grpc.KeepaliveEnforcementPolicy(kaep), grpc.KeepaliveParams(kasp))
+
+	lis, err := net.Listen("tcp", "localhost:0")
+	if err != nil {
+		log.Fatalf("failed to listen: %v", err)
+	}
+
+	srv, err := signalServer.NewServer(context.Background(), otel.Meter(""))
+	require.NoError(t, err)
+	proto.RegisterSignalExchangeServer(s, srv)
+
+	go func() {
+		if err = s.Serve(lis); err != nil {
+			log.Fatalf("failed to serve: %v", err)
+		}
+	}()
+
+	return s, lis.Addr().String(), nil
+}
+
+func startManagement(t *testing.T, dataDir, testFile string) (*grpc.Server, string, error) {
+	t.Helper()
+
+	config := &config.Config{
+		Stuns:      []*config.Host{},
+		TURNConfig: &config.TURNConfig{},
+		Relay: &config.Relay{
+			Addresses:      []string{"127.0.0.1:1234"},
+			CredentialsTTL: util.Duration{Duration: time.Hour},
+			Secret:         "222222222222222222",
+		},
+		Signal: &config.Host{
+			Proto: "http",
+			URI:   "localhost:10000",
+		},
+		Datadir:    dataDir,
+		HttpConfig: nil,
+	}
+
+	lis, err := net.Listen("tcp", "localhost:0")
+	if err != nil {
+		return nil, "", err
+	}
+	s := grpc.NewServer(grpc.KeepaliveEnforcementPolicy(kaep), grpc.KeepaliveParams(kasp))
+
+	store, cleanUp, err := store.NewTestStoreFromSQL(context.Background(), testFile, config.Datadir)
+	if err != nil {
+		return nil, "", err
+	}
+	t.Cleanup(cleanUp)
+
+	eventStore := &activity.InMemoryEventStore{}
+	if err != nil {
+		return nil, "", err
+	}
+
+	permissionsManager := permissions.NewManager(store)
+	peersManager := peers.NewManager(store, permissionsManager)
+	jobManager := job.NewJobManager(nil, store, peersManager)
+
+	cacheStore, err := nbcache.NewStore(context.Background(), 100*time.Millisecond, 300*time.Millisecond, 100)
+	if err != nil {
+		return nil, "", err
+	}
+
+	ia, _ := validator.NewIntegratedValidator(context.Background(), peersManager, nil, eventStore, cacheStore)
+
+	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
+	require.NoError(t, err)
+
+	ctrl := gomock.NewController(t)
+	t.Cleanup(ctrl.Finish)
+	settingsMockManager := settings.NewMockManager(ctrl)
+	settingsMockManager.EXPECT().
+		GetSettings(gomock.Any(), gomock.Any(), gomock.Any()).
+		Return(&types.Settings{}, nil).
+		AnyTimes()
+	settingsMockManager.EXPECT().
+		GetExtraSettings(gomock.Any(), gomock.Any()).
+		Return(&types.ExtraSettings{}, nil).
+		AnyTimes()
+
+	groupsManager := groups.NewManagerMock()
+
+	updateManager := update_channel.NewPeersUpdateManager(metrics)
+	requestBuffer := server.NewAccountRequestBuffer(context.Background(), store)
+	networkMapController := controller.NewController(context.Background(), store, metrics, updateManager, requestBuffer, server.MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), manager.NewEphemeralManager(store, peersManager), config)
+	accountManager, err := server.BuildManager(context.Background(), config, store, networkMapController, jobManager, nil, "", eventStore, nil, false, ia, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false, cacheStore)
+	if err != nil {
+		return nil, "", err
+	}
+
+	secretsManager, err := nbgrpc.NewTimeBasedAuthSecretsManager(updateManager, config.TURNConfig, config.Relay, settingsMockManager, groupsManager)
+	if err != nil {
+		return nil, "", err
+	}
+	mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController, nil, nil)
+	if err != nil {
+		return nil, "", err
+	}
+	mgmtProto.RegisterManagementServiceServer(s, mgmtServer)
+	go func() {
+		if err = s.Serve(lis); err != nil {
+			log.Fatalf("failed to serve: %v", err)
+		}
+	}()
+
+	return s, lis.Addr().String(), nil
+}
+
+// getConnectedPeers returns a connection Status or nil if peer connection wasn't found
+func getConnectedPeers(e *Engine) int {
+	e.syncMsgMux.Lock()
+	defer e.syncMsgMux.Unlock()
+	i := 0
+	for _, id := range e.peerStore.PeersPubKey() {
+		conn, _ := e.peerStore.PeerConn(id)
+		if conn.IsConnected() {
+			i++
+		}
+	}
+	return i
+}
+
+func getPeers(e *Engine) int {
+	e.syncMsgMux.Lock()
+	defer e.syncMsgMux.Unlock()
+
+	return len(e.peerStore.PeersPubKey())
+}
--- a/client/internal/engine_test.go
+++ b/client/internal/engine_test.go
@@ -6,37 +6,18 @@ import (
 	"net"
 	"net/netip"
 	"os"
-	"runtime"
 	"strings"
 	"sync"
 	"testing"
 	"time"

-	"github.com/golang/mock/gomock"
-	"github.com/google/uuid"
-	log "github.com/sirupsen/logrus"
 	"github.com/stretchr/testify/assert"
 	"github.com/stretchr/testify/require"
-	"go.opentelemetry.io/otel"
 	wgdevice "golang.zx2c4.com/wireguard/device"
 	"golang.zx2c4.com/wireguard/tun/netstack"
 	"golang.zx2c4.com/wireguard/wgctrl/wgtypes"
-	"google.golang.org/grpc"
-	"google.golang.org/grpc/keepalive"

 	"github.com/netbirdio/netbird/client/internal/stdnet"
-	"github.com/netbirdio/netbird/management/server/job"
-
-	"github.com/netbirdio/netbird/management/server/integrations/integrated_validator/validator"
-
-	"github.com/netbirdio/netbird/management/internals/controllers/network_map/controller"
-	"github.com/netbirdio/netbird/management/internals/controllers/network_map/update_channel"
-	"github.com/netbirdio/netbird/management/internals/modules/peers"
-	"github.com/netbirdio/netbird/management/internals/modules/peers/ephemeral/manager"
-	nbgrpc "github.com/netbirdio/netbird/management/internals/shared/grpc"
-
-	"github.com/netbirdio/netbird/management/internals/server/config"
-	"github.com/netbirdio/netbird/management/server/groups"

 	"github.com/netbirdio/netbird/client/iface"
 	"github.com/netbirdio/netbird/client/iface/configurer"
@@ -50,18 +31,7 @@ import (
 	icemaker "github.com/netbirdio/netbird/client/internal/peer/ice"
 	"github.com/netbirdio/netbird/client/internal/profilemanager"
 	"github.com/netbirdio/netbird/client/internal/routemanager"
-	nbssh "github.com/netbirdio/netbird/client/ssh"
-	"github.com/netbirdio/netbird/client/system"
 	nbdns "github.com/netbirdio/netbird/dns"
-	"github.com/netbirdio/netbird/management/server"
-	"github.com/netbirdio/netbird/management/server/activity"
-	nbcache "github.com/netbirdio/netbird/management/server/cache"
-	"github.com/netbirdio/netbird/management/server/integrations/port_forwarding"
-	"github.com/netbirdio/netbird/management/server/permissions"
-	"github.com/netbirdio/netbird/management/server/settings"
-	"github.com/netbirdio/netbird/management/server/store"
-	"github.com/netbirdio/netbird/management/server/telemetry"
-	"github.com/netbirdio/netbird/management/server/types"
 	"github.com/netbirdio/netbird/monotime"
 	"github.com/netbirdio/netbird/route"
 	mgmt "github.com/netbirdio/netbird/shared/management/client"
@@ -69,25 +39,9 @@ import (
 	"github.com/netbirdio/netbird/shared/netiputil"
 	relayClient "github.com/netbirdio/netbird/shared/relay/client"
 	signal "github.com/netbirdio/netbird/shared/signal/client"
-	"github.com/netbirdio/netbird/shared/signal/proto"
-	signalServer "github.com/netbirdio/netbird/signal/server"
 	"github.com/netbirdio/netbird/util"
 )

-var (
-	kaep = keepalive.EnforcementPolicy{
-		MinTime:             15 * time.Second,
-		PermitWithoutStream: true,
-	}
-
-	kasp = keepalive.ServerParameters{
-		MaxConnectionIdle:     15 * time.Second,
-		MaxConnectionAgeGrace: 5 * time.Second,
-		Time:                  5 * time.Second,
-		Timeout:               2 * time.Second,
-	}
-)
-
 type MockWGIface struct {
 	CreateFunc                 func() error
 	CreateOnAndroidFunc        func(routeRange []string, ip string, domains []string) error
@@ -234,129 +188,6 @@ func TestMain(m *testing.M) {
 	os.Exit(code)
 }

-func TestEngine_SSH(t *testing.T) {
-	key, err := wgtypes.GeneratePrivateKey()
-	if err != nil {
-		t.Fatal(err)
-		return
-	}
-
-	sshKey, err := nbssh.GeneratePrivateKey(nbssh.ED25519)
-	if err != nil {
-		t.Fatal(err)
-		return
-	}
-
-	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
-	defer cancel()
-
-	relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
-	engine := NewEngine(
-		ctx, cancel,
-		&EngineConfig{
-			WgIfaceName:      "utun101",
-			WgAddr:           wgaddr.MustParseWGAddress("100.64.0.1/24"),
-			WgPrivateKey:     key,
-			WgPort:           33100,
-			ServerSSHAllowed: true,
-			MTU:              iface.DefaultMTU,
-			SSHKey:           sshKey,
-		},
-		EngineServices{
-			SignalClient:   &signal.MockClient{},
-			MgmClient:      &mgmt.MockClient{},
-			RelayManager:   relayMgr,
-			StatusRecorder: peer.NewRecorder("https://mgm"),
-		},
-		MobileDependency{},
-	)
-
-	engine.dnsServer = &dns.MockServer{
-		UpdateDNSServerFunc: func(serial uint64, update nbdns.Config) error { return nil },
-	}
-
-	err = engine.Start(nil, nil)
-	require.NoError(t, err)
-
-	defer func() {
-		err := engine.Stop()
-		if err != nil {
-			return
-		}
-	}()
-
-	peerWithSSH := &mgmtProto.RemotePeerConfig{
-		WgPubKey:   "MNHf3Ma6z6mdLbriAJbqhX7+nM/B71lgw2+91q3LfhU=",
-		AllowedIps: []string{"100.64.0.21/24"},
-		SshConfig: &mgmtProto.SSHConfig{
-			SshPubKey: []byte("ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFATYCqaQw/9id1Qkq3n16JYhDhXraI6Pc1fgB8ynEfQ"),
-		},
-	}
-
-	// SSH server is not enabled so SSH config of a remote peer should be ignored
-	networkMap := &mgmtProto.NetworkMap{
-		Serial:             6,
-		PeerConfig:         nil,
-		RemotePeers:        []*mgmtProto.RemotePeerConfig{peerWithSSH},
-		RemotePeersIsEmpty: false,
-	}
-
-	err = engine.updateNetworkMap(networkMap)
-	require.NoError(t, err)
-
-	assert.Nil(t, engine.sshServer)
-
-	// SSH server is enabled, therefore SSH config should be applied
-	networkMap = &mgmtProto.NetworkMap{
-		Serial: 7,
-		PeerConfig: &mgmtProto.PeerConfig{Address: "100.64.0.1/24",
-			SshConfig: &mgmtProto.SSHConfig{
-				SshEnabled: true,
-				JwtConfig: &mgmtProto.JWTConfig{
-					Issuer:       "test-issuer",
-					Audience:     "test-audience",
-					KeysLocation: "test-keys",
-					MaxTokenAge:  3600,
-				},
-			}},
-		RemotePeers:        []*mgmtProto.RemotePeerConfig{peerWithSSH},
-		RemotePeersIsEmpty: false,
-	}
-
-	err = engine.updateNetworkMap(networkMap)
-	require.NoError(t, err)
-
-	time.Sleep(250 * time.Millisecond)
-	assert.NotNil(t, engine.sshServer)
-
-	// now remove peer
-	networkMap = &mgmtProto.NetworkMap{
-		Serial:             8,
-		RemotePeers:        []*mgmtProto.RemotePeerConfig{},
-		RemotePeersIsEmpty: false,
-	}
-
-	err = engine.updateNetworkMap(networkMap)
-	require.NoError(t, err)
-
-	// time.Sleep(250 * time.Millisecond)
-	assert.NotNil(t, engine.sshServer)
-
-	// now disable SSH server
-	networkMap = &mgmtProto.NetworkMap{
-		Serial: 9,
-		PeerConfig: &mgmtProto.PeerConfig{Address: "100.64.0.1/24",
-			SshConfig: &mgmtProto.SSHConfig{SshEnabled: false}},
-		RemotePeers:        []*mgmtProto.RemotePeerConfig{peerWithSSH},
-		RemotePeersIsEmpty: false,
-	}
-
-	err = engine.updateNetworkMap(networkMap)
-	require.NoError(t, err)
-
-	assert.Nil(t, engine.sshServer)
-}
-
 func TestEngine_SSHUpdateLogic(t *testing.T) {
 	// Test that SSH server start/stop logic works based on config
 	engine := &Engine{
@@ -631,97 +462,6 @@ func TestEngine_UpdateNetworkMap(t *testing.T) {
 	}
 }

-func TestEngine_Sync(t *testing.T) {
-	key, err := wgtypes.GeneratePrivateKey()
-	if err != nil {
-		t.Fatal(err)
-		return
-	}
-
-	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
-	defer cancel()
-
-	// feed updates to Engine via mocked Management client
-	updates := make(chan *mgmtProto.SyncResponse)
-	defer close(updates)
-	syncFunc := func(ctx context.Context, info *system.Info, msgHandler func(msg *mgmtProto.SyncResponse) error) error {
-		for msg := range updates {
-			err := msgHandler(msg)
-			if err != nil {
-				t.Fatal(err)
-			}
-		}
-		return nil
-	}
-	relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
-	engine := NewEngine(ctx, cancel, &EngineConfig{
-		WgIfaceName:  "utun103",
-		WgAddr:       wgaddr.MustParseWGAddress("100.64.0.1/24"),
-		WgPrivateKey: key,
-		WgPort:       33100,
-		MTU:          iface.DefaultMTU,
-	}, EngineServices{
-		SignalClient:   &signal.MockClient{},
-		MgmClient:      &mgmt.MockClient{SyncFunc: syncFunc},
-		RelayManager:   relayMgr,
-		StatusRecorder: peer.NewRecorder("https://mgm"),
-	}, MobileDependency{})
-	engine.ctx = ctx
-
-	engine.dnsServer = &dns.MockServer{
-		UpdateDNSServerFunc: func(serial uint64, update nbdns.Config) error { return nil },
-	}
-
-	defer func() {
-		err := engine.Stop()
-		if err != nil {
-			return
-		}
-	}()
-
-	err = engine.Start(nil, nil)
-	if err != nil {
-		t.Fatal(err)
-		return
-	}
-
-	peer1 := &mgmtProto.RemotePeerConfig{
-		WgPubKey:   "RRHf3Ma6z6mdLbriAJbqhX7+nM/B71lgw2+91q3LfhU=",
-		AllowedIps: []string{"100.64.0.10/24"},
-	}
-	peer2 := &mgmtProto.RemotePeerConfig{
-		WgPubKey:   "LLHf3Ma6z6mdLbriAJbqhX9+nM/B71lgw2+91q3LlhU=",
-		AllowedIps: []string{"100.64.0.11/24"},
-	}
-	peer3 := &mgmtProto.RemotePeerConfig{
-		WgPubKey:   "GGHf3Ma6z6mdLbriAJbqhX9+nM/B71lgw2+91q3LlhU=",
-		AllowedIps: []string{"100.64.0.12/24"},
-	}
-	// 1st update with just 1 peer and serial larger than the current serial of the engine => apply update
-	updates <- &mgmtProto.SyncResponse{
-		NetworkMap: &mgmtProto.NetworkMap{
-			Serial:             10,
-			PeerConfig:         nil,
-			RemotePeers:        []*mgmtProto.RemotePeerConfig{peer1, peer2, peer3},
-			RemotePeersIsEmpty: false,
-		},
-	}
-
-	timeout := time.After(time.Second * 2)
-	for {
-		select {
-		case <-timeout:
-			t.Fatalf("timeout while waiting for test to finish")
-			return
-		default:
-		}
-
-		if getPeers(engine) == 3 && engine.networkSerial == 10 {
-			break
-		}
-	}
-}
-
 func TestEngine_UpdateNetworkMapWithRoutes(t *testing.T) {
 	testCases := []struct {
 		name                 string
@@ -1105,104 +845,6 @@ func TestEngine_UpdateNetworkMapWithDNSUpdate(t *testing.T) {
 	}
 }

-func TestEngine_MultiplePeers(t *testing.T) {
-	// log.SetLevel(log.DebugLevel)
-
-	ctx, cancel := context.WithCancel(CtxInitState(context.Background()))
-	defer cancel()
-
-	sigServer, signalAddr, err := startSignal(t)
-	if err != nil {
-		t.Fatal(err)
-		return
-	}
-	defer sigServer.Stop()
-	mgmtServer, mgmtAddr, err := startManagement(t, t.TempDir(), "../testdata/store.sql")
-	if err != nil {
-		t.Fatal(err)
-		return
-	}
-	defer mgmtServer.GracefulStop()
-
-	setupKey := "A2C8E62B-38F5-4553-B31E-DD66C696CEBB"
-
-	mu := sync.Mutex{}
-	engines := []*Engine{}
-	numPeers := 10
-	wg := sync.WaitGroup{}
-	wg.Add(numPeers)
-	// create and start peers
-	for i := 0; i < numPeers; i++ {
-		j := i
-		go func() {
-			engine, err := createEngine(ctx, cancel, setupKey, j, mgmtAddr, signalAddr)
-			if err != nil {
-				wg.Done()
-				t.Errorf("unable to create the engine for peer %d with error %v", j, err)
-				return
-			}
-			engine.dnsServer = &dns.MockServer{}
-			mu.Lock()
-			defer mu.Unlock()
-			guid := fmt.Sprintf("{%s}", uuid.New().String())
-			device.CustomWindowsGUIDString = strings.ToLower(guid)
-			err = engine.Start(nil, nil)
-			if err != nil {
-				t.Errorf("unable to start engine for peer %d with error %v", j, err)
-				wg.Done()
-				return
-			}
-			engines = append(engines, engine)
-			wg.Done()
-		}()
-	}
-
-	// wait until all have been created and started
-	wg.Wait()
-	if len(engines) != numPeers {
-		t.Fatal("not all peers was started")
-	}
-	// check whether all the peer have expected peers connected
-
-	expectedConnected := numPeers * (numPeers - 1)
-
-	// adjust according to timeouts
-	timeout := 50 * time.Second
-	timeoutChan := time.After(timeout)
-	ticker := time.NewTicker(time.Second)
-	defer ticker.Stop()
-loop:
-	for {
-		select {
-		case <-timeoutChan:
-			t.Fatalf("waiting for expected connections timeout after %s", timeout.String())
-			break loop
-		case <-ticker.C:
-			totalConnected := 0
-			for _, engine := range engines {
-				totalConnected += getConnectedPeers(engine)
-			}
-			if totalConnected == expectedConnected {
-				log.Infof("total connected=%d", totalConnected)
-				break loop
-			}
-			log.Infof("total connected=%d", totalConnected)
-		}
-	}
-	// cleanup test
-	for n, peerEngine := range engines {
-		t.Logf("stopping peer with interface %s from multipeer test, loopIndex %d", peerEngine.wgInterface.Name(), n)
-		errStop := peerEngine.mgmClient.Close()
-		if errStop != nil {
-			log.Infoln("got error trying to close management clients from engine: ", errStop)
-		}
-		errStop = peerEngine.Stop()
-		if errStop != nil {
-			log.Infoln("got error trying to close testing peers engine: ", errStop)
-		}
-	}
-}
-
 func Test_ParseNATExternalIPMappings(t *testing.T) {
 	ifaceList, err := net.Interfaces()
 	if err != nil {
@@ -1526,187 +1168,6 @@ func TestCompareNetIPLists(t *testing.T) {
 	}
 }

-func createEngine(ctx context.Context, cancel context.CancelFunc, setupKey string, i int, mgmtAddr string, signalAddr string) (*Engine, error) {
-	key, err := wgtypes.GeneratePrivateKey()
-	if err != nil {
-		return nil, err
-	}
-	mgmtClient, err := mgmt.NewClient(ctx, mgmtAddr, key, false)
-	if err != nil {
-		return nil, err
-	}
-	signalClient, err := signal.NewClient(ctx, signalAddr, key, false)
-	if err != nil {
-		return nil, err
-	}
-
-	info := system.GetInfo(ctx)
-	resp, err := mgmtClient.Register(setupKey, "", info, nil, nil)
-	if err != nil {
-		return nil, err
-	}
-
-	var ifaceName string
-	if runtime.GOOS == "darwin" {
-		ifaceName = fmt.Sprintf("utun1%d", i)
-	} else {
-		ifaceName = fmt.Sprintf("wt%d", i)
-	}
-
-	wgPort := 33100 + i
-	conf := &EngineConfig{
-		WgIfaceName:  ifaceName,
-		WgAddr:       wgaddr.MustParseWGAddress(resp.PeerConfig.Address),
-		WgPrivateKey: key,
-		WgPort:       wgPort,
-		MTU:          iface.DefaultMTU,
-	}
-
-	relayMgr := relayClient.NewManager(ctx, nil, key.PublicKey().String(), iface.DefaultMTU)
-	e, err := NewEngine(ctx, cancel, conf, EngineServices{
-		SignalClient:   signalClient,
-		MgmClient:      mgmtClient,
-		RelayManager:   relayMgr,
-		StatusRecorder: peer.NewRecorder("https://mgm"),
-	}, MobileDependency{}), nil
-	e.ctx = ctx
-	return e, err
-}
-
-func startSignal(t *testing.T) (*grpc.Server, string, error) {
-	t.Helper()
-
-	s := grpc.NewServer(grpc.KeepaliveEnforcementPolicy(kaep), grpc.KeepaliveParams(kasp))
-
-	lis, err := net.Listen("tcp", "localhost:0")
-	if err != nil {
-		log.Fatalf("failed to listen: %v", err)
-	}
-
-	srv, err := signalServer.NewServer(context.Background(), otel.Meter(""))
-	require.NoError(t, err)
-	proto.RegisterSignalExchangeServer(s, srv)
-
-	go func() {
-		if err = s.Serve(lis); err != nil {
-			log.Fatalf("failed to serve: %v", err)
-		}
-	}()
-
-	return s, lis.Addr().String(), nil
-}
-
-func startManagement(t *testing.T, dataDir, testFile string) (*grpc.Server, string, error) {
-	t.Helper()
-
-	config := &config.Config{
-		Stuns:      []*config.Host{},
-		TURNConfig: &config.TURNConfig{},
-		Relay: &config.Relay{
-			Addresses:      []string{"127.0.0.1:1234"},
-			CredentialsTTL: util.Duration{Duration: time.Hour},
-			Secret:         "222222222222222222",
-		},
-		Signal: &config.Host{
-			Proto: "http",
-			URI:   "localhost:10000",
-		},
-		Datadir:    dataDir,
-		HttpConfig: nil,
-	}
-
-	lis, err := net.Listen("tcp", "localhost:0")
-	if err != nil {
-		return nil, "", err
-	}
-	s := grpc.NewServer(grpc.KeepaliveEnforcementPolicy(kaep), grpc.KeepaliveParams(kasp))
-
-	store, cleanUp, err := store.NewTestStoreFromSQL(context.Background(), testFile, config.Datadir)
-	if err != nil {
-		return nil, "", err
-	}
-	t.Cleanup(cleanUp)
-
-	eventStore := &activity.InMemoryEventStore{}
-	if err != nil {
-		return nil, "", err
-	}
-
-	permissionsManager := permissions.NewManager(store)
-	peersManager := peers.NewManager(store, permissionsManager)
-	jobManager := job.NewJobManager(nil, store, peersManager)
-
-	cacheStore, err := nbcache.NewStore(context.Background(), 100*time.Millisecond, 300*time.Millisecond, 100)
-	if err != nil {
-		return nil, "", err
-	}
-
-	ia, _ := validator.NewIntegratedValidator(context.Background(), peersManager, nil, eventStore, cacheStore)
-
-	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
-	require.NoError(t, err)
-
-	ctrl := gomock.NewController(t)
-	t.Cleanup(ctrl.Finish)
-	settingsMockManager := settings.NewMockManager(ctrl)
-	settingsMockManager.EXPECT().
-		GetSettings(gomock.Any(), gomock.Any(), gomock.Any()).
-		Return(&types.Settings{}, nil).
-		AnyTimes()
-	settingsMockManager.EXPECT().
-		GetExtraSettings(gomock.Any(), gomock.Any()).
-		Return(&types.ExtraSettings{}, nil).
-		AnyTimes()
-
-	groupsManager := groups.NewManagerMock()
-
-	updateManager := update_channel.NewPeersUpdateManager(metrics)
-	requestBuffer := server.NewAccountRequestBuffer(context.Background(), store)
-	networkMapController := controller.NewController(context.Background(), store, metrics, updateManager, requestBuffer, server.MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), manager.NewEphemeralManager(store, peersManager), config)
-	accountManager, err := server.BuildManager(context.Background(), config, store, networkMapController, jobManager, nil, "", eventStore, nil, false, ia, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManager, false, cacheStore)
-	if err != nil {
-		return nil, "", err
-	}
-
-	secretsManager, err := nbgrpc.NewTimeBasedAuthSecretsManager(updateManager, config.TURNConfig, config.Relay, settingsMockManager, groupsManager)
-	if err != nil {
-		return nil, "", err
-	}
-	mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController, nil, nil)
-	if err != nil {
-		return nil, "", err
-	}
-	mgmtProto.RegisterManagementServiceServer(s, mgmtServer)
-	go func() {
-		if err = s.Serve(lis); err != nil {
-			log.Fatalf("failed to serve: %v", err)
-		}
-	}()
-
-	return s, lis.Addr().String(), nil
-}
-
-// getConnectedPeers returns a connection Status or nil if peer connection wasn't found
-func getConnectedPeers(e *Engine) int {
-	e.syncMsgMux.Lock()
-	defer e.syncMsgMux.Unlock()
-	i := 0
-	for _, id := range e.peerStore.PeersPubKey() {
-		conn, _ := e.peerStore.PeerConn(id)
-		if conn.IsConnected() {
-			i++
-		}
-	}
-	return i
-}
-
-func getPeers(e *Engine) int {
-	e.syncMsgMux.Lock()
-	defer e.syncMsgMux.Unlock()
-
-	return len(e.peerStore.PeersPubKey())
-}
-
 func mustEncodePrefix(t *testing.T, p netip.Prefix) []byte {
 	t.Helper()
 	b, err := netiputil.EncodePrefix(p)
--- a/client/internal/lazyconn/activity/listener_bind.go
+++ b/client/internal/lazyconn/activity/listener_bind.go
@@ -119,10 +119,6 @@ func (d *BindListener) ReadPackets() {
 	}

 	d.peerCfg.Log.Debugf("removing lazy endpoint for peer %s", d.peerCfg.PublicKey)
-	if err := d.wgIface.RemovePeer(d.peerCfg.PublicKey); err != nil {
-		d.peerCfg.Log.Errorf("failed to remove endpoint: %s", err)
-	}
-
 	_ = d.lazyConn.Close()
 	d.bind.RemoveEndpoint(d.fakeIP)
 	d.done.Done()
--- a/client/internal/metrics/influxdb.go
+++ b/client/internal/metrics/influxdb.go
@@ -120,6 +120,30 @@ func (m *influxDBMetrics) RecordSyncDuration(_ context.Context, agentInfo AgentI
 	m.trimLocked()
 }

+func (m *influxDBMetrics) RecordSyncPhase(_ context.Context, agentInfo AgentInfo, phase string, duration time.Duration) {
+	tags := fmt.Sprintf("deployment_type=%s,version=%s,os=%s,arch=%s,peer_id=%s,phase=%s",
+		agentInfo.DeploymentType.String(),
+		agentInfo.Version,
+		agentInfo.OS,
+		agentInfo.Arch,
+		agentInfo.peerID,
+		phase,
+	)
+
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	m.samples = append(m.samples, influxSample{
+		measurement: "netbird_sync_phase",
+		tags:        tags,
+		fields: map[string]float64{
+			"duration_seconds": duration.Seconds(),
+		},
+		timestamp: time.Now(),
+	})
+	m.trimLocked()
+}
+
 func (m *influxDBMetrics) RecordLoginDuration(_ context.Context, agentInfo AgentInfo, duration time.Duration, success bool) {
 	result := "success"
 	if !success {
--- a/client/internal/metrics/infra/README.md
+++ b/client/internal/metrics/infra/README.md
@@ -78,6 +78,25 @@ Tags:
 - `os`: Operating system (linux, darwin, windows, android, ios, etc.)
 - `arch`: CPU architecture (amd64, arm64, etc.)

+### Sync Phase Timing
+
+Measurement: `netbird_sync_phase`
+
+Breaks down where time goes inside a single sync, so the total `netbird_sync` duration can be attributed to the sub-step that dominates.
+
+| Field | Description |
+|-------|-------------|
+| `duration_seconds` | Time spent in one sub-phase of sync processing |
+
+Tags:
+- `phase`: the sub-phase — `netbird_config`, `checks`, `persist`, `dns_server`, `routes_classify`, `routes_apply`, `filtering`, `dns_forwarder`, `forward_rules`, `offline_peers`, `removed_peers`, `modified_peers`, `added_peers`, `lazy_exclude`
+- `deployment_type`: "cloud" | "selfhosted" | "unknown"
+- `version`: NetBird version string
+- `os`: Operating system (linux, darwin, windows, android, ios, etc.)
+- `arch`: CPU architecture (amd64, arm64, etc.)
+
+**Note:** this is wall-time per phase — it includes both CPU work and time spent waiting on locks. A slow phase points to *where* the time goes, not *why*; pair it with lock-wait metrics to tell contention apart from real work.
+
 ### Login Duration

 Measurement: `netbird_login`
@@ -191,4 +210,52 @@ docker compose exec influxdb influx query \

 # Check ingest server health
 curl http://localhost:8087/health
-```
+```
+
+## Analyzing a Debug Bundle
+
+Metrics collection is always on, so every debug bundle ships a `metrics.txt` in InfluxDB line protocol — a timestamped time series of all recorded events (sync durations, sync phases, connection stages, login). You can replay it into the local stack and graph it, without a running client.
+
+The bundle's `metrics.txt` is a rolling window (capped at 5 days / ~20k samples, see [Buffer Limits](#buffer-limits)). For a connection incident the relevant window is short (connection setup is seconds), so a bundle captured during the issue is enough.
+
+### 1. Start the stack
+
+```bash
+# From this directory (client/internal/metrics/infra)
+INFLUXDB_ADMIN_TOKEN=admin123 INFLUXDB_ADMIN_PASSWORD=admin123 GRAFANA_ADMIN_PASSWORD=admin123 \
+  docker compose up -d
+```
+
+(`admin123` are throwaway local credentials — fine for offline analysis.)
+
+### 2. Clear any previous data
+
+So you only see this bundle:
+
+```bash
+docker exec influxdb influx delete --org netbird --bucket metrics --token admin123 \
+  --start 1970-01-01T00:00:00Z --stop 2100-01-01T00:00:00Z
+```
+
+### 3. Import the bundle's metrics.txt
+
+InfluxDB is not exposed on the host, so import inside the container:
+
+```bash
+docker cp /path/to/bundle/metrics.txt influxdb:/tmp/m.txt
+docker exec influxdb influx write --org netbird --bucket metrics --precision ns \
+  --token admin123 --file /tmp/m.txt
+```
+
+Re-importing the same file is idempotent (same measurement+tags+timestamp overwrites).
+
+### 4. View the dashboards
+
+Grafana on http://localhost:3001 (login `admin` / `admin123`), datasource pre-provisioned:
+
+- **Where sync time goes:** http://localhost:3001/d/netbird-sync-phases/netbird-sync-phases-where-time-goes
+- **General client metrics:** http://localhost:3001/d/netbird-influxdb-metrics
+
+**Set the time range** to cover the bundle's timestamps (e.g. "Last 7 days" or an absolute range matching when the bundle was taken) — with the default short range the panels look empty.
+
+Bundles are distinguishable by the `version` tag; add a tag at import time (e.g. `sed 's/^netbird_\([a-z_]*\),/netbird_\1,bundle=mycase,/' metrics.txt`) if you want to compare several side by side.
--- a/client/internal/metrics/metrics.go
+++ b/client/internal/metrics/metrics.go
@@ -56,6 +56,9 @@ type metricsImplementation interface {
 	// RecordSyncDuration records how long it took to process a sync message
 	RecordSyncDuration(ctx context.Context, agentInfo AgentInfo, duration time.Duration)

+	// RecordSyncPhase records how long a single sub-phase of sync processing took
+	RecordSyncPhase(ctx context.Context, agentInfo AgentInfo, phase string, duration time.Duration)
+
 	// RecordLoginDuration records how long the login to management took
 	RecordLoginDuration(ctx context.Context, agentInfo AgentInfo, duration time.Duration, success bool)

@@ -127,6 +130,18 @@ func (c *ClientMetrics) RecordSyncDuration(ctx context.Context, duration time.Du
 	c.impl.RecordSyncDuration(ctx, agentInfo, duration)
 }

+// RecordSyncPhase records the duration of a single sub-phase of sync processing
+func (c *ClientMetrics) RecordSyncPhase(ctx context.Context, phase string, duration time.Duration) {
+	if c == nil {
+		return
+	}
+	c.mu.RLock()
+	agentInfo := c.agentInfo
+	c.mu.RUnlock()
+
+	c.impl.RecordSyncPhase(ctx, agentInfo, phase, duration)
+}
+
 // RecordLoginDuration records how long the login to management server took
 func (c *ClientMetrics) RecordLoginDuration(ctx context.Context, duration time.Duration, success bool) {
 	if c == nil {
--- a/client/internal/metrics/push_test.go
+++ b/client/internal/metrics/push_test.go
@@ -70,6 +70,9 @@ func (m *mockMetrics) RecordConnectionStages(_ context.Context, _ AgentInfo, _ s
 func (m *mockMetrics) RecordSyncDuration(_ context.Context, _ AgentInfo, _ time.Duration) {
 }

+func (m *mockMetrics) RecordSyncPhase(_ context.Context, _ AgentInfo, _ string, _ time.Duration) {
+}
+
 func (m *mockMetrics) RecordLoginDuration(_ context.Context, _ AgentInfo, _ time.Duration, _ bool) {
 }

--- a/client/internal/peer/handshaker.go
+++ b/client/internal/peer/handshaker.go
@@ -195,14 +195,14 @@ func (h *Handshaker) sendOffer() error {
 	}

 	offer := h.buildOfferAnswer()
-	h.log.Infof("sending offer with serial: %s", offer.SessionIDString())
+	h.log.Debugf("sending offer with serial: %s", offer.SessionIDString())

 	return h.signaler.SignalOffer(offer, h.config.Key)
 }

 func (h *Handshaker) sendAnswer() error {
 	answer := h.buildOfferAnswer()
-	h.log.Infof("sending answer with serial: %s", answer.SessionIDString())
+	h.log.Debugf("sending answer with serial: %s", answer.SessionIDString())

 	return h.signaler.SignalAnswer(answer, h.config.Key)
 }
--- a/client/internal/peer/status.go
+++ b/client/internal/peer/status.go
@@ -192,6 +192,7 @@ func (s *StatusChangeSubscription) Events() chan map[string]RouterState {
 // Pure read methods take RLock; anything that mutates state takes Lock.
 type Status struct {
 	mux                   sync.RWMutex
+	muxRelays             sync.RWMutex
 	peers                 map[string]State
 	ipToKey               map[string]string
 	changeNotify          map[string]map[string]*StatusChangeSubscription // map[peerID]map[subscriptionID]*StatusChangeSubscription
@@ -244,8 +245,8 @@ func NewRecorder(mgmAddress string) *Status {
 }

 func (d *Status) SetRelayMgr(manager *relayClient.Manager) {
-	d.mux.Lock()
-	defer d.mux.Unlock()
+	d.muxRelays.Lock()
+	defer d.muxRelays.Unlock()
 	d.relayMgr = manager
 }

@@ -906,8 +907,8 @@ func (d *Status) MarkSignalConnected() {
 }

 func (d *Status) UpdateRelayStates(relayResults []relay.ProbeResult) {
-	d.mux.Lock()
-	defer d.mux.Unlock()
+	d.muxRelays.Lock()
+	defer d.muxRelays.Unlock()
 	d.relayStates = relayResults
 }

@@ -1018,24 +1019,26 @@ func (d *Status) GetSignalState() SignalState {

 // GetRelayStates returns the stun/turn/permanent relay states
 func (d *Status) GetRelayStates() []relay.ProbeResult {
-	d.mux.RLock()
-	defer d.mux.RUnlock()
+	d.muxRelays.RLock()
 	if d.relayMgr == nil {
-		return d.relayStates
+		defer d.muxRelays.RUnlock()
+		return slices.Clone(d.relayStates)
 	}

+	relayMgr := d.relayMgr
 	// extend the list of stun, turn servers with the relay server connections
 	relayStates := slices.Clone(d.relayStates)
+	d.muxRelays.RUnlock()

-	states := d.relayMgr.RelayStates()
+	states := relayMgr.RelayStates()
 	if len(states) == 0 {
 		// no relay connection tracked yet; surface configured servers as
 		// unavailable with the real reconnect error when known
 		err := relayClient.ErrRelayClientNotConnected
-		if connErr := d.relayMgr.RelayConnectError(); connErr != nil {
+		if connErr := relayMgr.RelayConnectError(); connErr != nil {
 			err = connErr
 		}
-		for _, r := range d.relayMgr.ServerURLs() {
+		for _, r := range relayMgr.ServerURLs() {
 			relayStates = append(relayStates, relay.ProbeResult{
 				URI: r,
 				Err: err,
--- a/client/internal/profilemanager/config.go
+++ b/client/internal/profilemanager/config.go
@@ -433,7 +433,7 @@ func (config *Config) apply(input ConfigInput) (updated bool, err error) {
 		updated = true
 	}

-	if input.ServerSSHAllowed != nil && *input.ServerSSHAllowed != *config.ServerSSHAllowed {
+	if input.ServerSSHAllowed != nil && (config.ServerSSHAllowed == nil || *input.ServerSSHAllowed != *config.ServerSSHAllowed) {
 		if *input.ServerSSHAllowed {
 			log.Infof("enabling SSH server")
 		} else {
--- a/client/internal/profilemanager/config_test.go
+++ b/client/internal/profilemanager/config_test.go
@@ -242,6 +242,35 @@ func TestWireguardPortDefaultVsExplicit(t *testing.T) {
 	}
 }

+func TestUpdateConfigServerSSHAllowedNotSet(t *testing.T) {
+	// Configs written before ServerSSHAllowed was introduced lack the field and
+	// unmarshal to nil. Supplying the SSH server flag on top of such a config must
+	// apply the value instead of panicking on a nil pointer dereference.
+	tests := []struct {
+		name  string
+		input *bool
+		want  bool
+	}{
+		{"enable", util.True(), true},
+		{"disable", util.False(), false},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			configPath := filepath.Join(t.TempDir(), "config.json")
+			require.NoError(t, os.WriteFile(configPath, []byte("{}"), 0600))
+
+			config, err := UpdateConfig(ConfigInput{
+				ConfigPath:       configPath,
+				ServerSSHAllowed: tt.input,
+			})
+			require.NoError(t, err)
+			require.NotNil(t, config.ServerSSHAllowed, "ServerSSHAllowed should be set from input")
+			assert.Equal(t, tt.want, *config.ServerSSHAllowed)
+		})
+	}
+}
+
 func TestUpdateOldManagementURL(t *testing.T) {
 	origProber := newMgmProber
 	newMgmProber = func(_ context.Context, _ string, _ wgtypes.Key, _ bool) (mgmProber, error) {
--- a/client/internal/routemanager/dnsinterceptor/handler.go
+++ b/client/internal/routemanager/dnsinterceptor/handler.go
@@ -226,12 +226,11 @@ func (d *DnsInterceptor) ServeDNS(w dns.ResponseWriter, r *dns.Msg) {
 		return
 	}

-	// pass if non A/AAAA query
-	if r.Question[0].Qtype != dns.TypeA && r.Question[0].Qtype != dns.TypeAAAA {
-		d.continueToNextHandler(w, r, logger, "non A/AAAA query")
-		return
-	}
-
+	// All query types for an intercepted domain are forwarded to the peer's
+	// DNS forwarder, which owns the name. Falling through to the system
+	// resolver would let it answer NXDOMAIN for a name it isn't authoritative
+	// for, poisoning the whole name (including the A/AAAA records the route
+	// does serve). The forwarder answers NODATA for types it cannot resolve.
 	d.mu.RLock()
 	peerKey := d.currentPeerKey
 	d.mu.RUnlock()
@@ -293,19 +292,6 @@ func (d *DnsInterceptor) writeDNSError(w dns.ResponseWriter, r *dns.Msg, logger
 	}
 }

-// continueToNextHandler signals the handler chain to try the next handler
-func (d *DnsInterceptor) continueToNextHandler(w dns.ResponseWriter, r *dns.Msg, logger *log.Entry, reason string) {
-	logger.Tracef("continuing to next handler for domain=%s reason=%s", r.Question[0].Name, reason)
-
-	resp := new(dns.Msg)
-	resp.SetRcode(r, dns.RcodeNameError)
-	// Set Zero bit to signal handler chain to continue
-	resp.MsgHdr.Zero = true
-	if err := w.WriteMsg(resp); err != nil {
-		logger.Errorf("failed writing DNS continue response: %v", err)
-	}
-}
-
 func (d *DnsInterceptor) getUpstreamIP(peerKey string) (netip.Addr, error) {
 	peerAllowedIP, exists := d.peerStore.AllowedIP(peerKey)
 	if !exists {
--- a/client/internal/routemanager/manager_test.go
+++ b/client/internal/routemanager/manager_test.go
@@ -1,3 +1,5 @@
+//go:build privileged
+
 package routemanager

 import (
--- a/client/internal/routemanager/systemops/rt_tables_linux_test.go
+++ b/client/internal/routemanager/systemops/rt_tables_linux_test.go
@@ -0,0 +1,69 @@
+//go:build linux && !android
+
+package systemops
+
+import (
+	"fmt"
+	"os"
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestEntryExists(t *testing.T) {
+	tempDir := t.TempDir()
+	tempFilePath := fmt.Sprintf("%s/rt_tables", tempDir)
+
+	content := []string{
+		"1000 reserved",
+		fmt.Sprintf("%d %s", NetbirdVPNTableID, NetbirdVPNTableName),
+		"9999 other_table",
+	}
+	require.NoError(t, os.WriteFile(tempFilePath, []byte(strings.Join(content, "\n")), 0644))
+
+	file, err := os.Open(tempFilePath)
+	require.NoError(t, err)
+	defer func() {
+		assert.NoError(t, file.Close())
+	}()
+
+	tests := []struct {
+		name        string
+		id          int
+		shouldExist bool
+		err         error
+	}{
+		{
+			name:        "ExistsWithNetbirdPrefix",
+			id:          7120,
+			shouldExist: true,
+			err:         nil,
+		},
+		{
+			name:        "ExistsWithDifferentName",
+			id:          1000,
+			shouldExist: true,
+			err:         ErrTableIDExists,
+		},
+		{
+			name:        "DoesNotExist",
+			id:          1234,
+			shouldExist: false,
+			err:         nil,
+		},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			exists, err := entryExists(file, tc.id)
+			if tc.err != nil {
+				assert.ErrorIs(t, err, tc.err)
+			} else {
+				assert.NoError(t, err)
+			}
+			assert.Equal(t, tc.shouldExist, exists)
+		})
+	}
+}
--- a/client/internal/routemanager/systemops/systemops_bsd_privileged_test.go
+++ b/client/internal/routemanager/systemops/systemops_bsd_privileged_test.go
@@ -0,0 +1,191 @@
+//go:build (darwin || dragonfly || freebsd || netbsd || openbsd) && privileged
+
+package systemops
+
+import (
+	"fmt"
+	"net"
+	"net/netip"
+	"os/exec"
+	"regexp"
+	"runtime"
+	"strings"
+	"sync"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func init() {
+	testCases = append(testCases, []testCase{
+		{
+			name:              "To more specific route without custom dialer via vpn",
+			expectedInterface: expectedVPNint,
+			dialer:            &net.Dialer{},
+			expectedPacket:    createPacketExpectation("100.64.0.1", 12345, "10.10.0.2", 53),
+		},
+	}...)
+}
+
+func TestConcurrentRoutes(t *testing.T) {
+	baseIP := netip.MustParseAddr("192.0.2.0")
+
+	var intf *net.Interface
+	var nexthop Nexthop
+
+	_, intf = setupDummyInterface(t)
+	nexthop = Nexthop{netip.Addr{}, intf}
+
+	r := New(nil, nil)
+
+	var wg sync.WaitGroup
+	for i := 0; i < 1024; i++ {
+		wg.Add(1)
+		go func(ip netip.Addr) {
+			defer wg.Done()
+			prefix := netip.PrefixFrom(ip, 32)
+			if err := r.addToRouteTable(prefix, nexthop); err != nil {
+				t.Errorf("Failed to add route for %s: %v", prefix, err)
+			}
+		}(baseIP)
+		baseIP = baseIP.Next()
+	}
+
+	wg.Wait()
+
+	baseIP = netip.MustParseAddr("192.0.2.0")
+
+	for i := 0; i < 1024; i++ {
+		wg.Add(1)
+		go func(ip netip.Addr) {
+			defer wg.Done()
+			prefix := netip.PrefixFrom(ip, 32)
+			if err := r.removeFromRouteTable(prefix, nexthop); err != nil {
+				t.Errorf("Failed to remove route for %s: %v", prefix, err)
+			}
+		}(baseIP)
+		baseIP = baseIP.Next()
+	}
+
+	wg.Wait()
+}
+
+func createAndSetupDummyInterface(t *testing.T, intf string, ipAddressCIDR string) string {
+	t.Helper()
+
+	if runtime.GOOS == "darwin" {
+		err := exec.Command("ifconfig", intf, "alias", ipAddressCIDR).Run()
+		require.NoError(t, err, "Failed to create loopback alias")
+
+		t.Cleanup(func() {
+			err := exec.Command("ifconfig", intf, ipAddressCIDR, "-alias").Run()
+			assert.NoError(t, err, "Failed to remove loopback alias")
+		})
+
+		return intf
+	}
+
+	prefix, err := netip.ParsePrefix(ipAddressCIDR)
+	require.NoError(t, err, "Failed to parse prefix")
+
+	netIntf, err := net.InterfaceByName(intf)
+	require.NoError(t, err, "Failed to get interface by name")
+
+	nexthop := Nexthop{netip.Addr{}, netIntf}
+
+	r := New(nil, nil)
+	err = r.addToRouteTable(prefix, nexthop)
+	require.NoError(t, err, "Failed to add route to table")
+
+	t.Cleanup(func() {
+		err := r.removeFromRouteTable(prefix, nexthop)
+		assert.NoError(t, err, "Failed to remove route from table")
+	})
+
+	return intf
+}
+
+func addDummyRoute(t *testing.T, dstCIDR string, gw netip.Addr, _ string) {
+	t.Helper()
+
+	var originalNexthop net.IP
+	if dstCIDR == "0.0.0.0/0" {
+		var err error
+		originalNexthop, err = fetchOriginalGateway()
+		if err != nil {
+			t.Logf("Failed to fetch original gateway: %v", err)
+		}
+
+		if output, err := exec.Command("route", "delete", "-net", dstCIDR).CombinedOutput(); err != nil {
+			t.Logf("Failed to delete route: %v, output: %s", err, output)
+		}
+	}
+
+	t.Cleanup(func() {
+		if originalNexthop != nil {
+			err := exec.Command("route", "add", "-net", dstCIDR, originalNexthop.String()).Run()
+			assert.NoError(t, err, "Failed to restore original route")
+		}
+	})
+
+	err := exec.Command("route", "add", "-net", dstCIDR, gw.String()).Run()
+	require.NoError(t, err, "Failed to add route")
+
+	t.Cleanup(func() {
+		err := exec.Command("route", "delete", "-net", dstCIDR).Run()
+		assert.NoError(t, err, "Failed to remove route")
+	})
+}
+
+func fetchOriginalGateway() (net.IP, error) {
+	output, err := exec.Command("route", "-n", "get", "default").CombinedOutput()
+	if err != nil {
+		return nil, err
+	}
+
+	matches := regexp.MustCompile(`gateway: (\S+)`).FindStringSubmatch(string(output))
+	if len(matches) == 0 {
+		return nil, fmt.Errorf("gateway not found")
+	}
+
+	return net.ParseIP(matches[1]), nil
+}
+
+// setupDummyInterface creates a dummy tun interface for FreeBSD route testing
+func setupDummyInterface(t *testing.T) (netip.Addr, *net.Interface) {
+	t.Helper()
+
+	if runtime.GOOS == "darwin" {
+		return netip.AddrFrom4([4]byte{192, 168, 1, 2}), &net.Interface{Name: "lo0"}
+	}
+
+	output, err := exec.Command("ifconfig", "tun", "create").CombinedOutput()
+	require.NoError(t, err, "Failed to create tun interface: %s", string(output))
+
+	tunName := strings.TrimSpace(string(output))
+
+	output, err = exec.Command("ifconfig", tunName, "192.168.1.1", "netmask", "255.255.0.0", "192.168.1.2", "up").CombinedOutput()
+	require.NoError(t, err, "Failed to configure tun interface: %s", string(output))
+
+	intf, err := net.InterfaceByName(tunName)
+	require.NoError(t, err, "Failed to get interface by name")
+
+	t.Cleanup(func() {
+		if err := exec.Command("ifconfig", tunName, "destroy").Run(); err != nil {
+			t.Logf("Failed to destroy tun interface %s: %v", tunName, err)
+		}
+	})
+
+	return netip.AddrFrom4([4]byte{192, 168, 1, 2}), intf
+}
+
+func setupDummyInterfacesAndRoutes(t *testing.T) {
+	t.Helper()
+
+	defaultDummy := createAndSetupDummyInterface(t, expectedExternalInt, "192.168.0.1/24")
+	addDummyRoute(t, "0.0.0.0/0", netip.AddrFrom4([4]byte{192, 168, 0, 1}), defaultDummy)
+
+	otherDummy := createAndSetupDummyInterface(t, expectedInternalInt, "192.168.1.1/24")
+	addDummyRoute(t, "10.0.0.0/8", netip.AddrFrom4([4]byte{192, 168, 1, 1}), otherDummy)
+}
--- a/client/internal/routemanager/systemops/systemops_bsd_test.go
+++ b/client/internal/routemanager/systemops/systemops_bsd_test.go
@@ -3,79 +3,24 @@
 package systemops

 import (
-	"fmt"
-	"net"
-	"net/netip"
-	"os/exec"
-	"regexp"
-	"runtime"
-	"strings"
-	"sync"
 	"testing"

 	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
 	"golang.org/x/net/route"
 )

+// Interface names used by the shared routing test fixtures. Kept untagged (no
+// privileged build tag) so the non-privileged test files in this package compile.
+//
+//nolint:unused // consumed by the privileged-tagged routing tests
 var expectedVPNint = "utun100"
+
+//nolint:unused // consumed by the privileged-tagged routing tests
 var expectedExternalInt = "lo0"
+
+//nolint:unused // consumed by the privileged-tagged routing tests
 var expectedInternalInt = "lo0"

-func init() {
-	testCases = append(testCases, []testCase{
-		{
-			name:              "To more specific route without custom dialer via vpn",
-			expectedInterface: expectedVPNint,
-			dialer:            &net.Dialer{},
-			expectedPacket:    createPacketExpectation("100.64.0.1", 12345, "10.10.0.2", 53),
-		},
-	}...)
-}
-
-func TestConcurrentRoutes(t *testing.T) {
-	baseIP := netip.MustParseAddr("192.0.2.0")
-
-	var intf *net.Interface
-	var nexthop Nexthop
-
-	_, intf = setupDummyInterface(t)
-	nexthop = Nexthop{netip.Addr{}, intf}
-
-	r := New(nil, nil)
-
-	var wg sync.WaitGroup
-	for i := 0; i < 1024; i++ {
-		wg.Add(1)
-		go func(ip netip.Addr) {
-			defer wg.Done()
-			prefix := netip.PrefixFrom(ip, 32)
-			if err := r.addToRouteTable(prefix, nexthop); err != nil {
-				t.Errorf("Failed to add route for %s: %v", prefix, err)
-			}
-		}(baseIP)
-		baseIP = baseIP.Next()
-	}
-
-	wg.Wait()
-
-	baseIP = netip.MustParseAddr("192.0.2.0")
-
-	for i := 0; i < 1024; i++ {
-		wg.Add(1)
-		go func(ip netip.Addr) {
-			defer wg.Done()
-			prefix := netip.PrefixFrom(ip, 32)
-			if err := r.removeFromRouteTable(prefix, nexthop); err != nil {
-				t.Errorf("Failed to remove route for %s: %v", prefix, err)
-			}
-		}(baseIP)
-		baseIP = baseIP.Next()
-	}
-
-	wg.Wait()
-}
-
 func TestBits(t *testing.T) {
 	tests := []struct {
 		name    string
@@ -122,122 +67,3 @@ func TestBits(t *testing.T) {
 		})
 	}
 }
-
-func createAndSetupDummyInterface(t *testing.T, intf string, ipAddressCIDR string) string {
-	t.Helper()
-
-	if runtime.GOOS == "darwin" {
-		err := exec.Command("ifconfig", intf, "alias", ipAddressCIDR).Run()
-		require.NoError(t, err, "Failed to create loopback alias")
-
-		t.Cleanup(func() {
-			err := exec.Command("ifconfig", intf, ipAddressCIDR, "-alias").Run()
-			assert.NoError(t, err, "Failed to remove loopback alias")
-		})
-
-		return intf
-	}
-
-	prefix, err := netip.ParsePrefix(ipAddressCIDR)
-	require.NoError(t, err, "Failed to parse prefix")
-
-	netIntf, err := net.InterfaceByName(intf)
-	require.NoError(t, err, "Failed to get interface by name")
-
-	nexthop := Nexthop{netip.Addr{}, netIntf}
-
-	r := New(nil, nil)
-	err = r.addToRouteTable(prefix, nexthop)
-	require.NoError(t, err, "Failed to add route to table")
-
-	t.Cleanup(func() {
-		err := r.removeFromRouteTable(prefix, nexthop)
-		assert.NoError(t, err, "Failed to remove route from table")
-	})
-
-	return intf
-}
-
-func addDummyRoute(t *testing.T, dstCIDR string, gw netip.Addr, _ string) {
-	t.Helper()
-
-	var originalNexthop net.IP
-	if dstCIDR == "0.0.0.0/0" {
-		var err error
-		originalNexthop, err = fetchOriginalGateway()
-		if err != nil {
-			t.Logf("Failed to fetch original gateway: %v", err)
-		}
-
-		if output, err := exec.Command("route", "delete", "-net", dstCIDR).CombinedOutput(); err != nil {
-			t.Logf("Failed to delete route: %v, output: %s", err, output)
-		}
-	}
-
-	t.Cleanup(func() {
-		if originalNexthop != nil {
-			err := exec.Command("route", "add", "-net", dstCIDR, originalNexthop.String()).Run()
-			assert.NoError(t, err, "Failed to restore original route")
-		}
-	})
-
-	err := exec.Command("route", "add", "-net", dstCIDR, gw.String()).Run()
-	require.NoError(t, err, "Failed to add route")
-
-	t.Cleanup(func() {
-		err := exec.Command("route", "delete", "-net", dstCIDR).Run()
-		assert.NoError(t, err, "Failed to remove route")
-	})
-}
-
-func fetchOriginalGateway() (net.IP, error) {
-	output, err := exec.Command("route", "-n", "get", "default").CombinedOutput()
-	if err != nil {
-		return nil, err
-	}
-
-	matches := regexp.MustCompile(`gateway: (\S+)`).FindStringSubmatch(string(output))
-	if len(matches) == 0 {
-		return nil, fmt.Errorf("gateway not found")
-	}
-
-	return net.ParseIP(matches[1]), nil
-}
-
-// setupDummyInterface creates a dummy tun interface for FreeBSD route testing
-func setupDummyInterface(t *testing.T) (netip.Addr, *net.Interface) {
-	t.Helper()
-
-	if runtime.GOOS == "darwin" {
-		return netip.AddrFrom4([4]byte{192, 168, 1, 2}), &net.Interface{Name: "lo0"}
-	}
-
-	output, err := exec.Command("ifconfig", "tun", "create").CombinedOutput()
-	require.NoError(t, err, "Failed to create tun interface: %s", string(output))
-
-	tunName := strings.TrimSpace(string(output))
-
-	output, err = exec.Command("ifconfig", tunName, "192.168.1.1", "netmask", "255.255.0.0", "192.168.1.2", "up").CombinedOutput()
-	require.NoError(t, err, "Failed to configure tun interface: %s", string(output))
-
-	intf, err := net.InterfaceByName(tunName)
-	require.NoError(t, err, "Failed to get interface by name")
-
-	t.Cleanup(func() {
-		if err := exec.Command("ifconfig", tunName, "destroy").Run(); err != nil {
-			t.Logf("Failed to destroy tun interface %s: %v", tunName, err)
-		}
-	})
-
-	return netip.AddrFrom4([4]byte{192, 168, 1, 2}), intf
-}
-
-func setupDummyInterfacesAndRoutes(t *testing.T) {
-	t.Helper()
-
-	defaultDummy := createAndSetupDummyInterface(t, expectedExternalInt, "192.168.0.1/24")
-	addDummyRoute(t, "0.0.0.0/0", netip.AddrFrom4([4]byte{192, 168, 0, 1}), defaultDummy)
-
-	otherDummy := createAndSetupDummyInterface(t, expectedInternalInt, "192.168.1.1/24")
-	addDummyRoute(t, "10.0.0.0/8", netip.AddrFrom4([4]byte{192, 168, 1, 1}), otherDummy)
-}
--- a/client/internal/routemanager/systemops/systemops_dialer_test.go
+++ b/client/internal/routemanager/systemops/systemops_dialer_test.go
@@ -0,0 +1,17 @@
+//go:build !android && !ios
+
+package systemops
+
+import (
+	"context"
+	"net"
+)
+
+// dialer is shared by the per-platform routing test cases. Kept untagged (no
+// privileged build tag) so the non-privileged test files compile on every platform.
+//
+//nolint:unused // consumed by the privileged-tagged routing tests
+type dialer interface {
+	Dial(network, address string) (net.Conn, error)
+	DialContext(ctx context.Context, network, address string) (net.Conn, error)
+}
--- a/client/internal/routemanager/systemops/systemops_generic_test.go
+++ b/client/internal/routemanager/systemops/systemops_generic_test.go
@@ -1,4 +1,4 @@
-//go:build !android && !ios
+//go:build !android && !ios && privileged

 package systemops

@@ -26,11 +26,6 @@ import (
 	nbnet "github.com/netbirdio/netbird/client/net"
 )

-type dialer interface {
-	Dial(network, address string) (net.Conn, error)
-	DialContext(ctx context.Context, network, address string) (net.Conn, error)
-}
-
 func TestAddVPNRoute(t *testing.T) {
 	testCases := []struct {
 		name        string
@@ -515,125 +510,3 @@ func setupTestEnv(t *testing.T) {
 	// unique route in vpn table
 	setupRouteAndCleanup(t, r, netip.MustParsePrefix("172.16.0.0/12"), intf)
 }
-
-func TestIsVpnRoute(t *testing.T) {
-	tests := []struct {
-		name           string
-		addr           string
-		vpnRoutes      []string
-		localRoutes    []string
-		expectedVpn    bool
-		expectedPrefix netip.Prefix
-	}{
-		{
-			name:           "Match in VPN routes",
-			addr:           "192.168.1.1",
-			vpnRoutes:      []string{"192.168.1.0/24"},
-			localRoutes:    []string{"10.0.0.0/8"},
-			expectedVpn:    true,
-			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
-		},
-		{
-			name:           "Match in local routes",
-			addr:           "10.1.1.1",
-			vpnRoutes:      []string{"192.168.1.0/24"},
-			localRoutes:    []string{"10.0.0.0/8"},
-			expectedVpn:    false,
-			expectedPrefix: netip.MustParsePrefix("10.0.0.0/8"),
-		},
-		{
-			name:           "No match",
-			addr:           "172.16.0.1",
-			vpnRoutes:      []string{"192.168.1.0/24"},
-			localRoutes:    []string{"10.0.0.0/8"},
-			expectedVpn:    false,
-			expectedPrefix: netip.Prefix{},
-		},
-		{
-			name:           "Default route ignored",
-			addr:           "192.168.1.1",
-			vpnRoutes:      []string{"0.0.0.0/0", "192.168.1.0/24"},
-			localRoutes:    []string{"10.0.0.0/8"},
-			expectedVpn:    true,
-			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
-		},
-		{
-			name:           "Default route matches but ignored",
-			addr:           "172.16.1.1",
-			vpnRoutes:      []string{"0.0.0.0/0", "192.168.1.0/24"},
-			localRoutes:    []string{"10.0.0.0/8"},
-			expectedVpn:    false,
-			expectedPrefix: netip.Prefix{},
-		},
-		{
-			name:           "Longest prefix match local",
-			addr:           "192.168.1.1",
-			vpnRoutes:      []string{"192.168.0.0/16"},
-			localRoutes:    []string{"192.168.1.0/24"},
-			expectedVpn:    false,
-			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
-		},
-		{
-			name:           "Longest prefix match local multiple",
-			addr:           "192.168.0.1",
-			vpnRoutes:      []string{"192.168.0.0/16", "192.168.0.0/25", "192.168.0.0/27"},
-			localRoutes:    []string{"192.168.0.0/24", "192.168.0.0/26", "192.168.0.0/28"},
-			expectedVpn:    false,
-			expectedPrefix: netip.MustParsePrefix("192.168.0.0/28"),
-		},
-		{
-			name:           "Longest prefix match vpn",
-			addr:           "192.168.1.1",
-			vpnRoutes:      []string{"192.168.1.0/24"},
-			localRoutes:    []string{"192.168.0.0/16"},
-			expectedVpn:    true,
-			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
-		},
-		{
-			name:           "Longest prefix match vpn multiple",
-			addr:           "192.168.0.1",
-			vpnRoutes:      []string{"192.168.0.0/16", "192.168.0.0/25", "192.168.0.0/27"},
-			localRoutes:    []string{"192.168.0.0/24", "192.168.0.0/26"},
-			expectedVpn:    true,
-			expectedPrefix: netip.MustParsePrefix("192.168.0.0/27"),
-		},
-		{
-			name:           "Duplicate prefix in both",
-			addr:           "192.168.1.1",
-			vpnRoutes:      []string{"192.168.1.0/24"},
-			localRoutes:    []string{"192.168.1.0/24"},
-			expectedVpn:    false,
-			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
-		},
-	}
-
-	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
-			addr, err := netip.ParseAddr(tt.addr)
-			if err != nil {
-				t.Fatalf("Failed to parse address %s: %v", tt.addr, err)
-			}
-
-			var vpnRoutes, localRoutes []netip.Prefix
-			for _, route := range tt.vpnRoutes {
-				prefix, err := netip.ParsePrefix(route)
-				if err != nil {
-					t.Fatalf("Failed to parse VPN route %s: %v", route, err)
-				}
-				vpnRoutes = append(vpnRoutes, prefix)
-			}
-
-			for _, route := range tt.localRoutes {
-				prefix, err := netip.ParsePrefix(route)
-				if err != nil {
-					t.Fatalf("Failed to parse local route %s: %v", route, err)
-				}
-				localRoutes = append(localRoutes, prefix)
-			}
-
-			isVpn, matchedPrefix := isVpnRoute(addr, vpnRoutes, localRoutes)
-			assert.Equal(t, tt.expectedVpn, isVpn, "isVpnRoute should return expectedVpn value")
-			assert.Equal(t, tt.expectedPrefix, matchedPrefix, "isVpnRoute should return expectedVpn prefix")
-		})
-	}
-}
--- a/client/internal/routemanager/systemops/systemops_isvpnroute_test.go
+++ b/client/internal/routemanager/systemops/systemops_isvpnroute_test.go
@@ -0,0 +1,132 @@
+//go:build !android && !ios
+
+package systemops
+
+import (
+	"net/netip"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestIsVpnRoute(t *testing.T) {
+	tests := []struct {
+		name           string
+		addr           string
+		vpnRoutes      []string
+		localRoutes    []string
+		expectedVpn    bool
+		expectedPrefix netip.Prefix
+	}{
+		{
+			name:           "Match in VPN routes",
+			addr:           "192.168.1.1",
+			vpnRoutes:      []string{"192.168.1.0/24"},
+			localRoutes:    []string{"10.0.0.0/8"},
+			expectedVpn:    true,
+			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
+		},
+		{
+			name:           "Match in local routes",
+			addr:           "10.1.1.1",
+			vpnRoutes:      []string{"192.168.1.0/24"},
+			localRoutes:    []string{"10.0.0.0/8"},
+			expectedVpn:    false,
+			expectedPrefix: netip.MustParsePrefix("10.0.0.0/8"),
+		},
+		{
+			name:           "No match",
+			addr:           "172.16.0.1",
+			vpnRoutes:      []string{"192.168.1.0/24"},
+			localRoutes:    []string{"10.0.0.0/8"},
+			expectedVpn:    false,
+			expectedPrefix: netip.Prefix{},
+		},
+		{
+			name:           "Default route ignored",
+			addr:           "192.168.1.1",
+			vpnRoutes:      []string{"0.0.0.0/0", "192.168.1.0/24"},
+			localRoutes:    []string{"10.0.0.0/8"},
+			expectedVpn:    true,
+			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
+		},
+		{
+			name:           "Default route matches but ignored",
+			addr:           "172.16.1.1",
+			vpnRoutes:      []string{"0.0.0.0/0", "192.168.1.0/24"},
+			localRoutes:    []string{"10.0.0.0/8"},
+			expectedVpn:    false,
+			expectedPrefix: netip.Prefix{},
+		},
+		{
+			name:           "Longest prefix match local",
+			addr:           "192.168.1.1",
+			vpnRoutes:      []string{"192.168.0.0/16"},
+			localRoutes:    []string{"192.168.1.0/24"},
+			expectedVpn:    false,
+			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
+		},
+		{
+			name:           "Longest prefix match local multiple",
+			addr:           "192.168.0.1",
+			vpnRoutes:      []string{"192.168.0.0/16", "192.168.0.0/25", "192.168.0.0/27"},
+			localRoutes:    []string{"192.168.0.0/24", "192.168.0.0/26", "192.168.0.0/28"},
+			expectedVpn:    false,
+			expectedPrefix: netip.MustParsePrefix("192.168.0.0/28"),
+		},
+		{
+			name:           "Longest prefix match vpn",
+			addr:           "192.168.1.1",
+			vpnRoutes:      []string{"192.168.1.0/24"},
+			localRoutes:    []string{"192.168.0.0/16"},
+			expectedVpn:    true,
+			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
+		},
+		{
+			name:           "Longest prefix match vpn multiple",
+			addr:           "192.168.0.1",
+			vpnRoutes:      []string{"192.168.0.0/16", "192.168.0.0/25", "192.168.0.0/27"},
+			localRoutes:    []string{"192.168.0.0/24", "192.168.0.0/26"},
+			expectedVpn:    true,
+			expectedPrefix: netip.MustParsePrefix("192.168.0.0/27"),
+		},
+		{
+			name:           "Duplicate prefix in both",
+			addr:           "192.168.1.1",
+			vpnRoutes:      []string{"192.168.1.0/24"},
+			localRoutes:    []string{"192.168.1.0/24"},
+			expectedVpn:    false,
+			expectedPrefix: netip.MustParsePrefix("192.168.1.0/24"),
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			addr, err := netip.ParseAddr(tt.addr)
+			if err != nil {
+				t.Fatalf("Failed to parse address %s: %v", tt.addr, err)
+			}
+
+			var vpnRoutes, localRoutes []netip.Prefix
+			for _, route := range tt.vpnRoutes {
+				prefix, err := netip.ParsePrefix(route)
+				if err != nil {
+					t.Fatalf("Failed to parse VPN route %s: %v", route, err)
+				}
+				vpnRoutes = append(vpnRoutes, prefix)
+			}
+
+			for _, route := range tt.localRoutes {
+				prefix, err := netip.ParsePrefix(route)
+				if err != nil {
+					t.Fatalf("Failed to parse local route %s: %v", route, err)
+				}
+				localRoutes = append(localRoutes, prefix)
+			}
+
+			isVpn, matchedPrefix := isVpnRoute(addr, vpnRoutes, localRoutes)
+			assert.Equal(t, tt.expectedVpn, isVpn, "isVpnRoute should return expectedVpn value")
+			assert.Equal(t, tt.expectedPrefix, matchedPrefix, "isVpnRoute should return expectedVpn prefix")
+		})
+	}
+}
--- a/client/internal/routemanager/systemops/systemops_linux_test.go
+++ b/client/internal/routemanager/systemops/systemops_linux_test.go
@@ -1,13 +1,10 @@
-//go:build !android
+//go:build linux && !android && privileged

 package systemops

 import (
 	"errors"
-	"fmt"
 	"net"
-	"os"
-	"strings"
 	"syscall"
 	"testing"

@@ -18,10 +15,6 @@ import (
 	"github.com/netbirdio/netbird/client/internal/routemanager/vars"
 )

-var expectedVPNint = "wgtest0"
-var expectedExternalInt = "dummyext0"
-var expectedInternalInt = "dummyint0"
-
 func init() {
 	testCases = append(testCases, []testCase{
 		{
@@ -33,62 +26,6 @@ func init() {
 	}...)
 }

-func TestEntryExists(t *testing.T) {
-	tempDir := t.TempDir()
-	tempFilePath := fmt.Sprintf("%s/rt_tables", tempDir)
-
-	content := []string{
-		"1000 reserved",
-		fmt.Sprintf("%d %s", NetbirdVPNTableID, NetbirdVPNTableName),
-		"9999 other_table",
-	}
-	require.NoError(t, os.WriteFile(tempFilePath, []byte(strings.Join(content, "\n")), 0644))
-
-	file, err := os.Open(tempFilePath)
-	require.NoError(t, err)
-	defer func() {
-		assert.NoError(t, file.Close())
-	}()
-
-	tests := []struct {
-		name        string
-		id          int
-		shouldExist bool
-		err         error
-	}{
-		{
-			name:        "ExistsWithNetbirdPrefix",
-			id:          7120,
-			shouldExist: true,
-			err:         nil,
-		},
-		{
-			name:        "ExistsWithDifferentName",
-			id:          1000,
-			shouldExist: true,
-			err:         ErrTableIDExists,
-		},
-		{
-			name:        "DoesNotExist",
-			id:          1234,
-			shouldExist: false,
-			err:         nil,
-		},
-	}
-
-	for _, tc := range tests {
-		t.Run(tc.name, func(t *testing.T) {
-			exists, err := entryExists(file, tc.id)
-			if tc.err != nil {
-				assert.ErrorIs(t, err, tc.err)
-			} else {
-				assert.NoError(t, err)
-			}
-			assert.Equal(t, tc.shouldExist, exists)
-		})
-	}
-}
-
 func createAndSetupDummyInterface(t *testing.T, interfaceName, ipAddressCIDR string) string {
 	t.Helper()

--- a/client/internal/routemanager/systemops/systemops_routing_data_linux_test.go
+++ b/client/internal/routemanager/systemops/systemops_routing_data_linux_test.go
@@ -0,0 +1,15 @@
+//go:build linux && !android
+
+package systemops
+
+// Interface names used by the shared routing test fixtures. Kept untagged (no
+// privileged build tag) so the non-privileged test files in this package compile.
+//
+//nolint:unused // consumed by the privileged-tagged routing tests
+var expectedVPNint = "wgtest0"
+
+//nolint:unused // consumed by the privileged-tagged routing tests
+var expectedExternalInt = "dummyext0"
+
+//nolint:unused // consumed by the privileged-tagged routing tests
+var expectedInternalInt = "dummyint0"
--- a/client/internal/routemanager/systemops/systemops_routing_data_test.go
+++ b/client/internal/routemanager/systemops/systemops_routing_data_test.go
@@ -0,0 +1,83 @@
+//go:build (linux && !android) || (darwin && !ios) || freebsd || openbsd || netbsd || dragonfly
+
+package systemops
+
+import (
+	"net"
+
+	nbnet "github.com/netbirdio/netbird/client/net"
+)
+
+// Shared, non-privileged routing test fixtures. The privileged TestRouting (and its
+// per-platform init() appenders) consume these; they live here so the unprivileged
+// BSD/darwin test files compile without the privileged build tag.
+
+type PacketExpectation struct {
+	SrcIP   net.IP
+	DstIP   net.IP
+	SrcPort int
+	DstPort int
+	UDP     bool
+	TCP     bool
+}
+
+//nolint:unused // consumed by the privileged-tagged routing tests
+type testCase struct {
+	name              string
+	expectedInterface string
+	dialer            dialer
+	expectedPacket    PacketExpectation
+}
+
+//nolint:unused // consumed by the privileged-tagged routing tests
+var testCases = []testCase{
+	{
+		name:              "To external host without custom dialer via vpn",
+		expectedInterface: expectedVPNint,
+		dialer:            &net.Dialer{},
+		expectedPacket:    createPacketExpectation("100.64.0.1", 12345, "192.0.2.1", 53),
+	},
+	{
+		name:              "To external host with custom dialer via physical interface",
+		expectedInterface: expectedExternalInt,
+		dialer:            nbnet.NewDialer(),
+		expectedPacket:    createPacketExpectation("192.168.0.1", 12345, "192.0.2.1", 53),
+	},
+
+	{
+		name:              "To duplicate internal route with custom dialer via physical interface",
+		expectedInterface: expectedInternalInt,
+		dialer:            nbnet.NewDialer(),
+		expectedPacket:    createPacketExpectation("192.168.1.1", 12345, "10.0.0.2", 53),
+	},
+	{
+		name:              "To duplicate internal route without custom dialer via physical interface", // local route takes precedence
+		expectedInterface: expectedInternalInt,
+		dialer:            &net.Dialer{},
+		expectedPacket:    createPacketExpectation("192.168.1.1", 12345, "10.0.0.2", 53),
+	},
+
+	{
+		name:              "To unique vpn route with custom dialer via physical interface",
+		expectedInterface: expectedExternalInt,
+		dialer:            nbnet.NewDialer(),
+		expectedPacket:    createPacketExpectation("192.168.0.1", 12345, "172.16.0.2", 53),
+	},
+	{
+		name:              "To unique vpn route without custom dialer via vpn",
+		expectedInterface: expectedVPNint,
+		dialer:            &net.Dialer{},
+		expectedPacket:    createPacketExpectation("100.64.0.1", 12345, "172.16.0.2", 53),
+	},
+}
+
+//nolint:unused // consumed by the privileged-tagged routing tests
+func createPacketExpectation(srcIP string, srcPort int, dstIP string, dstPort int) PacketExpectation {
+	return PacketExpectation{
+		SrcIP:   net.ParseIP(srcIP),
+		DstIP:   net.ParseIP(dstIP),
+		SrcPort: srcPort,
+		DstPort: dstPort,
+		UDP:     true,
+	}
+}
--- a/client/internal/routemanager/systemops/systemops_unix_test.go
+++ b/client/internal/routemanager/systemops/systemops_unix_test.go
@@ -1,4 +1,4 @@
-//go:build (linux && !android) || (darwin && !ios) || freebsd || openbsd || netbsd || dragonfly
+//go:build ((linux && !android) || (darwin && !ios) || freebsd || openbsd || netbsd || dragonfly) && privileged

 package systemops

@@ -20,63 +20,6 @@ import (
 	nbnet "github.com/netbirdio/netbird/client/net"
 )

-type PacketExpectation struct {
-	SrcIP   net.IP
-	DstIP   net.IP
-	SrcPort int
-	DstPort int
-	UDP     bool
-	TCP     bool
-}
-
-type testCase struct {
-	name              string
-	expectedInterface string
-	dialer            dialer
-	expectedPacket    PacketExpectation
-}
-
-var testCases = []testCase{
-	{
-		name:              "To external host without custom dialer via vpn",
-		expectedInterface: expectedVPNint,
-		dialer:            &net.Dialer{},
-		expectedPacket:    createPacketExpectation("100.64.0.1", 12345, "192.0.2.1", 53),
-	},
-	{
-		name:              "To external host with custom dialer via physical interface",
-		expectedInterface: expectedExternalInt,
-		dialer:            nbnet.NewDialer(),
-		expectedPacket:    createPacketExpectation("192.168.0.1", 12345, "192.0.2.1", 53),
-	},
-
-	{
-		name:              "To duplicate internal route with custom dialer via physical interface",
-		expectedInterface: expectedInternalInt,
-		dialer:            nbnet.NewDialer(),
-		expectedPacket:    createPacketExpectation("192.168.1.1", 12345, "10.0.0.2", 53),
-	},
-	{
-		name:              "To duplicate internal route without custom dialer via physical interface", // local route takes precedence
-		expectedInterface: expectedInternalInt,
-		dialer:            &net.Dialer{},
-		expectedPacket:    createPacketExpectation("192.168.1.1", 12345, "10.0.0.2", 53),
-	},
-
-	{
-		name:              "To unique vpn route with custom dialer via physical interface",
-		expectedInterface: expectedExternalInt,
-		dialer:            nbnet.NewDialer(),
-		expectedPacket:    createPacketExpectation("192.168.0.1", 12345, "172.16.0.2", 53),
-	},
-	{
-		name:              "To unique vpn route without custom dialer via vpn",
-		expectedInterface: expectedVPNint,
-		dialer:            &net.Dialer{},
-		expectedPacket:    createPacketExpectation("100.64.0.1", 12345, "172.16.0.2", 53),
-	},
-}
-
 func TestRouting(t *testing.T) {
 	nbnet.Init()
 	for _, tc := range testCases {
@@ -102,16 +45,6 @@ func TestRouting(t *testing.T) {
 	}
 }

-func createPacketExpectation(srcIP string, srcPort int, dstIP string, dstPort int) PacketExpectation {
-	return PacketExpectation{
-		SrcIP:   net.ParseIP(srcIP),
-		DstIP:   net.ParseIP(dstIP),
-		SrcPort: srcPort,
-		DstPort: dstPort,
-		UDP:     true,
-	}
-}
-
 func startPacketCapture(t *testing.T, intf, filter string) *pcap.Handle {
 	t.Helper()

--- a/client/internal/routemanager/systemops/systemops_windows_test.go
+++ b/client/internal/routemanager/systemops/systemops_windows_test.go
@@ -1,3 +1,5 @@
+//go:build windows && privileged
+
 package systemops

 import (
--- a/client/internal/routemanager/systemops/v6route_bsd_test.go
+++ b/client/internal/routemanager/systemops/v6route_bsd_test.go
@@ -11,6 +11,8 @@ import (
 // ensureIPv6DefaultRoute installs an IPv6 default route via the loopback
 // interface so route lookups for global IPv6 prefixes resolve in environments
 // without v6 connectivity. If a default already exists it is left alone.
+//
+//nolint:unused // consumed by the privileged-tagged routing tests
 func ensureIPv6DefaultRoute(t *testing.T) {
 	t.Helper()

--- a/client/internal/routemanager/systemops/v6route_linux_test.go
+++ b/client/internal/routemanager/systemops/v6route_linux_test.go
@@ -1,4 +1,4 @@
-//go:build linux && !android
+//go:build linux && !android && privileged

 package systemops

--- a/client/internal/routemanager/systemops/v6route_windows_test.go
+++ b/client/internal/routemanager/systemops/v6route_windows_test.go
@@ -8,11 +8,14 @@ import (
 	"testing"
 )

+//nolint:unused // consumed by the privileged-tagged routing tests
 const loopbackIfaceWindows = "Loopback Pseudo-Interface 1"

 // ensureIPv6DefaultRoute installs an IPv6 default route via the loopback
 // interface so route lookups for global IPv6 prefixes resolve in environments
 // without v6 connectivity. If a default already exists it is left alone.
+//
+//nolint:unused // consumed by the privileged-tagged routing tests
 func ensureIPv6DefaultRoute(t *testing.T) {
 	t.Helper()

--- a/client/server/server_privileged_test.go
+++ b/client/server/server_privileged_test.go
@@ -0,0 +1,235 @@
+//go:build privileged
+
+package server
+
+import (
+	"context"
+	"net"
+	"os/user"
+	"testing"
+	"time"
+
+	"github.com/golang/mock/gomock"
+	"github.com/stretchr/testify/require"
+	"go.opentelemetry.io/otel"
+
+	"github.com/netbirdio/netbird/management/server/integrations/integrated_validator/validator"
+
+	"github.com/netbirdio/netbird/management/internals/controllers/network_map/controller"
+	"github.com/netbirdio/netbird/management/internals/controllers/network_map/update_channel"
+	"github.com/netbirdio/netbird/management/internals/modules/peers"
+	"github.com/netbirdio/netbird/management/internals/modules/peers/ephemeral/manager"
+	nbgrpc "github.com/netbirdio/netbird/management/internals/shared/grpc"
+	"github.com/netbirdio/netbird/management/server/job"
+
+	"github.com/netbirdio/netbird/management/internals/server/config"
+	"github.com/netbirdio/netbird/management/server/groups"
+
+	log "github.com/sirupsen/logrus"
+	"google.golang.org/grpc"
+	"google.golang.org/grpc/keepalive"
+
+	"github.com/netbirdio/netbird/client/internal"
+	"github.com/netbirdio/netbird/client/internal/peer"
+	"github.com/netbirdio/netbird/client/internal/profilemanager"
+	"github.com/netbirdio/netbird/management/server"
+	"github.com/netbirdio/netbird/management/server/activity"
+	nbcache "github.com/netbirdio/netbird/management/server/cache"
+	"github.com/netbirdio/netbird/management/server/integrations/port_forwarding"
+	"github.com/netbirdio/netbird/management/server/permissions"
+	"github.com/netbirdio/netbird/management/server/settings"
+	"github.com/netbirdio/netbird/management/server/store"
+	"github.com/netbirdio/netbird/management/server/telemetry"
+	mgmtProto "github.com/netbirdio/netbird/shared/management/proto"
+	"github.com/netbirdio/netbird/shared/signal/proto"
+	signalServer "github.com/netbirdio/netbird/signal/server"
+)
+
+var (
+	kaep = keepalive.EnforcementPolicy{
+		MinTime:             15 * time.Second,
+		PermitWithoutStream: true,
+	}
+
+	kasp = keepalive.ServerParameters{
+		MaxConnectionIdle:     15 * time.Second,
+		MaxConnectionAgeGrace: 5 * time.Second,
+		Time:                  5 * time.Second,
+		Timeout:               2 * time.Second,
+	}
+)
+
+// TestConnectWithRetryRuns checks that the connectWithRetry function runs and runs the retries according to the times specified via environment variables
+// we will use a management server started via to simulate the server and capture the number of retries
+func TestConnectWithRetryRuns(t *testing.T) {
+	// start the signal server
+	_, signalAddr, err := startSignal(t)
+	if err != nil {
+		t.Fatalf("failed to start signal server: %v", err)
+	}
+
+	counter := 0
+	// start the management server
+	_, mgmtAddr, err := startManagement(t, signalAddr, &counter)
+	if err != nil {
+		t.Fatalf("failed to start management server: %v", err)
+	}
+
+	ctx := internal.CtxInitState(context.Background())
+
+	ctx, cancel := context.WithDeadline(ctx, time.Now().Add(30*time.Second))
+	defer cancel()
+	// create new server
+	ic := profilemanager.ConfigInput{
+		ManagementURL: "http://" + mgmtAddr,
+		ConfigPath:    t.TempDir() + "/test-profile.json",
+	}
+
+	config, err := profilemanager.UpdateOrCreateConfig(ic)
+	if err != nil {
+		t.Fatalf("failed to create config: %v", err)
+	}
+
+	currUser, err := user.Current()
+	require.NoError(t, err)
+
+	pm := profilemanager.ServiceManager{}
+	err = pm.SetActiveProfileState(&profilemanager.ActiveProfileState{
+		ID:       "test-profile",
+		Username: currUser.Username,
+	})
+	if err != nil {
+		t.Fatalf("failed to set active profile state: %v", err)
+	}
+
+	s := New(ctx, "debug", "", false, false, false, false)
+
+	s.config = config
+
+	s.statusRecorder = peer.NewRecorder(config.ManagementURL.String())
+	t.Setenv(retryInitialIntervalVar, "1s")
+	t.Setenv(maxRetryIntervalVar, "2s")
+	t.Setenv(maxRetryTimeVar, "5s")
+	t.Setenv(retryMultiplierVar, "1")
+
+	s.connectWithRetryRuns(ctx, config, s.statusRecorder, nil, nil)
+	if counter < 3 {
+		t.Fatalf("expected counter > 2, got %d", counter)
+	}
+}
+
+type mockServer struct {
+	mgmtProto.ManagementServiceServer
+	counter *int
+}
+
+func (m *mockServer) Login(ctx context.Context, req *mgmtProto.EncryptedMessage) (*mgmtProto.EncryptedMessage, error) {
+	*m.counter++
+	return m.ManagementServiceServer.Login(ctx, req)
+}
+
+func startManagement(t *testing.T, signalAddr string, counter *int) (*grpc.Server, string, error) {
+	t.Helper()
+	dataDir := t.TempDir()
+
+	config := &config.Config{
+		Stuns:      []*config.Host{},
+		TURNConfig: &config.TURNConfig{},
+		Signal: &config.Host{
+			Proto: "http",
+			URI:   signalAddr,
+		},
+		Datadir:    dataDir,
+		HttpConfig: nil,
+	}
+
+	lis, err := net.Listen("tcp", "localhost:0")
+	if err != nil {
+		return nil, "", err
+	}
+	s := grpc.NewServer(grpc.KeepaliveEnforcementPolicy(kaep), grpc.KeepaliveParams(kasp))
+	store, cleanUp, err := store.NewTestStoreFromSQL(context.Background(), "", config.Datadir)
+	if err != nil {
+		return nil, "", err
+	}
+	t.Cleanup(cleanUp)
+
+	eventStore := &activity.InMemoryEventStore{}
+	if err != nil {
+		return nil, "", err
+	}
+
+	ctrl := gomock.NewController(t)
+	t.Cleanup(ctrl.Finish)
+
+	permissionsManagerMock := permissions.NewMockManager(ctrl)
+	peersManager := peers.NewManager(store, permissionsManagerMock)
+	settingsManagerMock := settings.NewMockManager(ctrl)
+
+	jobManager := job.NewJobManager(nil, store, peersManager)
+
+	cacheStore, err := nbcache.NewStore(context.Background(), 100*time.Millisecond, 300*time.Millisecond, 100)
+	if err != nil {
+		return nil, "", err
+	}
+
+	ia, _ := validator.NewIntegratedValidator(context.Background(), peersManager, settingsManagerMock, eventStore, cacheStore)
+
+	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
+	require.NoError(t, err)
+
+	settingsMockManager := settings.NewMockManager(ctrl)
+	groupsManager := groups.NewManagerMock()
+
+	requestBuffer := server.NewAccountRequestBuffer(context.Background(), store)
+	peersUpdateManager := update_channel.NewPeersUpdateManager(metrics)
+	networkMapController := controller.NewController(context.Background(), store, metrics, peersUpdateManager, requestBuffer, server.MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), manager.NewEphemeralManager(store, peersManager), config)
+	accountManager, err := server.BuildManager(context.Background(), config, store, networkMapController, jobManager, nil, "", eventStore, nil, false, ia, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManagerMock, false, cacheStore)
+	if err != nil {
+		return nil, "", err
+	}
+
+	secretsManager, err := nbgrpc.NewTimeBasedAuthSecretsManager(peersUpdateManager, config.TURNConfig, config.Relay, settingsMockManager, groupsManager)
+	if err != nil {
+		return nil, "", err
+	}
+	mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController, nil, nil)
+	if err != nil {
+		return nil, "", err
+	}
+	mock := &mockServer{
+		ManagementServiceServer: mgmtServer,
+		counter:                 counter,
+	}
+	mgmtProto.RegisterManagementServiceServer(s, mock)
+	go func() {
+		if err = s.Serve(lis); err != nil {
+			log.Fatalf("failed to serve: %v", err)
+		}
+	}()
+
+	return s, lis.Addr().String(), nil
+}
+
+func startSignal(t *testing.T) (*grpc.Server, string, error) {
+	t.Helper()
+
+	s := grpc.NewServer(grpc.KeepaliveEnforcementPolicy(kaep), grpc.KeepaliveParams(kasp))
+
+	lis, err := net.Listen("tcp", "localhost:0")
+	if err != nil {
+		return nil, "", err
+	}
+
+	srv, err := signalServer.NewServer(context.Background(), otel.Meter(""))
+	require.NoError(t, err)
+	proto.RegisterSignalExchangeServer(s, srv)
+
+	go func() {
+		if err = s.Serve(lis); err != nil {
+			log.Fatalf("failed to serve: %v", err)
+		}
+	}()
+
+	return s, lis.Addr().String(), nil
+}
--- a/client/server/server_test.go
+++ b/client/server/server_test.go
@@ -2,124 +2,22 @@ package server

 import (
 	"context"
-	"net"
 	"net/url"
 	"os/user"
 	"path/filepath"
 	"testing"
 	"time"

-	"github.com/golang/mock/gomock"
-	"github.com/stretchr/testify/require"
-	"go.opentelemetry.io/otel"
-
-	"github.com/netbirdio/netbird/management/server/integrations/integrated_validator/validator"
-
-	"github.com/netbirdio/netbird/management/internals/controllers/network_map/controller"
-	"github.com/netbirdio/netbird/management/internals/controllers/network_map/update_channel"
-	"github.com/netbirdio/netbird/management/internals/modules/peers"
-	"github.com/netbirdio/netbird/management/internals/modules/peers/ephemeral/manager"
-	nbgrpc "github.com/netbirdio/netbird/management/internals/shared/grpc"
-	"github.com/netbirdio/netbird/management/server/job"
-
-	"github.com/netbirdio/netbird/management/internals/server/config"
-	"github.com/netbirdio/netbird/management/server/groups"
-
 	log "github.com/sirupsen/logrus"
 	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
 	"google.golang.org/grpc"
-	"google.golang.org/grpc/keepalive"

 	"github.com/netbirdio/netbird/client/internal"
-	"github.com/netbirdio/netbird/client/internal/peer"
 	"github.com/netbirdio/netbird/client/internal/profilemanager"
 	daemonProto "github.com/netbirdio/netbird/client/proto"
-	"github.com/netbirdio/netbird/management/server"
-	"github.com/netbirdio/netbird/management/server/activity"
-	nbcache "github.com/netbirdio/netbird/management/server/cache"
-	"github.com/netbirdio/netbird/management/server/integrations/port_forwarding"
-	"github.com/netbirdio/netbird/management/server/permissions"
-	"github.com/netbirdio/netbird/management/server/settings"
-	"github.com/netbirdio/netbird/management/server/store"
-	"github.com/netbirdio/netbird/management/server/telemetry"
-	mgmtProto "github.com/netbirdio/netbird/shared/management/proto"
-	"github.com/netbirdio/netbird/shared/signal/proto"
-	signalServer "github.com/netbirdio/netbird/signal/server"
 )

-var (
-	kaep = keepalive.EnforcementPolicy{
-		MinTime:             15 * time.Second,
-		PermitWithoutStream: true,
-	}
-
-	kasp = keepalive.ServerParameters{
-		MaxConnectionIdle:     15 * time.Second,
-		MaxConnectionAgeGrace: 5 * time.Second,
-		Time:                  5 * time.Second,
-		Timeout:               2 * time.Second,
-	}
-)
-
-// TestConnectWithRetryRuns checks that the connectWithRetry function runs and runs the retries according to the times specified via environment variables
-// we will use a management server started via to simulate the server and capture the number of retries
-func TestConnectWithRetryRuns(t *testing.T) {
-	// start the signal server
-	_, signalAddr, err := startSignal(t)
-	if err != nil {
-		t.Fatalf("failed to start signal server: %v", err)
-	}
-
-	counter := 0
-	// start the management server
-	_, mgmtAddr, err := startManagement(t, signalAddr, &counter)
-	if err != nil {
-		t.Fatalf("failed to start management server: %v", err)
-	}
-
-	ctx := internal.CtxInitState(context.Background())
-
-	ctx, cancel := context.WithDeadline(ctx, time.Now().Add(30*time.Second))
-	defer cancel()
-	// create new server
-	ic := profilemanager.ConfigInput{
-		ManagementURL: "http://" + mgmtAddr,
-		ConfigPath:    t.TempDir() + "/test-profile.json",
-	}
-
-	config, err := profilemanager.UpdateOrCreateConfig(ic)
-	if err != nil {
-		t.Fatalf("failed to create config: %v", err)
-	}
-
-	currUser, err := user.Current()
-	require.NoError(t, err)
-
-	pm := profilemanager.ServiceManager{}
-	err = pm.SetActiveProfileState(&profilemanager.ActiveProfileState{
-		ID:       "test-profile",
-		Username: currUser.Username,
-	})
-	if err != nil {
-		t.Fatalf("failed to set active profile state: %v", err)
-	}
-
-	s := New(ctx, "debug", "", false, false, false, false)
-
-	s.config = config
-
-	s.statusRecorder = peer.NewRecorder(config.ManagementURL.String())
-	t.Setenv(retryInitialIntervalVar, "1s")
-	t.Setenv(maxRetryIntervalVar, "2s")
-	t.Setenv(maxRetryTimeVar, "5s")
-	t.Setenv(retryMultiplierVar, "1")
-
-	s.connectWithRetryRuns(ctx, config, s.statusRecorder, nil, nil)
-	if counter < 3 {
-		t.Fatalf("expected counter > 2, got %d", counter)
-	}
-}
-
 func TestServer_Up(t *testing.T) {
 	tempDir := t.TempDir()
 	origDefaultProfileDir := profilemanager.DefaultConfigPathDir
@@ -259,119 +157,3 @@ func TestServer_SubcribeEvents(t *testing.T) {

 	assert.NoError(t, err)
 }
-
-type mockServer struct {
-	mgmtProto.ManagementServiceServer
-	counter *int
-}
-
-func (m *mockServer) Login(ctx context.Context, req *mgmtProto.EncryptedMessage) (*mgmtProto.EncryptedMessage, error) {
-	*m.counter++
-	return m.ManagementServiceServer.Login(ctx, req)
-}
-
-func startManagement(t *testing.T, signalAddr string, counter *int) (*grpc.Server, string, error) {
-	t.Helper()
-	dataDir := t.TempDir()
-
-	config := &config.Config{
-		Stuns:      []*config.Host{},
-		TURNConfig: &config.TURNConfig{},
-		Signal: &config.Host{
-			Proto: "http",
-			URI:   signalAddr,
-		},
-		Datadir:    dataDir,
-		HttpConfig: nil,
-	}
-
-	lis, err := net.Listen("tcp", "localhost:0")
-	if err != nil {
-		return nil, "", err
-	}
-	s := grpc.NewServer(grpc.KeepaliveEnforcementPolicy(kaep), grpc.KeepaliveParams(kasp))
-	store, cleanUp, err := store.NewTestStoreFromSQL(context.Background(), "", config.Datadir)
-	if err != nil {
-		return nil, "", err
-	}
-	t.Cleanup(cleanUp)
-
-	eventStore := &activity.InMemoryEventStore{}
-	if err != nil {
-		return nil, "", err
-	}
-
-	ctrl := gomock.NewController(t)
-	t.Cleanup(ctrl.Finish)
-
-	permissionsManagerMock := permissions.NewMockManager(ctrl)
-	peersManager := peers.NewManager(store, permissionsManagerMock)
-	settingsManagerMock := settings.NewMockManager(ctrl)
-
-	jobManager := job.NewJobManager(nil, store, peersManager)
-
-	cacheStore, err := nbcache.NewStore(context.Background(), 100*time.Millisecond, 300*time.Millisecond, 100)
-	if err != nil {
-		return nil, "", err
-	}
-
-	ia, _ := validator.NewIntegratedValidator(context.Background(), peersManager, settingsManagerMock, eventStore, cacheStore)
-
-	metrics, err := telemetry.NewDefaultAppMetrics(context.Background())
-	require.NoError(t, err)
-
-	settingsMockManager := settings.NewMockManager(ctrl)
-	groupsManager := groups.NewManagerMock()
-
-	requestBuffer := server.NewAccountRequestBuffer(context.Background(), store)
-	peersUpdateManager := update_channel.NewPeersUpdateManager(metrics)
-	networkMapController := controller.NewController(context.Background(), store, metrics, peersUpdateManager, requestBuffer, server.MockIntegratedValidator{}, settingsMockManager, "netbird.selfhosted", port_forwarding.NewControllerMock(), manager.NewEphemeralManager(store, peersManager), config)
-	accountManager, err := server.BuildManager(context.Background(), config, store, networkMapController, jobManager, nil, "", eventStore, nil, false, ia, metrics, port_forwarding.NewControllerMock(), settingsMockManager, permissionsManagerMock, false, cacheStore)
-	if err != nil {
-		return nil, "", err
-	}
-
-	secretsManager, err := nbgrpc.NewTimeBasedAuthSecretsManager(peersUpdateManager, config.TURNConfig, config.Relay, settingsMockManager, groupsManager)
-	if err != nil {
-		return nil, "", err
-	}
-	mgmtServer, err := nbgrpc.NewServer(config, accountManager, settingsMockManager, jobManager, secretsManager, nil, nil, &server.MockIntegratedValidator{}, networkMapController, nil, nil)
-	if err != nil {
-		return nil, "", err
-	}
-	mock := &mockServer{
-		ManagementServiceServer: mgmtServer,
-		counter:                 counter,
-	}
-	mgmtProto.RegisterManagementServiceServer(s, mock)
-	go func() {
-		if err = s.Serve(lis); err != nil {
-			log.Fatalf("failed to serve: %v", err)
-		}
-	}()
-
-	return s, lis.Addr().String(), nil
-}
-
-func startSignal(t *testing.T) (*grpc.Server, string, error) {
-	t.Helper()
-
-	s := grpc.NewServer(grpc.KeepaliveEnforcementPolicy(kaep), grpc.KeepaliveParams(kasp))
-
-	lis, err := net.Listen("tcp", "localhost:0")
-	if err != nil {
-		log.Fatalf("failed to listen: %v", err)
-	}
-
-	srv, err := signalServer.NewServer(context.Background(), otel.Meter(""))
-	require.NoError(t, err)
-	proto.RegisterSignalExchangeServer(s, srv)
-
-	go func() {
-		if err = s.Serve(lis); err != nil {
-			log.Fatalf("failed to serve: %v", err)
-		}
-	}()
-
-	return s, lis.Addr().String(), nil
-}
--- a/client/ssh/client/client_privileged_test.go
+++ b/client/ssh/client/client_privileged_test.go
@@ -0,0 +1,118 @@
+//go:build privileged
+
+package client
+
+import (
+	"context"
+	"errors"
+	"runtime"
+	"strings"
+	"testing"
+	"time"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+	cryptossh "golang.org/x/crypto/ssh"
+
+	"github.com/netbirdio/netbird/client/ssh/testutil"
+)
+
+func TestSSHClient_CommandExecution(t *testing.T) {
+	if runtime.GOOS == "windows" && testutil.IsCI() {
+		t.Skip("Skipping Windows command execution tests in CI due to S4U authentication issues")
+	}
+
+	server, _, client := setupTestSSHServerAndClient(t)
+	defer func() {
+		err := server.Stop()
+		require.NoError(t, err)
+	}()
+	defer func() {
+		err := client.Close()
+		assert.NoError(t, err)
+	}()
+
+	ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+	defer cancel()
+
+	t.Run("ExecuteCommand captures output", func(t *testing.T) {
+		output, err := client.ExecuteCommand(ctx, "echo hello")
+		assert.NoError(t, err)
+		assert.Contains(t, string(output), "hello")
+	})
+
+	t.Run("ExecuteCommandWithIO streams output", func(t *testing.T) {
+		err := client.ExecuteCommandWithIO(ctx, "echo world")
+		assert.NoError(t, err)
+	})
+
+	t.Run("commands with flags work", func(t *testing.T) {
+		output, err := client.ExecuteCommand(ctx, "echo -n test_flag")
+		assert.NoError(t, err)
+		assert.Equal(t, "test_flag", strings.TrimSpace(string(output)))
+	})
+
+	t.Run("non-zero exit codes don't return errors", func(t *testing.T) {
+		var testCmd string
+		if runtime.GOOS == "windows" {
+			testCmd = "echo hello | Select-String notfound"
+		} else {
+			testCmd = "echo 'hello' | grep 'notfound'"
+		}
+		_, err := client.ExecuteCommand(ctx, testCmd)
+		assert.NoError(t, err)
+	})
+}
+
+func TestSSHClient_ContextCancellation(t *testing.T) {
+	server, serverAddr, _ := setupTestSSHServerAndClient(t)
+	defer func() {
+		err := server.Stop()
+		require.NoError(t, err)
+	}()
+
+	t.Run("connection with short timeout", func(t *testing.T) {
+		ctx, cancel := context.WithTimeout(context.Background(), 1*time.Millisecond)
+		defer cancel()
+
+		currentUser := testutil.GetTestUsername(t)
+		_, err := Dial(ctx, serverAddr, currentUser, DialOptions{
+			InsecureSkipVerify: true,
+		})
+		if err != nil {
+			// Check for actual timeout-related errors rather than string matching
+			assert.True(t,
+				errors.Is(err, context.DeadlineExceeded) ||
+					errors.Is(err, context.Canceled) ||
+					strings.Contains(err.Error(), "timeout"),
+				"Expected timeout-related error, got: %v", err)
+		}
+	})
+
+	t.Run("command execution cancellation", func(t *testing.T) {
+		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		defer cancel()
+		currentUser := testutil.GetTestUsername(t)
+		client, err := Dial(ctx, serverAddr, currentUser, DialOptions{
+			InsecureSkipVerify: true,
+		})
+		require.NoError(t, err)
+		defer func() {
+			if err := client.Close(); err != nil {
+				t.Logf("client close error: %v", err)
+			}
+		}()
+
+		cmdCtx, cmdCancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
+		defer cmdCancel()
+
+		err = client.ExecuteCommandWithPTY(cmdCtx, "sleep 10")
+		if err != nil {
+			var exitMissingErr *cryptossh.ExitMissingError
+			isValidCancellation := errors.Is(err, context.DeadlineExceeded) ||
+				errors.Is(err, context.Canceled) ||
+				errors.As(err, &exitMissingErr)
+			assert.True(t, isValidCancellation, "Should handle command cancellation properly")
+		}
+	})
+}
--- a/client/ssh/client/client_test.go
+++ b/client/ssh/client/client_test.go
@@ -15,7 +15,6 @@ import (

 	"github.com/stretchr/testify/assert"
 	"github.com/stretchr/testify/require"
-	cryptossh "golang.org/x/crypto/ssh"

 	"github.com/netbirdio/netbird/client/ssh"
 	sshserver "github.com/netbirdio/netbird/client/ssh/server"
@@ -78,53 +77,6 @@ func TestSSHClient_DialWithKey(t *testing.T) {
 	assert.NotNil(t, client.client)
 }

-func TestSSHClient_CommandExecution(t *testing.T) {
-	if runtime.GOOS == "windows" && testutil.IsCI() {
-		t.Skip("Skipping Windows command execution tests in CI due to S4U authentication issues")
-	}
-
-	server, _, client := setupTestSSHServerAndClient(t)
-	defer func() {
-		err := server.Stop()
-		require.NoError(t, err)
-	}()
-	defer func() {
-		err := client.Close()
-		assert.NoError(t, err)
-	}()
-
-	ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
-	defer cancel()
-
-	t.Run("ExecuteCommand captures output", func(t *testing.T) {
-		output, err := client.ExecuteCommand(ctx, "echo hello")
-		assert.NoError(t, err)
-		assert.Contains(t, string(output), "hello")
-	})
-
-	t.Run("ExecuteCommandWithIO streams output", func(t *testing.T) {
-		err := client.ExecuteCommandWithIO(ctx, "echo world")
-		assert.NoError(t, err)
-	})
-
-	t.Run("commands with flags work", func(t *testing.T) {
-		output, err := client.ExecuteCommand(ctx, "echo -n test_flag")
-		assert.NoError(t, err)
-		assert.Equal(t, "test_flag", strings.TrimSpace(string(output)))
-	})
-
-	t.Run("non-zero exit codes don't return errors", func(t *testing.T) {
-		var testCmd string
-		if runtime.GOOS == "windows" {
-			testCmd = "echo hello | Select-String notfound"
-		} else {
-			testCmd = "echo 'hello' | grep 'notfound'"
-		}
-		_, err := client.ExecuteCommand(ctx, testCmd)
-		assert.NoError(t, err)
-	})
-}
-
 func TestSSHClient_ConnectionHandling(t *testing.T) {
 	server, serverAddr, _ := setupTestSSHServerAndClient(t)
 	defer func() {
@@ -154,59 +106,6 @@ func TestSSHClient_ConnectionHandling(t *testing.T) {
 	}
 }

-func TestSSHClient_ContextCancellation(t *testing.T) {
-	server, serverAddr, _ := setupTestSSHServerAndClient(t)
-	defer func() {
-		err := server.Stop()
-		require.NoError(t, err)
-	}()
-
-	t.Run("connection with short timeout", func(t *testing.T) {
-		ctx, cancel := context.WithTimeout(context.Background(), 1*time.Millisecond)
-		defer cancel()
-
-		currentUser := testutil.GetTestUsername(t)
-		_, err := Dial(ctx, serverAddr, currentUser, DialOptions{
-			InsecureSkipVerify: true,
-		})
-		if err != nil {
-			// Check for actual timeout-related errors rather than string matching
-			assert.True(t,
-				errors.Is(err, context.DeadlineExceeded) ||
-					errors.Is(err, context.Canceled) ||
-					strings.Contains(err.Error(), "timeout"),
-				"Expected timeout-related error, got: %v", err)
-		}
-	})
-
-	t.Run("command execution cancellation", func(t *testing.T) {
-		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
-		defer cancel()
-		currentUser := testutil.GetTestUsername(t)
-		client, err := Dial(ctx, serverAddr, currentUser, DialOptions{
-			InsecureSkipVerify: true,
-		})
-		require.NoError(t, err)
-		defer func() {
-			if err := client.Close(); err != nil {
-				t.Logf("client close error: %v", err)
-			}
-		}()
-
-		cmdCtx, cmdCancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
-		defer cmdCancel()
-
-		err = client.ExecuteCommandWithPTY(cmdCtx, "sleep 10")
-		if err != nil {
-			var exitMissingErr *cryptossh.ExitMissingError
-			isValidCancellation := errors.Is(err, context.DeadlineExceeded) ||
-				errors.Is(err, context.Canceled) ||
-				errors.As(err, &exitMissingErr)
-			assert.True(t, isValidCancellation, "Should handle command cancellation properly")
-		}
-	})
-}
-
 func TestSSHClient_NoAuthMode(t *testing.T) {
 	hostKey, err := ssh.GeneratePrivateKey(ssh.ED25519)
 	require.NoError(t, err)
--- a/client/ssh/proxy/proxy_privileged_test.go
+++ b/client/ssh/proxy/proxy_privileged_test.go
@@ -0,0 +1,423 @@
+//go:build privileged
+
+package proxy
+
+import (
+	"bytes"
+	"context"
+	"crypto/rand"
+	"crypto/rsa"
+	"encoding/base64"
+	"encoding/json"
+	"io"
+	"math/big"
+	"net"
+	"net/http"
+	"net/http/httptest"
+	"os"
+	"runtime"
+	"strconv"
+	"testing"
+	"time"
+
+	"github.com/golang-jwt/jwt/v5"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+	cryptossh "golang.org/x/crypto/ssh"
+
+	nbssh "github.com/netbirdio/netbird/client/ssh"
+	sshauth "github.com/netbirdio/netbird/client/ssh/auth"
+	"github.com/netbirdio/netbird/client/ssh/server"
+	"github.com/netbirdio/netbird/client/ssh/testutil"
+	nbjwt "github.com/netbirdio/netbird/shared/auth/jwt"
+	sshuserhash "github.com/netbirdio/netbird/shared/sshauth"
+)
+
+func (m *mockDaemon) setJWTToken(token string) {
+	m.impl.jwtToken = token
+}
+
+func TestSSHProxy_Connect(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping integration test in short mode")
+	}
+
+	// TODO: Windows test times out - user switching and command execution tested on Linux
+	if runtime.GOOS == "windows" {
+		t.Skip("Skipping on Windows - covered by Linux tests")
+	}
+
+	const (
+		issuer   = "https://test-issuer.example.com"
+		audience = "test-audience"
+	)
+
+	jwksServer, privateKey, jwksURL := setupJWKSServer(t)
+	defer jwksServer.Close()
+
+	hostKey, err := nbssh.GeneratePrivateKey(nbssh.ED25519)
+	require.NoError(t, err)
+	hostPubKey, err := nbssh.GeneratePublicKey(hostKey)
+	require.NoError(t, err)
+
+	serverConfig := &server.Config{
+		HostKeyPEM: hostKey,
+		JWT: &server.JWTConfig{
+			Issuer:       issuer,
+			Audiences:    []string{audience},
+			KeysLocation: jwksURL,
+		},
+	}
+	sshServer := server.New(serverConfig)
+	sshServer.SetAllowRootLogin(true)
+
+	// Configure SSH authorization for the test user
+	testUsername := testutil.GetTestUsername(t)
+	testJWTUser := "test-username"
+	testUserHash, err := sshuserhash.HashUserID(testJWTUser)
+	require.NoError(t, err)
+
+	authConfig := &sshauth.Config{
+		UserIDClaim:     sshauth.DefaultUserIDClaim,
+		AuthorizedUsers: []sshuserhash.UserIDHash{testUserHash},
+		MachineUsers: map[string][]uint32{
+			testUsername: {0}, // Index 0 in AuthorizedUsers
+		},
+	}
+	sshServer.UpdateSSHAuth(authConfig)
+
+	sshServerAddr := server.StartTestServer(t, sshServer)
+	defer func() { _ = sshServer.Stop() }()
+
+	mockDaemon := startMockDaemon(t)
+	defer mockDaemon.stop()
+
+	host, portStr, err := net.SplitHostPort(sshServerAddr)
+	require.NoError(t, err)
+	port, err := strconv.Atoi(portStr)
+	require.NoError(t, err)
+
+	mockDaemon.setHostKey(host, hostPubKey)
+
+	validToken := generateValidJWT(t, privateKey, issuer, audience, testJWTUser)
+	mockDaemon.setJWTToken(validToken)
+
+	proxyInstance, err := New(mockDaemon.addr, host, port, io.Discard, nil)
+	require.NoError(t, err)
+
+	clientConn, proxyConn := net.Pipe()
+	defer func() { _ = clientConn.Close() }()
+
+	origStdin := os.Stdin
+	origStdout := os.Stdout
+	defer func() {
+		os.Stdin = origStdin
+		os.Stdout = origStdout
+	}()
+
+	stdinReader, stdinWriter, err := os.Pipe()
+	require.NoError(t, err)
+	stdoutReader, stdoutWriter, err := os.Pipe()
+	require.NoError(t, err)
+
+	os.Stdin = stdinReader
+	os.Stdout = stdoutWriter
+
+	go func() {
+		_, _ = io.Copy(stdinWriter, proxyConn)
+	}()
+	go func() {
+		_, _ = io.Copy(proxyConn, stdoutReader)
+	}()
+
+	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer cancel()
+
+	connectErrCh := make(chan error, 1)
+	go func() {
+		connectErrCh <- proxyInstance.Connect(ctx)
+	}()
+
+	sshConfig := &cryptossh.ClientConfig{
+		User:            testutil.GetTestUsername(t),
+		Auth:            []cryptossh.AuthMethod{},
+		HostKeyCallback: cryptossh.InsecureIgnoreHostKey(),
+		Timeout:         3 * time.Second,
+	}
+
+	sshClientConn, chans, reqs, err := cryptossh.NewClientConn(clientConn, "test", sshConfig)
+	require.NoError(t, err, "Should connect to proxy server")
+	defer func() { _ = sshClientConn.Close() }()
+
+	sshClient := cryptossh.NewClient(sshClientConn, chans, reqs)
+
+	session, err := sshClient.NewSession()
+	require.NoError(t, err, "Should create session through full proxy to backend")
+
+	outputCh := make(chan []byte, 1)
+	errCh := make(chan error, 1)
+	go func() {
+		output, err := session.Output("echo hello-from-proxy")
+		outputCh <- output
+		errCh <- err
+	}()
+
+	select {
+	case output := <-outputCh:
+		err := <-errCh
+		require.NoError(t, err, "Command should execute successfully through proxy")
+		assert.Contains(t, string(output), "hello-from-proxy", "Should receive command output through proxy")
+	case <-time.After(3 * time.Second):
+		t.Fatal("Command execution timed out")
+	}
+
+	_ = session.Close()
+	_ = sshClient.Close()
+	_ = clientConn.Close()
+	cancel()
+}
+
+// TestSSHProxy_CommandQuoting verifies that the proxy preserves shell quoting
+// when forwarding commands to the backend. This is critical for tools like
+// Ansible that send commands such as:
+//
+//	/bin/sh -c '( umask 77 && mkdir -p ... ) && sleep 0'
+//
+// The single quotes must be preserved so the backend shell receives the
+// subshell expression as a single argument to -c.
+func TestSSHProxy_CommandQuoting(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping integration test in short mode")
+	}
+
+	sshClient, cleanup := setupProxySSHClient(t)
+	defer cleanup()
+
+	// These commands simulate what the SSH protocol delivers as exec payloads.
+	// When a user types: ssh host '/bin/sh -c "( echo hello )"'
+	// the local shell strips the outer single quotes, and the SSH exec request
+	// contains the raw string: /bin/sh -c "( echo hello )"
+	//
+	// The proxy must forward this string verbatim. Using session.Command()
+	// (shlex.Split + strings.Join) strips the inner double quotes, breaking
+	// the command on the backend.
+	tests := []struct {
+		name    string
+		command string
+		expect  string
+	}{
+		{
+			name:    "subshell_in_double_quotes",
+			command: `/bin/sh -c "( echo from-subshell ) && echo outer"`,
+			expect:  "from-subshell\nouter\n",
+		},
+		{
+			name:    "printf_with_special_chars",
+			command: `/bin/sh -c "printf '%s\n' 'hello world'"`,
+			expect:  "hello world\n",
+		},
+		{
+			name:    "nested_command_substitution",
+			command: `/bin/sh -c "echo $(echo nested)"`,
+			expect:  "nested\n",
+		},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			session, err := sshClient.NewSession()
+			require.NoError(t, err)
+			defer func() { _ = session.Close() }()
+
+			var stderrBuf bytes.Buffer
+			session.Stderr = &stderrBuf
+
+			outputCh := make(chan []byte, 1)
+			errCh := make(chan error, 1)
+			go func() {
+				output, err := session.Output(tc.command)
+				outputCh <- output
+				errCh <- err
+			}()
+
+			select {
+			case output := <-outputCh:
+				err := <-errCh
+				if stderrBuf.Len() > 0 {
+					t.Logf("stderr: %s", stderrBuf.String())
+				}
+				require.NoError(t, err, "command should succeed: %s", tc.command)
+				assert.Equal(t, tc.expect, string(output), "output mismatch for: %s", tc.command)
+			case <-time.After(5 * time.Second):
+				t.Fatalf("command timed out: %s", tc.command)
+			}
+		})
+	}
+}
+
+// setupProxySSHClient creates a full proxy test environment and returns
+// an SSH client connected through the proxy to a backend NetBird SSH server.
+func setupProxySSHClient(t *testing.T) (*cryptossh.Client, func()) {
+	t.Helper()
+
+	const (
+		issuer   = "https://test-issuer.example.com"
+		audience = "test-audience"
+	)
+
+	jwksServer, privateKey, jwksURL := setupJWKSServer(t)
+
+	hostKey, err := nbssh.GeneratePrivateKey(nbssh.ED25519)
+	require.NoError(t, err)
+	hostPubKey, err := nbssh.GeneratePublicKey(hostKey)
+	require.NoError(t, err)
+
+	serverConfig := &server.Config{
+		HostKeyPEM: hostKey,
+		JWT: &server.JWTConfig{
+			Issuer:       issuer,
+			Audiences:    []string{audience},
+			KeysLocation: jwksURL,
+		},
+	}
+	sshServer := server.New(serverConfig)
+	sshServer.SetAllowRootLogin(true)
+
+	testUsername := testutil.GetTestUsername(t)
+	testJWTUser := "test-username"
+	testUserHash, err := sshuserhash.HashUserID(testJWTUser)
+	require.NoError(t, err)
+
+	authConfig := &sshauth.Config{
+		UserIDClaim:     sshauth.DefaultUserIDClaim,
+		AuthorizedUsers: []sshuserhash.UserIDHash{testUserHash},
+		MachineUsers: map[string][]uint32{
+			testUsername: {0},
+		},
+	}
+	sshServer.UpdateSSHAuth(authConfig)
+
+	sshServerAddr := server.StartTestServer(t, sshServer)
+
+	mockDaemon := startMockDaemon(t)
+
+	host, portStr, err := net.SplitHostPort(sshServerAddr)
+	require.NoError(t, err)
+	port, err := strconv.Atoi(portStr)
+	require.NoError(t, err)
+
+	mockDaemon.setHostKey(host, hostPubKey)
+
+	validToken := generateValidJWT(t, privateKey, issuer, audience, testJWTUser)
+	mockDaemon.setJWTToken(validToken)
+
+	proxyInstance, err := New(mockDaemon.addr, host, port, io.Discard, nil)
+	require.NoError(t, err)
+
+	origStdin := os.Stdin
+	origStdout := os.Stdout
+
+	stdinReader, stdinWriter, err := os.Pipe()
+	require.NoError(t, err)
+	stdoutReader, stdoutWriter, err := os.Pipe()
+	require.NoError(t, err)
+
+	os.Stdin = stdinReader
+	os.Stdout = stdoutWriter
+
+	clientConn, proxyConn := net.Pipe()
+
+	go func() { _, _ = io.Copy(stdinWriter, proxyConn) }()
+	go func() { _, _ = io.Copy(proxyConn, stdoutReader) }()
+
+	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+
+	go func() {
+		_ = proxyInstance.Connect(ctx)
+	}()
+
+	sshConfig := &cryptossh.ClientConfig{
+		User:            testutil.GetTestUsername(t),
+		Auth:            []cryptossh.AuthMethod{},
+		HostKeyCallback: cryptossh.InsecureIgnoreHostKey(),
+		Timeout:         5 * time.Second,
+	}
+
+	sshClientConn, chans, reqs, err := cryptossh.NewClientConn(clientConn, "test", sshConfig)
+	require.NoError(t, err)
+
+	client := cryptossh.NewClient(sshClientConn, chans, reqs)
+
+	cleanupFn := func() {
+		_ = client.Close()
+		_ = clientConn.Close()
+		cancel()
+		os.Stdin = origStdin
+		os.Stdout = origStdout
+		_ = sshServer.Stop()
+		mockDaemon.stop()
+		jwksServer.Close()
+	}
+
+	return client, cleanupFn
+}
+
+func setupJWKSServer(t *testing.T) (*httptest.Server, *rsa.PrivateKey, string) {
+	t.Helper()
+	privateKey, jwksJSON := generateTestJWKS(t)
+
+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.Header().Set("Content-Type", "application/json")
+		if _, err := w.Write(jwksJSON); err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+		}
+	}))
+
+	return server, privateKey, server.URL
+}
+
+func generateTestJWKS(t *testing.T) (*rsa.PrivateKey, []byte) {
+	t.Helper()
+	privateKey, err := rsa.GenerateKey(rand.Reader, 2048)
+	require.NoError(t, err)
+
+	publicKey := &privateKey.PublicKey
+	n := publicKey.N.Bytes()
+	e := publicKey.E
+
+	jwk := nbjwt.JSONWebKey{
+		Kty: "RSA",
+		Kid: "test-key-id",
+		Use: "sig",
+		N:   base64.RawURLEncoding.EncodeToString(n),
+		E:   base64.RawURLEncoding.EncodeToString(big.NewInt(int64(e)).Bytes()),
+	}
+
+	jwks := nbjwt.Jwks{
+		Keys: []nbjwt.JSONWebKey{jwk},
+	}
+
+	jwksJSON, err := json.Marshal(jwks)
+	require.NoError(t, err)
+
+	return privateKey, jwksJSON
+}
+
+func generateValidJWT(t *testing.T, privateKey *rsa.PrivateKey, issuer, audience string, user string) string {
+	t.Helper()
+	claims := jwt.MapClaims{
+		"iss": issuer,
+		"aud": audience,
+		"sub": user,
+		"exp": time.Now().Add(time.Hour).Unix(),
+		"iat": time.Now().Unix(),
+	}
+
+	token := jwt.NewWithClaims(jwt.SigningMethodRS256, claims)
+	token.Header["kid"] = "test-key-id"
+
+	tokenString, err := token.SignedString(privateKey)
+	require.NoError(t, err)
+
+	return tokenString
+}
--- a/client/ssh/proxy/proxy_test.go
+++ b/client/ssh/proxy/proxy_test.go
@@ -1,25 +1,12 @@
 package proxy

 import (
-	"bytes"
 	"context"
-	"crypto/rand"
-	"crypto/rsa"
-	"encoding/base64"
-	"encoding/json"
 	"fmt"
-	"io"
-	"math/big"
 	"net"
-	"net/http"
-	"net/http/httptest"
 	"os"
-	"runtime"
-	"strconv"
 	"testing"
-	"time"

-	"github.com/golang-jwt/jwt/v5"
 	"github.com/stretchr/testify/assert"
 	"github.com/stretchr/testify/require"
 	cryptossh "golang.org/x/crypto/ssh"
@@ -28,11 +15,7 @@ import (

 	"github.com/netbirdio/netbird/client/proto"
 	nbssh "github.com/netbirdio/netbird/client/ssh"
-	sshauth "github.com/netbirdio/netbird/client/ssh/auth"
-	"github.com/netbirdio/netbird/client/ssh/server"
 	"github.com/netbirdio/netbird/client/ssh/testutil"
-	nbjwt "github.com/netbirdio/netbird/shared/auth/jwt"
-	sshuserhash "github.com/netbirdio/netbird/shared/sshauth"
 )

 func TestMain(m *testing.M) {
@@ -106,331 +89,6 @@ func TestSSHProxy_verifyHostKey(t *testing.T) {
 	})
 }

-func TestSSHProxy_Connect(t *testing.T) {
-	if testing.Short() {
-		t.Skip("Skipping integration test in short mode")
-	}
-
-	// TODO: Windows test times out - user switching and command execution tested on Linux
-	if runtime.GOOS == "windows" {
-		t.Skip("Skipping on Windows - covered by Linux tests")
-	}
-
-	const (
-		issuer   = "https://test-issuer.example.com"
-		audience = "test-audience"
-	)
-
-	jwksServer, privateKey, jwksURL := setupJWKSServer(t)
-	defer jwksServer.Close()
-
-	hostKey, err := nbssh.GeneratePrivateKey(nbssh.ED25519)
-	require.NoError(t, err)
-	hostPubKey, err := nbssh.GeneratePublicKey(hostKey)
-	require.NoError(t, err)
-
-	serverConfig := &server.Config{
-		HostKeyPEM: hostKey,
-		JWT: &server.JWTConfig{
-			Issuer:       issuer,
-			Audiences:    []string{audience},
-			KeysLocation: jwksURL,
-		},
-	}
-	sshServer := server.New(serverConfig)
-	sshServer.SetAllowRootLogin(true)
-
-	// Configure SSH authorization for the test user
-	testUsername := testutil.GetTestUsername(t)
-	testJWTUser := "test-username"
-	testUserHash, err := sshuserhash.HashUserID(testJWTUser)
-	require.NoError(t, err)
-
-	authConfig := &sshauth.Config{
-		UserIDClaim:     sshauth.DefaultUserIDClaim,
-		AuthorizedUsers: []sshuserhash.UserIDHash{testUserHash},
-		MachineUsers: map[string][]uint32{
-			testUsername: {0}, // Index 0 in AuthorizedUsers
-		},
-	}
-	sshServer.UpdateSSHAuth(authConfig)
-
-	sshServerAddr := server.StartTestServer(t, sshServer)
-	defer func() { _ = sshServer.Stop() }()
-
-	mockDaemon := startMockDaemon(t)
-	defer mockDaemon.stop()
-
-	host, portStr, err := net.SplitHostPort(sshServerAddr)
-	require.NoError(t, err)
-	port, err := strconv.Atoi(portStr)
-	require.NoError(t, err)
-
-	mockDaemon.setHostKey(host, hostPubKey)
-
-	validToken := generateValidJWT(t, privateKey, issuer, audience, testJWTUser)
-	mockDaemon.setJWTToken(validToken)
-
-	proxyInstance, err := New(mockDaemon.addr, host, port, io.Discard, nil)
-	require.NoError(t, err)
-
-	clientConn, proxyConn := net.Pipe()
-	defer func() { _ = clientConn.Close() }()
-
-	origStdin := os.Stdin
-	origStdout := os.Stdout
-	defer func() {
-		os.Stdin = origStdin
-		os.Stdout = origStdout
-	}()
-
-	stdinReader, stdinWriter, err := os.Pipe()
-	require.NoError(t, err)
-	stdoutReader, stdoutWriter, err := os.Pipe()
-	require.NoError(t, err)
-
-	os.Stdin = stdinReader
-	os.Stdout = stdoutWriter
-
-	go func() {
-		_, _ = io.Copy(stdinWriter, proxyConn)
-	}()
-	go func() {
-		_, _ = io.Copy(proxyConn, stdoutReader)
-	}()
-
-	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
-	defer cancel()
-
-	connectErrCh := make(chan error, 1)
-	go func() {
-		connectErrCh <- proxyInstance.Connect(ctx)
-	}()
-
-	sshConfig := &cryptossh.ClientConfig{
-		User:            testutil.GetTestUsername(t),
-		Auth:            []cryptossh.AuthMethod{},
-		HostKeyCallback: cryptossh.InsecureIgnoreHostKey(),
-		Timeout:         3 * time.Second,
-	}
-
-	sshClientConn, chans, reqs, err := cryptossh.NewClientConn(clientConn, "test", sshConfig)
-	require.NoError(t, err, "Should connect to proxy server")
-	defer func() { _ = sshClientConn.Close() }()
-
-	sshClient := cryptossh.NewClient(sshClientConn, chans, reqs)
-
-	session, err := sshClient.NewSession()
-	require.NoError(t, err, "Should create session through full proxy to backend")
-
-	outputCh := make(chan []byte, 1)
-	errCh := make(chan error, 1)
-	go func() {
-		output, err := session.Output("echo hello-from-proxy")
-		outputCh <- output
-		errCh <- err
-	}()
-
-	select {
-	case output := <-outputCh:
-		err := <-errCh
-		require.NoError(t, err, "Command should execute successfully through proxy")
-		assert.Contains(t, string(output), "hello-from-proxy", "Should receive command output through proxy")
-	case <-time.After(3 * time.Second):
-		t.Fatal("Command execution timed out")
-	}
-
-	_ = session.Close()
-	_ = sshClient.Close()
-	_ = clientConn.Close()
-	cancel()
-}
-
-// TestSSHProxy_CommandQuoting verifies that the proxy preserves shell quoting
-// when forwarding commands to the backend. This is critical for tools like
-// Ansible that send commands such as:
-//
-//	/bin/sh -c '( umask 77 && mkdir -p ... ) && sleep 0'
-//
-// The single quotes must be preserved so the backend shell receives the
-// subshell expression as a single argument to -c.
-func TestSSHProxy_CommandQuoting(t *testing.T) {
-	if testing.Short() {
-		t.Skip("Skipping integration test in short mode")
-	}
-
-	sshClient, cleanup := setupProxySSHClient(t)
-	defer cleanup()
-
-	// These commands simulate what the SSH protocol delivers as exec payloads.
-	// When a user types: ssh host '/bin/sh -c "( echo hello )"'
-	// the local shell strips the outer single quotes, and the SSH exec request
-	// contains the raw string: /bin/sh -c "( echo hello )"
-	//
-	// The proxy must forward this string verbatim. Using session.Command()
-	// (shlex.Split + strings.Join) strips the inner double quotes, breaking
-	// the command on the backend.
-	tests := []struct {
-		name    string
-		command string
-		expect  string
-	}{
-		{
-			name:    "subshell_in_double_quotes",
-			command: `/bin/sh -c "( echo from-subshell ) && echo outer"`,
-			expect:  "from-subshell\nouter\n",
-		},
-		{
-			name:    "printf_with_special_chars",
-			command: `/bin/sh -c "printf '%s\n' 'hello world'"`,
-			expect:  "hello world\n",
-		},
-		{
-			name:    "nested_command_substitution",
-			command: `/bin/sh -c "echo $(echo nested)"`,
-			expect:  "nested\n",
-		},
-	}
-
-	for _, tc := range tests {
-		t.Run(tc.name, func(t *testing.T) {
-			session, err := sshClient.NewSession()
-			require.NoError(t, err)
-			defer func() { _ = session.Close() }()
-
-			var stderrBuf bytes.Buffer
-			session.Stderr = &stderrBuf
-
-			outputCh := make(chan []byte, 1)
-			errCh := make(chan error, 1)
-			go func() {
-				output, err := session.Output(tc.command)
-				outputCh <- output
-				errCh <- err
-			}()
-
-			select {
-			case output := <-outputCh:
-				err := <-errCh
-				if stderrBuf.Len() > 0 {
-					t.Logf("stderr: %s", stderrBuf.String())
-				}
-				require.NoError(t, err, "command should succeed: %s", tc.command)
-				assert.Equal(t, tc.expect, string(output), "output mismatch for: %s", tc.command)
-			case <-time.After(5 * time.Second):
-				t.Fatalf("command timed out: %s", tc.command)
-			}
-		})
-	}
-}
-
-// setupProxySSHClient creates a full proxy test environment and returns
-// an SSH client connected through the proxy to a backend NetBird SSH server.
-func setupProxySSHClient(t *testing.T) (*cryptossh.Client, func()) {
-	t.Helper()
-
-	const (
-		issuer   = "https://test-issuer.example.com"
-		audience = "test-audience"
-	)
-
-	jwksServer, privateKey, jwksURL := setupJWKSServer(t)
-
-	hostKey, err := nbssh.GeneratePrivateKey(nbssh.ED25519)
-	require.NoError(t, err)
-	hostPubKey, err := nbssh.GeneratePublicKey(hostKey)
-	require.NoError(t, err)
-
-	serverConfig := &server.Config{
-		HostKeyPEM: hostKey,
-		JWT: &server.JWTConfig{
-			Issuer:       issuer,
-			Audiences:    []string{audience},
-			KeysLocation: jwksURL,
-		},
-	}
-	sshServer := server.New(serverConfig)
-	sshServer.SetAllowRootLogin(true)
-
-	testUsername := testutil.GetTestUsername(t)
-	testJWTUser := "test-username"
-	testUserHash, err := sshuserhash.HashUserID(testJWTUser)
-	require.NoError(t, err)
-
-	authConfig := &sshauth.Config{
-		UserIDClaim:     sshauth.DefaultUserIDClaim,
-		AuthorizedUsers: []sshuserhash.UserIDHash{testUserHash},
-		MachineUsers: map[string][]uint32{
-			testUsername: {0},
-		},
-	}
-	sshServer.UpdateSSHAuth(authConfig)
-
-	sshServerAddr := server.StartTestServer(t, sshServer)
-
-	mockDaemon := startMockDaemon(t)
-
-	host, portStr, err := net.SplitHostPort(sshServerAddr)
-	require.NoError(t, err)
-	port, err := strconv.Atoi(portStr)
-	require.NoError(t, err)
-
-	mockDaemon.setHostKey(host, hostPubKey)
-
-	validToken := generateValidJWT(t, privateKey, issuer, audience, testJWTUser)
-	mockDaemon.setJWTToken(validToken)
-
-	proxyInstance, err := New(mockDaemon.addr, host, port, io.Discard, nil)
-	require.NoError(t, err)
-
-	origStdin := os.Stdin
-	origStdout := os.Stdout
-
-	stdinReader, stdinWriter, err := os.Pipe()
-	require.NoError(t, err)
-	stdoutReader, stdoutWriter, err := os.Pipe()
-	require.NoError(t, err)
-
-	os.Stdin = stdinReader
-	os.Stdout = stdoutWriter
-
-	clientConn, proxyConn := net.Pipe()
-
-	go func() { _, _ = io.Copy(stdinWriter, proxyConn) }()
-	go func() { _, _ = io.Copy(proxyConn, stdoutReader) }()
-
-	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
-
-	go func() {
-		_ = proxyInstance.Connect(ctx)
-	}()
-
-	sshConfig := &cryptossh.ClientConfig{
-		User:            testutil.GetTestUsername(t),
-		Auth:            []cryptossh.AuthMethod{},
-		HostKeyCallback: cryptossh.InsecureIgnoreHostKey(),
-		Timeout:         5 * time.Second,
-	}
-
-	sshClientConn, chans, reqs, err := cryptossh.NewClientConn(clientConn, "test", sshConfig)
-	require.NoError(t, err)
-
-	client := cryptossh.NewClient(sshClientConn, chans, reqs)
-
-	cleanupFn := func() {
-		_ = client.Close()
-		_ = clientConn.Close()
-		cancel()
-		os.Stdin = origStdin
-		os.Stdout = origStdout
-		_ = sshServer.Stop()
-		mockDaemon.stop()
-		jwksServer.Close()
-	}
-
-	return client, cleanupFn
-}
-
 type mockDaemonServer struct {
 	proto.UnimplementedDaemonServiceServer
 	hostKeys map[string][]byte
@@ -492,10 +150,6 @@ func (m *mockDaemon) setHostKey(addr string, pubKey []byte) {
 	m.impl.hostKeys[addr] = pubKey
 }

-func (m *mockDaemon) setJWTToken(token string) {
-	m.impl.jwtToken = token
-}
-
 func (m *mockDaemon) stop() {
 	if m.server != nil {
 		m.server.Stop()
@@ -508,63 +162,3 @@ func mustParsePublicKey(t *testing.T, pubKeyBytes []byte) cryptossh.PublicKey {
 	require.NoError(t, err)
 	return pubKey
 }
-
-func setupJWKSServer(t *testing.T) (*httptest.Server, *rsa.PrivateKey, string) {
-	t.Helper()
-	privateKey, jwksJSON := generateTestJWKS(t)
-
-	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-		w.Header().Set("Content-Type", "application/json")
-		if _, err := w.Write(jwksJSON); err != nil {
-			http.Error(w, err.Error(), http.StatusInternalServerError)
-		}
-	}))
-
-	return server, privateKey, server.URL
-}
-
-func generateTestJWKS(t *testing.T) (*rsa.PrivateKey, []byte) {
-	t.Helper()
-	privateKey, err := rsa.GenerateKey(rand.Reader, 2048)
-	require.NoError(t, err)
-
-	publicKey := &privateKey.PublicKey
-	n := publicKey.N.Bytes()
-	e := publicKey.E
-
-	jwk := nbjwt.JSONWebKey{
-		Kty: "RSA",
-		Kid: "test-key-id",
-		Use: "sig",
-		N:   base64.RawURLEncoding.EncodeToString(n),
-		E:   base64.RawURLEncoding.EncodeToString(big.NewInt(int64(e)).Bytes()),
-	}
-
-	jwks := nbjwt.Jwks{
-		Keys: []nbjwt.JSONWebKey{jwk},
-	}
-
-	jwksJSON, err := json.Marshal(jwks)
-	require.NoError(t, err)
-
-	return privateKey, jwksJSON
-}
-
-func generateValidJWT(t *testing.T, privateKey *rsa.PrivateKey, issuer, audience string, user string) string {
-	t.Helper()
-	claims := jwt.MapClaims{
-		"iss": issuer,
-		"aud": audience,
-		"sub": user,
-		"exp": time.Now().Add(time.Hour).Unix(),
-		"iat": time.Now().Unix(),
-	}
-
-	token := jwt.NewWithClaims(jwt.SigningMethodRS256, claims)
-	token.Header["kid"] = "test-key-id"
-
-	tokenString, err := token.SignedString(privateKey)
-	require.NoError(t, err)
-
-	return tokenString
-}
--- a/client/ssh/server/executor_unix_privileged_test.go
+++ b/client/ssh/server/executor_unix_privileged_test.go
@@ -0,0 +1,66 @@
+//go:build unix && privileged
+
+package server
+
+import (
+	"context"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestPrivilegeDropper_CreateExecutorCommand(t *testing.T) {
+	pd := NewPrivilegeDropper()
+
+	config := ExecutorConfig{
+		UID:        1000,
+		GID:        1000,
+		Groups:     []uint32{1000, 1001},
+		WorkingDir: "/home/testuser",
+		Shell:      "/bin/bash",
+		Command:    "ls -la",
+	}
+
+	cmd, err := pd.CreateExecutorCommand(context.Background(), config)
+	require.NoError(t, err)
+	require.NotNil(t, cmd)
+
+	// Verify the command is calling netbird ssh exec
+	assert.Contains(t, cmd.Args, "ssh")
+	assert.Contains(t, cmd.Args, "exec")
+	assert.Contains(t, cmd.Args, "--uid")
+	assert.Contains(t, cmd.Args, "1000")
+	assert.Contains(t, cmd.Args, "--gid")
+	assert.Contains(t, cmd.Args, "1000")
+	assert.Contains(t, cmd.Args, "--groups")
+	assert.Contains(t, cmd.Args, "1000")
+	assert.Contains(t, cmd.Args, "1001")
+	assert.Contains(t, cmd.Args, "--working-dir")
+	assert.Contains(t, cmd.Args, "/home/testuser")
+	assert.Contains(t, cmd.Args, "--shell")
+	assert.Contains(t, cmd.Args, "/bin/bash")
+	assert.Contains(t, cmd.Args, "--cmd")
+	assert.Contains(t, cmd.Args, "ls -la")
+}
+
+func TestPrivilegeDropper_CreateExecutorCommandInteractive(t *testing.T) {
+	pd := NewPrivilegeDropper()
+
+	config := ExecutorConfig{
+		UID:        1000,
+		GID:        1000,
+		Groups:     []uint32{1000},
+		WorkingDir: "/home/testuser",
+		Shell:      "/bin/bash",
+		Command:    "",
+	}
+
+	cmd, err := pd.CreateExecutorCommand(context.Background(), config)
+	require.NoError(t, err)
+	require.NotNil(t, cmd)
+
+	// Verify no command mode (command is empty so no --cmd flag)
+	assert.NotContains(t, cmd.Args, "--cmd")
+	assert.NotContains(t, cmd.Args, "--interactive")
+}
--- a/client/ssh/server/executor_unix_test.go
+++ b/client/ssh/server/executor_unix_test.go
@@ -73,61 +73,6 @@ func TestPrivilegeDropper_ValidatePrivileges(t *testing.T) {
 	}
 }

-func TestPrivilegeDropper_CreateExecutorCommand(t *testing.T) {
-	pd := NewPrivilegeDropper()
-
-	config := ExecutorConfig{
-		UID:        1000,
-		GID:        1000,
-		Groups:     []uint32{1000, 1001},
-		WorkingDir: "/home/testuser",
-		Shell:      "/bin/bash",
-		Command:    "ls -la",
-	}
-
-	cmd, err := pd.CreateExecutorCommand(context.Background(), config)
-	require.NoError(t, err)
-	require.NotNil(t, cmd)
-
-	// Verify the command is calling netbird ssh exec
-	assert.Contains(t, cmd.Args, "ssh")
-	assert.Contains(t, cmd.Args, "exec")
-	assert.Contains(t, cmd.Args, "--uid")
-	assert.Contains(t, cmd.Args, "1000")
-	assert.Contains(t, cmd.Args, "--gid")
-	assert.Contains(t, cmd.Args, "1000")
-	assert.Contains(t, cmd.Args, "--groups")
-	assert.Contains(t, cmd.Args, "1000")
-	assert.Contains(t, cmd.Args, "1001")
-	assert.Contains(t, cmd.Args, "--working-dir")
-	assert.Contains(t, cmd.Args, "/home/testuser")
-	assert.Contains(t, cmd.Args, "--shell")
-	assert.Contains(t, cmd.Args, "/bin/bash")
-	assert.Contains(t, cmd.Args, "--cmd")
-	assert.Contains(t, cmd.Args, "ls -la")
-}
-
-func TestPrivilegeDropper_CreateExecutorCommandInteractive(t *testing.T) {
-	pd := NewPrivilegeDropper()
-
-	config := ExecutorConfig{
-		UID:        1000,
-		GID:        1000,
-		Groups:     []uint32{1000},
-		WorkingDir: "/home/testuser",
-		Shell:      "/bin/bash",
-		Command:    "",
-	}
-
-	cmd, err := pd.CreateExecutorCommand(context.Background(), config)
-	require.NoError(t, err)
-	require.NotNil(t, cmd)
-
-	// Verify no command mode (command is empty so no --cmd flag)
-	assert.NotContains(t, cmd.Args, "--cmd")
-	assert.NotContains(t, cmd.Args, "--interactive")
-}
-
 // TestPrivilegeDropper_ActualPrivilegeDrop tests actual privilege dropping
 // This test requires root privileges and will be skipped if not running as root
 func TestPrivilegeDropper_ActualPrivilegeDrop(t *testing.T) {
--- a/client/system/info.go
+++ b/client/system/info.go
@@ -3,6 +3,7 @@ package system
 import (
 	"context"
 	"net/netip"
+	"slices"
 	"strings"

 	log "github.com/sirupsen/logrus"
@@ -121,6 +122,23 @@ func (i *Info) SetFlags(
 	}
 }

+// removeAddresses drops network addresses whose IP matches any of the given
+// addresses, regardless of prefix length. Used to exclude the NetBird overlay
+// address, which otherwise churns the meta as the interface comes and goes.
+func (i *Info) removeAddresses(ips ...netip.Addr) {
+	if len(ips) == 0 {
+		return
+	}
+	filtered := i.NetworkAddresses[:0]
+	for _, addr := range i.NetworkAddresses {
+		if slices.Contains(ips, addr.NetIP.Addr()) {
+			continue
+		}
+		filtered = append(filtered, addr)
+	}
+	i.NetworkAddresses = filtered
+}
+
 // extractUserAgent extracts Netbird's agent (client) name and version from the outgoing context
 func extractUserAgent(ctx context.Context) string {
 	md, hasMeta := metadata.FromOutgoingContext(ctx)
@@ -147,7 +165,9 @@ func extractDeviceName(ctx context.Context, defaultName string) string {
 }

 // GetInfoWithChecks retrieves and parses the system information with applied checks.
-func GetInfoWithChecks(ctx context.Context, checks []*proto.Checks) (*Info, error) {
+// excludeIPs are dropped from the reported network addresses (e.g. our own
+// WireGuard overlay address, which otherwise churns the peer meta).
+func GetInfoWithChecks(ctx context.Context, checks []*proto.Checks, excludeIPs ...netip.Addr) (*Info, error) {
 	log.Debugf("gathering system information with checks: %d", len(checks))
 	processCheckPaths := make([]string, 0)
 	for _, check := range checks {
@@ -162,6 +182,7 @@ func GetInfoWithChecks(ctx context.Context, checks []*proto.Checks) (*Info, erro

 	info := GetInfo(ctx)
 	info.Files = files
+	info.removeAddresses(excludeIPs...)

 	log.Debugf("all system information gathered successfully")
 	return info, nil
--- a/client/system/info_test.go
+++ b/client/system/info_test.go
@@ -2,6 +2,7 @@ package system

 import (
 	"context"
+	"net/netip"
 	"testing"

 	"github.com/stretchr/testify/assert"
@@ -43,3 +44,42 @@ func Test_NetAddresses(t *testing.T) {
 		t.Errorf("no network addresses found")
 	}
 }
+
+func TestInfo_RemoveAddresses(t *testing.T) {
+	addr := func(cidr string) NetworkAddress {
+		return NetworkAddress{NetIP: netip.MustParsePrefix(cidr)}
+	}
+
+	info := &Info{
+		NetworkAddresses: []NetworkAddress{
+			addr("192.168.1.7/24"),
+			addr("100.76.70.97/32"),                          // overlay v4 (host mask /32)
+			addr("2001:818:c51b:4800:845:a65d:ae6f:623f/64"), // real global v6
+			addr("fd00:1234::1/64"),                          // overlay v6
+		},
+	}
+
+	// Overlay addresses as the engine knows them, with a different mask (/16, /64).
+	info.removeAddresses(
+		netip.MustParseAddr("100.76.70.97"),
+		netip.MustParseAddr("fd00:1234::1"),
+	)
+
+	want := []string{"192.168.1.7/24", "2001:818:c51b:4800:845:a65d:ae6f:623f/64"}
+	if len(info.NetworkAddresses) != len(want) {
+		t.Fatalf("got %d addresses, want %d: %v", len(info.NetworkAddresses), len(want), info.NetworkAddresses)
+	}
+	for i, w := range want {
+		if got := info.NetworkAddresses[i].NetIP.String(); got != w {
+			t.Errorf("address[%d] = %s, want %s", i, got, w)
+		}
+	}
+}
+
+func TestInfo_RemoveAddresses_NoOp(t *testing.T) {
+	info := &Info{NetworkAddresses: []NetworkAddress{{NetIP: netip.MustParsePrefix("10.0.0.1/24")}}}
+	info.removeAddresses()
+	if len(info.NetworkAddresses) != 1 {
+		t.Errorf("expected no change with empty input, got %v", info.NetworkAddresses)
+	}
+}
--- a/client/system/network_addr.go
+++ b/client/system/network_addr.go
@@ -46,7 +46,9 @@ func toNetworkAddress(address net.Addr, mac string) (NetworkAddress, bool) {
 	if !ok {
 		return NetworkAddress{}, false
 	}
-	if ipNet.IP.IsLoopback() {
+	// Skip link-local and multicast: they carry no routable peer info and the
+	// IPv6 link-local of a flapping NIC churns the meta on every up/down.
+	if ipNet.IP.IsLoopback() || ipNet.IP.IsLinkLocalUnicast() || ipNet.IP.IsMulticast() {
 		return NetworkAddress{}, false
 	}
 	prefix, err := netip.ParsePrefix(ipNet.String())
--- a/client/system/network_addr_test.go
+++ b/client/system/network_addr_test.go
@@ -0,0 +1,45 @@
+//go:build !ios
+
+package system
+
+import (
+	"net"
+	"testing"
+)
+
+func mustIPNet(t *testing.T, cidr string) *net.IPNet {
+	t.Helper()
+	ip, ipNet, err := net.ParseCIDR(cidr)
+	if err != nil {
+		t.Fatalf("parse %q: %v", cidr, err)
+	}
+	ipNet.IP = ip
+	return ipNet
+}
+
+func TestToNetworkAddress_Filtering(t *testing.T) {
+	const mac = "c8:4b:d6:b6:04:ac"
+
+	tests := []struct {
+		name string
+		cidr string
+		want bool
+	}{
+		{"ipv4 global", "10.65.16.181/23", true},
+		{"ipv6 global", "2620:52:0:4110:102d:6a98:ee75:8b92/64", true},
+		{"ipv4 loopback", "127.0.0.1/8", false},
+		{"ipv6 loopback", "::1/128", false},
+		{"ipv6 link-local", "fe80::871:4c25:23d7:2529/64", false},
+		{"ipv4 link-local", "169.254.1.2/16", false},
+		{"ipv6 multicast", "ff02::1/128", false},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			_, got := toNetworkAddress(mustIPNet(t, tt.cidr), mac)
+			if got != tt.want {
+				t.Errorf("toNetworkAddress(%s) ok = %v, want %v", tt.cidr, got, tt.want)
+			}
+		})
+	}
+}
--- a/client/testutil/privileged/runner_test.go
+++ b/client/testutil/privileged/runner_test.go
@@ -0,0 +1,196 @@
+//go:build privileged && (linux || darwin)
+
+// Package privileged provides a self-hosting harness that runs the repo's
+// privileged-tagged test suite inside a --privileged --cap-add=NET_ADMIN
+// container, so developers can exercise the root/system-mutating tests on a
+// non-root host with a single `go test` invocation.
+package privileged
+
+import (
+	"bytes"
+	"context"
+	"fmt"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+
+	"github.com/moby/moby/api/types/container"
+	"github.com/ory/dockertest/v4"
+)
+
+// containerImage / containerTag match the image used by the CI privileged job
+// (.github/workflows/golang-test-linux.yml, test_client_on_docker).
+const (
+	containerImage = "golang"
+	containerTag   = "1.25-alpine"
+)
+
+const (
+	containerWorkdir    = "/app"
+	containerGoCache    = "/root/.cache/go-build"
+	containerGoModCache = "/go/pkg/mod"
+)
+
+// alpinePackages are the build/runtime deps the privileged tests need, mirroring
+// the CI container setup.
+const alpinePackages = "ca-certificates iptables ip6tables dbus dbus-dev libpcap-dev build-base"
+
+// privilegedTestPackages is the package list the suite runs, excluding the
+// server-side trees and UI/upload helpers, matching the CI Docker job's filter.
+const privilegedTestPackages = `go list -buildvcs=false ./... | grep -v -e /management -e /signal -e /relay -e /proxy -e /combined -e /client/ui -e /upload-server`
+
+// testWriter forwards container output to the test log line by line.
+type testWriter struct{ t *testing.T }
+
+func (w testWriter) Write(p []byte) (int, error) {
+	for _, line := range strings.Split(strings.TrimRight(string(p), "\n"), "\n") {
+		w.t.Log(line)
+	}
+	return len(p), nil
+}
+
+// TestRunPrivilegedSuiteInDocker spins up a privileged container, mounts the repo,
+// and runs `go test -tags 'devcert privileged'` inside it. When already running
+// inside that container (DOCKER_CI=true) it returns immediately so the real
+// privileged tests in the suite execute in place instead of recursing.
+func TestRunPrivilegedSuiteInDocker(t *testing.T) {
+	if os.Getenv("DOCKER_CI") == "true" {
+		t.Skip("inside privileged container, skipping container spawn; privileged tests run in place")
+	}
+
+	repoRoot, err := findRepoRoot()
+	if err != nil {
+		t.Fatalf("locate repo root: %v", err)
+	}
+	goCache, goModCache := hostGoCaches(t)
+
+	// dockertest reads DOCKER_HOST; point it at the active context's socket when
+	// the default one is absent (macOS Docker Desktop, Colima, OrbStack).
+	if host := dockerHost(); host != "" {
+		t.Setenv("DOCKER_HOST", host)
+	}
+
+	// NewPoolT registers container cleanup via t.Cleanup automatically.
+	pool := dockertest.NewPoolT(t, "", dockertest.WithMaxWait(30*time.Minute))
+
+	// Keep the container alive so the suite runs via Exec, which yields a clean
+	// exit code (the v4 Resource API exposes no container wait/exit-code).
+	resource := pool.RunT(t, containerImage,
+		dockertest.WithTag(containerTag),
+		dockertest.WithWorkingDir(containerWorkdir),
+		dockertest.WithMounts([]string{
+			repoRoot + ":" + containerWorkdir,
+			goCache + ":" + containerGoCache,
+			goModCache + ":" + containerGoModCache,
+		}),
+		dockertest.WithEnv([]string{
+			"CGO_ENABLED=1",
+			"CI=true",
+			"DOCKER_CI=true",
+			"CONTAINER=true",
+			"GOCACHE=" + containerGoCache,
+			"GOMODCACHE=" + containerGoModCache,
+		}),
+		dockertest.WithCmd([]string{"sleep", "infinity"}),
+		dockertest.WithHostConfig(func(hc *container.HostConfig) {
+			hc.Privileged = true
+			hc.CapAdd = []string{"NET_ADMIN"}
+		}),
+		dockertest.WithoutReuse(),
+	)
+
+	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
+	defer cancel()
+
+	result, err := resource.Exec(ctx, []string{"sh", "-c", buildTestScript()})
+	if err != nil {
+		t.Fatalf("run privileged suite in container: %v", err)
+	}
+
+	w := testWriter{t}
+	_, _ = w.Write([]byte(result.StdOut))
+	_, _ = w.Write([]byte(result.StdErr))
+
+	if result.ExitCode != 0 {
+		t.Fatalf("privileged test suite failed in container (exit code %d)", result.ExitCode)
+	}
+}
+
+// findRepoRoot walks up from the test's working directory to the module root.
+func findRepoRoot() (string, error) {
+	dir, err := os.Getwd()
+	if err != nil {
+		return "", err
+	}
+	for {
+		if _, statErr := os.Stat(filepath.Join(dir, "go.mod")); statErr == nil {
+			return dir, nil
+		}
+		parent := filepath.Dir(dir)
+		if parent == dir {
+			return "", fmt.Errorf("go.mod not found above %s", dir)
+		}
+		dir = parent
+	}
+}
+
+// dockerHost returns a DOCKER_HOST override when the default socket is missing.
+// An empty result means the caller should leave DOCKER_HOST untouched (it is
+// already set, or the default unix socket exists). When neither is present
+// (common on macOS Docker Desktop, Colima and OrbStack, which use a per-user
+// socket), it resolves the active docker context's endpoint.
+func dockerHost() string {
+	if os.Getenv("DOCKER_HOST") != "" {
+		return ""
+	}
+	if _, err := os.Stat("/var/run/docker.sock"); err == nil {
+		return ""
+	}
+
+	out, err := exec.Command("docker", "context", "inspect", "-f", "{{.Endpoints.docker.Host}}").Output()
+	if err != nil {
+		return ""
+	}
+	return strings.TrimSpace(string(out))
+}
+
+// hostGoCaches resolves the host GOCACHE/GOMODCACHE so the container reuses the
+// existing build/module cache for speed.
+func hostGoCaches(t *testing.T) (string, string) {
+	t.Helper()
+	return goEnv(t, "GOCACHE"), goEnv(t, "GOMODCACHE")
+}
+
+func goEnv(t *testing.T, key string) string {
+	t.Helper()
+	var out bytes.Buffer
+	cmd := exec.Command("go", "env", key)
+	cmd.Stdout = &out
+	if err := cmd.Run(); err != nil {
+		t.Fatalf("go env %s: %v", key, err)
+	}
+	return strings.TrimSpace(out.String())
+}
+
+// buildTestScript builds the in-container command. PRIV_PKGS overrides the package
+// list (default: the full filtered set); PRIV_RUN adds a -run test-name filter.
+// Both empty reproduces the full privileged suite.
+func buildTestScript() string {
+	pkgs := privilegedTestPackages + " | xargs"
+	if p := os.Getenv("PRIV_PKGS"); p != "" {
+		pkgs = "echo " + p + " | xargs"
+	}
+
+	runFilter := ""
+	if r := os.Getenv("PRIV_RUN"); r != "" {
+		runFilter = "-run '" + r + "' "
+	}
+
+	return fmt.Sprintf(
+		"apk update >/dev/null && apk add --no-cache %s >/dev/null && %s go test -buildvcs=false -tags 'devcert privileged' %s-v -timeout 20m -p 1",
+		alpinePackages, pkgs, runFilter,
+	)
+}
--- a/combined/Dockerfile.multistage
+++ b/combined/Dockerfile.multistage
@@ -5,16 +5,12 @@ WORKDIR /app
 RUN apt-get update && apt-get install -y gcc libc6-dev git && rm -rf /var/lib/apt/lists/*

 COPY go.mod go.sum ./
-RUN --mount=type=cache,target=/go/pkg/mod go mod download
+RUN go mod download

 COPY . .

-# Build with version info from git (matching goreleaser ldflags).
-# BuildKit cache mounts persist the module + build caches across image builds,
-# so a source change recompiles incrementally instead of from scratch.
-RUN --mount=type=cache,target=/go/pkg/mod \
-    --mount=type=cache,target=/root/.cache/go-build \
-    CGO_ENABLED=1 GOOS=linux go build \
+# Build with version info from git (matching goreleaser ldflags)
+RUN CGO_ENABLED=1 GOOS=linux go build \
    -ldflags="-s -w \
    -X github.com/netbirdio/netbird/version.version=$(git describe --tags --always --dirty 2>/dev/null || echo 'dev') \
    -X main.commit=$(git rev-parse --short HEAD 2>/dev/null || echo 'unknown') \
--- a/docs/agent-networks/00-overview.md
+++ b/docs/agent-networks/00-overview.md
@@ -1,109 +0,0 @@
-# Agent Networks — overview
-
-Single-entry point. Feature scope, the module map, and the cross-cutting
-topics worth keeping in mind, with links into every per-module guide.
-
-## TL;DR
-
-Agent Networks introduces an **LLM-aware reverse-proxy middleware system**
-plus **account-level controls** (budget rules, log collection toggles,
-PII redaction). The management server synthesises a per-peer middleware
-chain that the proxy executes on every LLM request; the chain enforces
-quotas, injects identity, redacts PII, parses tokens/cost, and emits
-access-log entries. The dashboard exposes the surface as a single **AI
-Observability** page with four tabs.
-
- **Backend** lives in this repo, primarily under
-  `management/server/agentnetwork`, `proxy/internal/middleware`, and
-  `proxy/internal/llm`, with wire contracts in `shared/management`.
- **Dashboard** lives in the dashboard repo under
-  `src/modules/agent-network/` and `src/app/(dashboard)/agent-network/`.
-
-## Reading order
-
-| # | Doc | Why |
-|---|-----|-----|
-| 1 | [01-end-to-end-flows.md](01-end-to-end-flows.md) | Get the three big diagrams in your head first. |
-| 2 | [modules/10-shared-api.md](modules/10-shared-api.md) | Wire contracts — every other module either produces or consumes these. |
-| 3 | [modules/21-management-agentnetwork.md](modules/21-management-agentnetwork.md) | The largest module; everything the proxy executes originates here. |
-| 4 | [modules/30-proxy-middleware-framework.md](modules/30-proxy-middleware-framework.md) | The generic plugin system on the proxy side. |
-| 5 | [modules/31-proxy-middleware-builtin.md](modules/31-proxy-middleware-builtin.md) | The 8 LLM middlewares that ride on the framework. |
-| 6 | Everything else in any order. | |
-
-## Module map
-
-11 modules. Each is described in detail in its own file under
-[`modules/`](modules/).
-
-| # | Module | Risk | BC impact |
-|---|--------|------|-----------|
-| 10 | [shared/api](modules/10-shared-api.md) — proto + OpenAPI | Low | Additive only |
-| 20 | [management/store](modules/20-management-store.md) — SQL persistence | Medium | Auto-migrate (additive) |
-| 21 | [management/agentnetwork](modules/21-management-agentnetwork.md) — domain layer + synthesizer | **High** | Additive |
-| 22 | [management/handlers + wiring](modules/22-management-handlers-wiring.md) — HTTP API + gRPC delivery | Medium | Additive |
-| 30 | [proxy/middleware-framework](modules/30-proxy-middleware-framework.md) — generic plugin system | High | Additive |
-| 31 | [proxy/middleware-builtin](modules/31-proxy-middleware-builtin.md) — 8 LLM middlewares | High | Additive |
-| 32 | [proxy/llm-parsers](modules/32-proxy-llm-parsers.md) — SDK adapters + pricing | Medium | Additive |
-| 33 | [proxy/runtime](modules/33-proxy-runtime.md) — translate + serve + access-log | High | Additive (touches hot path) |
-| 40 | [dashboard](modules/40-dashboard.md) — UI for everything above | Medium | Sidebar reshape |
-| 50 | [path-routed-providers](modules/50-path-routed-providers.md) — Vertex AI + Bedrock | Medium | Additive (new catalog entries) |
-
-The largest and highest-risk module is `management/agentnetwork`: it is
-the single writer of the middleware chain the proxy executes.
-
-## Cross-cutting topics
-
-These are the items most likely to bite production. Each is fully
-documented in the linked module guide.
-
-1. **Capture-pointer semantics** (`*bool` for `capture_prompt` and
-   `capture_completion`): nil = legacy emit, false = suppress, true =
-   emit. nil-vs-false must be handled at every JSON hop. See
-   [21-management-agentnetwork.md](modules/21-management-agentnetwork.md)
-   and [31-proxy-middleware-builtin.md](modules/31-proxy-middleware-builtin.md).
-2. **`ProxyMapping.Private` preservation** on per-proxy live updates.
-   Failure mode: `auth` skips `ValidateTunnelPeer` →
-   `CapturedData.UserGroups` empty → `llm_router` denies. See
-   [33-proxy-runtime.md](modules/33-proxy-runtime.md).
-3. **respInput carrying `UserEmail`/`UserGroups`/`UserGroupNames` onto
-   the response leg** in `reverseproxy.go`. Load-bearing wire that lets
-   `llm_limit_record` ship non-empty `group_ids` on `RecordLLMUsage`. See
-   [33-proxy-runtime.md](modules/33-proxy-runtime.md).
-4. **Min-wins all-must-pass budget rule semantics**. Every matching
-   rule's remaining quota must be > 0 for the request to proceed; one
-   exhausted rule blocks the whole call. Documented in
-   [21-management-agentnetwork.md](modules/21-management-agentnetwork.md)
-   and the `llm_limit_check` middleware in
-   [31-proxy-middleware-builtin.md](modules/31-proxy-middleware-builtin.md).
-5. **body-tap memory bounds**: per-direction 1 MiB cap, shared 256 MiB
-   budget, `LimitReader(r.Body, limit+1)` for truncation detection with
-   `replayReadCloser` fallback so upstream still sees the full body.
-   `cloneInputFor` deep-copies the body up to 16 times per chain — a
-   perf hot-spot. See
-   [30-proxy-middleware-framework.md](modules/30-proxy-middleware-framework.md).
-6. **UpstreamRewrite.AuthHeader bypasses the header denylist**
-   deliberately. The runtime consumer only unpacks it via the
-   trusted upstream-build path. See
-   [30-proxy-middleware-framework.md](modules/30-proxy-middleware-framework.md).
-7. **`disable_access_log` default-false semantics**: the synth target
-   sets it true, all other targets leave it false. See
-   [10-shared-api.md](modules/10-shared-api.md).
-8. **String-typed `decision` / `deny_code`** on
-   `CheckLLMPolicyLimitsResponse` — would benefit from enum pinning
-   before external consumers integrate. See
-   [10-shared-api.md](modules/10-shared-api.md).
-
-## Explicit non-goals
-
- **Reaper / GC pass over stale synth services** — designed but cut from
-  scope.
- **URL-sync for tab state on AI Observability** — read path is wired
-  (`?tab=`) but write path isn't. Future work.
- **CI golden-file regen-and-diff for `types.gen.go` /
-  `proxy_service.pb.go`** — would catch codegen drift; not yet in place.
-
-## Where to read the code
-
-Per-module file scopes are listed in each module guide. Behaviour is
-covered by Go tests co-located with each package (and an end-to-end
-chain integration test under `proxy/internal/proxy`).
--- a/docs/agent-networks/01-end-to-end-flows.md
+++ b/docs/agent-networks/01-end-to-end-flows.md
@@ -1,217 +0,0 @@
-# End-to-end flows
-
-Three cross-module mermaid diagrams. Each per-module guide repeats the
-slice that's relevant to its own scope — these are the canonical
-top-down views.
-
- [Flow A — Config → runtime (synth + deliver)](#flow-a--config--runtime-synth--deliver)
- [Flow B — Request lifecycle through the LLM chain](#flow-b--request-lifecycle-through-the-llm-chain)
- [Flow C — Budget rule feedback loop](#flow-c--budget-rule-feedback-loop)
-
---
-
-## Flow A — Config → runtime (synth + deliver)
-
-How an operator's change to a Provider, Policy, Guardrail, Budget Rule,
-or Settings record ends up as live middleware on a peer's proxy.
-
-```mermaid
-sequenceDiagram
-    autonumber
-    actor Op as Operator
-    participant UI as Dashboard
-    participant HTTP as management/handlers
-    participant Mgr as agentnetwork.Manager
-    participant Store as management/store (SQL)
-    participant Ctl as network_map.Controller
-    participant Synth as agentnetwork.SynthesizeServices
-    participant Grpc as management gRPC
-    participant Proxy as netbird-proxy
-    participant Xlate as middleware_translate
-    participant Chain as middleware.Chain
-
-    Op->>UI: edit provider/policy/budget/settings
-    UI->>HTTP: REST PUT/POST /api/agent-network/*
-    HTTP->>Mgr: SaveProvider / SavePolicy / SaveBudgetRule / SaveSettings
-    Mgr->>Store: persist (gorm)
-    Mgr-->>Ctl: account change event (Network-Map dirty)
-    loop per connected peer
-        Ctl->>Synth: SynthesizeServices(ctx, store, accountID)
-        Synth->>Store: load providers, policies, guardrails, budget rules, settings
-        Synth-->>Synth: build per-peer Service list
-        Note over Synth: each Service has a middleware<br/>chain with capture_prompt /<br/>capture_completion / redact_pii<br/>baked from account settings
-        Synth-->>Ctl: []rpservice.Service
-        Ctl->>Grpc: NetworkMap push (services + middleware configs)
-    end
-    Grpc-->>Proxy: NetworkMap stream
-    Proxy->>Xlate: translate proto MiddlewareConfig → runtime Spec
-    Xlate->>Chain: register / replace per-service chain
-    Note over Chain: chain replacement is live<br/>(no proxy restart, in-flight<br/>requests unaffected)
-```
-
-**Notes on the diagram**
-
- The `network_map.Controller` synthesises on every push, not on a
-  timer. A single config change costs O(connected peers × policies ×
-  providers) per push. See [`modules/22-management-handlers-wiring.md`](modules/22-management-handlers-wiring.md).
- `SynthesizeServices` is the single source of truth for the wire
-  format the proxy executes. Anything the proxy does that the
-  synthesiser didn't request is a bug. See
-  [`modules/21-management-agentnetwork.md`](modules/21-management-agentnetwork.md).
- The translate step (step 13) is the only place that knows the
-  middleware-ID strings on the proxy side. It must reject unknown IDs;
-  silently dropping middlewares would create a security gap (e.g.
-  missing `llm_limit_check` ⇒ unbounded spend). See
-  [`modules/33-proxy-runtime.md`](modules/33-proxy-runtime.md).
-
---
-
-## Flow B — Request lifecycle through the LLM chain
-
-What happens when an agent on the client peer sends a chat-completion /
-messages request through the synthesised reverse-proxy.
-
-```mermaid
-sequenceDiagram
-    autonumber
-    actor Agent as Agent (local)
-    participant Px as netbird-proxy
-    participant Auth as auth middleware
-    participant Map as service-mapping
-    participant Req as llm_request_parser
-    participant Rt as llm_router
-    participant Chk as llm_limit_check
-    participant Inj as llm_identity_inject
-    participant Grd as llm_guardrail
-    participant Up as upstream LLM
-    participant Resp as llm_response_parser
-    participant Cost as cost_meter
-    participant Rec as llm_limit_record
-    participant Log as access-log
-    participant MgmtGrpc as management gRPC
-
-    Agent->>Px: POST /v1/chat/completions  (OpenAI / Anthropic)
-    Px->>Auth: identify peer (user, groups)
-    Auth->>Map: resolve service from Host + path
-    Map-->>Req: dispatch chain in slot order
-
-    Req->>Req: parse body → provider, model, prompt, token estimate
-    Note over Req: capture_prompt gates raw_prompt<br/>capture (nil = legacy emit,<br/>false = drop, true = emit)
-    Req->>Rt: pass metadata
-    Rt->>Chk: route to upstream candidate
-
-    Chk->>MgmtGrpc: CheckLLMPolicyLimits(provider, model, est_tokens, groups, user)
-    MgmtGrpc-->>Chk: decision = allow / deny + deny_code
-    alt decision == deny
-        Chk-->>Log: emit access-log with deny_code<br/>(if EnableLogCollection)
-        Chk-->>Agent: 429 (or 403 per deny_code)
-    else decision == allow
-        Chk->>Inj: continue
-        Inj->>Inj: inject NetBird identity headers per provider config
-        Inj->>Grd: continue
-        Grd->>Grd: enforce model allowlist
-        Grd->>Up: forward (over WireGuard)
-        Up-->>Resp: response (JSON or SSE stream)
-        Resp->>Resp: parse usage tokens, completion
-        Note over Resp: capture_completion gates raw<br/>completion capture
-        Resp->>Cost: tokens
-        Cost->>Cost: lookup pricing.yaml + compute cost
-        Cost->>Rec: tokens + cost
-        Rec->>MgmtGrpc: RecordLLMUsage(provider, model, prompt_t, completion_t, cost, groups, user)
-        Rec-->>Log: emit access-log entry<br/>(if EnableLogCollection)
-        Log-->>Agent: 200 + body (streamed if SSE)
-    end
-```
-
-**Notes on the diagram**
-
- The chain runs in synth-defined order. Re-ordering middlewares
-  changes invariants — `llm_limit_check` must precede `llm_router` so
-  a denied request never hits upstream, and `llm_limit_record` must
-  pair with `llm_limit_check` so a successful check is always recorded
-  (or the rate-limit semantics break). See
-  [`modules/31-proxy-middleware-builtin.md`](modules/31-proxy-middleware-builtin.md).
- `llm_guardrail` is also where PII redaction happens
-  (`redact_pii = settings.RedactPii`). Phones, emails, credit cards,
-  PII names — see `redact.go` for the full set. See
-  [`modules/31-proxy-middleware-builtin.md`](modules/31-proxy-middleware-builtin.md).
- SSE streaming requires special handling on the response side; the
-  parser must handle partial chunks without buffering the whole
-  stream. See [`modules/32-proxy-llm-parsers.md`](modules/32-proxy-llm-parsers.md).
- Access-log emission is gated on `settings.EnableLogCollection`. With
-  it OFF, neither the deny nor the allow leg writes an entry — the
-  chain still runs (budget rules are still enforced) but no audit trail
-  is kept. See
-  [`modules/33-proxy-runtime.md`](modules/33-proxy-runtime.md).
-
---
-
-## Flow C — Budget rule feedback loop
-
-How an account's budget rules tighten ceilings on every request and how
-consumption flows back into the dashboard.
-
-```mermaid
-flowchart LR
-    subgraph Operator
-      DashBud[Dashboard Budget Settings tab]
-    end
-    subgraph Mgmt[Management]
-      Save[POST/PUT /api/agent-network/budget-rules]
-      Store[(SQL store)]
-      Synth[SynthesizeServices]
-      Check[CheckLLMPolicyLimits RPC]
-      Rec[RecordLLMUsage RPC]
-      Cons[/api/agent-network/consumption]
-    end
-    subgraph Proxy[Proxy]
-      Chk[llm_limit_check]
-      RecMw[llm_limit_record]
-    end
-    subgraph DashView[Dashboard Budget Dashboard tab]
-      Panel[AgentConsumptionPanel]
-    end
-
-    DashBud -->|create / update rules| Save
-    Save --> Store
-    Store --> Synth
-    Synth -->|push synth-services to peer| Proxy
-
-    Chk -->|per request| Check
-    Check -->|aggregate matching rules<br/>min-wins all-must-pass| Store
-    Check -->|allow / deny| Chk
-
-    RecMw -->|post-response| Rec
-    Rec -->|tokens + cost + groups + user| Store
-
-    Store -->|read counters| Cons
-    Cons --> Panel
-```
-
-**Notes on the diagram**
-
- **min-wins all-must-pass** is the core semantic. A budget rule binds
-  to (group set, user set) with a (window, ceiling). At check time,
-  every rule that matches the caller is evaluated; if ANY rule has
-  zero remaining quota the request is denied. This is the most
-  surprising semantic for operators — see the invariants section of
-  [`modules/21-management-agentnetwork.md`](modules/21-management-agentnetwork.md).
- The proxy never makes its own budget decisions. It always asks
-  management via `CheckLLMPolicyLimits` and reports back via
-  `RecordLLMUsage`. This keeps account-wide accounting in one place
-  and avoids per-proxy drift.
- `RecordLLMUsage` must carry `group_ids` and `user_id` so the
-  decrement hits the right rule(s). The wire that carries those
-  fields onto the response leg is `respInput` in `reverseproxy.go`. See
-  [`modules/33-proxy-runtime.md`](modules/33-proxy-runtime.md).
- The dashboard's Budget Dashboard tab polls
-  `/api/agent-network/consumption` — not gRPC, not WebSocket. Poll
-  interval lives in `AgentConsumptionPanel.tsx`. See
-  [`modules/40-dashboard.md`](modules/40-dashboard.md).
-
---
-
-## Cross-references
-
- Per-module guides: [`modules/`](modules/)
- Overview + module map: [`00-overview.md`](00-overview.md)
--- a/docs/agent-networks/README.md
+++ b/docs/agent-networks/README.md
@@ -1,66 +0,0 @@
-# Agent Networks — architecture documentation
-
-A self-contained set of documents describing the agent-networks feature:
-an LLM-aware reverse-proxy middleware system plus account-level controls
-(budget rules, log collection toggles, PII redaction). The management
-server synthesises a per-peer middleware chain that the proxy executes on
-every LLM request.
-
-## What to read first
-
-1. **[00-overview.md](00-overview.md)** — the single entry point. Feature
-   scope, the module map, and the cross-cutting topics worth keeping in
-   mind, with links to every per-module guide.
-2. **[01-end-to-end-flows.md](01-end-to-end-flows.md)** — three
-   high-level mermaid diagrams: config-to-runtime synth/delivery,
-   per-request lifecycle through the LLM chain, and the budget-rule
-   feedback loop.
-3. **Per-module guides** under `modules/` — one file per package. Each
-   describes the module boundary, the file-level layout, its own flow
-   diagrams, the public contracts, the invariants it relies on, and the
-   areas worth the closest attention.
-
-## Directory layout
-
-```
-docs/agent-networks/
-├── README.md                              # you are here
-├── 00-overview.md                         # feature summary + module map
-├── 01-end-to-end-flows.md                 # cross-module mermaid diagrams
-└── modules/
-    ├── 10-shared-api.md                   # proto + OpenAPI wire contracts
-    ├── 20-management-store.md             # SQL persistence layer
-    ├── 21-management-agentnetwork.md      # domain layer + synthesizer (largest)
-    ├── 22-management-handlers-wiring.md   # HTTP API + gRPC delivery
-    ├── 30-proxy-middleware-framework.md   # generic plugin system
-    ├── 31-proxy-middleware-builtin.md     # 8 LLM-aware middlewares
-    ├── 32-proxy-llm-parsers.md            # OpenAI/Anthropic/Bedrock SDKs + pricing
-    ├── 33-proxy-runtime.md                # translate + serve + access-log
-    ├── 40-dashboard.md                    # UI for everything above (lives in the dashboard repo)
-    └── 50-path-routed-providers.md        # Vertex AI + Bedrock (path-routed, keyfile:: creds, /bedrock prefix)
-```
-
-The `40-dashboard.md` module documents code that lives in the **dashboard
-repo**, not in this repo. The guide is co-located here so backend readers
-see the full picture in one place.
-
-## How the per-module guides are structured
-
-Every `modules/*.md` follows the same template so the docs are easy to
-scan:
-
- **Module boundary** — what this package owns; where it sits in the stack.
- **Files** — path / role.
- **Architecture & flow** — one or more mermaid diagrams.
- **Public contracts** — function signatures, gRPC messages, JSON shapes.
- **Invariants** — semantic guarantees the module relies on or enforces.
- **Things to scrutinize** — split by correctness / security /
-  concurrency / backward-compat / performance / observability.
- **Test coverage** — the test files that lock down behaviour in this
-  module.
- **Known limitations / non-goals** — what is intentionally out of scope.
- **Cross-references** — upstream/downstream module links + the
-  end-to-end flow + the overview.
-
-See [00-overview.md](00-overview.md) for the module map and the
-cross-cutting topics.
--- a/docs/agent-networks/modules/10-shared-api.md
+++ b/docs/agent-networks/modules/10-shared-api.md
@@ -1,105 +0,0 @@
-# shared/api — wire contracts (proto + OpenAPI)
-
-> **Risk level:** Medium — wire-format surface that every other module pins against; backward-compat hinges on field-number discipline more than on logic correctness.
-> **Backward-compat impact:** Additive only (new proto fields use unallocated numbers, new RPCs default to `Unimplemented`, new OpenAPI schemas/paths are append-only; no existing field/RPC/schema removed or renumbered).
-
-## Module boundary
-This module owns the cross-process contract surface between management, proxy, and dashboard. Two artefacts: `shared/management/proto/proxy_service.proto` (management↔proxy gRPC) and `shared/management/http/api/openapi.yml` (dashboard/CLI↔management REST). Both have generated companions checked in (`proxy_service.pb.go`, `proxy_service_grpc.pb.go`, `types.gen.go`) which must travel in lockstep with their sources. `shared/management/status/error.go` is in scope only for the four new typed `NotFound` constructors that the new HTTP handlers return.
-
-Everything downstream — `management/agentnetwork`, `management/server/http/handlers/*`, `proxy/internal/*`, the dashboard SDK — consumes these types verbatim. The concern here is wire stability and codegen reproducibility, not behaviour: behaviour is covered in the management and proxy module guides.
-
-`management.proto` and `signalexchange.proto` are unchanged. `status/error.go` only receives four additive constructors (lines 208-227); no existing error types are reshaped.
-
-## Files
-| Path | Role |
-| ---- | ---- |
-| `shared/management/proto/proxy_service.proto` | Source of truth: 2 new RPCs, 1 new message group (`MiddlewareConfig` + slot enum), additive fields on `PathTargetOptions`, `AccessLog`, `RecordLLMUsageRequest` |
-| `shared/management/proto/proxy_service.pb.go` | Generated (protoc-gen-go) |
-| `shared/management/proto/proxy_service_grpc.pb.go` | Generated; adds `CheckLLMPolicyLimits` + `RecordLLMUsage` client/server stubs and `UnimplementedProxyServiceServer` defaults |
-| `shared/management/http/api/openapi.yml` | 15 new `AgentNetwork*` schemas, 9 new path groups under `/api/agent-network/*` |
-| `shared/management/http/api/types.gen.go` | Generated (oapi-codegen; see codegen note below) |
-| `shared/management/status/error.go` | Four `NotFound` constructors for the new resource kinds (lines 208-227) |
-
-## Architecture & flow
-```mermaid
-sequenceDiagram
-    participant Dash as Dashboard / CLI
-    participant Mgmt as management (HTTP+gRPC)
-    participant Px as proxy
-
-    Note over Dash,Mgmt: REST (OpenAPI / types.gen.go)
-    Dash->>Mgmt: PUT /api/agent-network/providers (AgentNetworkProviderRequest)
-    Dash->>Mgmt: PUT /api/agent-network/settings (AgentNetworkSettingsRequest)
-    Dash->>Mgmt: GET /api/agent-network/consumption -> [AgentNetworkConsumption]
-
-    Note over Mgmt,Px: gRPC ProxyService (proxy_service.proto)
-    Mgmt-->>Px: SyncMappingsResponse{ ProxyMapping.path[*].options.middlewares,<br/>agent_network, disable_access_log, capture_* }
-    Px->>Mgmt: CheckLLMPolicyLimits(account, user, groups, provider, model)
-    Mgmt-->>Px: decision=allow|deny + selected_policy_id + attribution_group_id + window_seconds
-    Px->>Mgmt: RecordLLMUsage(account, user, group_id, group_ids, window_seconds, tokens, cost)
-    Px->>Mgmt: SendAccessLog(AccessLog{ agent_network=true })
-```
-
-The proto changes split into three independent slices: (1) **mapping enrichment** — `PathTargetOptions` grows fields 8-13 so management can ship middleware configs, capture limits, and the agent-network / log-suppression flags down to the proxy without a second RPC; (2) **two new request/response RPCs** (`CheckLLMPolicyLimits`, `RecordLLMUsage`) for per-LLM-request budget arbitration; (3) **observability tag** — `AccessLog.agent_network` so management can route logs to the right surface.
-
-The OpenAPI side is a thin CRUD surface — every resource (`Provider`, `Policy`, `Guardrail`, `BudgetRule`, `Settings`) follows the same `GET-list / POST / GET / PUT / DELETE` pattern, plus a read-only `/consumption` listing and a catalog endpoint. The `*Request` variants drop server-controlled fields (id, timestamps). `AgentNetworkBudgetRule` deliberately reuses `AgentNetworkPolicyLimits` to keep wire-shape parity with policies.
-
-## Public contracts added
- gRPC RPCs (`proxy_service.proto:52-57`): `CheckLLMPolicyLimits(CheckLLMPolicyLimitsRequest) → CheckLLMPolicyLimitsResponse`, `RecordLLMUsage(RecordLLMUsageRequest) → RecordLLMUsageResponse`. Both unary; default `UnimplementedProxyServiceServer` returns `codes.Unimplemented` (`proxy_service_grpc.pb.go:283-289`).
- New messages (`proxy_service.proto:145-175,448-502`): `MiddlewareConfig`, `MiddlewareSlot` enum, `CheckLLMPolicyLimitsRequest`/`Response`, `RecordLLMUsageRequest`/`Response`.
- New `PathTargetOptions` fields 8-13 (`proxy_service.proto:124-140`): `capture_max_request_bytes`, `capture_max_response_bytes`, `capture_content_types`, `middlewares`, `agent_network`, `disable_access_log`. All default-false / zero; pre-existing fields 1-7 byte-for-byte unchanged.
- `AccessLog.agent_network = 18` (`proxy_service.proto:258-261`).
- `RecordLLMUsageRequest.group_ids = 8` (`proxy_service.proto:496-498`) — so the record path can fan out to every applicable budget rule's window without a re-lookup.
- 15 new OpenAPI component schemas (`openapi.yml:5072-5829`): `AgentNetworkProvider[Request|Model]`, `AgentNetworkCatalog{Model,Provider,IdentityInjection,HeaderPairInjection,JSONMetadataInjection,ExtraHeader}`, `AgentNetworkPolicy[Request|TokenLimit|BudgetLimit|Limits]`, `AgentNetworkGuardrail[Checks|Request]`, `AgentNetworkConsumption`, `AgentNetworkSettings[Request]`, `AgentNetworkBudgetRule[Request]`.
- 9 new path groups (`openapi.yml:12797-13460`): `/api/agent-network/{consumption,settings,budget-rules,budget-rules/{ruleId},catalog/providers,providers,providers/{providerId},policies,policies/{policyId},guardrails,guardrails/{guardrailId}}`.
- Four typed NotFound errors (`shared/management/status/error.go:208-227`).
-
-## Invariants
- **Field-number monotonicity.** Every new proto field uses a previously-unallocated number in its message: `PathTargetOptions` 8-13 (was 1-7), `AccessLog` 18 (was 1-17), `RecordLLMUsageRequest` 8. `SendStatusUpdateRequest.inbound_listener = 50` (pre-existing) reserves 50+ for observability extensions, so 8 on `RecordLLMUsageRequest` doesn't conflict.
- **Old proxies stay compatible.** Old management never sends `disable_access_log`/`middlewares`/`agent_network` (zero value → existing behaviour); old proxies that don't decode these fields just drop them silently (proto3 unknown-field semantics) — log emission stays on. No pre-existing field number changed: the proto change is insertions only.
- **Old management stays compatible.** The two new RPCs are registered on the same `management.ProxyService` descriptor; old proxies hitting them get `codes.Unimplemented` from the unimplemented embed (`proxy_service_grpc.pb.go:283-289`), which is the same fallback pattern `SyncMappings` already documents (`proxy_service.proto:20-21`).
- **OpenAPI shapes are append-only.** New schemas are placed at the end of `components.schemas` (line 5072+); new paths at the end of `paths` (line 12797+). No existing schema's `required` list, enum, or property type was changed.
- **`*Request` vs response asymmetry.** Read shapes (`AgentNetworkProvider`, `AgentNetworkPolicy`, `AgentNetworkGuardrail`, `AgentNetworkSettings`, `AgentNetworkBudgetRule`) require `created_at`/`updated_at`; the matching `*Request` shapes do not — server fills them. `AgentNetworkProviderRequest.api_key` is write-only (`openapi.yml:5158-5161` "never returned in responses"); reviewers should confirm the response schema (5072-5138) actually omits `api_key`.
-
-## Things to scrutinize
-### Correctness
- `RecordLLMUsageRequest` carries both `group_id` (singular, the attribution group — field 3) and `group_ids` (plural, full membership — field 8). `b22d5a181` adds field 8 to drive account-budget fan-out; double-check that consumers can't accidentally key counters on the wrong one. Field comments at `proxy_service.proto:489-491` and `496-498` distinguish them but it's the kind of subtle thing a follow-up commit might collapse.
- `PathTargetOptions.disable_access_log` is the only field whose default-false meaning **changes semantics** on the proxy side: false → log (status quo), true → suppress. Synthesizer sets `DisableAccessLog = !settings.EnableLogCollection`, so a missing/default settings row yields `EnableLogCollection=false → DisableAccessLog=true → suppressed`. Worth confirming downstream (`agentnetwork.synthesizer`) that operator-defined private services never inherit this flag — the proto field default protects them, but only if synth code is explicit.
- `CheckLLMPolicyLimitsResponse.decision` is a free-form `string` (`proxy_service.proto:471`) rather than an enum. Only documented values are "allow" / "deny". An enum would prevent typo drift; consider before this RPC ships to external consumers.
- `deny_code` (`proxy_service.proto:478-481`) is documented as "a stable label" but is also a free string. Pin the allowed set somewhere observable to the proxy.
-
-### Security
- `AgentNetworkProvider.api_key` MUST be write-only. Schema split (request has it at line 5158; response omits it) looks correct, but a regression here leaks the upstream provider credential to every dashboard reader. Check that the handler explicitly zeros it on the response path.
- `extra_values` / `identity_header_*` headers on `AgentNetworkProvider` get stamped onto upstream requests. Description at `openapi.yml:5099` says "values not declared by the catalog are ignored at synth time" — a contract this module documents but the synthesizer must enforce. Confirm the synth module honours it.
- Cluster + subdomain on `AgentNetworkSettings` are documented immutable (`openapi.yml:5686-5694`) and the `AgentNetworkSettingsRequest` (lines 5733-5752) doesn't accept them. Verify the `PUT /api/agent-network/settings` handler can't be tricked by extra JSON keys (oapi-codegen's `additionalProperties: false` is not declared here; spec defaults to permissive).
-
-### Backward compatibility
- The proto change is field-number additive: every previously numbered field keeps the same name + type, and the change is insertions only (no deletions in `proxy_service.proto`), so this holds at the source-text level.
- `proxy_service_grpc.pb.go` adds two RPC handlers and registers them in `ProxyService_ServiceDesc.Methods` (lines 543-552). The existing entries are unchanged and order-preserving — gRPC method dispatch is name-keyed, so order doesn't matter, but reviewing the diff (no method renamed/dropped) is still worth a glance.
- OpenAPI 3.0 doesn't have a built-in deprecation flow for paths; if any client tooling iterates `paths.*`, the additive routes shouldn't break it, but generated SDKs (especially the dashboard's) need a regen to gain access to `AgentNetwork*`.
-
-### Codegen pinning
- `generate.sh` (`shared/management/http/api/generate.sh:14`) installs `oapi-codegen@latest` rather than a pinned version. **This is a reproducibility gap** — re-running the script later may produce a different `types.gen.go`. Either pin the version in `generate.sh` (e.g. `@v2.7.0`) or document the pin in a `tools.go`.
- proto codegen has the protoc / protoc-gen-go version stamped in the generated file header (`proxy_service.pb.go:3-4`).
- Regenerate locally and confirm zero diff against the committed `types.gen.go` / `proxy_service.pb.go`.
-
-## Test coverage
-| Test file | Locks down |
-| --------- | ---------- |
-| None in this scope | The proto and OpenAPI sources are tested transitively by the handler tests (`shared/management/http/handlers/agentnetwork/...`) and by the synthesizer/manager tests (`management/server/agentnetwork/...`). No round-trip serialisation test exists in the `proto/` or `api/` packages themselves. |
-| `shared/management/proto/*_test.go` | (absent) |
-| `shared/management/http/api/*_test.go` | (absent) |
-
-Acceptable for codegen artefacts, but a single golden-file test that re-runs `oapi-codegen` and `protoc` in CI and diffs against the checked-in files would close the reproducibility gap noted above.
-
-## Known limitations / explicit non-goals
- **No deprecation surface.** Old fields/RPCs are kept silently; there is no `[deprecated = true]` annotation on anything. Acceptable here because nothing is being removed.
- **No proto-side validation.** Numeric ranges (e.g. `window_seconds >= 60`, `cost_usd >= 0`, capture-byte clamps) are enforced in the OpenAPI schema via `minimum:` and inside Go code by the proxy/management, but `proto3` itself can't express them; downstream is expected to validate every message.
- **`MiddlewareConfig.config_json` is `bytes`** (`proxy_service.proto:163`) — opaque to the proto layer. Schema validity is the middleware factory's problem. This is a deliberate tradeoff (per the comment at 161-162) but worth flagging: a corrupted/malicious config_json can only fail at proxy apply time, not at the wire-decode step.
- **No catalog endpoint schema for the catalog itself** — the catalog data ships as a `GET /api/agent-network/catalog/providers` returning `[AgentNetworkCatalogProvider]` (`openapi.yml:13024`), but the catalog source-of-truth lives in `management/server/agentnetwork/catalog`, not here.
- The reaper / GC design was cut from scope; no reaper-related types appear here.
-
-## Cross-references
- Downstream: [management/store](20-management-store.md), [management/agentnetwork](21-management-agentnetwork.md), [management/handlers + wiring](22-management-handlers-wiring.md), [proxy/runtime](33-proxy-runtime.md)
- End-to-end flow: [../01-end-to-end-flows.md](../01-end-to-end-flows.md)
- Top-level: [../00-overview.md](../00-overview.md)
--- a/docs/agent-networks/modules/20-management-store.md
+++ b/docs/agent-networks/modules/20-management-store.md
@@ -1,112 +0,0 @@
-# management/store — persistence for agent-network entities
-
-> **Risk level:** Medium — six brand-new tables behind AutoMigrate, one upsert-counter table that runs on the request hot path, and one column carrying an encrypted secret.
-> **Backward-compat impact:** Additive (six new tables created by AutoMigrate; the `Store` interface gains 23 methods, but no existing column/index is touched).
-
-## Module boundary
-
-This module is the persistence layer for the Agent Network feature. Everything the management server stores about LLM proxying — providers, policies, guardrails, the per-account settings row, a usage-counter table written on every proxied LLM request, and the account-budget rules — flows through the methods added to `store.Store`. The module owns six tables, six entity types from `management/server/agentnetwork/types`, and a single hot-path upsert (`IncrementAgentNetworkConsumption`) consumed by the proxy fleet.
-
-Out of scope here: the catalog of provider definitions (compiled-in, no DB), the synthesizer/manager built on top of these CRUDs (covered in [21-management-agentnetwork.md](21-management-agentnetwork.md)), and the HTTP handlers that translate API requests into Save/Delete calls.
-
-## Files
-
-| Path | Role |
-| ---- | ---- |
-| `management/server/store/sql_store_agentnetwork.go` | gorm implementations of all 23 store methods |
-| `management/server/store/sql_store_agentnetwork_budgetrule_test.go` | round-trip + account-scoping coverage against a real sqlite store |
-| `management/server/store/sql_store.go` | one import, six entities appended to the `AutoMigrate` slice (sql_store.go:40, sql_store.go:141-142) |
-| `management/server/store/store.go` | 23 methods added to the `Store` interface (store.go:328-354) |
-| `management/server/store/store_mock_agentnetwork.go` | mockgen output for the new interface surface |
-
-## Tables added / migrations
-
-All six tables are created by `db.AutoMigrate` invoked from `NewSqlStore` at sql_store.go:133-143. There is no hand-rolled SQL migration script — the schema is whatever GORM derives from the struct tags.
-
- `agent_network_providers` — `Provider.TableName()` at provider.go:76. PK `id`, index on `account_id`, named index `idx_agent_network_provider` on `provider_id`. Carries an at-rest-encrypted `api_key` and ed25519 `session_private_key` (provider.go:35,56). `extra_values` and `models` are JSON blobs (`serializer:json`).
- `agent_network_policies` — `Policy.TableName()` at policy.go:70. PK `id`, index on `account_id`. JSON columns: `source_groups`, `destination_provider_ids`, `guardrail_ids`, `limits`.
- `agent_network_guardrails` — `Guardrail.TableName()` at guardrail.go:41. PK `id`, index on `account_id`. JSON `checks`.
- `agent_network_settings` — `Settings.TableName()` at settings.go:33. PK `account_id` (one row per account), named index `idx_agent_network_settings_cluster_subdomain` on `subdomain` only — the index name implies a composite, but only one column is tagged.
- `agent_network_consumption` — `Consumption.TableName()` at consumption.go:46. Composite PK across `(account_id, dim_kind, dim_id, window_seconds, window_start_utc)` — the same tuple the upsert keys on.
- `agent_network_budget_rules` — `AccountBudgetRule.TableName()` at budgetrule.go:35. PK `id`, index on `account_id`. JSON `target_groups`, `target_users`, `limits`.
-
-## CRUD surface added
-
-Provider, Policy, Guardrail, BudgetRule follow the same pattern: `Get<Kind>ByID`, `GetAccount<Kind>` (list), `Save<Kind>` (upsert), `Delete<Kind>`, with account-scoping enforced by the existing `accountAndIDQueryCondition` / `accountIDCondition` constants (sql_store.go:59-62). Provider additionally exposes `GetAllAgentNetworkProviders` (cross-account, used by the synthesizer). Settings exposes `Get`/`GetByCluster`/`Save` (no delete — one row per account, created on first save). Consumption exposes the upsert `Increment`, a point `Get`, and a cross-window `List`.
-
-## Architecture & flow
-
-```mermaid
-flowchart LR
-    handlers["HTTP handlers<br/>(management/server/agentnetwork)"] -->|Save/Delete| iface["Store interface<br/>store.go:328-354"]
-    manager["agentnetwork.Manager"] -->|Get*| iface
-    synth["synthesizer<br/>(global)"] -->|GetAllAgentNetworkProviders| iface
-    proxy["proxy fleet<br/>(hot path)"] -->|IncrementAgentNetworkConsumption| iface
-    iface --> sql["SqlStore methods<br/>sql_store_agentnetwork.go"]
-    iface -.gomock.-> mock["MockStore<br/>store_mock_agentnetwork.go"]
-    sql --> gorm["gorm.DB"]
-    gorm --> tables[("6 tables<br/>agent_network_*")]
-    sql --> enc["crypt.FieldEncrypt<br/>(provider only)"]
-```
-
-Reads decrypt provider secrets in-place; writes do `provider.Copy().EncryptSensitiveData(...)` before `db.Save` so the caller's in-memory object keeps the plaintext `api_key` (sql_store_agentnetwork.go:88-102). Every list/get takes a `LockingStrength` and applies `clause.Locking{Strength: ...}` when non-`None` — matching the rest of the store. The upsert path uses `clause.OnConflict` with `gorm.Expr` server-side increments so concurrent proxy nodes converge without read-modify-write races (sql_store_agentnetwork.go:321-335).
-
-## Invariants enforced at the store layer
-
- **Account scoping.** Every entity-by-ID method keys on `account_id = ? and id = ?`; no cross-tenant leak path through the API is reachable as long as callers always pass the auth'd `accountID` (sql_store_agentnetwork.go:70,141,201,429).
- **NotFound mapping.** `gorm.ErrRecordNotFound` is translated to typed `status.NewAgentNetwork*NotFoundError`; `Delete*` returns NotFound when `RowsAffected == 0` (sql_store_agentnetwork.go:111-113,171-173,231-233,461-463).
- **Provider secret encryption at rest.** `SaveAgentNetworkProvider` always encrypts before persist; `Get*` always decrypts after read. The plaintext `api_key` never reaches the DB through this layer (sql_store_agentnetwork.go:31,54,80,90).
- **Consumption monotonicity.** The upsert only ever issues `col = col + ?` for the three counter columns — no decrement path exists (sql_store_agentnetwork.go:330-332).
- **Window alignment is the caller's responsibility.** The store stamps `WindowStartUTC` as-passed; alignment to epoch happens in `types.WindowStart` at consumption.go:51-58.
- **Settings has no Delete.** Intentional — one row per account, created on first save; the row sticks around for the account lifetime.
-
-## Things to scrutinize
-
-### Correctness
- `SaveAgentNetworkProvider` saves the copy (sql_store_agentnetwork.go:95). The caller's in-memory pointer therefore keeps plaintext `api_key` and any `CreatedAt`/`UpdatedAt` gorm autofills land on the copy, not the original. Callers that need synced timestamps must re-fetch.
- `IncrementAgentNetworkConsumption`'s `Create` provides initial counter values (`TokensInput: tokensIn`, etc.) in the row, and on conflict the assignments add the same deltas to the existing values. The insert-vs-update arithmetic is consistent. Cross-check that no engine in use (sqlite, postgres, mysql) silently rejects the `OnConflict` clause — GORM emits engine-specific SQL but `ON DUPLICATE KEY UPDATE` (mysql) vs `ON CONFLICT (...)` (sqlite/postgres) need their unique constraint to match the composite PK on `agent_network_consumption`; it does, by construction.
- `IncrementAgentNetworkConsumption` writes `updated_at: time.Now().UTC()` literally inside the assignments map (sql_store_agentnetwork.go:333) — fine, but it's a Go-side timestamp captured at call time, not a DB-side `now()`. Acceptable for an audit field.
- `GetAgentNetworkConsumption` returns a zero-valued non-nil row on `ErrRecordNotFound` (sql_store_agentnetwork.go:364-371). Document or rename — a typed sentinel error would be more orthodox; callers must know not to error-check.
-
-### Concurrency / transactions
- Hot-path `IncrementAgentNetworkConsumption` runs outside any explicit transaction; concurrency safety relies entirely on the DB serialising the `ON CONFLICT` upsert against the composite PK. This is correct for postgres and mysql; for sqlite it serialises behind the single writer.
- `SaveAgentNetworkSettings` is a blind upsert with no version/etag — concurrent writes from two operators last-write-wins on the collection-toggle flags (settings.go:23-25). Acceptable for admin-curated state but worth flagging.
- `Save*Provider` uses `db.Save` on a struct with a PK already set — GORM emits UPDATE or INSERT based on row existence. No upsert clause is attached, so a race between two creates with the same generated `xid` (vanishingly unlikely) would surface as a PK violation.
-
-### Migration safety
- All six tables ride `AutoMigrate` (sql_store.go:141-142). AutoMigrate is additive: new columns get added, but it never drops columns nor narrows types. Three `bool` columns on `agent_network_settings` (`EnableLogCollection`, `EnablePromptCollection`, `RedactPii`) default to false at the GORM/DDL layer for existing rows; the test at sql_store_agentnetwork_budgetrule_test.go:83-112 locks that down on a fresh sqlite. Verify postgres/mysql produce the same default.
- The named index `idx_agent_network_settings_cluster_subdomain` on settings.go:15 is declared on only `subdomain`. Either the cluster column also needs `gorm:"index:idx_agent_network_settings_cluster_subdomain"` to make it composite, or the name is misleading.
- The named index `idx_agent_network_provider` on `Provider.ProviderID` (provider.go:30) is *not* unique and not scoped to account — two providers in the same account with the same `provider_id` are permitted at the DB layer; uniqueness, if any, must live above the store.
-
-### Backward compatibility
- Net additive. No removed methods, no renamed columns, no schema change to existing tables. Existing deployments running a prior binary continue to work; the first boot of the new binary creates the six tables.
- The `Store` interface grows by 23 methods (store.go:330-354); any non-mock external implementer of `store.Store` will fail to compile. The repo only has `SqlStore` + `MockStore`, both updated.
-
-### Performance (indexes, N+1)
- All by-account list queries hit the `idx_account_id` per-table index. No N+1: list methods return the full slice in one query.
- `GetAgentNetworkSettingsByCluster` (sql_store_agentnetwork.go:263-277) does a tablescan on `cluster` — no index. Tolerable for the bootstrap label generator (one-shot at provisioning) but worth noting if the call moves onto a hot path.
- `ListAgentNetworkConsumption` returns every row ever recorded for the account (sql_store_agentnetwork.go:382-400) — unbounded growth, no `LIMIT`, no time filter. With one row per (dim, window) per request burst, this table grows fastest of the six; a retention job + a paginated list method are obvious follow-ups.
-
-## Test coverage
-
-| Test file | Locks down |
-| --------- | ---------- |
-| `sql_store_agentnetwork_budgetrule_test.go::TestAgentNetworkBudgetRule_RealStore_RoundTrip` | full save → reload of `AccountBudgetRule` including the JSON-serialised `PolicyLimits`, target slices, double-delete returns NotFound (lines 18-59) |
-| `sql_store_agentnetwork_budgetrule_test.go::TestAgentNetworkBudgetRule_RealStore_ScopedByAccount` | cross-account isolation for budget rules (lines 63-78) |
-| `sql_store_agentnetwork_budgetrule_test.go::TestAgentNetworkSettings_RealStore_CollectionTogglesRoundTrip` | collection toggles default off, survive save/reload at the set values (lines 83-112) |
-
-Gap: there is no store-level test for providers (encryption round-trip), policies, guardrails, or `IncrementAgentNetworkConsumption` (concurrent upsert, window-key uniqueness). The consumption upsert is the most performance-sensitive method in this module and the only one without a real-sqlite test.
-
-## Known limitations / explicit non-goals
-
- No retention / GC for `agent_network_consumption`.
- No `Delete` for `Settings` (one row per account, cleared with the account).
- No DB-engine-specific tuning — the same struct tags drive sqlite, mysql, postgres.
- Provider `extra_values` and `models` are JSON blobs; querying inside them is not supported by design.
- `GetAgentNetworkConsumption` "not-found = zero row" contract is convenient but unconventional.
-
-## Cross-references
-
- Upstream: [shared/api](10-shared-api.md), [management/agentnetwork](21-management-agentnetwork.md)
- End-to-end flow: [../01-end-to-end-flows.md](../01-end-to-end-flows.md)
- Top-level: [../00-overview.md](../00-overview.md)
--- a/docs/agent-networks/modules/21-management-agentnetwork.md
+++ b/docs/agent-networks/modules/21-management-agentnetwork.md
@@ -1,225 +0,0 @@
-# management/agentnetwork — domain layer + synth pipeline
-
-> **Risk level:** High — central business logic + budget enforcement + the source of every middleware-chain change the proxy executes.
-> **Backward-compat impact:** Additive within the agent-network surface; one **behavioural difference for opted-out accounts** in parser capture (the capture flag is stamped explicitly false instead of being absent — see capture-pointer semantics below). Non-agent-network proxy services are untouched (the synth chain only ships on `agent-net-svc-*` targets).
-
-## Module boundary
-
-`management/server/agentnetwork` owns every agent-network entity (providers, policies, guardrails, account budget rules, per-account settings, consumption rows) and **translates them into the in-memory `*rpservice.Service` that the reverse-proxy controller turns into `proto.ProxyMapping`s and pushes to clusters**. It is the *only* writer of the agent-network middleware chain.
-
-Inside the package: `manager.go` is the CRUD + permissions-gated facade; `synthesizer.go` walks settings + providers + policies + guardrails and emits the per-account service plus every middleware's JSON config; `policyselect.go` runs per-request attribution (min-wins account ceiling, then "drain bigger pool first"); `reconcile.go` diffs successive synth outputs and emits precise Create/Update/Delete proxy-mapping updates plus a peer-map refresh. `labelgen/` mints DNS-safe subdomain labels; `catalog/` is the static provider catalogue; `types/` carries gorm entity structs. The `_realstack_test.go` files in the parent `management/server/` directory exercise the manager + network-map controller end-to-end with no mocks.
-
-## Files
-
-| Path | Role |
-| ---- | ---- |
-| `agentnetwork/manager.go` | Manager interface + CRUD + permission gates + bootstrap-settings + reconcile trigger |
-| `agentnetwork/synthesizer.go` | Settings/policy → wire-format synthesis; sole writer of the proxy middleware chain |
-| `agentnetwork/policyselect.go` | Per-request policy attribution + account-budget ceiling (min-wins) |
-| `agentnetwork/reconcile.go` | Per-account synth diff vs in-memory cache → Create/Update/Delete |
-| `agentnetwork/catalog/catalog.go` | Static provider catalogue (auth headers, identity-injection shapes) |
-| `agentnetwork/labelgen/{labelgen,words}.go` | DNS-safe subdomain picker + curated wordlist |
-| `agentnetwork/types/provider.go` | Provider entity + APIKey + Models + ExtraValues + SessionKeys |
-| `agentnetwork/types/policy.go` | Policy entity + `PolicyLimits` (token + budget) |
-| `agentnetwork/types/guardrail.go` | Guardrail entity (`ModelAllowlist`, `PromptCapture`) |
-| `agentnetwork/types/budgetrule.go` | `AccountBudgetRule` (reuses `PolicyLimits`) |
-| `agentnetwork/types/settings.go` | Per-account `Settings` (Cluster, Subdomain, 3 toggles) |
-| `agentnetwork/types/consumption.go` | `Consumption` row + `WindowStart` aligner |
-| `agentnetwork/{synthesizer,policyselect,reconcile,wire_shape}_*test.go` | See test coverage table |
-| `agentnetwork/types/consumption_test.go` | `WindowStart` alignment proofs |
-| `agentnetwork/labelgen/labelgen_test.go` | Deterministic picks + exhaustion + fallback |
-| `management/server/agentnetwork_realstack_test.go` | No-mock provider CRUD → network-map fan-out |
-| `management/server/agentnetwork_budgetrule_realstack_test.go` | No-mock budget-rule CRUD + settings preserve-immutable |
-
-## Architecture & flow
-
-### Synthesis (settings/policy → wire format)
-
-```mermaid
-flowchart TD
-    A[Mutation: provider/policy/guardrail/settings] --> B[managerImpl.reconcile accountID]
-    B --> C{proxyController nil?}
-    C -- yes --> D[accountManager.UpdateAccountPeers only]
-    C -- no --> E[SynthesizeServices]
-    E --> F[loadSettings — NotFound returns ok=false, no synth]
-    F --> G[filterEnabledProviders sorted by CreatedAt]
-    G --> H[filterEnabledPolicies]
-    H --> I[backfillProviderSessionKeys if missing]
-    I --> J[indexProviderGroups: providerID -> sorted source groups]
-    J --> K[buildRouterConfigJSON drops orphan providers]
-    J --> L[buildIdentityInjectConfigJSON per catalog entry]
-    H --> M[mergeGuardrails: union allowlist, OR redact]
-    M --> N[applyAccountCollectionControls account toggle = SOLE capture control]
-    N --> O[marshalGuardrailConfig]
-    K --> P[buildMiddlewareChain 8 middleware entries]
-    L --> P
-    O --> P
-    P --> Q[buildAccountService: AccessGroups=union source groups, noop.invalid target]
-    Q --> R[reconcile.diffMappings vs cache]
-    R --> S[SendServiceUpdateToCluster CREATE/MODIFY/REMOVE]
-    R --> T[accountManager.UpdateAccountPeers — fans synth ACLs into network map]
-```
-
-### Budget rule resolution (min-wins, group+user bound)
-
-```mermaid
-flowchart TD
-    A[SelectPolicyForRequest in] --> B[checkAccountBudget — runs FIRST, independent of policies]
-    B --> C[GetAccountAgentNetworkBudgetRules]
-    C --> D{for each enabled rule}
-    D --> E{budgetRuleApplies?}
-    E -- no --> D
-    E -- yes --> F[attrGroup = lowestIntersect TargetGroups, in.GroupIDs]
-    F --> G{Token cap enabled?}
-    G -- yes --> H[evalTokenCap user dim + group dim]
-    H --> I{exhausted?}
-    I -- yes --> J[DENY: llm_account.token_cap_exceeded - STOP]
-    I -- no --> K{Budget cap enabled?}
-    G -- no --> K
-    K -- yes --> L[evalBudgetCap user dim + group dim]
-    L --> M{exhausted?}
-    M -- yes --> N[DENY: llm_account.budget_cap_exceeded - STOP]
-    M -- no --> D
-    K -- no --> D
-    D --> O[All rules passed -> fall through to per-policy selection]
-```
-
-Key invariant: **rules are checked sequentially and ANY exhausted rule denies (all-must-pass / min-wins).** Untargeted rules (`len(TargetGroups)==0 && len(TargetUsers)==0`) apply to every caller (`policyselect.go:393`).
-
-### Policy selection (per-peer, per-request)
-
-```mermaid
-flowchart TD
-    A[Account-budget gate passed] --> B[GetAccountAgentNetworkPolicies]
-    B --> C[filterApplicablePolicies enabled + provider match + group intersect]
-    C --> D{candidates empty?}
-    D -- yes --> E[Allow, empty SelectedPolicyID]
-    D -- no --> F[scoreCandidates -> scoreOne per policy]
-    F --> G[scoreOne: attrGroup + window]
-    G --> H{any cap exhausted?}
-    H -- yes --> I[Drop policy; record last deny code]
-    H -- no --> K[Keep as live candidate]
-    F --> L{live candidates exist?}
-    L -- no --> M[Deny with last exhaustion code]
-    L -- yes --> N[Sort: uncapped wins -> larger group token -> group budget -> user token -> user budget -> oldest CreatedAt]
-    N --> O[winner = scored 0]
-    O --> P[Allow + SelectedPolicyID + AttributionGroupID + WindowSeconds]
-```
-
-End-to-end: a mutation calls `managerImpl.reconcile(ctx, accountID)` (`manager.go:205,239,...`). Reconcile defers an `accountManager.UpdateAccountPeers` so the network-map controller re-runs and `injectAllProxyPolicies` picks up the new access groups; with a `proxyController` wired, it re-synthesizes the service, diffs against `reconcileCache[accountID]` (guarded by `reconcileMu`), and emits proto mappings to the cluster derived from the mapping's domain (`reconcile.go:120`). Synthesis is stateless and idempotent. Sole persistent side effect: `backfillProviderSessionKeys` (`synthesizer.go:249`) mints ed25519 keys on legacy provider rows and writes them back.
-
-At request time the path is independent: the proxy calls `SelectPolicyForRequest` (`policyselect.go:56`); account-budget ceiling first, then per-policy scoring. Token + budget caps share `evalTokenCap` / `evalBudgetCap` — same primitive for account rules and policy limits, `label` differentiates the deny reason. After a served request, `RecordAccountBudgetUsage` (`policyselect.go:415`) fans deltas to every applicable rule's distinct `(dim_kind, dim_id, window)` tuple, deduplicating to prevent double-count when two rules share target+window.
-
-## Public contracts
-
- **Manager interface** (`manager.go:48-80`): CRUD for `Providers/Policies/Guardrails/BudgetRules`; `GetSettings/UpdateSettings` (cluster + subdomain immutable, only the three toggles mutate); `ListConsumption/RecordConsumption(account, kind, dimID, windowSec, in, out, USD)`; `RecordAccountBudgetUsage(account, user, groups, in, out, USD)`; `SelectPolicyForRequest(ctx, PolicySelectionInput) → *PolicySelectionResult{Allow, SelectedPolicyID, AttributionGroupID, WindowSeconds, DenyCode, DenyReason}`.
- **`PolicySelectionInput`** (`manager.go:85-90`): `{AccountID, UserID, GroupIDs, ProviderID}` — populated by the proxy from CapturedData + `llm_router` resolution.
- **Synthesized middleware chain** (`synthesizer.go:576-657`), order load-bearing — response slot runs reverse-of-slice:
-
-  | Slot | Idx | ID | ConfigJSON shape | CanMutate |
-  | --- | --- | --- | --- | --- |
-  | on_request | 0 | `llm_request_parser` | `{"capture_prompt": <bool>, "redact_pii"?: true}` | – |
-  | on_request | 1 | `llm_router` | `{"providers":[{id, models[], upstream_*, auth_header_*, allowed_group_ids[]}]}` | **true** |
-  | on_request | 2 | `llm_limit_check` | `{}` | – |
-  | on_request | 3 | `llm_identity_inject` | `{"providers":[{provider_id, header_pair?, json_metadata?, extra_headers?}]}` | **true** |
-  | on_request | 4 | `llm_guardrail` | `{"model_allowlist"?, "prompt_capture":{enabled,redact_pii}}` | – |
-  | on_response | 5 | `llm_limit_record` | `{}` (runs LAST at runtime) | – |
-  | on_response | 6 | `cost_meter` | `{}` | – |
-  | on_response | 7 | `llm_response_parser` | `{"capture_completion": <bool>, "redact_pii"?: true}` | – |
- **Synthesized service shape** (`synthesizer.go:739`): `Mode=HTTP`, `Private=true`, `Domain=<subdomain>.<cluster>`, `AccessGroups=unionSourceGroups(enabledPolicies)`, one `TargetTypeCluster` target with `Host=noop.invalid:443` (router rewrites per request), `Options.{DirectUpstream,AgentNetwork}=true`, `DisableAccessLog=!settings.EnableLogCollection`, `CaptureMax{Req,Resp}Bytes=1<<20`, `CaptureContentTypes=["application/json","text/event-stream"]`.
-
-## Invariants
-
- **Min-wins / all-must-pass for account budget rules** (`checkAccountBudget`, `policyselect.go:353`): every applicable enabled rule is checked; first exhausted cap denies. Untargeted rules bind every caller.
- **Account toggle is the SOLE control for capture enablement.** `applyAccountCollectionControls` (`synthesizer.go:701`) sets `merged.PromptCapture.Enabled = settings.EnablePromptCollection` *unconditionally*.
- **Capture-pointer semantics on parser configs** — see "Things to scrutinize" below.
- **`EnableLogCollection` ↔ `DisableAccessLog` is the only access-log toggle** (`synthesizer.go:770`). Default off ⇒ access log suppressed.
- **`RedactPii` flows verbatim to BOTH parsers** (`synthesizer.go:584-585`) and is OR'd into the merged guardrail (`synthesizer.go:706`).
- **Cluster and Subdomain are immutable on Settings.** `UpdateSettings` reloads existing row and overlays only the three toggles (`manager.go:558-561`).
- **Orphan providers (no enabled policy authorises them) NEVER reach the router** (`synthesizer.go:351-357`); skipped from `identity_inject` for symmetry.
- **Provider creation refuses empty `api_key`** (`manager.go:175`); **deletion refuses while any policy still references it** (`manager.go:265-273`).
- **Session keypair stability across provider edits** (`manager.go:226-228`) — server-managed, copied through every `UpdateProvider`, never API-surfaced.
-
-## Things to scrutinize
-
-### Correctness
-
- **Capture-pointer semantics — `*bool` vs `bool`.** Three states, owned by separate sides:
-  - **Wire JSON this module emits:** `buildParserConfigJSON` (`synthesizer.go:678-693`) *always* stamps the capture field. Agent-network targets ship `"capture_prompt": false` or `"capture_prompt": true` — never absent. Same for `"capture_completion"`. The happy-path test pins `{"capture_prompt":false}` (`synthesizer_test.go:174`).
-  - **Proxy-side parser config (consumer):** parsers decode into `*bool`. Matrix:
-    - `nil` (field absent) → **legacy default = emit**. Preserved for non-agent-network callers and pre-existing tests (the backward-compat hook).
-    - `false` (field present, value false) → **suppress emission entirely**. The behaviour for opted-out agent-network accounts. Without this, `enable_log_collection=true` + `enable_prompt_collection=false` would leak raw user input AND raw model output to the access log.
-    - `true` → emit normally.
-  - **Why the synth always stamps a value:** an agent-network mapping omitting the field would hit legacy "always emit" and re-introduce the leak. The `json.Marshal` error fallback at `synthesizer.go:687` degrades to `{}` — comment-claimed unreachable, but if ever fired re-introduces the leak. Consider fail-closed (return literal `{"capture_prompt":false}`) instead.
- **`scoreCandidates` non-cumulative deny code.** Only the *last* exhausted policy's deny code survives (`policyselect.go:188-190`). Iteration order is store's natural order. Auth signal is `len(scored)==0`, so this is informational only — verify no UI depends on "first exhausted policy" semantics.
- **`effectiveWindowSeconds` token-wins tiebreak.** When both halves are enabled with different windows, token's window wins (`policyselect.go:482`). Verify `RecordLLMUsage` increments against the winning window only.
- **`RecordAccountBudgetUsage` dedup.** Two rules with the same `(kind, dim_id, window)` would double-count without the `tuples` map (`policyselect.go:434-449`). Key includes all three dimensions — correct.
- **Fail-closed on bad provider:** unknown catalog id (`synthesizer.go:794-796`) or empty API key (`synthesizer.go:801-803`) drops the **entire** account's synth, not just the bad provider. Confirm matches operator UX.
-
-### Security
-
- **Redact OR-merge:** merged `RedactPii` = account OR guardrail (`synthesizer.go:706`). **Parser-side flag is `settings.RedactPii` only, NOT the OR** — a guardrail-only opt-in does not propagate to parsers. Correct because the account toggle gates capture, but worth noting on the proxy side.
- **Group resolution must not leak across accounts.** Every store call carries `accountID` (`policyselect.go:73, 286, 298, 322, 334, 354`); `lowestIntersect` uses caller's claimed groups only (`policyselect.go:494`). Risk surface is upstream (handler populates `in.GroupIDs`).
- **`UpdateSettings` preserves immutable Cluster + Subdomain** (`manager.go:558`). A client can't rebind the cluster.
- **Provider session keypair backfill writes through `SaveAgentNetworkProvider`** (`synthesizer.go:256`) from a read-shaped call. Idempotent → worst case is a wasted write under concurrent reconcile + snapshot.
-
-### Concurrency
-
- **`reconcileMu`** guards `reconcileCache`. Lock window is narrow — compute diff inside, send outside (`reconcile.go:56-68`).
- **`labelRngMu`** guards `labelRng` because `math/rand.Source` is unsafe for concurrent use (`manager.go:638-640`).
- **Real-store tests** use `store.NewTestStoreFromSQL` with `t.TempDir()` per test — no shared state, no `t.Parallel()`.
- **`RecordAccountBudgetUsage` dedup `tuples` map is per-call;** concurrent calls fan out fully — correct (each request's tokens book once per applicable rule).
- **Deferred `UpdateAccountPeers` runs inline after the proxy push** (`reconcile.go:28-35`); a slow call stretches CRUD response time.
-
-### Backward compatibility
-
- **Capture-pointer semantics (restated):** non-agent-network callers see no field → legacy nil-default emit, identical to pre-PR. Agent-network targets always carry an explicit `capture_*` value.
- **`TestSynthesizeServices_HappyPath` was updated:** request-parser config moved from `{}` to `{"capture_prompt":false}` (`synthesizer_test.go:174`). External snapshot tests against synth output need updating.
- **`MergedGuardrails` retains zeroed `TokenLimits`/`Budget`/`Retention`** even though `Policy.Limits` carries the real values now; `llm_limit_check` is the authoritative enforcement. Comment at `synthesizer.go:940-948` calls this out.
-
-### Performance
-
- **`SynthesizeServices` runs on every controller tick / mutation reconcile.** Cost: 4 store reads + optional per-provider keypair backfill. Sort + index + merge are O(N log N) / O(P × G); dominant cost is JSON marshalling. No nested loops escape these dimensions.
- **`reconcile.diffMappings` is O(N + M)** with N=M=1 per account today — effectively constant.
- **`SynthesizeServicesForCluster`** (`synthesizer.go:71`) walks every account on a cluster; per-account failures are **swallowed** (`synthesizer.go:91-93`) so a single misconfigured account doesn't drop the cluster. Runs per proxy reconnect.
-
-### Observability
-
- **Activity codes:** `AgentNetwork{Provider,Policy,Guardrail,BudgetRule}{Created,Updated,Deleted}`; `AgentNetworkSettingsUpdated` with `log_collection/prompt_collection/redact_pii` payload (`manager.go:567-571`). **No activity code for `SelectPolicyForRequest` denies** — surfaced via proxy access log only (likely intentional given volume).
- **Deny codes** namespaced: `llm_policy.{token,budget}_cap_exceeded`, `llm_account.{token,budget}_cap_exceeded` (`policyselect.go:18-26`).
- **Reconcile failures are logged at warn and swallowed** (`reconcile.go:42-44`). Persistent synth failures (e.g. unknown catalog id) silently keep the proxy out of sync — consider a manager-level synth-health surface if this becomes a support burden.
-
-## Test coverage
-
-| Test file | Locks down |
-| --------- | ---------- |
-| `synthesizer_test.go` | Mock-store: `HappyPath` (8-mw chain ordering, `{"capture_prompt":false}` baseline); `No{Settings,Providers}`; `Disabled{Provider,Policy}_NoService`; `RouterConfigOrdering`; `PolicyCheckConfig_UnionsSourceGroups`; `OrphanProvider_HasEmptyAllowedGroups`; identity-inject for LiteLLM / Bifrost (overrides + partial disable) / Cloudflare / Portkey / Vercel / OpenRouter / generic non-customizable; `GuardrailMerge_AllowlistUnion_LimitsRestrictive`; `BackfillsMissingSessionKeys`; `HTTPUpstream_KeepsExplicitPort`; `UpstreamURLPath_FlowsToRouter`; `UnknownProviderID_FailsClosed`; `EmptyAPIKey_FailsClosed`. |
-| `synthesizer_realstore_test.go` | Real-sqlite: `SurvivesStatusToggle` reproduces the disable/re-enable 403 regression; `Reconcile_RealStore_PushesPrivateAfterStatusToggle` extends through reconcile push. |
-| `synthesizer_guardrail_realstore_test.go` | `PromptCaptureAccountIsSoleControl`; `PromptCaptureFlowsWhenAccountOptsIn`; `AccountRedactWithoutGuardrailRedact`; `NoGuardrail_CaptureOff`. |
-| `synthesizer_log_collection_realstore_test.go` | `LogCollection{Off_SuppressesAccessLog,On_PermitsAccessLog}` — verifies `DisableAccessLog` propagation through `ToProtoMapping`. |
-| `synthesizer_parser_redact_realstore_test.go` | **Capture-pointer regression suite:** `ParserConfigsCarryRedactPii`; `ParserConfigsSuppressCaptureWhenLogCollectionOnly` (log=on/prompt=off ⇒ both capture flags false); `ParserConfigsOmitRedactPiiWhenOff`. |
-| `policyselect_test.go` | Mock-store: `NoApplicablePolicies`; `AllowWithLowestGroupAttribution`; `LargerPoolWinsAcrossUsageLevels`; `StaysOnLargerPoolAfterPartialDrain`; `FallsThroughToSmallerPoolWhenLargerExhausted`; `TiebreakBy{LargerGroupPool,CreatedAt}`; `DeniesWhenAllExhausted`; `UncappedPolicyAlwaysWinsAgainstCapped`; `DisabledPolicyIgnored`; `StoreErrorPropagates`; `RejectsEmptyAccount`; `SharesGroupCounterAcrossPolicies`; `AntiFallThroughOnLowestGroup`; `BudgetOnlyExhaustionDenies`; `BudgetTighterThanTokenWins`. |
-| `policyselect_realstore_test.go` | Real-sqlite regression guard: `NoApplicablePolicies`; `AllowAndLowestGroupAttribution`; `LargerPoolWins_FallsThroughWhenExhausted`; `BudgetCapDenies`; `GroupCounterSharedAcrossPolicies`; `DisabledPolicyIgnored`. |
-| `policyselect_account_realstore_test.go` | Account budget rules: `AccountCeilingBindsEvenWithUncappedPolicy` (min-wins); `AccountGroupCeiling`; `AccountTargetUsersBindsOnlyThatUser`; `AccountRuleRecordsToOwnWindow`. |
-| `reconcile_test.go` | `FirstSynth_EmitsCreate`; `NoChange_EmitsNothingExtra` (re-push as Modified — verify desired); `PolicyRemoved_EmitsDelete`; `NilProxyController_NoOp`; `EmptyAccountID_NoOp`; `ClusterFromMapping`. |
-| `wire_shape_test.go` | `TestSynthesizedService_WireShape` — proto-shape lockdown via `ToProtoMapping`. Catches "service not matching" (mapping reaches proxy but no SNI/HTTP route). Asserts ID, Domain, Mode, AuthToken, `Private`, `Auth.Oidc=false`, one path `/` + `https://noop.invalid/`, 8 middlewares with correct slot enums, router config `auth_header_value="Bearer sk-test-key"`. |
-| `labelgen/labelgen_test.go` | `PickUnique_{DeterministicWithSeededRng,AvoidsTakenWordsWhenMostAreReserved,FallsBackWhenAllReserved}`; `UniqueWords_DropsDuplicates`. |
-| `types/consumption_test.go` | `WindowStart_{AlignedToUnixEpoch,WithinWindowConverges,AcrossWindowsDiverges,DifferentWindowsHaveDifferentBuckets,SubMinuteAndMinuteAlignment,ZeroWindowReturnsInputUTC}`. Bucket alignment so multi-node reads converge. |
-| `agentnetwork_realstack_test.go` | `ProviderCRUD_FansOutToProxyAndClientPeers` — no-mock end-to-end through real account manager + network-map + agentnetwork: provider create propagates the updated map to both proxy peer and client peer with the synth DNS surface. |
-| `agentnetwork_budgetrule_realstack_test.go` | `BudgetRuleCRUD_RealManager`; `UpdateSettings_PreservesImmutableAndTogglesCollection`. |
-
-## Known limitations / explicit non-goals
-
- **`MergedGuardrails.TokenLimits/Budget/Retention` emit at zero** (`synthesizer.go:940-948`); real enforcement is `Policy.Limits` via `llm_limit_check`. Future cleanup implied.
- **Session keys picked from first enabled provider by created_at** (`pickServiceSessionKeys`, `synthesizer.go:270`). Existing session cookies survive provider edits only while the first-by-CreatedAt provider stays in place. Document for operators.
- **Reconcile failures silently swallowed** (`reconcile.go:42-44`). Persistent failures keep the proxy out of sync until the next reconcile.
- **`scoreCandidates` exposes only the LAST exhaustion's deny code** when multiple policies are exhausted.
- **`bootstrapSettingsIfNeeded` failure is non-fatal to provider create** (`manager.go:200`): provider lands, synth is no-op until the next provider create retries the bootstrap.
- **Budget rules do not trigger a reconcile** (`manager.go:476-477`). Request-time evaluation only; new rules take effect on the next request without a proxy push.
-
-## Cross-references
-
- **Upstream:** [shared/api](10-shared-api.md), [management/store](20-management-store.md), reverseproxy `service`/`proxy`/`sessionkey` packages, `management/server/permissions` + `activity`.
- **Downstream:** [management/handlers (HTTP wiring)](22-management-handlers-wiring.md), [proxy/middleware-builtin](31-proxy-middleware-builtin.md), network-map controller (`injectAllProxyPolicies` fan-out).
- **End-to-end flow:** [../01-end-to-end-flows.md](../01-end-to-end-flows.md) — "Provider create → reconcile → proxy push → peer map refresh" and "request → policy select → record" diagrams.
- **Top-level:** [../00-overview.md](../00-overview.md)
--- a/docs/agent-networks/modules/22-management-handlers-wiring.md
+++ b/docs/agent-networks/modules/22-management-handlers-wiring.md
@@ -1,203 +0,0 @@
-# management/handlers + wiring — HTTP API + gRPC delivery
-
-> **Risk level:** Medium — the surface is mostly additive, but two changes are load-bearing: `injectAllProxyPolicies` runs on every per-peer compute, and `shallowCloneMapping` must round-trip `Private` (a missed field silently breaks every MODIFIED).
-> **Backward-compat impact:** Additive on the wire (new routes, new RPCs, new proto fields, new gorm column on `AccessLogEntry`). One management-internal break: `nbhttp.NewAPIHandler` gains a trailing `agentNetworkManager` parameter; `nil` is tolerated and silently skips route registration.
-
-## Module boundary
-
-This module is the seam between the public Agent Network HTTP API and the proxy fleet that serves agent traffic. North side: a `/api/agent-network/*` surface (providers, policies, guardrails, budget rules, settings, consumption) on the existing gorilla router, delegating to `agentnetwork.Manager`. Handlers are thin — they translate `api.*` ↔ `types.*`, validate shape, forward. RBAC and event emission stay inside the manager (`manager.go:680-682`).
-
-South side: `ProxyServiceServer` (`proxy.go`) learns to (a) ship synth services to a proxy on initial snapshot, (b) resolve agent-network domains in `getServiceByDomain` for OIDC/session/tunnel-peer flows, (c) gate LLM requests via `CheckLLMPolicyLimits` + `RecordLLMUsage`, (d) preserve `Private` through `shallowCloneMapping` so per-proxy live updates don't silently flip services public. The network_map controller prepends synth services to `account.Services` on every per-peer compute; `accesslogentry.go` gains an indexed `AgentNetwork` column so the dashboard can filter cheaply.
-
-## Files
-
-| Path | Role |
-| ---- | ---- |
-| `handlers/agentnetwork/providers_handler.go` | Catalog + provider CRUD + central `AddEndpoints` |
-| `handlers/agentnetwork/policies_handler.go` | Policy CRUD + shared `validatePolicy*` |
-| `handlers/agentnetwork/guardrails_handler.go` | Guardrail CRUD |
-| `handlers/agentnetwork/budget_handler.go` | Account-level budget rule CRUD |
-| `handlers/agentnetwork/settings_handler.go` | GET (200+`null` if unbootstrapped) + PUT toggles |
-| `handlers/agentnetwork/consumption_handler.go` | Read-only consumption rows |
-| `handlers/agentnetwork/handlers_test.go` | Real-store fixture; wire round-trip + validation |
-| `handlers/agentnetwork/budget_handler_test.go` | Budget-rule + settings toggles |
-| `server/http/handler.go` | New `agentNetworkManager` arg; conditional `AddEndpoints` |
-| `server/permissions/modules/module.go` | New `AgentNetwork` module key |
-| `internals/server/boot.go` | Wires synthesiser adapter + limits service into proxy server |
-| `internals/server/modules.go` | `AgentNetworkManager()` lazy-create node |
-| `internals/controllers/network_map/controller/controller.go` | `injectAllProxyPolicies` replaces 4 `InjectProxyPolicies` calls |
-| `internals/controllers/network_map/controller/repository.go` | `SynthesizeAgentNetworkServices` repo method |
-| `internals/modules/reverseproxy/service/service.go` | `MiddlewareConfig`, capture limits, `AgentNetwork`, `DisableAccessLog` + proto |
-| `internals/modules/reverseproxy/accesslogs/accesslogentry.go` | Indexed `AgentNetwork bool` from proto |
-| `internals/shared/grpc/proxy.go` | Synth wiring, 2 RPCs, domain fallback, `Private` in clone |
-| `internals/shared/grpc/proxy_clone_test.go` | Locks every `ProxyMapping` field minus `AuthToken` |
-| `server/activity/codes.go` | 13 new activity codes (125-137) |
-
-## HTTP routes added
-
-All routes inherit the platform's auth middleware. Perms enforced inside `agentnetwork.Manager.requirePermission` (`manager.go:680-682`) on `modules.AgentNetwork`. Permission column shows the `op` passed to `requirePermission` — read = `Read`, etc.
-
-| Method | Path | Perm | Handler |
-| ------ | ---- | ---- | ------- |
-| GET    | `/agent-network/catalog/providers` | authn only | `providers_handler.go:43` |
-| GET    | `/agent-network/providers` | read | `providers_handler.go:57` |
-| POST   | `/agent-network/providers` | create | `providers_handler.go:97` |
-| GET    | `/agent-network/providers/{providerId}` | read | `providers_handler.go:77` |
-| PUT    | `/agent-network/providers/{providerId}` | update | `providers_handler.go:132` |
-| DELETE | `/agent-network/providers/{providerId}` | delete | `providers_handler.go:172` |
-| GET    | `/agent-network/policies` | read | `policies_handler.go:32` |
-| POST   | `/agent-network/policies` | create | `policies_handler.go:72` |
-| GET    | `/agent-network/policies/{policyId}` | read | `policies_handler.go:52` |
-| PUT    | `/agent-network/policies/{policyId}` | update | `policies_handler.go:102` |
-| DELETE | `/agent-network/policies/{policyId}` | delete | `policies_handler.go:142` |
-| GET    | `/agent-network/guardrails` | read | `guardrails_handler.go:25` |
-| POST   | `/agent-network/guardrails` | create | `guardrails_handler.go:65` |
-| GET    | `/agent-network/guardrails/{guardrailId}` | read | `guardrails_handler.go:45` |
-| PUT    | `/agent-network/guardrails/{guardrailId}` | update | `guardrails_handler.go:95` |
-| DELETE | `/agent-network/guardrails/{guardrailId}` | delete | `guardrails_handler.go:135` |
-| GET    | `/agent-network/budget-rules` | read | `budget_handler.go:24` |
-| POST   | `/agent-network/budget-rules` | create | `budget_handler.go:64` |
-| GET    | `/agent-network/budget-rules/{ruleId}` | read | `budget_handler.go:44` |
-| PUT    | `/agent-network/budget-rules/{ruleId}` | update | `budget_handler.go:95` |
-| DELETE | `/agent-network/budget-rules/{ruleId}` | delete | `budget_handler.go:135` |
-| GET    | `/agent-network/settings` | read | `settings_handler.go:53` (200+`null` if no row) |
-| PUT    | `/agent-network/settings` | update | `settings_handler.go:27` |
-| GET    | `/agent-network/consumption` | read | `consumption_handler.go:21` |
-
-## gRPC RPCs added (or modified)
-
-| RPC | Direction | Trigger |
-| --- | --------- | ------- |
-| `CheckLLMPolicyLimits` | proxy→mgmt unary | Pre-flight gate; returns allow/deny, selected policy, attribution group, window, deny code+reason (`proxy.go:259-301`). `Unimplemented` when limits service is nil. |
-| `RecordLLMUsage` | proxy→mgmt unary | Post-flight write of tokens+cost against policy-window dimensions + every applicable account budget rule (`proxy.go:303-349`). `window_seconds==0` ⇒ no policy cap, only account fan-out runs. |
-| `GetMappingUpdate`/`SendServiceUpdate` (stream) | mgmt→proxy | Snapshot (`proxy.go:752-780`) now appends `SynthesizeServicesForCluster`. Live updates use `SendServiceUpdateToCluster` + `shallowCloneMapping`. |
-
-## Architecture & flow
-
-### HTTP request lifecycle
-
-```mermaid
-sequenceDiagram
-    participant DB as Dashboard
-    participant R as gorilla.Router (/api)
-    participant H as handler (agentnetwork)
-    participant M as agentnetwork.Manager
-    participant S as store.Store
-    participant AM as accountManager (StoreEvent)
-
-    DB->>R: POST /api/agent-network/providers
-    R->>H: createProvider (auth mw sets UserAuth)
-    H->>H: GetUserAuthFromContext + validate(req)
-    H->>M: CreateProvider(userID, provider, bootstrapCluster)
-    M->>M: requirePermission(AgentNetwork, Create)
-    M->>S: SaveAgentNetworkProvider
-    M->>AM: StoreEvent(AgentNetworkProviderCreated)
-    M-->>H: created provider
-    H-->>DB: 200 + api.AgentNetworkProvider JSON
-```
-
-### Synth-service delivery via gRPC
-
-```mermaid
-sequenceDiagram
-    participant P as Proxy
-    participant G as ProxyServiceServer
-    participant SM as service.Manager (persisted)
-    participant SA as synthesizerAdapter
-    participant AN as SynthesizeServicesForCluster
-    participant ST as store.Store
-
-    Note over P,G: Initial snapshot
-    P->>G: GetMappingUpdate (stream open)
-    G->>SM: GetServicesForCluster(conn.address)
-    SM-->>G: persisted []*Service
-    G->>SA: SynthesizeServicesForCluster(conn.address)
-    SA->>AN: SynthesizeServicesForCluster(store, clusterAddr)
-    AN->>ST: walk every account; read providers/policies/settings
-    AN-->>SA: in-memory []*Service
-    SA-->>G: []*Service
-    G->>P: response (persisted + synth)
-
-    Note over G,P: Per-request live update
-    G->>G: SendServiceUpdateToCluster(update, clusterAddr)
-    G->>G: shallowCloneMapping(update)   %% Private MUST survive
-    G->>P: response with single mapping
-```
-
-End-to-end: HTTP write persists rows and emits an activity event; the manager then triggers `proxyController.SendServiceUpdate` so proxies re-render. **The snapshot path is the only one that calls into the synthesiser** — on stream open it pulls persisted services then appends synth services for the cluster. Synth services are never persisted. For OIDC/session/tunnel-peer flows, `getServiceByDomain` falls back to `SynthesizeServicesForCluster(clusterFromDomain(domain))` when persisted lookup misses (`proxy.go:1763-1793`). The network_map contribution is orthogonal: per-peer compute prepends the same synth services to `account.Services` before `InjectProxyPolicies`.
-
-## Permissions model added
-
- `permissions/modules/module.go:22` adds `AgentNetwork Module = "agent_network"`, registered in `All` (`module.go:42`). Standard `operations.{Read,Create,Update,Delete}` matrix.
- Handlers don't call `permissionsManager` directly — they extract `UserAuth` and delegate to `agentnetwork.Manager`, which gates every mutation through `requirePermission` (`manager.go:168, 308, 549`, etc.). Confirm your role-set provider has `agent_network` rows for owner/admin/user/billing-admin before merging.
- `getCatalogProviders` (`providers_handler.go:43`) intentionally skips RBAC — catalog is global static data.
-
-## Activity codes added
-
-`activity/codes.go:244-274` adds Activities 125-137 + string/code mappings (`codes.go:428-444`), following `<domain>.<resource>.<action>` (e.g., `agent_network.provider.create`). Audit-log exporters / SIEM forwarders need to know the new codes.
-
-## Invariants
-
- **Synth services are never persisted.** Snapshot appends after `serviceManager.GetServicesForCluster` (`proxy.go:761-770`); network_map prepends before `InjectProxyPolicies` (`controller.go:117-126`).
- **`shallowCloneMapping` must round-trip every `ProxyMapping` field except `AuthToken`** — `proxy_clone_test.go:50-58` enforces via `gproto.Equal`. The bug it guards: a missing `Private` made every MODIFIED arrive `private=false`, the proxy skipped `ValidateTunnelPeer`, `UserGroups` stayed empty, `llm_router` denied `no_authorised_provider`; a restart "fixed" it because the snapshot uses the original mapping.
- **Limit-window floor is 60s** (`policies_handler.go:189-220`); enabled cap with both per-group and per-user at zero is rejected. Budget rules reuse the same validator (`budget_handler.go:170`).
- **Manager is optional at boot.** `NewAPIHandler` registers routes only when non-nil (`handler.go:129`); `ProxyServiceServer` returns `Unimplemented` from both RPCs when limits service is unwired (`proxy.go:262-265, 306-309`).
- **Settings GET on an unbootstrapped account returns 200 + `null`** (`settings_handler.go:65-72`) — not 404.
-
-## Things to scrutinize
-
-### Correctness
- **`injectAllProxyPolicies` runs on every per-peer compute**: `controller.go:163, 309, 415, 681`. `sendUpdateAccountPeers` is the target of the buffered fan-out — synth runs once per debounced account-update tick **and** once per direct `UpdateAccountPeer`. Cost is O(providers + policies × users-per-group) per account under `LockingStrengthNone`. No per-account synth cache — verify it fits the buffer interval for your largest tenant.
- **`clusterFromDomain` strips at the first `.`** (`proxy.go:1784-1792`). A zero-dot domain returns `""` and the synth call walks every account. Confirm no path reaches this with a malformed/internal domain.
- **Account-budget `RecordConsumption` fans out even when `window_seconds == 0`** (`proxy.go:341-348`) — intentional. Verify the proxy never sends `RecordLLMUsage` for a request that wasn't actually allowed.
-
-### Security
- Every handler extracts `UserAuth` via `nbcontext.GetUserAuthFromContext` before any work. Routes live behind the standard `/api` mux; bypass list is not extended.
- `CheckLLMPolicyLimits` / `RecordLLMUsage` ride the existing **proxy → mgmt** gRPC connection auth. No additional token check inside the RPCs — they trust the connection. Confirm the proxy-side token-verification interceptor in this package gates both.
- `RecordLLMUsage` only validates `account_id != ""` (`proxy.go:317-319`). A compromised proxy can attribute cost to any account in its cluster — was already true for prior RPCs but is louder now that data drives denials.
-
-### Concurrency
- `SetAgentNetworkSynthesizer` / `SetAgentNetworkLimitsService` write under `s.mu.Lock`; read paths copy the interface under read lock (`proxy.go:236-247, 260-263, 304-307`). Same pattern as existing `serviceManager`/`proxyController` setters.
- Manager writes use `LockingStrengthUpdate`; synth reads use `LockingStrengthNone` — read-after-write via the proxy snapshot can observe a stale view by up to one fan-out tick.
- Network_map controller is single-threaded per account; cross-account is parallel.
-
-### Backward compatibility
- `proxy_clone_test.go` is the regression net; any new `ProxyMapping` field must be cloned or explicitly nulled in the test.
- `AccessLogEntry` adds indexed `AgentNetwork bool` — implicit AutoMigrate; deploy story must handle table-rewrite cost on high-volume access-log tables.
- `TargetOptions` gains seven `omitempty` JSON fields (`service.go:69-94`); on-wire shape stays compatible. `targetOptionsToProto` tests all fields when deciding nil (`service.go:551-556`).
- `NewAPIHandler` signature changes — every caller must pass `agentNetworkManager`; `nil` is supported.
-
-### Observability
- 13 new activity codes via `accountManager.StoreEvent` in the manager — confirm dashboard's audit-log UI maps them.
- `AccessLogEntry.AgentNetwork` is indexed for the dashboard's agent-network log filter.
- New RPCs log at error level on store/selector failures (`proxy.go:284, 327, 332, 348`). Snapshot synth failures degrade to warnings — stream is not aborted (`proxy.go:765`).
-
-## Test coverage
-
-| Test | Locks down |
-| ---- | ---------- |
-| `handlers_test.go::TestPolicyHandler_WindowSecondsRoundTrip` | GET carries `window_seconds`; legacy `window_hours`/`window_days` absent. |
-| `handlers_test.go::TestPolicyHandler_RejectsSubMinuteWindow` | POST `<60s` returns 4xx. |
-| `handlers_test.go::TestConsumptionHandler_EmptyAccountReturnsArray` | `/consumption` returns `[]` — never null. |
-| `handlers_test.go::TestConsumptionHandler_PopulatedAccountListsRows` | RecordConsumption×2 surfaces both with correct tokens/cost/window. |
-| `budget_handler_test.go::TestBudgetRuleHandler_RoundTrip` | Targets + PolicyLimits shape round-trip. |
-| `budget_handler_test.go::TestBudgetRuleHandler_ListReturnsArray` | Empty-list shape. |
-| `budget_handler_test.go::TestBudgetRuleHandler_{RejectsMissingName,RejectsSubMinuteWindow}` | Validation rejections are 4xx. |
-| `budget_handler_test.go::TestSettingsHandler_GetExposesCollectionToggles` | All four toggles + computed `Endpoint`. |
-| `proxy_clone_test.go::TestShallowCloneMapping_PreservesAllFieldsExceptAuthToken` | Future-proofs clone; every field round-trips, `AuthToken` dropped. |
-
-Handler tests use a real sqlite store + real manager + always-allow permissions mock (`handlers_test.go:53-75`). Create/update/delete success paths flow through `accountManager.StoreEvent` which the fixture doesn't wire — covered by manager-level no-mock tests outside this module.
-
-## Known limitations / explicit non-goals
-
- No pagination on any list endpoint; no bulk endpoints.
- Synth result is not cached — every snapshot and every per-peer compute repeats the store walk.
- `getSettings` returning `200 + null` is a deliberate dashboard concession.
- No rate-limiting beyond the global `/api` rate limiter.
-
-## Cross-references
-
- Upstream: [shared/api](10-shared-api.md), [management/agentnetwork](21-management-agentnetwork.md), [management/store](20-management-store.md)
- Downstream: [proxy/runtime](33-proxy-runtime.md)
- End-to-end flow: [../01-end-to-end-flows.md](../01-end-to-end-flows.md)
- Top-level: [../00-overview.md](../00-overview.md)
--- a/docs/agent-networks/modules/30-proxy-middleware-framework.md
+++ b/docs/agent-networks/modules/30-proxy-middleware-framework.md
@@ -1,215 +0,0 @@
-# proxy/middleware-framework — generic plugin system
-
-> **Risk level:** **High** — every proxied request transits this chain. Budget exhaustion, panic recovery, or chain-close bugs hit the hot path for all targets, not just agent-network ones.
-> **Backward-compat impact:** Additive at the proxy. The `middleware` and `bodytap` packages are new (`proxy/internal/middleware/middleware.go:1`, `proxy/internal/middleware/bodytap/request.go:13`); existing proxy targets keep working until a chain is bound to them via `Manager.Rebuild`.
-
-This module is the **framework only** — no LLM/agent-network domain knowledge is required, since every example built into it is generic.
-
-## Module boundary
-
-This module is the **framework only**: slots, chains, registry, dispatcher, accumulator, body-tap, output filters. No middleware *implementation* lives here — those land in `proxy/internal/middleware/builtin/*` (covered in module 31). The package contract is:
-
-1. The proxy hands a `Manager` to its config-apply path. The synth pushes per-path `PathTargetBinding` lists (`proxy/internal/middleware/manager.go:26`) into `Manager.Rebuild`, which resolves each spec via the `Registry`/`Resolver` (`proxy/internal/middleware/registry.go:81-121`) and produces an immutable `Chain` keyed by `serviceID|pathID` (`proxy/internal/middleware/manager.go:410-412`).
-2. The reverse-proxy handler captures the request body via `bodytap.CaptureRequest`, calls `Chain.RunRequest`, applies returned mutations (already filtered by `chain.applyMutations`), forwards to the upstream behind a `bodytap.CapturingResponseWriter`, then calls `Chain.RunResponse` and `Chain.RunTerminal`.
-3. Middlewares are inert plugins that receive a deep-cloned `Input` and return an `Output` whose decision/mutations are clamped by the dispatcher's `filterOutput` (`proxy/internal/middleware/dispatcher.go:149-172`).
-
-Everything that crosses the framework boundary in either direction is value-typed and deep-copied — middlewares cannot mutate the live request directly, and the framework cannot inadvertently leak middleware-owned slices into the request hot path.
-
-## Files
-
-| Path | Role |
-| ---- | ---- |
-| `proxy/internal/middleware/middleware.go` | `Middleware` + `Factory` interfaces. |
-| `proxy/internal/middleware/types.go` | `Slot`, `FailMode`, `Decision`, all limit constants, `Input`/`Output`/`Mutations`/`UpstreamRewrite`/`AuthHeader` value types. |
-| `proxy/internal/middleware/spec.go` | Apply-time `Spec` (validated wire shape + runtime-injected fields) and `Clone`. |
-| `proxy/internal/middleware/registry.go` | `Registry` (factory map, RWMutex) and `Resolver` (Spec → bound `Middleware`). |
-| `proxy/internal/middleware/manager.go` | `Manager`, `chainTable` reverse index, `Rebuild`/`Invalidate*`, async chain close. |
-| `proxy/internal/middleware/chain.go` | `Chain.RunRequest`/`RunResponse`/`RunTerminal`, mutation gating, `cloneInputFor`. |
-| `proxy/internal/middleware/chain_test.go` | Metadata threading, LIFO response order, rewrite gating, UserGroups propagation, terminal accumulation. |
-| `proxy/internal/middleware/dispatcher.go` | Timeout/panic recovery, fail-mode, error classification, `filterOutput`. |
-| `proxy/internal/middleware/decision.go` | `RenderDenyResponse`, deny-code regex, status clamp. |
-| `proxy/internal/middleware/headerpolicy.go` | Compile-in header denylist + `FilterHeaderMutations`. |
-| `proxy/internal/middleware/bodypolicy.go` | `ValidateBodyReplace` / `ApplyBodyReplace` smuggling guards. |
-| `proxy/internal/middleware/keys.go` | Metadata key namespace constants. |
-| `proxy/internal/middleware/metadata.go` | `Accumulator` — allowlist, per-mw/per-request byte caps, redaction. |
-| `proxy/internal/middleware/metrics.go` | OTel instrument bundle (`proxy.middleware.*`). |
-| `proxy/internal/middleware/redaction.go` | `Scan` — PEM/JWT/AWS/bearer/Luhn-validated CC patterns. |
-| `proxy/internal/middleware/bodytap/request.go` | Capture + replay reader, `Budget` semaphore, bypass reason codes. |
-| `proxy/internal/middleware/bodytap/response.go` | `CapturingResponseWriter` (tee with `PassthroughWriter` for Flusher/Hijacker preservation). |
-
-## Slot model
-
-Three slots, declared per-middleware exactly once (`proxy/internal/middleware/types.go:27-41`):
-
- **`SlotOnRequest`** (`Slot=1`) — runs **before** the upstream call, in registration order. May `DecisionDeny`, may emit `Mutations` (header add/remove, body replace, `UpstreamRewrite`) when both `Spec.CanMutate` and `Middleware.MutationsSupported()` are true. May emit metadata. Each middleware in the slot sees metadata that earlier ones in the same slot just emitted (`proxy/internal/middleware/chain.go:144-178`) — this is how the framework gives middlewares an intra-slot side channel without a global bag.
- **`SlotOnResponse`** (`Slot=2`) — runs **after** the upstream returns, in **reverse** registration order. Cannot deny (clamped in `dispatcher.filterOutput`, `proxy/internal/middleware/dispatcher.go:153-157`). May still mutate response headers in principle, but the current chain only forwards `RewriteUpstream` from on_request, so on_response mutations are observe-only in practice. Threads the same per-slot metadata view as on_request.
- **`SlotTerminal`** (`Slot=3`) — runs **after** every on_response middleware has emitted, in registration order. Sees the full accumulated bag plus prior terminal emissions (`chain.go:221-245`). Cannot deny, cannot mutate (`dispatcher.go:168-170`). Designed for sinks (access log, metrics push, audit emitter).
-
-Splitting a feature across slots (e.g. "parse on the way out, ship on terminal") is the explicit architectural choice — `types.go:7-15` and `types.go:22-25` make it clear no middleware participates in more than one slot.
-
-## Architecture & flow
-
-### Chain dispatch
-
-```mermaid
-sequenceDiagram
-    autonumber
-    participant H as proxy HTTP handler
-    participant BT as bodytap.CaptureRequest
-    participant CH as Chain
-    participant DI as Dispatcher
-    participant MW as Middleware (per slot)
-    participant US as Upstream
-    participant CW as CapturingResponseWriter
-
-    H->>BT: CaptureRequest(r, cfg, budget)
-    BT-->>H: body[], truncated, release()
-    H->>CH: RunRequest(ctx, r, Input, Accumulator)
-    loop on_request, registration order
-        CH->>CH: cloneInputFor(in, OnRequest)
-        CH->>DI: Invoke(ctx, spec, mw, call)
-        DI->>MW: mw.Invoke(callCtx, in)
-        MW-->>DI: Output{decision, metadata, mutations?}
-        DI->>DI: filterOutput (clamp deny, gate mutations)
-        DI-->>CH: filtered Output
-        CH->>CH: Accumulator.Emit (allowlist + caps + redact)
-        alt DecisionDeny
-            CH-->>H: denied, merged, rewrite
-        else allow
-            CH->>CH: applyMutations(r, m) and capture rewrite
-        end
-    end
-    CH-->>H: nil, merged, rewrite
-    H->>US: ProxyRequest (with rewrite/mutations applied)
-    US-->>CW: bytes (streamed, tee'd into cap-bounded buf)
-    CW-->>H: passthrough complete
-    H->>CH: RunResponse(ctx, Input{RespBody:CW.Body(),...}, acc)
-    loop on_response, REVERSE order (LIFO)
-        CH->>DI: Invoke (same wrappers)
-    end
-    H->>CH: RunTerminal(ctx, Input{Metadata:full bag}, acc)
-    H->>BT: release() + CW.Release()
-```
-
-### Body-tap mechanics (request + response)
-
-```mermaid
-flowchart LR
-    subgraph req[Request capture — bodytap.CaptureRequest]
-        R0[r.Body] --> R1{cfg.MaxRequestBytes > 0?\nUpgrade absent?\nContent-Type allowed?\nCL <= cap?}
-        R1 -- no --> R2[bypass = reason\nbody = nil\nr.Body untouched]
-        R1 -- yes --> R3[Budget.Acquire(cap)]
-        R3 -- denied --> R4[bypass=BypassBudget]
-        R3 -- ok --> R5[io.LimitReader(r.Body, cap+1)\nio.ReadAll]
-        R5 --> R6{len > cap?}
-        R6 -- truncated --> R7[viewable = buf[:cap]\nr.Body = replayReadCloser{buf, tail}]
-        R6 -- whole --> R8[r.Body = NopCloser(bytes.Reader(buf))\nclose original]
-        R7 --> R9[(release captured\nbudget on req end)]
-        R8 --> R9
-    end
-
-    subgraph resp[Response capture — CapturingResponseWriter]
-        W0[client] -.-> CW[Write(p)]
-        CW --> P1[PassthroughWriter.Write(p)\n— bytes leave to client first]
-        P1 --> P2{!stopped?}
-        P2 -- yes --> P3{remaining = cap - buf.Len()}
-        P3 --> P4[buf.Write(p[:take])\nset truncated if take<n]
-        P2 -- no --> P5[silent drop into the tee\n(client write already done)]
-    end
-```
-
-The body-tap is the highest-leak-risk surface in this module; three details matter:
-
-1. **Request capture is "read-and-replay", not "read-and-forward".** `CaptureRequest` always swaps `r.Body` for either a `bytes.Reader` (whole body fit) or a `replayReadCloser` that replays the captured prefix then drains the remaining stream from the original body (`bodytap/request.go:178-201`). This means the **upstream still sees the full body even when the tap truncates**. The original `r.Body` is **not** closed in the truncated branch — `replayReadCloser.Close()` only closes the tail (`bodytap/request.go:199-201`), which is the same reader, so close once on request end is correct, but reviewers should confirm the upstream proxy always reads to EOF (otherwise the tail is leaked).
-2. **Response capture is a write-through tee.** `CapturingResponseWriter.Write` forwards to the underlying writer **first** (`bodytap/response.go:116-117`), then tees into `buf` under its own mutex. Client never blocks on the tee. `Flusher`/`Hijacker` are preserved via the embedded `responsewriter.PassthroughWriter`. SSE/chunked streams flow through untouched; middlewares only see the bounded prefix.
-3. **Budget is a single shared semaphore.** `Manager` constructs one `bodytap.Budget` at startup (`manager.go:138-144`, default `256 MiB` from `bodytap/request.go:39`). Every capture pre-acquires its full `MaxRequestBytes` / `MaxResponseBytes` from the budget regardless of actual body size; that prevents a flood of small captures from collectively exceeding the cap, but it also means a misconfigured `MaxRequestBytes = 1 MiB` with 256 concurrent requests already exhausts the default budget. Reviewers should sanity-check the operator-facing defaults that ship with synth-service.
-
-The framework explicitly aborts capture (and increments `proxy.middleware.capture_bypass_total`) before reading the first byte when `Upgrade`/`Connection: upgrade` is set (`bodytap/request.go:120-125`), when the content-type isn't in the allowlist (`bodytap/request.go:126-128`), or when the advertised `Content-Length` already exceeds the cap (`bodytap/request.go:131-133`). This is the right place to make sure WebSocket upgrades and large file uploads never reach the buffer.
-
-## Public contracts
-
- **`Middleware` interface** (`middleware.go:14-36`): `ID()`, `Version()`, `Slot()`, `AcceptedContentTypes()`, `MetadataKeys()`, `MutationsSupported()`, `Invoke(ctx, *Input) (*Output, error)`, `Close()`. `MetadataKeys()` is the **closed set** the middleware is allowed to emit — the accumulator drops anything outside it (`metadata.go:71-75`). `Close` must be idempotent (called even when `Invoke` was never reached).
- **`Factory` interface** (`middleware.go:44-47`): `ID()`, `New(rawConfig []byte) (Middleware, error)`. `RawConfig` is opaque JSON bytes on the wire (`spec.go:6-12`); each factory owns its own typed config.
- **`Decision` type** (`types.go:59-69`): `Allow=0`, `Deny=1`, `Passthrough=2`. Default-zero is permissive — important because every middleware that omits `Decision` gets `Allow`. Dispatcher clamps `Deny` to `Passthrough` outside `SlotOnRequest` (`dispatcher.go:153-157`).
- **`Mutations`** (`types.go:196-201`): `HeadersAdd`/`HeadersRemove` (filtered through `headerpolicy.go`), `BodyReplace` (gated through `bodypolicy.go`), and `RewriteUpstream`. `RewriteUpstream` is **last-write-wins** within the on_request slot (`chain.go:170-172`, locked down by `TestChain_RunRequest_LatestRewriteWins`).
- **Metadata propagation keys** (`keys.go`): all keys live in a single file and follow `^[a-z][a-z0-9_-]*(\.[a-z0-9_-]*)+$` (`metadata.go:8`). Framework-injected error tagging uses `mw.<id>.error_kind` (`keys.go:81`) so operators can distinguish framework-emitted entries from middleware-emitted ones.
-
-## Invariants
-
- **Per-request context isolation.** `cloneInputFor` deep-copies every mutable field (`Headers`, `RespHeaders`, `Metadata`, `Body`, `RespBody`, `UserGroups`, `UserGroupNames`) before each invocation (`chain.go:286-308`). A misbehaving middleware that mutates `in.Headers` only corrupts its own copy.
- **Body-tap bounded by capture limit.** Request side uses `io.LimitReader(r.Body, limit+1)` (`bodytap/request.go:152`) — the `+1` is how the code detects truncation (`bodytap/request.go:160`); the surfaced buffer is sliced back down to `limit`. Response side stops teeing once `buf.Len() >= cap` (`bodytap/response.go:121-133`). Neither side can grow the buffer past the configured cap.
- **Headers/body redaction order.** Accumulator runs `Scan(value)` **before** counting cost (`metadata.go:81-82`), so the byte budgets are computed against post-redaction sizes. `Scan` order is PEM → JWT → AWS key → bearer → Luhn-validated CC (`redaction.go:25-51`) — the comment block in `redaction.go:8-13` is explicit that this is best-effort, not DLP.
- **No middleware can starve the chain.** Every invocation runs inside `context.WithTimeout(ctx, clampTimeout(spec.Timeout))` in a separate goroutine (`dispatcher.go:51-94`), with the deadline race-`select`ed against the result channel. A blocked middleware fires the timeout path, gets fail-mode'd, and `IncError(kind=timeout)`. Timeouts are clamped to `[10ms, 5s]` (`types.go:80-86`, `dispatcher.go:174-185`).
- **Panic recovery.** `recover()` captures the panic, logs only the type + a 4 KiB stack prefix (no panic value — avoids leaking secrets the middleware was processing), and produces a `panicError` that flows through fail-mode (`dispatcher.go:64-76`).
- **Chain immutability + atomic swap.** `chainTable` is cloned on every `Rebuild`/`Invalidate*` and swapped via `atomic.Pointer` (`manager.go:44-69`, `manager.go:221-300`). Readers (`ChainFor`) are lock-free; writers serialise on `writeMu`. The retired chain is `Close`-d in a background goroutine bounded by `chainCloseTimeout = 2 * MaxTimeout` (`manager.go:21-22`, `manager.go:326-346`), so in-flight invocations finish on the old chain after the swap.
-
-## Things to scrutinize
-
-### Correctness
-
- **Chain ordering deterministic from synth output?** `Manager.buildChain` iterates `b.Specs` in slice order and appends to `bound` (`manager.go:366-391`); `NewChain` then partitions by slot but **preserves slice order within each slot** (`chain.go:50-60`). So order on the wire = order observed at runtime. Synth must therefore emit specs in the intended execution order — there is no per-spec `Priority` field. Worth flagging.
- **Decision short-circuit semantics.** `RunRequest` returns immediately on `DecisionDeny` (`chain.go:164-167`) **with the metadata accumulated so far** plus the `denied.Metadata`. Callers that ignore `merged` on deny will lose framework-injected `mw.<id>.error_kind` entries. The proxy runtime is the only caller; confirm it always feeds `merged` into the access log on the deny path as well.
- **`UpstreamRewrite` `AuthHeader` bypass** (`types.go:218-235`). The `AuthHeader`/`StripHeaders` fields *intentionally* bypass the header denylist on the basis that the proxy itself rewrites auth. The denylist still blocks middleware-emitted `HeadersAdd: Authorization=...`. This is a delicate carve-out — review the runtime consumer to confirm only the trusted upstream-build path unpacks `AuthHeader`, never the generic `applyMutations` loop.
- **`replayReadCloser.Close` only closes the tail** (`bodytap/request.go:199-201`). The replay buffer doesn't own a resource, so this is correct, but it conflates "replay finished" with "underlying body closed". If a caller `Close()`s without reading to EOF, the original body is closed but the captured prefix is lost; harmless for the proxy path (upstream always reads to EOF) but worth a doc-comment.
-
-### Security
-
- **Body-tap memory bounds.** Discussed above — bounded by `MaxBodyCapBytes = 1 MiB` per direction (`types.go:77`) and the shared `Budget` (default 256 MiB). The concerning case is the **deep-copy in `cloneInputFor`** (`chain.go:300-306`): every middleware invocation gets its **own copy** of `Body` and `RespBody`. A chain of N middlewares with a 1 MiB body allocates N MiB of transient bytes per request. With `MaxMiddlewaresPerChain = 16` (`types.go:103`) that's up to 16 MiB extra per in-flight request. Worth pricing into the budget model.
- **Header redaction completeness.** `denyHeaders` (`headerpolicy.go:5-17`) covers the auth/forwarding family and framing (`Content-Length`, `Transfer-Encoding`, `Trailer`). `denyHeaderPrefixes` covers `X-Authenticated-*`, `X-Forwarded-*`, `X-Remote-*`, `X-NetBird-*`. Notably absent: `Range`, `If-Match`/`If-None-Match` (mutation could cause cache poisoning), `Origin`/`Referer`. Not necessarily wrong, but worth a deliberate decision.
- **Metadata key collisions across middlewares.** The accumulator has no cross-middleware uniqueness check; two middlewares with the same key in their allowlist can both emit it, and both copies land in `merged` (`metadata.go:51-99`). Downstream consumers must tolerate duplicates. Worth documenting.
- **Deny rendering.** `RenderDenyResponse` only allows codes matching `^[a-z][a-z0-9._-]{0,63}$` (`decision.go:9`), redacts/truncates message + detail values, caps `Details` at 8 entries (`decision.go:42-50`), clamps status to `[400,499]\{401}` (`decision.go:65-73`). The deny body type is fixed; middlewares cannot inject arbitrary JSON.
-
-### Concurrency
-
- **Per-request state vs shared state in factories.** Each `Factory.New` is called once per chain build; the returned `Middleware` instance is **shared across all requests** for that chain. `Invoke` must be reentrant. The framework does not enforce this — a buggy middleware that holds per-call state on the struct will silently race. Suggest a `// Invoke must be safe for concurrent use` doc on the interface.
- **`chainTable` clone-on-write** is correct, but `addChain`/`removeChain` mutate the *cloned* table before the swap (`manager.go:71-108`), and they're called under `writeMu`. Readers only ever see the post-swap pointer. Good.
- **`Chain.inflight` WaitGroup**. `Run*` does `Add(1)`/`Done()` (`chain.go:142-143`, `chain.go:194-195`, `chain.go:225-226`); `Close` waits on it bounded by ctx (`chain.go:75-85`). One concern: a *new* `RunRequest` can `Add(1)` *after* `Close` started waiting if the caller still holds a stale chain pointer. `WaitGroup` does not panic on this if the count was already > 0 at `Wait` time, but it does panic if `Add` happens after `Wait` returns and another `Wait` runs. `Close` is documented one-shot, so single-`Wait` is fine, but callers must drop the chain reference before calling `Close`. Worth a code comment near `Close`.
- **Goroutine leaks.** `Dispatcher.Invoke` spawns one goroutine per call and *always* writes to a buffered (cap=1) channel (`dispatcher.go:62-76`), so even if the timeout fires the goroutine completes its send and exits. No leak.
- **`closeChainsAsync`** detaches retired chains into a goroutine (`manager.go:326-346`). If `Manager` is never GC'd this is fine, but there's no shutdown hook to wait on outstanding closes. Reviewers should confirm the proxy shutdown path explicitly drains in-flight requests before tearing down `Manager`, or accept that the last chain-close round may be cut short on exit.
-
-### Performance
-
- **Allocations per request.** `cloneInputFor` allocates new slices for `Headers`, `RespHeaders`, `Metadata`, `Body`, `RespBody`, `UserGroups`, `UserGroupNames` — once per middleware per request. For a typical 5-middleware chain on a 1 KiB body that's ~10 small slice allocs plus one `Body` copy each. Not a hot-path crisis, but `sync.Pool` for the per-call `Input` would be a natural follow-up.
- **Accumulator allocates a fresh `allowSet` per `Emit` call** (`metadata.go:55-58`). One per middleware per slot pass = up to 48 per request. Cheap, but worth noting.
- **Regex cost.** `Scan` runs five regex passes on every accepted metadata value (`redaction.go:25-51`). Bounded by `MaxMetadataValueBytes = 4 KiB` so worst case is small.
-
-### Observability
-
- **Per-middleware metrics.** `proxy.middleware.requests_total{middleware,target_id,outcome}` (`metrics.go:34-41`), `duration_ms`, `invocations_total`, `errors_total{kind}`, `metadata_rejected_total{reason}`, `header_mutation_blocked_total{header}`, `capture_bypass_total{reason}`. Comprehensive surface; operators can alert on `errors_total{kind=panic}` and `errors_total{kind=timeout}` separately. **Latency histogram is in milliseconds with default OTel buckets** — for a 10ms–5s timeout range default buckets cover OK, but a custom bucket set centred on 1–500ms would resolve the agent-network response-parser tail better.
- **Decision logs.** Panic logs (`dispatcher.go:69`) include `request_id`, type, and stack but not the panic value (safe). `Chain.Close` logs middleware-close errors at debug (`chain.go:91`). `applyMutations` logs body-replace rejections at warn (`chain.go:278`). No log on the deny path itself — by design, since the access-log terminal middleware is expected to record outcomes.
-
-## Test coverage
-
-| Test file | Locks down |
-| --------- | ---------- |
-| `proxy/internal/middleware/chain_test.go:77` | `RunRequest` threads metadata across on_request middlewares (regression for the "later mw can't see earlier mw's emissions" bug). |
-| `chain_test.go:110` | `RunResponse` reverse-order threading. |
-| `chain_test.go:142` | `cost_meter`-shaped scenario: response_parser registered after cost_meter still emits *before* cost_meter sees the bag (guards the `cost.skipped=missing_tokens` regression). |
-| `chain_test.go:178` | `UpstreamRewrite` last-write-wins. |
-| `chain_test.go:206` | No middleware emits → nil rewrite. |
-| `chain_test.go:224` | Rewrite filtered when `CanMutate=false`. |
-| `chain_test.go:245` | `Input.UserGroups` propagates verbatim through `cloneInputFor`. |
-| `chain_test.go:304` | Terminal middlewares see the full accumulated bag + prior terminal emissions. |
-
-**Gaps** worth raising with the author:
- No direct test for `Dispatcher.Invoke` timeout / panic / fail-mode behaviour at the framework level (covered indirectly by built-in tests, but a unit test pinning `errors_total{kind=...}` labels would be cheap insurance).
- No test for `bodytap.CaptureRequest` truncated replay (the upstream-sees-full-body invariant is exactly the kind of thing a regression would silently break).
- No test for `Budget` exhaustion behaviour under concurrency.
- No test for `Manager.InvalidateMiddleware` + `LiveServiceCheck` race (the auth-revocation race the comment at `manager.go:33-38` calls out is the load-bearing reason for `LiveServiceCheck`).
-
-## Known limitations / explicit non-goals
-
- **No middleware-to-middleware RPC.** Side-channel is metadata only.
- **No streaming body inspection.** Middlewares see a bounded prefix; SSE / chunked parsing happens against that prefix in the response middleware.
- **No per-spec priority.** Order is registration order in the spec slice.
- **No retry / circuit-breaker** on middleware errors. Fail-mode is binary (open/closed) and per-spec.
- **Mutations cannot rewrite the request URL path or query** — only `RewriteUpstream` can change scheme/host (+ optional path replacement, see `types.go:218-235`).
- **Redaction is best-effort.** Explicitly documented in `redaction.go:8-13`. Not a DLP solution.
-
-## Cross-references
-
- Upstream wire shape: [../modules/10-shared-api.md](10-shared-api.md) (Spec/RawConfig encoding from management).
- Built-in middlewares using this framework: [../modules/31-proxy-middleware-builtin.md](31-proxy-middleware-builtin.md).
- Runtime wiring (where `Manager`, `Chain`, and `bodytap` are consumed by the HTTP handler): [../modules/33-proxy-runtime.md](33-proxy-runtime.md).
- End-to-end request flow including capture + chain dispatch: [../01-end-to-end-flows.md](../01-end-to-end-flows.md).
- Top-level architecture: [../00-overview.md](../00-overview.md).
--- a/docs/agent-networks/modules/31-proxy-middleware-builtin.md
+++ b/docs/agent-networks/modules/31-proxy-middleware-builtin.md
@@ -1,365 +0,0 @@
-# proxy/middleware-builtin — the LLM chain
-
-The registry-mounted middleware set the proxy executes on every agent-network
-LLM request. The two highest-blast-radius areas are the **capture-pointer
-semantics** and the **limit_check ⇒ limit_record** record-once invariant.
-
-Sibling module: [32-proxy-llm-parsers.md](./32-proxy-llm-parsers.md) — the SDK
-adapters + pricing catalog this chain delegates to.
-
---
-
-## Module boundary
-
-This module is the registry-mounted middleware set the proxy executes on
-every agent-network LLM request. Each sub-package registers itself via
-`init()`
-([builtin.go:32–34](../../../proxy/internal/middleware/builtin/builtin.go));
-the proxy server anonymous-imports the set
-([all_test.go:11–19](../../../proxy/internal/middleware/builtin/all_test.go))
-so the registry is populated at boot. The chain is wired by the management
-synthesiser and executed by the framework
-(`proxy/internal/middleware/{chain,dispatcher,accumulator}.go` — both out
-of scope). Everything here reads from / writes to one envelope: the
-`middleware.KV` metadata bag plus `middleware.Mutations` for header/body
-rewrites.
-
-## The 8 middlewares
-
-| Name | Slot | Inputs (metadata read) | Outputs (metadata written) | Side effects |
-|---|---|---|---|---|
-| `llm_request_parser` | OnRequest | `Input.{URL,Body,BodyTruncated}` | `llm.{provider,model,stream,request_prompt_raw,capture_truncated}` | none |
-| `llm_router` | OnRequest | `llm.model`, `Input.{URL,UserGroups}` | `llm.{resolved_provider_id,authorising_groups}`, `llm_policy.{decision,reason}` | upstream rewrite + auth strip/inject |
-| `llm_limit_check` | OnRequest | `llm.{resolved_provider_id,model}`, `Input.{AccountID,UserID,UserGroups}` | `llm.{selected_policy_id,attribution_group_id,attribution_window_seconds}`, `llm_policy.{decision,reason}` | gRPC `CheckLLMPolicyLimits` |
-| `llm_identity_inject` | OnRequest | `llm.{resolved_provider_id,authorising_groups}`, `Input.{UserEmail,UserID,UserGroups,UserGroupNames}` | none | header strip/inject + optional body rewrite |
-| `llm_guardrail` | OnRequest | `llm.{model,request_prompt_raw}` | `llm_policy.{decision,reason}`, `llm.request_prompt` | none (model allowlist deny) |
-| `llm_response_parser` | OnResponse | `llm.provider`, `Input.{RespHeaders,RespBody,Status}` | `llm.{input,output,total,cached_input,cache_creation}_tokens`, `llm.response_completion` | none |
-| `cost_meter` | OnResponse | `llm.{provider,model}`, token buckets | `cost.usd_total` or `cost.skipped` | pricing lookup |
-| `llm_limit_record` | OnResponse | `llm.{attribution_group_id,attribution_window_seconds,input_tokens,output_tokens}`, `cost.usd_total` | none | gRPC `RecordLLMUsage` |
-
-[all_test.go:26–40](../../../proxy/internal/middleware/builtin/all_test.go)
-locks the ID set; adding or removing one is a conscious extension.
-
-## Files
-
-| File | LOC | Notes |
-|---|---:|---|
-| `builtin.go` | 86 | Registry + `FactoryContext` (ctx, data dir, meter, logger, mgmt client) |
-| `all_test.go` | 41 | Locks the 8-ID registry surface |
-| `agentnetwork_chain_integration_test.go` | 319 | Live sqlite + real gRPC bufconn; gate→recorder wire path |
-| `llm_request_parser/*` | 162 / 66 / 356 | Provider detection, body parse, prompt extraction with capture-pointer gating |
-| `llm_router/*` | 385 / 84 / 586 | Three-pass route selection (model → groups → path-prefix) |
-| `llm_limit_check/*` | 196 / 38 / 182 | Pre-flight `CheckLLMPolicyLimits` (2s, fail-open) |
-| `llm_identity_inject/*` | 440 / 108 / 666 | HeaderPair (LiteLLM) + JSONMetadata (Portkey) + ExtraHeaders |
-| `llm_guardrail/*` | 176 / 82 / 75 / 219 / 217 | Model allowlist + optional prompt capture with PII redaction |
-| `llm_response_parser/*` | 258 / 222 / 43 / 433 / 169 / 111 | Buffered + SSE accumulation; AWS event-stream accumulator (`streaming_bedrock.go`) for Bedrock; capture-pointer gates completion emit |
-| `cost_meter/*` | 181 / 84 / 439 | Token → USD via `proxy/internal/llm/pricing` |
-| `llm_limit_record/*` | 144 / 35 / 191 | Post-flight `RecordLLMUsage` (5s, debug-on-error) |
-
-## Per-middleware
-
-### llm_request_parser
-
-Detects the LLM provider via `llm.DetectParser` (URL sniff) or by name via
-`llm.ParserByName` when synthesiser stamps `provider_id`
-([middleware.go:96–99](../../../proxy/internal/middleware/builtin/llm_request_parser/middleware.go)).
-**Path-routed providers short-circuit first:** `parseVertexPath` and
-`parseBedrockPath` ([middleware.go:85–94](../../../proxy/internal/middleware/builtin/llm_request_parser/middleware.go))
-pull the model + vendor out of the URL before parser selection runs — Vertex
-from `/v1/projects/.../publishers/{pub}/models/{model}:{action}` (publisher →
-vendor via `vertexPublisherVendor`), Bedrock from `/model/{id}/{action}` with
-`normalizeBedrockModel` stripping the region prefix + version suffix. See
-[50-path-routed-providers.md](./50-path-routed-providers.md) for the full path
-grammar. For body-routed providers it decodes the body into `RequestFacts`
-(model + stream) and extracts the prompt. On
-`capture_prompt=true` (or absent — see capture-pointer semantics below) the
-prompt is run through `llm_guardrail.RedactPII` when `redact_pii=true` and
-truncated rune-safely to 3500 bytes
-([middleware.go:109–122](../../../proxy/internal/middleware/builtin/llm_request_parser/middleware.go)).
-**Key invariant:** redaction is parser-side, not guardrail-side — access-log
-reads `llm.request_prompt_raw` directly.
-
-### llm_router
-
-Three-pass route selection in `matchRoute`
-([middleware.go:241–300](../../../proxy/internal/middleware/builtin/llm_router/middleware.go)):
-filter by `Models` claim → vendor-pin (a vendor-tagged request never crosses to
-another vendor's route) → filter by `AllowedGroupIDs` intersection → model
-precedence over path → tie-break by longest `UpstreamPath` prefix match.
-Model-miss returns `llm_policy.model_not_routable`; known-but-unauthorised
-returns `llm_policy.no_authorised_provider`. **Key invariant:** auth-header
-strip+inject rides on `UpstreamRewrite.{StripHeaders,AuthHeader}`
-([middleware.go:606–646](../../../proxy/internal/middleware/builtin/llm_router/middleware.go))
-— NOT `HeadersAdd/HeadersRemove` — because the framework's mutation gate
-blocks `Authorization` on the generic header path.
-
-**Path-routed providers route before the model table.** `Invoke` checks
-`isVertexPath` / `isBedrockPath`
-([middleware.go:138–216](../../../proxy/internal/middleware/builtin/llm_router/middleware.go))
-ahead of the model lookup, so a path-carried model can't be claimed by a
-same-vendor body-routed provider. `matchPathRoute` enforces the route's `Models`
-allowlist (empty = catch-all) even though the model came from the URL.
-Two path-only behaviours:
- **Vertex unmeterable publisher** — when `llm_request_parser` emits no
-  `llm.provider` (e.g. Gemini/`google`), the router denies with
-  `llm_policy.unmeterable_publisher` (403) rather than forward it uncounted.
- **GCP token minting** — when the route carries `GCPServiceAccountKeyB64`
-  (set from a `keyfile::` api_key), `gcpBearer` mints + caches a short-lived
-  OAuth2 token per request instead of injecting a static value; a bad key or
-  unreachable token endpoint denies with `llm_policy.upstream_auth_failed`
-  (502). Bedrock uses its static bearer token directly (no minting).
- **`/bedrock` prefix** — an optional `/bedrock` gateway-namespace prefix is
-  accepted and stripped via `RewriteUpstream.StripPathPrefix` so the native
-  `/model/...` path reaches the upstream.
-
-Full treatment in [50-path-routed-providers.md](./50-path-routed-providers.md).
-
-### llm_limit_check
-
-Pre-flight gate. Reads `llm.resolved_provider_id`, calls
-`CheckLLMPolicyLimits` with a 2s context timeout
-([middleware.go:24, 97–106](../../../proxy/internal/middleware/builtin/llm_limit_check/middleware.go)),
-on allow stamps `llm.selected_policy_id`, `llm.attribution_group_id`,
-`llm.attribution_window_seconds`. **Key invariant:** fail-open. Nil
-`MgmtClient`, empty provider id, or RPC error returns `allowNoAttribution()`
-— management outage doesn't take down every LLM request. Operators audit via
-the access-log; a future flag may switch this to fail-closed.
-
-### llm_identity_inject
-
-Dispatches per-rule between LiteLLM-shaped `HeaderPair`
-([middleware.go:169](../../../proxy/internal/middleware/builtin/llm_identity_inject/middleware.go))
-and Portkey-shaped `JSONMetadata`
-([middleware.go:292](../../../proxy/internal/middleware/builtin/llm_identity_inject/middleware.go)).
-Identity is the peer's email (or `UserID` fallback); tags are the
-**authorising-groups intersection** emitted by `llm_router`, not the full
-`UserGroups` — a peer in 5 groups authorised under 1 only tags as that 1.
-**Anti-spoof:** every `HeadersAdd` is preceded by a `HeadersRemove` of the
-same name; the framework runs `Remove` before `Add` so client-supplied
-identity never reaches the upstream. Body-level inject (`tags_in_body`,
-`end_user_id_in_body`) is skipped on empty / truncated / non-JSON bodies so
-header attribution stays intact.
-
-### llm_guardrail
-
-Model allowlist deny + optional prompt-capture-with-redaction. Allowlist
-match is case-insensitive via `normaliseModel`; empty allowlist disables the
-check. Prompt capture reads `llm.request_prompt_raw` and emits
-`llm.request_prompt` only when `prompt_capture.enabled`
-([middleware.go:149–165](../../../proxy/internal/middleware/builtin/llm_guardrail/middleware.go)).
-**Key invariant:** `RedactPII` is the exported function the parsers call —
-single PII contract across all three keys.
-
-### llm_response_parser
-
-Buffered and SSE paths share one `Invoke`
-([middleware.go:102–127](../../../proxy/internal/middleware/builtin/llm_response_parser/middleware.go)):
-content-type sniffing dispatches to `invokeBuffered` (JSON, status<400) or
-`invokeStreaming` (text/event-stream, partial bodies tolerated). Streaming
-delegates to `accumulateStream`
-([streaming.go:21–30](../../../proxy/internal/middleware/builtin/llm_response_parser/streaming.go))
-using `llm.NewScanner`. A third path, `accumulateBedrockStream`
-([streaming_bedrock.go](../../../proxy/internal/middleware/builtin/llm_response_parser/streaming_bedrock.go)),
-decodes the AWS binary event-stream (`application/vnd.amazon.eventstream`)
-returned by Bedrock's `-stream` actions — InvokeModel `chunk` frames wrap a
-base64 Anthropic event, Converse frames carry text + a trailing usage block.
-Cached / cache-creation buckets emit only when non-zero, preserving the existing
-token schema.
-
-### cost_meter
-
-Reads `llm.provider` + `llm.model` + token buckets, looks up per-1k rate via
-`pricing.Loader`, emits `cost.usd_total` or a closed-set `cost.skipped`
-reason (`missing_provider/model/tokens`, `unparseable_tokens`, `zero_tokens`,
-`unknown_model`). Loader's hot-reload goroutine is bound to proxy-lifetime
-context via `startReloader`. **Key invariant:** provider-shape switch lives
-in `pricing.Table.Cost` (sibling doc) — `cost_meter` stays provider-agnostic.
-
-### llm_limit_record
-
-Post-flight write. Always returns `DecisionAllow`; response has already been
-served so RPC errors mustn't surface (logged at `Debugf`). Skip-on-no-signal
-at line 81 (zero tokens + zero cost). **Key invariant:** the
-skip-on-missing-attribution guard at line 98 is a safety net independent of
-the framework's deny short-circuit — if the gate denied and the framework
-still runs the recorder, the recorder skips on absent
-`UserID`+`groupID`+`UserGroups` and no phantom counter materialises.
-
-## Full-chain diagram (canonical order)
-
-```mermaid
-flowchart TD
-    A[HTTP request] --> B[llm_request_parser<br/>OnRequest]
-    B -->|llm.provider, llm.model,<br/>llm.stream, llm.request_prompt_raw| C[llm_router<br/>OnRequest]
-    C -->|llm.resolved_provider_id,<br/>llm.authorising_groups,<br/>upstream rewrite + auth| D[llm_limit_check<br/>OnRequest]
-    D -->|deny path| Z1[403 llm_policy.*]
-    D -->|allow + llm.selected_policy_id,<br/>llm.attribution_group_id,<br/>llm.attribution_window_seconds| E[llm_identity_inject<br/>OnRequest]
-    E -->|header strip+inject<br/>+ optional body rewrite| F[llm_guardrail<br/>OnRequest]
-    F -->|deny: model_blocked| Z2[403 llm_policy.model_blocked]
-    F -->|allow + llm.request_prompt| G[upstream LLM call]
-    G --> H[llm_response_parser<br/>OnResponse]
-    H -->|llm.{input,output,total,cached_input,cache_creation}_tokens,<br/>llm.response_completion| I[cost_meter<br/>OnResponse]
-    I -->|cost.usd_total or cost.skipped| J[llm_limit_record<br/>OnResponse]
-    J --> K[response to client]
-```
-
-## limit_check ⇒ limit_record record-once invariant
-
-```mermaid
-sequenceDiagram
-    participant LC as llm_limit_check
-    participant M as management gRPC
-    participant U as upstream LLM
-    participant LR as llm_limit_record
-    participant DB as sqlite consumption table
-
-    LC->>M: CheckLLMPolicyLimits (2s)
-    alt allow
-        M-->>LC: selected_policy_id, attribution_group_id, window_s
-        LC->>U: stamps attribution metadata
-        U-->>LR: response + tokens (via llm_response_parser + cost_meter)
-        LR->>M: RecordLLMUsage (5s, debug-on-error)
-        M->>DB: increment (user, group, window) row
-    else deny
-        M-->>LC: llm_policy.token_cap_exceeded
-        Note over LR: framework short-circuits; even if invoked,<br/>recorder skips on absent UserID+groupID+UserGroups
-    else mgmt nil / rpc error
-        LC-->>LC: allowNoAttribution() — fail open
-        Note over LR: no window_s ⇒ recorder books only account-level<br/>budget rules (which run independently)
-    end
-```
-
-The integration test
-[agentnetwork_chain_integration_test.go](../../../proxy/internal/middleware/builtin/agentnetwork_chain_integration_test.go)
-exercises all three branches against a real sqlite store + bufconn gRPC —
-no mocks. Tests: `TestChain_AllowPath_StampsAttributionAndRecordsCounter`
-(line 130), `TestChain_DenyPath_GateRejectsAndNoConsumptionWritten` (line
-207), `TestChain_CapExhaustTransition` (line 265).
-
-## Public contracts (per-middleware JSON config)
-
-| Middleware | Config shape |
-|---|---|
-| `llm_request_parser` | `{provider_id?, redact_pii?, capture_prompt?: *bool}` ([factory.go:19–37](../../../proxy/internal/middleware/builtin/llm_request_parser/factory.go)) |
-| `llm_router` | `{providers: [{id, models, upstream_scheme, upstream_host, upstream_path?, auth_header_name, auth_header_value, allowed_group_ids}]}` |
-| `llm_limit_check` | `{}` — pulls `MgmtClient` from `FactoryContext` |
-| `llm_identity_inject` | `{providers: [{provider_id, header_pair?|json_metadata?, extra_headers?}]}` |
-| `llm_guardrail` | `{model_allowlist: []string, prompt_capture: {enabled, redact_pii}}` |
-| `llm_response_parser` | `{redact_pii?, capture_completion?: *bool}` |
-| `cost_meter` | `{pricing_path?}` (basename inside data-dir; defaults `pricing.yaml`) |
-| `llm_limit_record` | `{}` — same pattern as `llm_limit_check` |
-
-All factories accept empty / null / `{}` / whitespace as zero-value config;
-only structurally invalid JSON is rejected so misconfig surfaces at chain
-build time.
-
-## Invariants
-
-1. **limit_check ↔ limit_record paired.** They MUST appear together. Gate
-   stamps attribution metadata on the request leg; recorder reads it on the
-   response leg. If a chain contains only the recorder, the
-   skip-on-missing-attribution guard at
-   [llm_limit_record/middleware.go:81–87, 98–103](../../../proxy/internal/middleware/builtin/llm_limit_record/middleware.go)
-   keeps counters consistent but no enforcement runs. Only-gate means
-   counters never tick and headroom appears infinite.
-
-2. **`capture_prompt` / `capture_completion` pointer semantics.** Both are
-   `*bool`. `nil` = "preserve legacy emit" (back-compat default for
-   non-agent-network callers and pre-toggle tests). `false` = suppress the
-   key entirely (access-log row carries zero prompt / completion content).
-   `true` = emit. The synthesiser sets the pointer explicitly to the
-   account's `EnablePromptCollection` toggle. The handling lives
-   in [llm_request_parser/factory.go:55–61](../../../proxy/internal/middleware/builtin/llm_request_parser/factory.go)
-   and the symmetric [llm_response_parser/middleware.go:62–68](../../../proxy/internal/middleware/builtin/llm_response_parser/middleware.go);
-   a missing pointer must not be treated as `false` (that would suppress
-   capture for legacy non-agent-network callers).
-   `redact_pii` is an orthogonal `bool` controlling **form** of emitted
-   content, not whether it's emitted.
-
-3. **`redact_pii` is parser-side.** Both parsers import
-   `llm_guardrail.RedactPII` and run it BEFORE stamping the metadata bag.
-   Load-bearing because the access-log sink reads `llm.request_prompt_raw`
-   and `llm.response_completion` directly — by the time `llm_guardrail`
-   runs its own pass on `llm.request_prompt`, the raw key has already been
-   stamped. Tests: `TestInvoke_RedactPii_RedactsBeforeEmittingRawPrompt`,
-   `TestInvoke_RedactPii_RedactsCompletionBeforeEmit`.
-
-4. **Metadata allowlist enforcement.** Every middleware declares
-   `MetadataKeys()`. The framework accumulator drops any KV outside that
-   allowlist. When adding a new key, also extend the docstring in
-   `middleware/keys.go`.
-
-5. **Closed deny-code set.** All deny paths emit one of:
-   `llm_policy.model_not_routable`, `llm_policy.no_authorised_provider`,
-   `llm_policy.model_blocked`, `llm_policy.token_cap_exceeded`,
-   `llm_policy.unmeterable_publisher` (path-routed Vertex publisher with no
-   parser → 403), `llm_policy.upstream_auth_failed` (GCP token mint failure →
-   502), or the management-supplied code on `llm_limit_check`. These surface
-   verbatim; arbitrary middleware text never reaches the wire.
-
-## Things to scrutinise
-
-**Correctness.** `llm_router` model match treats an empty `Models` slice as
-"claim every model"
-([middleware.go:238–248](../../../proxy/internal/middleware/builtin/llm_router/middleware.go))
-for gateway-style providers — confirm no real provider record ships with an
-empty `Models` by accident. Path-prefix tie-break falls back to declaration
-order when no candidate prefix-matches, so the synthesiser must emit a
-deterministic order. `llm_limit_record` discards `strconv.ParseInt` errors
-([middleware.go:78–80](../../../proxy/internal/middleware/builtin/llm_limit_record/middleware.go))
-— relies on `llm_response_parser` always emitting parseable values; spot-check
-the streaming partial path on truncated bodies.
-
-**Security.** Auth headers must NEVER appear on `Mutations.HeadersAdd/Remove`
-for the router — a direct headers path would bypass the framework gate. The
-capture-pointer handling is the kind of place a bug ships PII to logs
-silently; every synthesiser config path must set the pointer explicitly.
-`llm_identity_inject` body inject silently skips on a
-non-object `metadata` field
-([middleware.go:262–270](../../../proxy/internal/middleware/builtin/llm_identity_inject/middleware.go))
-— header path still attributes, but body-level tag-budget enforcement
-doesn't run for that request.
-
-**Concurrency.** `cost_meter` shares a `pricing.Loader` via
-`atomic.Pointer[Table]`; readers always see a consistent table. Every
-middleware is a stateless value receiver. Integration test uses real bufconn
-gRPC — race detector is the meaningful bar.
-
-**Perf.** Hot path is `lookupKV` linear scan over <10 KVs; `cost_meter.Cost`
-is O(1); SSE accumulation is single-pass. No map allocation per call.
-
-**Observability.** Every deny stamps `llm_policy.decision=deny` and a
-matching `llm_policy.reason` — access-log can pivot on either.
-`llm_limit_record` only logs at `Debugf` on RPC failure
-([middleware.go:125–130](../../../proxy/internal/middleware/builtin/llm_limit_record/middleware.go));
-operators need an alternate signal (metric on `RecordLLMUsage` failures) for
-counter accuracy.
-
-## Test coverage
-
-| File | Tests | Notes |
-|---|---:|---|
-| `all_test.go` | 1 | Registry surface lock |
-| `agentnetwork_chain_integration_test.go` | 3 | Allow/deny/cap-exhaust vs live sqlite + bufconn gRPC |
-| `llm_request_parser/middleware_test.go` | 18 | `provider_id` bypass, redaction, capture-pointer, rune-safe truncation |
-| `llm_router/middleware_test.go` | 19 | Three-pass match, deny codes, path-prefix tie-break, header strip+inject |
-| `llm_limit_check/middleware_test.go` | 6 | Allow/deny, fail-open on nil mgmt / RPC error, attribution stamping |
-| `llm_identity_inject/middleware_test.go` | 28 | HeaderPair, JSONMetadata, ExtraHeaders, body inject, anti-spoof |
-| `llm_guardrail/middleware_test.go` | 15 | Allowlist case-insensitivity, prompt capture toggle, deny shape |
-| `llm_guardrail/redact_test.go` | 15 | Email, SSN, phone (E.164 + NA), bearer, IPv4; fixture-driven |
-| `llm_response_parser/middleware_test.go` | 18 | Buffered OAI+Anthro, capture-pointer, redact, truncation |
-| `llm_response_parser/streaming_test.go` | 7 | OAI usage frame, Anthro message_delta, truncated body best-effort |
-| `cost_meter/middleware_test.go` | 17 | Each skip reason, provider-shape, pricing loader integration |
-| `llm_limit_record/middleware_test.go` | 7 | Skip-on-no-signal, skip-on-missing-attribution, RPC failure swallowed |
-
-## Cross-references
-
- Sibling: [32-proxy-llm-parsers.md](./32-proxy-llm-parsers.md) — SDK adapters
-  + SSE framer + pricing loader.
- Path-routed providers (Vertex AI + Bedrock), `keyfile::` credential, GCP
-  token minting, `/bedrock` prefix:
-  [50-path-routed-providers.md](./50-path-routed-providers.md).
- Upstream config: `management/server/agentnetwork/synthesizer` (out of scope).
- Framework: `proxy/internal/middleware/{chain,dispatcher,accumulator,registry}.go`.
- Metadata key registry: `proxy/internal/middleware/keys.go`.
- gRPC surface: `proto.ProxyServiceClient.{CheckLLMPolicyLimits,RecordLLMUsage}`.
--- a/docs/agent-networks/modules/32-proxy-llm-parsers.md
+++ b/docs/agent-networks/modules/32-proxy-llm-parsers.md
@@ -1,392 +0,0 @@
-# proxy/llm-parsers — SDK adapters + pricing + SSE
-
-The runtime-agnostic LLM library: the OpenAI Responses API (`/v1/responses`)
-and the older Chat Completions API (`/v1/chat/completions`), the Anthropic
-Messages API (`/v1/messages`), the SSE wire format (`event:` / `data:` lines,
-`\n\n` framing, CRLF tolerance), and per-provider token accounting (OpenAI's
-cached-prompt **subset** vs Anthropic's cache_read **additive** model). The
-pricing table's per-provider cost formula is the highest-leverage place a
-small bug would silently mis-bill operators.
-
-Sibling module: [31-proxy-middleware-builtin.md](./31-proxy-middleware-builtin.md)
-— the 8 middlewares that consume this package's parsers + pricing loader.
-
---
-
-## Module boundary
-
-`proxy/internal/llm` is the runtime-agnostic LLM library shared by every
-middleware that needs to understand provider-specific shapes. Zero
-proxy-framework dependencies:
-
- `parser.go` — `Parser` interface, `Provider` enum, public factories
-  (`Parsers`, `DetectParser`, `ParserByName`).
- `openai.go` / `anthropic.go` / `bedrock.go` — per-provider `Parser` impls.
- `sse.go` — SSE scanner (`Scanner`, `Event`, `NewScanner`).
- `errors.go` — sentinels callers branch on with `errors.Is`.
- `pricing/` — embedded-default + hot-reload override table with
-  symlink-safe Unix loader (build-tagged stub elsewhere).
- `fixtures/` — captured request/response/stream bodies the tests replay.
-
-The package carries zero proxy-framework dependencies so the same parsers can
-be reused later by a WASM adapter
-([parser.go:1–6](../../../proxy/internal/llm/parser.go)).
-
-## Files
-
-| File | LOC | Notes |
-|---|---:|---|
-| `parser.go` | 104 | Interface + factories + `Provider{Unknown,OpenAI,Anthropic}` enum |
-| `openai.go` | 347 | Chat Completions + Completions + Responses API; cached_tokens subset |
-| `openai_test.go` | 222 | 11 tests; fixture replay + cached/Responses-API matrix |
-| `anthropic.go` | 172 | Messages + legacy `/v1/complete`; cache_read + cache_creation additive |
-| `anthropic_test.go` | 154 | 7 tests including streaming-extraction-skipped contract |
-| `bedrock.go` | 190 | AWS Bedrock InvokeModel (snake_case) + Converse (camelCase) response shapes; model lives in URL path |
-| `bedrock_test.go` | — | InvokeModel + Converse usage shapes; AWS event-stream content-type → `ErrStreamingUnsupported` on buffered `ParseResponse` |
-| `sse.go` | 117 | `bufio`-backed scanner; CRLF normalised; trailing-event handling |
-| `sse_test.go` | 175 | 12 tests; fixture replay + multiline + size limits |
-| `parser_test.go` | 53 | `Parsers()`, `DetectParser`, provider enum values |
-| `errors.go` | 31 | 6 sentinels: `Err{Unknown,Unsupported}Provider/Model`, `Err{NotLLM,Malformed}Response`, `ErrStreamingUnsupported`, `ErrMalformedRequest` |
-| `pricing/pricing.go` | 421 | `Loader`, `Table`, `Entry`; embedded defaults + atomic swap + mtime reload |
-| `pricing/pricing_unix.go` | 69 | `O_NOFOLLOW` + fstat-from-FD + 1 MiB cap |
-| `pricing/pricing_other.go` | 21 | Stub returning "not supported on this platform" |
-| `pricing/pricing_test.go` | 432 | 21 tests — symlink rejection, reload race, path traversal, oversize |
-| `pricing/defaults_pricing.yaml` | 85 | go:embed source of truth |
-| `fixtures/*` | 21–59 | OAI chat/responses/stream + Anthro messages/stream + pricing starter |
-
-## Request body → parser dispatch
-
-```mermaid
-flowchart TD
-    A[HTTP request<br/>URL + JSON body] --> B{ParserByName?<br/>provider_id config set}
-    B -- yes --> P[matched Parser]
-    B -- no --> C[DetectParser]
-    C --> D{loop Parsers<br/>OpenAIParser, AnthropicParser}
-    D -- DetectFromURL match --> P
-    D -- no match --> X[ok=false<br/>middleware skips]
-    P --> E[ParseRequest body]
-    E -->|err: ErrMalformedRequest| Y[middleware emits provider only]
-    E --> F[RequestFacts<br/>model + stream]
-    P --> G[ExtractPrompt body]
-    G --> H[joinMessages<br/>extractContentParts<br/>decodeStringOrJoin]
-    H --> I[prompt text<br/>or empty]
-    F --> J[stamps llm.model + llm.stream]
-    I --> K[stamps llm.request_prompt_raw<br/>subject to capture_prompt gate]
-```
-
-OpenAI's URL hints
-([openai.go:27–33](../../../proxy/internal/llm/openai.go)) include
-both `/v1/chat/completions` and the bare `/chat/completions` — the latter
-covers Cloudflare AI Gateway, which rewrites the canonical version segment.
-Anthropic's hints are `/v1/messages` and `/v1/complete`
-([anthropic.go:14–17](../../../proxy/internal/llm/anthropic.go)).
-Both implementations use case-insensitive substring matching so a proxy prefix
-strip / rewrite doesn't defeat detection.
-
-`ParserByName` ([parser.go:93–103](../../../proxy/internal/llm/parser.go))
-is the **agent-network bypass**: the synthesiser knows which parser to use
-because it built the synth service from the catalog, so it stamps
-`provider_id` on the parser config and the middleware skips URL sniffing
-entirely. This is what makes the same parser set work whether the request
-flows to OpenAI direct, to LiteLLM, to Portkey, or to any gateway with a
-non-canonical URL shape.
-
-**Path-routed providers (Vertex AI, Bedrock) bypass both `ParserByName` and
-`DetectParser`.** The model and the parser surface live in the URL path, so the
-request middleware extracts them directly (`parseVertexPath` /
-`parseBedrockPath`) before the parser-selection step. For Vertex the publisher
-segment picks the parser (`anthropic` → Anthropic parser; `google`/Gemini →
-none, request denied as unmeterable). For Bedrock the dedicated `BedrockParser`
-handles the response. Full treatment in
-[50-path-routed-providers.md](./50-path-routed-providers.md).
-
-## Streaming response → SSE chunker → response parser → completion + token count
-
-```mermaid
-sequenceDiagram
-    participant U as upstream LLM
-    participant LR as llm_response_parser<br/>(OnResponse)
-    participant S as llm.NewScanner<br/>(SSE framer)
-    participant P as Parser-specific accumulator<br/>(accumulateOpenAIStream<br/>or accumulateAnthropicStream)
-
-    U-->>LR: text/event-stream<br/>(buffered prefix in RespBody)
-    LR->>S: NewScanner(bytes.NewReader(body))
-    loop until EOF or [DONE]
-        S-->>LR: Event{Type, Data}
-        LR->>P: dispatch per event.Type<br/>(OpenAI: data-only<br/>Anthropic: named events)
-        P-->>P: accumulate completion text<br/>track usage from final frame
-    end
-    P-->>LR: llm.Usage + completion string
-    LR->>LR: appendUsage stamps<br/>llm.{input,output,total,cached_input,cache_creation}_tokens
-    LR->>LR: truncateCompletion(3500 bytes, rune-safe)
-    LR->>LR: redactPII if redact_pii && captureCompletion
-```
-
-`Scanner.Next`
-([sse.go:44–87](../../../proxy/internal/llm/sse.go)) returns one
-event per `\n\n` boundary; multiple `data:` lines join with `\n`; comment lines
-(starting with `:`) are skipped per the SSE spec; a trailing event without a
-closing blank line is still returned before `io.EOF` so a server that closes
-the connection cleanly doesn't lose the last frame
-([sse.go:55–58](../../../proxy/internal/llm/sse.go)). CRLF is
-normalised in `trimEOL` so fixtures captured from live servers replay
-unchanged.
-
-## Per-provider
-
-### OpenAI
-
-[openai.go:54–67](../../../proxy/internal/llm/openai.go) defines
-`openAIRequest` with three prompt fields: `messages` (Chat Completions),
-`prompt` (legacy), `input` (Responses API). The decoder uses
-`json.RawMessage` so each shape is parsed lazily.
-
-`ParseResponse`
-([openai.go:117–146](../../../proxy/internal/llm/openai.go))
-accepts both naming conventions: Chat Completions returns
-`prompt_tokens`/`completion_tokens`, Responses API returns
-`input_tokens`/`output_tokens`. `pickInt64` prefers Responses-API names and
-falls back — same parser handles both endpoints without per-route config.
-`openAICachedTokens` mirrors the fallback for
-`input_tokens_details.cached_tokens` vs `prompt_tokens_details.cached_tokens`.
-
-**Key invariant:** `CachedInputTokens` for OpenAI is a SUBSET of
-`InputTokens`. The cost meter clamps to guard against malformed upstream
-responses where `cached > total`.
-
-### Anthropic
-
-[anthropic.go:37–49](../../../proxy/internal/llm/anthropic.go)
-defines `anthropicRequest` covering Messages API (`system` + `messages[]`)
-and legacy `/v1/complete` (`prompt` string). `ExtractPrompt` emits
-`system: <text>` first when present, then per-message `role: content`.
-
-`ParseResponse`
-([anthropic.go:82–104](../../../proxy/internal/llm/anthropic.go))
-fills three independent token buckets: `InputTokens`, `CacheReadInputTokens`,
-`CacheCreationInputTokens`. Latter two are **additive** (not subset).
-`TotalTokens` sums all four so downstream dashboards render one "tokens"
-number without double-counting.
-
-`ExtractCompletion` walks `content[]` `{type, text}` parts and concatenates
-non-empty text with newlines, falling back to legacy `completion`.
-
-### Bedrock
-
-[bedrock.go](../../../proxy/internal/llm/bedrock.go) implements the
-`Parser` interface for the AWS Bedrock runtime. Bedrock is **path-routed**: the
-model lives in the URL (`/model/{id}/{action}`), so the request middleware
-extracts it (see [50-path-routed-providers.md](./50-path-routed-providers.md))
-and `ParseRequest` is a deliberate no-op. The parser's real work is on the
-response leg, covering both Bedrock body shapes:
-
- **InvokeModel** — vendor-native. Anthropic-on-Bedrock returns snake_case usage
-  (`input_tokens`, `output_tokens`, `cache_read_input_tokens`,
-  `cache_creation_input_tokens`) with the same additive cache buckets as
-  first-party Anthropic.
- **Converse** — unified camelCase (`inputTokens`, `outputTokens`,
-  `totalTokens`). `firstNonZero` folds the two naming conventions into one
-  `Usage`; when Converse omits `totalTokens` the parser sums the buckets.
-
-`ProviderName()` returns `"bedrock"` — its own `defaults_pricing.yaml` block,
-keyed by the **normalised** model id (region prefix + version suffix stripped by
-the request parser). `ParseResponse` returns `ErrStreamingUnsupported` for an
-AWS binary event-stream content-type (`application/vnd.amazon.eventstream`,
-`isAWSEventStream`) so the caller routes to the streaming accumulator instead.
-
-### SSE framing
-
-`Scanner` is `bufio`-backed, 64 KiB read buffer, 1 MiB max line so a
-malicious upstream can't blow process memory
-([sse.go:33–38, 97–100](../../../proxy/internal/llm/sse.go)).
-`splitField` strips one space after the `:` per the SSE spec. Documented
-`not safe for concurrent use`; every consumer creates a fresh scanner per
-response body. Streaming accumulators live in the middleware package
-([llm_response_parser/streaming.go](../../../proxy/internal/middleware/builtin/llm_response_parser/streaming.go))
-but use `llm.NewScanner` so the framing contract stays here.
-
-### Pricing catalog
-
-`Table.Cost`
-([pricing.go:129–174](../../../proxy/internal/llm/pricing/pricing.go))
-is the cost formula — most security-relevant math in this module:
-
-| Provider | Formula |
-|---|---|
-| `openai` | `(inTokens − clamped) × InputPer1K + clamped × CachedInputPer1K + outTokens × OutputPer1K` where `clamped = min(cachedInput, inTokens)` |
-| `anthropic`, `bedrock` | `inTokens × InputPer1K + cachedInput × CacheReadPer1K + cacheCreation × CacheCreationPer1K + outTokens × OutputPer1K` |
-| default | `inTokens × InputPer1K + outTokens × OutputPer1K` |
-
-`bedrock` shares the Anthropic additive-cache formula
-([pricing.go:172-174](../../../proxy/internal/llm/pricing/pricing.go)):
-Anthropic-on-Bedrock reports the same additive cache buckets, while non-Anthropic
-Bedrock models (Nova, Llama) simply report zero in those buckets so cost reduces
-to `input + output`.
-
-Each per-bucket rate falls back to `InputPer1K` when zero — operators opt in
-to discounts by setting the field.
-
-`Loader`
-([pricing.go:212–268](../../../proxy/internal/llm/pricing/pricing.go))
-overlays an optional `pricing.yaml` from data-dir on top of the go:embed
-defaults. Atomic pointer swap means readers never observe a partial update.
-The mtime-poll reloader (30s default cadence) keeps the previous table on
-parse failure so cost annotation never goes blank during a botched edit.
-
-`defaults_pricing.yaml` is the source of truth for built-in pricing.
-Operator overrides only carry the entries they want to change.
-
-## Public contracts
-
-**`Parser` interface**
-([parser.go:50–66](../../../proxy/internal/llm/parser.go)):
-
-```go
-type Parser interface {
-    Provider() Provider
-    ProviderName() string
-    DetectFromURL(path string) bool
-    ParseRequest(body []byte) (RequestFacts, error)
-    ParseResponse(status int, contentType string, body []byte) (Usage, error)
-    ExtractPrompt(body []byte) string
-    ExtractCompletion(status int, contentType string, body []byte) string
-}
-```
-
-Adding a provider means implementing this interface and appending to the
-slice returned by `Parsers()` ([parser.go:78–84](../../../proxy/internal/llm/parser.go)).
-Order matters: `DetectFromURL` ties resolve by registration order.
-`Parsers()` today returns `{OpenAIParser, AnthropicParser, BedrockParser}`.
-
-**`Provider` enum**
-([parser.go:8–18](../../../proxy/internal/llm/parser.go)):
-`ProviderUnknown = 0`, `ProviderOpenAI = 1`, `ProviderAnthropic = 2`,
-`ProviderBedrock = 3`. Numeric values are persisted in nothing today but treat
-them as wire-stable — new providers must take fresh numbers.
-
-**`Pricing` lookup**
-([pricing.go:129](../../../proxy/internal/llm/pricing/pricing.go)):
-
-```go
-func (t *Table) Cost(provider, model string, inTokens, outTokens, cachedInput, cacheCreation int64) (float64, bool)
-```
-
-Nil-safe: `t.Cost` on a nil receiver returns `(0, false)`
-([pricing.go:130–132](../../../proxy/internal/llm/pricing/pricing.go)).
-`ok=false` means provider or model is absent from the loaded table; the caller
-emits `cost.skipped=unknown_model`.
-
-## Invariants
-
-1. **Cross-platform pricing build.** `pricing_unix.go` carries the only
-   functional `loadPricing` (uses `syscall.O_NOFOLLOW` and `f.Stat()` on an
-   open descriptor — both Unix-only). `pricing_other.go` is a build-tag
-   fallback that returns `"not supported on this platform"`
-   ([pricing_other.go:14–16](../../../proxy/internal/llm/pricing/pricing_other.go)).
-   The proxy is Linux-only in production today; a Windows port needs an
-   equivalent path-as-handle implementation. Reviewers building on Windows
-   should expect this surface to return an error at startup if an override
-   file is configured.
-
-2. **SSE scanner handles partial chunks.** A buffered prefix that doesn't end
-   in `\n\n` still yields its accumulated event before `io.EOF`
-   ([sse.go:55–58](../../../proxy/internal/llm/sse.go)). Tests:
-   `TestSSEScanner_OpenAIFixture`, `TestSSEScanner_AnthropicFixture`,
-   `TestSSEScanner_MultilineData`, `TestSSEScanner_CRLF`. The streaming
-   accumulators ride on this: `accumulateAnthropicStream` and
-   `accumulateOpenAIStream` `break` on any scanner error to return partial
-   usage rather than aborting
-   ([streaming.go:68–73, 144–150](../../../proxy/internal/middleware/builtin/llm_response_parser/streaming.go)).
-
-3. **`defaults_pricing.yaml` is the source of truth.** Compiled into the
-   binary via `//go:embed`
-   ([pricing.go:29–30](../../../proxy/internal/llm/pricing/pricing.go)).
-   `DefaultTable()` parses once and panics on parse failure
-   ([pricing.go:42–49](../../../proxy/internal/llm/pricing/pricing.go))
-   — by design: a broken embedded YAML must not ship to production.
-
-4. **Loader path validation.** `resolveMiddlewareDataPath`
-   ([pricing.go:370–394](../../../proxy/internal/llm/pricing/pricing.go))
-   rejects absolute paths, traversal segments, and basenames that fail
-   `basenameRegex = ^[a-zA-Z0-9._-]+$`. The resolved path must remain
-   inside `baseDir` even after `filepath.Clean`. Tests:
-   `TestNewLoader_PathValidation`, `TestNewLoader_PathValidation_Extended`,
-   `TestNewLoader_SymlinkOutsideBaseDirRejected`, `TestNewLoader_SymlinkRejected`.
-
-5. **Unix loader symlink safety.** `O_NOFOLLOW` on open, `f.Stat()` on the
-   open descriptor (never re-stat by path), `info.Mode().IsRegular()` check,
-   `io.LimitReader(f, maxPricingBytes+1)` with a final size assertion
-   ([pricing_unix.go:25–57](../../../proxy/internal/llm/pricing/pricing_unix.go)).
-   A mid-read symlink swap is detected because the fstat is on the original
-   fd. Test: `TestNewLoader_RejectsOversizedFile_FixesM4`.
-
-6. **`yaml.NewDecoder(...).KnownFields(true)`**
-   ([pricing.go:397–398](../../../proxy/internal/llm/pricing/pricing.go))
-   rejects YAML files that carry fields not in the schema. A typo in an
-   operator override file fails loud instead of silently zeroing rates.
-
-## Things to scrutinise
-
-**Correctness.** Verify OpenAI cached-prompt clamp at
-[pricing.go:147–149](../../../proxy/internal/llm/pricing/pricing.go)
-short-circuits before subtraction. `Anthropic.TotalTokens` sums all four
-buckets (in + out + cache_read + cache_creation) — downstream dashboards
-need to know this differs from `input + output`.
-`OpenAIParser.ExtractPrompt` falls through `messages → input → prompt`; a
-request sending all three reports only `messages` (uncommon but worth
-noting).
-
-**Security.** `Scanner.maxLine = 1 MiB`; a 2 MiB single-line `data:` event
-errors from `Scanner.Next` and both accumulators stop with partial usage.
-Pricing file 1 MiB cap is orders of magnitude larger than realistic. Confirm
-new schema additions are mirrored in both `pricingFile` and `Entry`;
-`KnownFields(true)` will reject silently-typo'd operator overrides
-otherwise.
-
-**Concurrency.** `Loader.table` is `atomic.Pointer[Table]`; readers never
-block or see a torn table. `Loader.Reload` is one goroutine, cancelled via
-context (`TestLoader_ReloadBackgroundLoopCancellation`). `DefaultTable()`
-uses `sync.Once`. Per-call `Scanner` instances mean no shared state across
-concurrent response-parser calls.
-
-**Perf.** `Table.Cost` is two map lookups + multiplications, O(1).
-`Scanner.Next` is one `ReadString('\n')` per line. Pricing reload poll 30s.
-
-**Observability.** Reload failures count via `metric.Int64Counter` keyed
-`plugin`; warning log rate-limited at 5 min so a broken file doesn't flood.
-Parser errors return sentinels — middleware uses `errors.Is` to map to the
-right `cost.skipped` reason.
-
-## Test coverage
-
-| File | Tests | Coverage highlights |
-|---|---:|---|
-| `parser_test.go` | 3 | `Parsers()` shape lock, `DetectParser` URL matrix, provider enum stability |
-| `openai_test.go` | 11 | Chat Completions + Responses API + legacy `prompt`; cached-tokens subset for both naming conventions; fixture replays |
-| `anthropic_test.go` | 7 | Messages + legacy `/v1/complete`; streaming REJECTED on `ParseResponse` (must use scanner); fixture replays |
-| `sse_test.go` | 12 | Fixture replay both providers; multiline `data:`; CRLF; comment skip; trailing-event-without-blank-line; oversize rejection |
-| `pricing/pricing_test.go` | 21 | Provider-shape switch; cached-rate fallback; cached-clamp; symlink rejection (target outside basedir + symlink to file); path validation matrix; oversize rejection; reload-keeps-previous-on-parse-error; mtime change detection; goroutine cancellation |
-
-**Fixtures** ([proxy/internal/llm/fixtures/](../../../proxy/internal/llm/fixtures/)):
-`openai_chat_completion.json` (chat.completions with usage),
-`openai_responses.json` (Responses API shape),
-`openai_stream.txt` (3 deltas + usage + `[DONE]`),
-`anthropic_messages.json` (Messages API non-streaming),
-`anthropic_stream.txt` (full 7-event sequence: message_start →
-content_block_{start,delta×2,stop} → message_delta (usage) → message_stop),
-`pricing.yaml` (realistic-pricing starter for operator overrides).
-
-## Cross-references
-
- Sibling: [31-proxy-middleware-builtin.md](./31-proxy-middleware-builtin.md)
-  — the chain that calls `llm.Parsers()`, `llm.ParserByName`,
-  `llm.NewScanner`, `pricing.NewLoader`.
- Path-routed providers (Vertex AI + Bedrock), credential syntax, and the
-  Bedrock AWS event-stream accumulator:
-  [50-path-routed-providers.md](./50-path-routed-providers.md).
- Direct callers: `llm_request_parser/middleware.go:82–94`,
-  `llm_response_parser/middleware.go:113–123`,
-  `llm_response_parser/streaming.go:65, 142`, `cost_meter/factory.go:49–57`.
- Related elsewhere: the agent-network synthesiser stamping `provider_id`
-  is covered in the management-side module guide; proxy server boot +
-  `FactoryContext` construction is covered in the proxy-framework guide.
--- a/docs/agent-networks/modules/33-proxy-runtime.md
+++ b/docs/agent-networks/modules/33-proxy-runtime.md
@@ -1,194 +0,0 @@
-# proxy/runtime — translate + serve + log
-
-> **Risk level:** High — every config push from management is translated here, and the chain runs on every HTTP request to a synth target.
-> **Backward-compat impact:** Additive at the wire (`PathTargetOptions.middlewares`, `agent_network`, `disable_access_log`, capture caps) and on the proxy `Server` struct (`MiddlewareDataDir`, `MiddlewareCaptureBudgetBytes`). Non-agent-network targets stay on the no-middleware fast path.
-
-## Module boundary
-
-Turns the synth-service wire format from `ProxyService.SyncMappings`/`GetMappingUpdate` into in-process middleware chains and runs them on top of the existing `httputil.ReverseProxy`. Four concerns: (a) **translate** — `proto.MiddlewareConfig` → validated `middleware.Spec` (proxy/middleware_translate.go) + self-register the eight built-ins (proxy/middleware_register.go); (b) **boot + rebuild** — construct the `middleware.Manager`, share the OTel meter, install the live-service check, rebuild per-path chains on every `addMapping`/`modifyMapping` (proxy/server.go); (c) **serve** — resolve chain at request time, capture bodies under a global budget, invoke `RunRequest`/`RunResponse`/`RunTerminal`, render deny responses, apply `UpstreamRewrite` (proxy/internal/proxy/reverseproxy.go); (d) **log + tag** — emit access-log entries with the new `agent_network` flag, gate emission on `EnableLogCollection` via `DisableAccessLog` (proxy/internal/accesslog).
-
-**Inert for non-agent-network targets**: nil or empty chain → existing fast path (reverseproxy.go:127-139); `SuppressAccessLog` defaults false so the access-log middleware emits unchanged.
-
-## Files
-
-| Path | Role |
-| ---- | ---- |
-| proxy/middleware_translate.go | proto→Spec translation; slot/failmode/timeout mapping; caps |
-| proxy/middleware_translate_test.go | translator unit tests |
-| proxy/middleware_register.go | blank-imports the eight builtins for `init()` registration |
-| proxy/server.go | `initMiddlewareManager`, `rebuildMiddlewareChains`, `isLiveService`, `buildMiddlewareBindings`, new Server fields, `protoToMapping` stamps AgentNetwork/DisableAccessLog/CaptureConfig/Middlewares |
-| proxy/internal/proxy/reverseproxy.go | `WithMiddlewareManager`, chain dispatch, body capture, `applyUpstreamRewrite`/`Headers`, `buildRequestInput`, response-leg respInput identity fields |
-| proxy/internal/proxy/reverseproxy_test.go | `TestBuildRequestInput_PropagatesIdentityAndGroups` |
-| proxy/internal/proxy/context.go | `agentNetwork`, `suppressAccessLog`, `userGroupNames` on `CapturedData` |
-| proxy/internal/proxy/servicemapping.go | new `PathTarget` fields |
-| proxy/internal/proxy/agent_network_chain_realstack_test.go | end-to-end self-contained chain test |
-| proxy/internal/accesslog/logger.go | `logEntry.AgentNetwork` → `proto.AccessLog` |
-| proxy/internal/accesslog/middleware.go | reads `GetAgentNetwork()`; gates `l.log` on `!GetSuppressAccessLog()` |
-| proxy/internal/accesslog/middleware_test.go | suppress/default/preserves-usage assertions |
-| proxy/internal/auth/middleware_test.go | tunnel-peer group propagation contract |
-| proxy/internal/metrics/metrics.go | `Meter()` getter for the middleware manager |
-
-## Architecture & flow
-
-### Synth-service ingestion → translate → register → serve
-
-```mermaid
-flowchart TD
-    A[Management SyncMappings/GetMappingUpdate] --> B["processMappings\nserver.go:1492"]
-    B --> C{Mapping type}
-    C -->|CREATED| D["addMapping → setupHTTPMapping → updateMapping"]
-    C -->|MODIFIED| E["modifyMapping → cleanupMappingRoutes → setupHTTPMapping → updateMapping"]
-    C -->|REMOVED| F["removeMapping → cleanupMappingRoutes → invalidateMiddlewareChains"]
-    D --> G["protoToMapping\nserver.go:2181"]
-    E --> G
-    G --> H["translateMiddlewareConfigs\nmiddleware_translate.go:55"]
-    G --> I["translateMiddlewareCaptureConfig\nmiddleware_translate.go:18"]
-    H --> J["[]middleware.Spec on PathTarget"]
-    I --> K["*bodytap.Config on PathTarget"]
-    J --> L["proxy.AddMapping\nservicemapping.go:118"]
-    K --> L
-    L --> M["rebuildMiddlewareChains\nserver.go:2017 → Manager.Rebuild"]
-    F --> N["Manager.Invalidate(serviceID)"]
-```
-
-### Per-request lifecycle through the chain + accesslog
-
-```mermaid
-sequenceDiagram
-    autonumber
-    participant C as Client
-    participant M as accesslog.Middleware
-    participant A as auth.Middleware (Protect)
-    participant RP as ReverseProxy.ServeHTTP
-    participant CH as middleware.Chain
-    participant U as Upstream
-    C->>M: HTTP request
-    M->>M: NewCapturedData(requestID), WithCapturedData(ctx)
-    M->>A: next.ServeHTTP
-    A->>A: Private → ValidateTunnelPeer → stamp UserID/Email/Groups/GroupNames/AuthMethod
-    A->>RP: next.ServeHTTP
-    RP->>RP: findTargetForRequest → targetResult
-    RP->>RP: stamp ServiceID/AccountID/AgentNetwork/SuppressAccessLog on CapturedData
-    RP->>RP: resolveChain via Manager.ChainFor
-    alt chain == nil or Empty
-        RP->>U: httputil.ReverseProxy.ServeHTTP (fast path)
-    else chain non-empty
-        RP->>RP: bodytap.CaptureRequest (global budget)
-        RP->>CH: RunRequest
-        CH-->>RP: denyOutput? requestMeta + upstreamRewrite
-        alt deny
-            RP->>C: RenderDenyResponse
-        else allow
-            RP->>RP: capturingWriter + applyUpstreamRewrite/Headers
-            RP->>U: httputil.ReverseProxy.ServeHTTP(respWriter)
-            U-->>RP: response
-            RP->>CH: RunResponse (respInput carries UserGroups)
-            RP->>CH: RunTerminal (merged request+response metadata)
-        end
-    end
-    RP-->>M: handler returns
-    M->>M: build logEntry incl. AgentNetwork
-    alt SuppressAccessLog == true
-        M->>M: skip l.log; still trackUsage
-    else default
-        M->>M: l.log → goroutine SendAccessLog
-    end
-```
-
-### EnableLogCollection suppression path
-
-```mermaid
-flowchart LR
-    S["agentnetwork.Settings.EnableLogCollection"] --> B["synthesizer: target.DisableAccessLog = !EnableLogCollection"]
-    B --> P["proto PathTargetOptions.disable_access_log (field 13)"]
-    P --> T["protoToMapping reads GetDisableAccessLog()\nserver.go:2211"]
-    T --> M["PathTarget.DisableAccessLog\nservicemapping.go:47"]
-    M --> R["ServeHTTP: cd.SetSuppressAccessLog\nreverseproxy.go:106"]
-    R --> G["accesslog middleware: if !GetSuppressAccessLog l.log\nmiddleware.go:95"]
-    R --> U["trackUsage unconditional — bandwidth telemetry preserved"]
-```
-
-**Ingestion** lands as a `ProxyMapping` batch on `handleSyncMappingsStream`/`handleMappingStream`. `processMappings` dispatches to `addMapping`/`modifyMapping`/`removeMapping`; HTTP goes `setupHTTPMapping → updateMapping → protoToMapping`. `protoToMapping` (server.go:2181) is the single translation surface that materialises `[]middleware.Spec`, `*bodytap.Config`, `AgentNetwork`, `DisableAccessLog` onto each `PathTarget`; `updateMapping` finishes with `s.proxy.AddMapping(m)` (atomic swap under `mappingsMux`) and `s.rebuildMiddlewareChains(svcID, m)`.
-
-At **request time** the access-log middleware stamps `CapturedData`; the auth chain runs (Private services lift `peer_group_ids` from `ValidateTunnelPeer` — auth/middleware_test.go:322). `ReverseProxy.ServeHTTP` resolves the chain; nil or empty → original `httputil.ReverseProxy`, no body capture. When a chain matches, body is captured under the global budget, `RunRequest` produces an `UpstreamRewrite` (`llm_router` selects a provider, rewrites scheme/host/path, injects `Authorization`), and `RunResponse`+`RunTerminal` run after the upstream returns. The terminal slot sees the merged metadata bag — that's how `llm_limit_record` ships the consumption sample. The **access-log** addition: `logEntry.AgentNetwork` from `GetAgentNetwork()` onto `proto.AccessLog.AgentNetwork`; the gate at middleware.go:95 honors `EnableLogCollection`, skipping `l.log` but keeping `trackUsage` so bandwidth telemetry survives.
-
-## Public contracts touched
-
- `proxy.Server.MiddlewareDataDir` (string) — base dir for file-backed middleware config (server.go:238-241).
- `proxy.Server.MiddlewareCaptureBudgetBytes` (int64) — process-wide capture cap; defaults to 256 MiB (server.go:248-250).
- `proxy/internal/proxy.WithMiddlewareManager(*middleware.Manager) Option` — new option on `NewReverseProxy`; nil keeps the fast path (reverseproxy.go:48-56).
- `proxy/internal/proxy.PathTarget` adds `Middlewares`, `CaptureConfig`, `AgentNetwork`, `DisableAccessLog` (servicemapping.go:27-51), all zero-default.
- `proxy/internal/proxy.CapturedData` adds `agentNetwork`, `suppressAccessLog`, `userGroupNames` behind `sync.RWMutex`; slices deep-copied (context.go:47-66, 183-258).
- `accesslog.logEntry.AgentNetwork` + `proto.AccessLog.AgentNetwork` (logger.go:131, 268).
- `metrics.Metrics.Meter()` exposes the OTel meter for the middleware manager (metrics.go:53-58).
-
-## Invariants
-
- **Synth-service updates are live (no proxy restart).** Every `MODIFIED` flows through `modifyMapping → cleanupMappingRoutes` (invalidates chains) `→ setupHTTPMapping → updateMapping → rebuildMiddlewareChains`. **ProxyMapping.Private preservation:** the relevant logic lives in `management/internals/shared/grpc/proxy.go:shallowCloneMapping`, not this module, but it surfaces here — if a `MODIFIED` synth service arrives `private=false`, auth skips `ValidateTunnelPeer`, `CapturedData.UserGroups` stays empty, and `llm_router` denies with `llm_policy.no_authorised_provider` until a management restart re-pushes the snapshot. This module assumes `mapping.GetPrivate()` is correct on every batch.
- **`EnableLogCollection=false` suppresses access-log writes but middleware still runs.** Gate is one `if !cd.GetSuppressAccessLog()` immediately around `l.log(entry)` (middleware.go:95); `trackUsage` runs below the gate. Locked by `TestMiddleware_SuppressAccessLog_PreservesUsageTracking` (middleware_test.go:139).
- **`agent_network` flag on access-log entries is set when the chain processed the request.** Source `target.AgentNetwork`, stamped at reverseproxy.go:105, read at accesslog/middleware.go:86.
- **auth → builtin group propagation.** `Protect` writes `UserGroups`/`UserGroupNames`; `buildRequestInput` (reverseproxy.go:333) copies them into `middleware.Input`. The response-leg `respInput` (reverseproxy.go:196-223) also carries `UserEmail`/`UserGroups`/`UserGroupNames` — `llm_limit_record` needs `UserGroups` to ship `group_ids` so management's group-targeted budget rules match (comment at reverseproxy.go:211-215).
- **Empty chains stay on the fast path.** `ServeHTTP` skips body capture and the run sequence when `chain == nil || chain.Empty()` (reverseproxy.go:127).
- **Self-registration is the only way a builtin reaches the registry.** `middleware_register.go` blank-imports each builtin; `init()` adds the factory to `mwbuiltin.DefaultRegistry()`. Missing it → translator drops the entry with a warn (translate.go:97).
-
-## Things to scrutinize
-
-### Correctness
- **Translate edge cases** — drops on nil cfg, empty ID, unknown ID, UNSPECIFIED slot; each logs one warn; volume bounded by `MaxMiddlewaresPerChain`.
- **Re-translate without dropping in-flight requests** — `Manager.Rebuild` is the only call from `rebuildMiddlewareChains`. Reverse proxy reads `ChainFor` once per request (reverseproxy.go:327) and runs the captured `*Chain` for the whole request. Verify in module 30 that `Rebuild` swaps atomically.
- **ProxyMapping.Private preservation** — enforced management-side in `shallowCloneMapping`. Proxy-side regression catches: `TestProtect_PrivateService_TunnelPeerGroupsPropagate` + the integration test.
- **Body-capture cleanup** — `defer releaseBudget()` (reverseproxy.go:145) and `defer capturingWriter.Release()` (reverseproxy.go:180) must run on every return; confirm no future `return` lands between acquisition and defer.
- **`applyUpstreamRewrite` clones the URL** — `cloned := *orig` value-copies `*url.URL`; safe because overwritten fields are strings, not slices/maps (reverseproxy.go:285-292).
-
-### Security
- **Translate validates every config** — registry membership rejects unknown IDs; UNSPECIFIED slot drops; ID-less drops; raw config copied (not aliased) at translate.go:109.
- **`AuthHeader`/`StripHeaders` only reachable via `UpstreamRewrite`** — regular mutation surface goes through the framework denylist (`Authorization`/`Cookie` blocked); only the router middleware can replace `Authorization` (reverseproxy.go:296-304). Confirm in module 30 nothing outside the proxy-trusted path populates `UpstreamRewrite.AuthHeader`.
- **`stampNetBirdIdentity` strips client-sent values first** (reverseproxy.go:742-743) — anti-spoof for `X-NetBird-User`/`X-NetBird-Groups`; control chars filtered; comma-bearing labels dropped (reverseproxy_test.go:1217/:1243/:1193).
- **Auth → group propagation** — `auth/middleware_test.go:322` and `:366` cover the contract. If auth ever stops calling `ValidateTunnelPeer` for Private services, every agent-network request silently denies.
-
-### Concurrency
- **Chain replacement under in-flight requests** — `findTargetForRequest` takes `mappingsMux.RLock`; `AddMapping` writes. `resolveChain` calls `ChainFor` once; even if `Rebuild` swaps mid-request, in-flight requests keep running on the captured pointer.
- **`CapturedData` mutation across slots** — accessors take `sync.RWMutex`; slices deep-copied on both Set and Get. Verify no caller mutates the returned slice expecting it to land back.
- **`Manager.Invalidate` race** — `removeMapping` invalidates after `cleanupMappingRoutes`; mapping read happens before chain resolution, so requests before invalidate run captured chains; later ones fail `findTargetForRequest`.
- **`Logger.log` goroutine** — `logSem` caps at `maxLogWorkers = 4096`; overflow → `dropped.Add(1)` + debug log. Middleware test uses a buffered channel and 150ms negative-assertion window — review whether 150ms holds on slow CI.
-
-### Backward compatibility
- **Non-agent-network services unaffected** — `protoToMapping` reads new fields only when `opts != nil`; defaults leave `Middlewares`/`CaptureConfig` nil → chain resolves nil → fast path. Existing `reverseproxy_test.go` (non-chain) still passes.
- **`disable_access_log` is proto field 13, default false** — every existing target unset; gate is no-op. Locked by `TestMiddleware_SuppressAccessLog_DefaultEmitsLog` (middleware_test.go:104).
- **`Server` additions optional** — 256 MiB default when `MiddlewareCaptureBudgetBytes ≤ 0` (server.go:1997-2000).
-
-### Performance
- **Translate cost per push** — O(n) with per-entry registry lookup and `config_json` copy; negligible vs. the upstream gRPC unmarshal.
- **Empty-chain hot path** — one `ChainFor` map lookup + one `chain.Empty()` check; no allocation delta vs. pre-PR.
- **Body capture buffer churn** — `bodytap.CaptureRequest` allocates `MaxRequestBytes` per chain-hitting request; `releaseBudget` ties allocation to the 256 MiB proxy-wide budget. Confirm in module 30 the budget is a hard cap.
-
-### Observability
- **Metrics** — `Metrics.Meter()` shared with `middleware.NewMetrics` (server.go:1990-1993) so middleware instruments land in the same prometheus exporter. No new metrics defined here.
- **Access-log accuracy** — every entry carries `AgentNetwork`; terminal-slot metadata merged into `CapturedData.Metadata` (reverseproxy.go:238-241).
- **Deny logs at `Infof`** (reverseproxy.go:170) — review whether `Info` is too noisy at high deny rates; consider Debug or rate-limit.
-
-## Test coverage
-
-| Test file | Locks down |
-| --------- | ---------- |
-| proxy/middleware_translate_test.go | Empty/nil → nil; field preservation; unknown ID skip; nil registry permissive; timeout clamping; fail-mode + slot incl. UNSPECIFIED-drop; empty-ID drop; truncation above + at `MaxMiddlewaresPerChain` |
-| proxy/internal/proxy/reverseproxy_test.go | Rewrite host/headers/cookies/query; trusted proxy; path forwarding; classifyProxyError; X-NetBird-User/Groups anti-spoof + CSV-join + control-char/comma rejection + fallback-to-ID; `TestBuildRequestInput_PropagatesIdentityAndGroups` (UserGroups/Email/GroupNames/AgentNetwork reach `middleware.Input`) |
-| proxy/internal/proxy/agent_network_chain_realstack_test.go | **The end-to-end integration test.** Drives a real agent-network request through `ReverseProxy.ServeHTTP` with the chain the synthesizer produces, against an in-process management gRPC (bufconn) backed by a real sqlite store + real `agentnetwork.Manager`, plus an `httptest` upstream — no external infrastructure or real LLM. Guarantees: (1) response-leg `respInput` carries `UserGroups` so `llm_limit_record` ships non-empty `group_ids` and the admin-group consumption row increments; (2) `RedactPii=true` redacts both prompt and completion on captured metadata; (3) the full chain runs against a real management stack. **Line 189-211 inlines the proto→Spec mapping** instead of calling the proxy's private `translateMiddlewareConfig` — keep that inline mirror in sync with `proxy/middleware_translate.go` or the test silently diverges from production. |
-| proxy/internal/accesslog/middleware_test.go | `SuppressAccessLog=true` skips `SendAccessLog` (150ms negative wait); default emits one send (2s positive); usage tracking runs under suppression |
-| proxy/internal/auth/middleware_test.go | `TestProtect_PrivateService_TunnelPeerGroupsPropagate` proves `peer_group_ids` reach `CapturedData.UserGroups`; `TestProtect_PrivateService_TunnelPeerDenied` proves rejected peers 403 without reaching the handler |
-
-The integration test runs in a few seconds with no external infrastructure — exercising the real synthesizer, `Manager.Rebuild`, `ServeHTTP` dispatch, and `llm_limit_record` writing a real consumption row through the real `agentnetwork.Manager` over real gRPC.
-
-## Known limitations / explicit non-goals
-
- **Translator does not validate `RawConfig` JSON** — factory's job at `New([]byte)`. Confirm in module 30 that a per-binding factory failure doesn't poison the rest of the chain.
- **No throttle on management push rate** — every `MODIFIED` triggers `Manager.Rebuild`. Mitigation upstream.
- **Streaming responses (SSE)** — body capture is streaming-aware, but response-leg middleware runs only after the response completes; long SSE streams delay `llm_limit_record` until close.
- **OIDC-only path doesn't carry tunnel-peer groups** — agent-network synth services rely on the Private tunnel-peer path; JWT groups claim is the only carrier for non-Private OIDC.
- **`agent_network` flag on L4 entries** not added; HTTP-only.
- **`mw.capture.bypass_reason` metadata key** documented at reverseproxy.go:151,184; namespace this in module 30/31 to avoid collisions.
-
-## Cross-references
- Upstream: [shared/api](10-shared-api.md), [proxy/middleware-framework](30-proxy-middleware-framework.md), [proxy/middleware-builtin](31-proxy-middleware-builtin.md), [proxy/llm-parsers](32-proxy-llm-parsers.md)
- End-to-end flow: [../01-end-to-end-flows.md](../01-end-to-end-flows.md)
- Top-level: [../00-overview.md](../00-overview.md)
--- a/docs/agent-networks/modules/40-dashboard.md
+++ b/docs/agent-networks/modules/40-dashboard.md
@@ -1,228 +0,0 @@
-# dashboard — UI for agent-networks
-
-This module documents code that lives in the **dashboard repo** (under
-`src/modules/agent-network/` and `src/app/(dashboard)/agent-network/`), not
-in this repo. It is co-located here so backend readers see the full picture.
-
-> **Risk level:** Medium. The new surface is isolated under `src/modules/agent-network/` and `src/app/(dashboard)/agent-network/`, but it also reshapes the sidebar, splits `/peers`, renames `reverse-proxy/clusters` → `self-hosted-proxies`, and overlays the Control Center graph. Regressions here would be cross-cutting.
-> **Backward-compat impact:** Additive on the API side. Breaking on URL/navigation: `/peers` redirects to `/peers/devices` (src/app/(dashboard)/peers/page.tsx:7-15), `/reverse-proxy/clusters` was renamed to `/reverse-proxy/self-hosted-proxies`, the sidebar lost Access Control / Networks / Reverse Proxy / DNS / standalone Guardrails / Consumption / Activity (Navigation.tsx:165-171 — routes still resolve via URL), and the standalone `/agent-network/{access-log,consumption,global-controls}` routes are gone in favor of `/agent-network/observability`.
-
-## Module boundary
-
-The dashboard is the only place an operator interacts with agent-networks: provider catalog, configured providers, policies, guardrails, account-level budget rules, account settings (collection / redaction toggles), per-request access log, and consumption rollups all render, paginate, and edit here. Data flows in via SWR (`useFetchApi`) keyed by REST URL. One big context provider (`src/modules/agent-network/AIProvidersProvider.tsx`) aggregates five resources (providers, policies, guardrails, budget rules, settings) plus the proxy access-log stream filtered to `agent_network=true`, and exposes `add* / update* / toggle* / delete*` mutators that call through `useApiCall` and re-`mutate()` SWR. Pages mount the provider once at the top and compose presentational tables and modals beneath. The control-center page additionally fetches `/agent-network/{providers,policies}` directly (control-center/page.tsx:123-130) to overlay graph nodes.
-
-## What the UI delivers
-
- **AI Observability** page with four tabs: Access Logs, Budget Dashboard,
-  Budget Settings, Log Settings (replaces the standalone access-log,
-  consumption, and global-controls routes).
- **Providers** page: provider catalog + connect/edit wizard with per-vendor
-  copy (LiteLLM, Portkey, Bifrost, Cloudflare, Vercel, OpenRouter, custom).
- **Policies** page: group → provider authorization with per-policy Limits
-  (minute-granular windows) + guardrail attach.
- **Guardrails** page: reusable model-allowlist + prompt-capture sets.
- **Account controls**: Log Collection / Prompt Collection / Redact PII toggles.
- **Budget rules**: account-level rules reusing the policy Limits UI.
- **Control Center overlay**: provider + agent-policy nodes on the graph.
- **Navigation + peers reshaping**: peers split into Devices / Agents,
-  `reverse-proxy/clusters` renamed to `self-hosted-proxies`, sidebar
-  repackaged for agent-network focus.
-
-## Surface added
-
-### New pages
-
-| Route | Purpose | Backing module(s) |
-| ----- | ------- | ----------------- |
-| `/agent-network` | Redirect to `/agent-network/providers` | page.tsx:7-15 |
-| `/agent-network/providers` | List + connect providers; header surfaces per-account base URL | providers/page.tsx + AgentProvidersTable + AIProviderModal |
-| `/agent-network/policies` | Group → Provider authorization with per-policy Limits + Guardrail attach | policies/page.tsx + AgentPoliciesTable + AgentPolicyModal |
-| `/agent-network/guardrails` | Reusable guardrail sets (model allowlist + prompt capture) | guardrails/page.tsx + AgentGuardrailsTable + AgentGuardrailModal |
-| `/agent-network/observability` | Tabs: Access Logs / Budget Dashboard / Budget Settings / Log Settings | observability/page.tsx |
-| `/peers/devices`, `/peers/agents` | Split of `/peers`, shared via `PeersListView` keyed by `kind` | peers/{devices,agents}/page.tsx |
-| `/reverse-proxy/self-hosted-proxies` | Renamed from `clusters` | self-hosted-proxies/page.tsx |
-
-Removed in favor of `/agent-network/observability`: `/agent-network/access-log`, `/agent-network/consumption`, `/agent-network/global-controls`.
-
-### New modules under src/modules/agent-network
-
-| File | Role |
-| ---- | ---- |
-| AIProvidersProvider.tsx (~1158 LOC) | Aggregates every agent-network resource via SWR; normalises snake↔camel; exposes mutators; holds wizard-open state |
-| AIProviderModal.tsx (~1268 LOC) | Connect / edit provider wizard with per-vendor copy (Bifrost, Portkey, LiteLLM, Cloudflare, Vercel, OpenRouter, custom) |
-| AIProviderLogo + useProviderCatalog | Catalog-driven brand swatch + SWR hook over `/agent-network/catalog/providers` |
-| AgentPoliciesTable + AgentPolicyModal + AgentPolicyGuardrailsTab + AgentPolicyLimitsTab | Policies; modal has 3 tabs (Rule, Limits, Guardrails) |
-| AgentGuardrailsTable + AgentGuardrailModal + AgentGuardrailBrowseModal + AgentGuardrailChecksCell | Guardrails CRUD + attach-from-policy |
-| AgentBudgetRulesTable + AgentBudgetRuleModal | Account-level budget rules; modal reuses AgentPolicyLimitsTab verbatim |
-| AgentAccountControlsCard | Three account-wide toggles (Log Collection / Prompt Collection / Redact PII) |
-| AgentAccessLogTable + AgentAccessLogExpandedRow | Access log on `/events/proxy?agent_network=true` |
-| AgentConsumptionPanel + AgentConsumptionTable | Token + cost panel: charts + counter table |
-| table/AgentProvidersTable + AgentProviderActionCell | Providers table + per-row actions |
-| data/mockData.ts | Domain types and a few residual `MOCK_*` constants (see scrutinize) |
-
-### Touched non-agent-network areas
-
- **control-center**: agent-network overlay (provider + agent-policy nodes); removed the All Networks dropdown; hid the Networks tab in FlowSelector (FlowSelector.tsx:9-14 — enum value kept so `?tab=networks` still type-checks); wrapped `ControlCenterView` in `AIProvidersProvider` (page.tsx:73-83); `agentPolicyNode` clicks routed to a separate state slot (page.tsx:1871-1874). New node renderers: nodes/ProviderNode.tsx, nodes/AgentPolicyNode.tsx (registered at utils/nodes.ts:21-22).
- **peers**: Split into Devices and Agents sub-routes; shared via `PeersListView` keyed by `kind` (PeersListView.tsx:24-95). New compact-toolbar `UserFilterSelector` (users/UserFilterSelector.tsx).
- **reverse-proxy**: Folder rename `clusters/` → `self-hosted-proxies/`; deleted `ClustersFeaturesCell.tsx`, `ClusterTypeIndicator.tsx`; new ReverseProxyClusterTargetSelector for cluster target type; Private toggle on target modal; body-capture knobs removed; new ReverseProxyEventExpandedRow.
- **events**: `ReverseProxyEventsUserCell` rewritten with user + peer fallback (ReverseProxyEventsUserCell.tsx:14-21), shared with the access-log table.
- **navigation**: Full repackaging in Navigation.tsx — Agent Network items flattened (no collapsible parent), distinct icons per item; Access Control, Networks, Reverse Proxy, DNS, standalone Guardrails, Consumption, Activity removed (still URL-reachable, per lines 165-171).
-
-## Architecture & flow
-
-### Page → Provider → Table/Modal hierarchy
-
-```mermaid
-graph TD
-  Nav[Navigation.tsx]
-  Nav --> ProvidersPage[/agent-network/providers/]
-  Nav --> PoliciesPage[/agent-network/policies/]
-  Nav --> GuardrailsPage[/agent-network/guardrails/]
-  Nav --> ObsPage[/agent-network/observability/]
-
-  ProvidersPage --> AIPP1[AIProvidersProvider]
-  PoliciesPage --> AIPP2[AIProvidersProvider]
-  GuardrailsPage --> AIPP3[AIProvidersProvider]
-  ObsPage --> AIPP4[AIProvidersProvider]
-  ObsPage -.wraps.-> GroupsProvider
-  ObsPage -.wraps.-> PeersProvider
-
-  AIPP1 --> ProvTable[AgentProvidersTable]
-  ProvTable --> ProvModal[AIProviderModal]
-  AIPP2 --> PolTable[AgentPoliciesTable]
-  PolTable --> PolModal[AgentPolicyModal]
-  PolModal --> PolGuardTab[AgentPolicyGuardrailsTab]
-  PolModal --> PolLimitsTab[AgentPolicyLimitsTab]
-  PolGuardTab --> GuardBrowse[AgentGuardrailBrowseModal]
-  PolGuardTab --> GuardModal[AgentGuardrailModal]
-  AIPP3 --> GuardTable[AgentGuardrailsTable]
-  GuardTable --> GuardModal
-  AIPP4 --> Tabs[Tabs]
-  Tabs --> AccessLog[AgentAccessLogTable]
-  Tabs --> Consumption[AgentConsumptionPanel]
-  Tabs --> BudgetRules[AgentBudgetRulesTable]
-  Tabs --> AccountCtl[AgentAccountControlsCard]
-  BudgetRules --> BudgetModal[AgentBudgetRuleModal]
-  BudgetModal -.reuses.-> PolLimitsTab
-```
-
-### AI Observability tab page
-
-```mermaid
-graph LR
-  Page[AIObservabilityPage] --> RA[RestrictedAccess<br/>permission.services.read]
-  RA --> GP[GroupsProvider]
-  GP --> PP[PeersProvider]
-  PP --> AIP[AIProvidersProvider]
-  AIP --> Tabs[Tabs / TabsList]
-  Tabs --> T1[Access Logs<br/>AgentAccessLogTable]
-  Tabs --> T2[Budget Dashboard<br/>AgentConsumptionPanel]
-  Tabs --> T3[Budget Settings<br/>AgentBudgetRulesTable]
-  Tabs --> T4[Log Settings<br/>AgentAccountControlsCard]
-  T1 -.GET.-> EP[/events/proxy?agent_network=true/]
-  T2 -.GET poll 5s.-> CONS[/agent-network/consumption/]
-  T3 -.GET/PUT.-> BR[/agent-network/budget-rules/]
-  T4 -.GET/PUT.-> ST[/agent-network/settings/]
-```
-
-### Data fetch path
-
-```mermaid
-graph TD
-  Page[Page component] --> Prov[AIProvidersProvider]
-  Prov -->|useFetchApi| SWR[(SWR cache<br/>key = URL)]
-  SWR -.GET.-> P[/agent-network/providers/]
-  SWR -.GET.-> POL[/agent-network/policies/]
-  SWR -.GET.-> G[/agent-network/guardrails/]
-  SWR -.GET.-> BR[/agent-network/budget-rules/]
-  SWR -.GET ignoreError.-> ST[/agent-network/settings/]
-  SWR -.GET.-> CAT[/agent-network/catalog/providers/]
-  SWR -.GET pageSize=100.-> EVT[/events/proxy agent_network=true/]
-  Prov --> Mut[useApiCall.post/put/del]
-  Mut -.on success.-> MutateSWR[SWR mutate keys]
-  Prov --> Children[Tables / Modals via useAIProviders]
-```
-
-Every list view reaches management through SWR over `/api/agent-network/*`. The provider context maps snake-case payloads to camelCase domain types (`fromAPI`, `policyFromAPI`, `guardrailFromAPI`, `budgetRuleFromAPI`, `settingsFromAPI`, `accessLogFromAPI` — AIProvidersProvider.tsx:138-562) and back via matching `*ToRequest` adaptors. The access log piggy-backs on `/events/proxy` with `agent_network=true&page_size=100` (line 707-709) and decodes LLM-specific fields from per-event `metadata`. Group IDs on events are resolved to current names through the surrounding GroupsProvider catalog (lines 515-521, 717-731) — no extra round trip. Mutators run `*ToRequest`, await `useApiCall.post/put/del`, call SWR `mutate()`, then `notify`. Errors caught and surfaced via `notify` — no exceptions escape into render. The Connect Provider modal's open state lives in the provider itself (`isWizardOpen` at lines 732-735) so the providers-page empty-state CTA and the table's + button share one modal. Control-center re-fetches `/agent-network/{providers,policies}` directly on top of `AIProvidersProvider` — SWR de-dupes but the code path is harder to reason about.
-
-## Public contracts consumed
-
- `GET/POST /api/agent-network/providers`, `PUT/DELETE /:id`
- `GET/POST /api/agent-network/policies`, `PUT/DELETE /:id`
- `GET/POST /api/agent-network/guardrails`, `PUT/DELETE /:id`
- `GET/POST /api/agent-network/budget-rules`, `PUT/DELETE /:id`
- `GET/PUT /api/agent-network/settings` (ignoreError-tolerant; 404 = not yet bootstrapped — auto-bootstrap on first provider create via `bootstrap_cluster` field — AIProvidersProvider.tsx:737-760)
- `GET /api/agent-network/catalog/providers` (read-only declarative; backend owns vendor list, IDs, brand colors, models, extra_headers, identity_injection — useProviderCatalog.ts:6-95)
- `GET /api/agent-network/consumption` (polled every 5s on Budget Dashboard — ConsumptionPanel.tsx:53,65-71)
- `GET /api/events/proxy?agent_network=true&page_size=100` (shared with Proxy Events)
- `permission?.services?.read` gates every agent-network route via RestrictedAccess.
-
-`AIProviderId` is a closed union in dashboard types (data/mockData.ts:8-21) but the converter tolerates anything the backend ships — unknown ids fall through to `"custom"` (AIProvidersProvider.tsx:497-506). Catalog values are pure read-through: anything declared in `extra_headers` renders in the modal automatically, copy keyed by header name (`EXTRA_HEADER_UI` in AIProviderModal.tsx:61-89), labeled-fallback for unknown ones.
-
-## Invariants
-
- Provider context wrap order on user-attribution pages: `GroupsProvider > PeersProvider > AIProvidersProvider` (observability/page.tsx:87-89). Reverse it and access-log group resolution silently drops names.
- Every agent-network route checks `permission?.services?.read` via `RestrictedAccess` (observability/page.tsx:85, providers/page.tsx:184, policies/page.tsx:53, guardrails/page.tsx:55).
- Modal `key={open ? 1 : 0}` pattern is used to force unmount/remount on close so internal `useState` resets between edits (AgentBudgetRuleModal.tsx:60, AgentPolicyModal.tsx:66). Removing this would leak prior-row state into a new-row session.
- `mockData.ts` is the canonical home for ALL agent-network domain types; `MOCK_*` constants must never reach a production code path. One leak remains (below).
-
-## Things to scrutinize
-
-### Correctness
-
- **Tab-state URL hand-off is one-way.** observability/page.tsx:53-58 reads `?tab=` on mount (despite the file comment at line 28 saying URL hand-off is future) but `setTab` does NOT push back, so reload preserves the chosen tab only if it came in via the link. Inconsistent with control-center (page.tsx:1817-1831).
- **Provider overlay runs only in `applySingleGroupView` / `applyPeerView`** (control-center/page.tsx:557, 1159-1166). User view does NOT show providers — if agent-network is a primary lens, that's a gap.
- **Two useEffects race to invalidate the control-center layout.** page.tsx:1655-1657 drops `layoutInitialized` when `agentPolicies` / `agentProviders` arrive; the main effect (1786-1799) also lists them as deps. Functional but fragile — watch for flash-of-empty-graph.
- **`updateProvider` / `updatePolicy` / `updateBudgetRule` use `??` on `enabled`** (AIProvidersProvider.tsx:784, 859, 1018). Toggle paths are safe; any caller sending `enabled: false` thinking "leave it off" gets `existing.enabled` instead. Audit modal callers.
- **Form validation in modals is minimal.** Window-seconds picker — mockData.ts:209-215 documents "minimum 60 — one minute" but there is no matching UI guard in PolicyLimitsTab; the backend validator is the enforcement point.
-
-### Security
-
- **No client-side enforcement claims** — every cap, allowlist, and toggle is display + edit; proxy is the source of truth for deny decisions (AccessLogTable.tsx:177-191 renders backend-emitted `denyReason` as-is).
- **Prompt display is gated by what the backend stamps.** When `enable_prompt_collection` is OFF the proxy must not put prompt/completion into event metadata; the dashboard renders whatever it gets verbatim (AccessLogTable lines 532-534, AccessLogExpandedRow.tsx:42-57). No UI filter on top of backend collection switches.
- Account Controls disables `Redact PII` when `Prompt Collection` is off (AgentAccountControlsCard.tsx:122) and clears it on off-transition (line 100), but relies on backend to enforce the same gate at write — confirm PUT handler rejects `redact_pii=true && enable_prompt_collection=false`.
- **Bifrost identity-header overrides**: empty-string vs nil semantics documented in AIProvidersProvider.tsx:772-781 ("omitted = preserve, empty = explicit clear"). Mishandling could leak group attribution to a header the operator thought disabled. Focused read of Bifrost code path in AIProviderModal.tsx recommended.
-
-### Accessibility
-
- Observability TabsList (observability/page.tsx:96-113) uses the shared Tabs component — should inherit Radix roving-tabindex. All four TabsTriggers carry only icon + text, no `aria-label`; fine because text is visible.
- Modal focus traps are inherited from the shared Modal; agent-network modals don't override them. Quick keyboard pass recommended.
- `EndpointBadge` Copy button (providers/page.tsx:66-76) has an `aria-label`, good.
-
-### Performance
-
- `AgentConsumptionPanel` polls `/agent-network/consumption` every 5s (ConsumptionPanel.tsx:53,70). Tab switches unmount the panel, so the poll stops — verify in network panel.
- `AgentAccessLogTable` is hard-capped at 100 rows via `page_size=100` (AIProvidersProvider.tsx:707-709). Server-side pagination is future work; high-traffic tenants miss everything past row 100 — known limitation.
- Observability page mounts providers ONCE at page level (observability/page.tsx:87-89); tab switches keep SWR cache hot. Moving the provider mount inside `TabsContent` would re-fetch the access log on every switch.
-
-### Visual consistency
-
- The observability tab style mirrors peers/page.tsx. Outer Tabs `pt-4 pb-0 mb-0`, TabsList `px-8` (observability/page.tsx:94-96) — confirm chrome height matches so the page doesn't visually jump.
- Sidebar: `Boxes` for Providers, `AccessControlIcon` for Policies, `TelescopeIcon` for AI Observability (Navigation.tsx:113,120,133). Reusing `AccessControlIcon` makes Policies look identical to the (now hidden) Access Control item — if Access Control ever comes back, they collide.
- `AgentNetworkIcon` is used in breadcrumbs on every agent-network page but NOT in the sidebar (per-page icons instead). Deliberate departure — record so it doesn't get reverted.
-
-## Test coverage
-
- **Cypress**: One file (`cypress/e2e/test.cy.ts`) covering only the install-page copy-to-clipboard flow. NOTHING covers agent-network UI.
- **Component / unit tests**: `src/utils/version.test.ts` is the only `.test.*` file in the repo. The agent-network modules ship without component tests.
- Data-cy hooks exist on key controls: `save-account-controls` (AgentAccountControlsCard.tsx:71), `enable-log-collection`, `enable-prompt-collection`, `redact-pii`, plus existing `data-cy={policy.name}` / `data-cy={provider.name}` on ActiveInactiveRow. Sufficient hooks for Cypress flows; none written yet.
- **Tooling gap (pre-existing):** `npm run lint` (`next lint`) is broken in Next 16 — the `lint` subcommand was removed from the Next CLI in 16.x, so the dashboard effectively has no working lint gate. The fix is to add either a flat-config `eslint .` script or wire ESLint via an explicit `eslint-config-next` invocation.
-
-## Known limitations / explicit non-goals
-
- **`data/mockData.ts` still contains `MOCK_GROUPS`, `MOCK_PROVIDERS`, `MOCK_PEERS`.** Only `MOCK_GROUPS` is referenced from production — AgentPoliciesTable.tsx:45,76 uses it as a name-lookup fallback when a policy references a group ID the real GroupsProvider doesn't know about. `MOCK_PROVIDERS` / `MOCK_PEERS` are unreferenced; safe to delete. The file is `/* eslint-disable */` so dead-code warnings don't flag them.
- **Tab-state URL hand-off on observability page is one-way** (read-only).
- **Access log hard-capped at 100 rows**; no server-side pagination.
- **No optimistic updates.** All mutations are round-trip; failures rollback via SWR revalidation.
- **`FlowView.NETWORKS` retained but hidden** from FlowSelector (FlowSelector.tsx:9-14). Old `?tab=networks` links still route to the hidden view because `applyNetworksView` still runs.
- **Redirects are not query-preserving** — `router.replace("/peers/devices")` (peers/page.tsx:13) strips any incoming filter params.
- **Control-center cross-fetches** `/agent-network/{providers,policies}` directly on top of `AIProvidersProvider`. Could be collapsed.
- **Sidebar permanently hides Access Control, Networks, Reverse Proxy, standalone Guardrails, DNS, Activity, Consumption.** Routes still resolve via URL (Navigation.tsx:165-171); intentional.
-
-## Cross-references
-
- Upstream API contracts: [shared/api](10-shared-api.md)
- Backend persistence: [management/store](20-management-store.md)
- Backend handler wiring: [management/handlers + wiring](22-management-handlers-wiring.md)
- End-to-end flow narrative: [../01-end-to-end-flows.md](../01-end-to-end-flows.md)
- Top-level overview: [../00-overview.md](../00-overview.md)
--- a/docs/agent-networks/modules/50-path-routed-providers.md
+++ b/docs/agent-networks/modules/50-path-routed-providers.md
@@ -1,251 +0,0 @@
-# path-routed providers — Vertex AI + Bedrock
-
-This guide pulls the **path-routed** provider story together in one place
-because it crosses the catalog, the synthesiser, the request parser, and the
-router. The relevant building blocks are the `llm_router` /
-`llm_request_parser` middlewares
-([31-proxy-middleware-builtin.md](31-proxy-middleware-builtin.md)), the
-per-provider parser surface ([32-proxy-llm-parsers.md](32-proxy-llm-parsers.md)),
-and the synthesiser's catalog → `ProviderRoute` mapping
-([21-management-agentnetwork.md](21-management-agentnetwork.md)).
-
-Sibling modules: [31-proxy-middleware-builtin.md](31-proxy-middleware-builtin.md)
-(router + request parser) and [32-proxy-llm-parsers.md](32-proxy-llm-parsers.md)
-(Bedrock parser + pricing).
-
---
-
-## What "path-routed" means
-
-Most catalog providers carry the model in the request **body** (`{"model": …}`),
-so `llm_router` selects an upstream by matching the model name against each
-provider's `Models` claim. Two providers instead carry the model in the **URL
-path**, so they are routed by path before the model/vendor table is consulted:
-
-| Catalog id | Style flag | Request path shape |
-|---|---|---|
-| `vertex_ai_api` | `IsVertexPathStyle` → `ProviderRoute.Vertex` | `/v1/projects/{project}/locations/{region}/publishers/{publisher}/models/{model}:{action}` |
-| `bedrock_api` | `IsBedrockPathStyle` → `ProviderRoute.Bedrock` | `/model/{modelId}/{action}` (optionally behind `/bedrock`) |
-
-The catalog declares the style with
-[`catalog.IsVertexPathStyle` / `catalog.IsBedrockPathStyle`](../../../management/server/agentnetwork/catalog/catalog.go)
-and the synthesiser copies the result onto the router route as the `Vertex` /
-`Bedrock` booleans
-([synthesizer.go:450-451](../../../management/server/agentnetwork/synthesizer.go)).
-On the request leg `llm_router.Invoke` dispatches `isVertexPath` / `isBedrockPath`
-**before** the model lookup
-([llm_router/middleware.go:138-216](../../../proxy/internal/middleware/builtin/llm_router/middleware.go))
-so a model the parser extracted from the path can't be claimed by a same-vendor
-*body-routed* provider (e.g. `claude-*` on `api.anthropic.com`).
-
-## Google Vertex AI (`vertex_ai_api`)
-
-### Catalog entry
-
-`KindProvider`, parser surface left unset on the catalog entry — the request
-parser picks the parser from the URL **publisher** segment, not from
-`ParserID`. Upstream host is `<region>-aiplatform.googleapis.com`
-(`https://aiplatform.googleapis.com` for the `global` location). The catalog
-lists the Claude-on-Vertex lineup (`claude-opus-4-*`, `claude-sonnet-4-*`,
-`claude-haiku-4-5`, `claude-fable-5`) at the same per-token rates as the
-first-party Anthropic entry
-([catalog.go:333-363](../../../management/server/agentnetwork/catalog/catalog.go)).
-
-### Credential — service-account OAuth (`keyfile::`)
-
-Vertex does **not** accept a static API key. The operator sets the provider
-`api_key` to:
-
-```
-keyfile::<base64 of the GCP service-account JSON key>
-```
-
-The synthesiser recognises the `keyfile::` prefix in `providerAuthHeader`
-([synthesizer.go:897-903](../../../management/server/agentnetwork/synthesizer.go)),
-emits **no** static auth value, and carries the base64 key material on the
-route as `GCPServiceAccountKeyB64`
-([factory.go:56-61](../../../proxy/internal/middleware/builtin/llm_router/factory.go)).
-At request time the router mints a short-lived OAuth2 access token from the key
-(cloud-platform scope) and injects `Authorization: Bearer <access-token>` —
-never the key itself
-([llm_router/middleware.go:621-692](../../../proxy/internal/middleware/builtin/llm_router/middleware.go)):
-
- One auto-refreshing `oauth2.TokenSource` is cached per key (keyed by a
-  SHA-256 of the base64 material), so token minting happens once and refreshes
-  amortise across requests.
- Mint / refresh is bounded by a 10s timeout HTTP client (`gcpTokenTimeout`) so
-  a slow Google token endpoint can't hang the request.
- A malformed key or an unreachable token endpoint fails the request with
-  `llm_policy.upstream_auth_failed` at HTTP **502** (an upstream problem, not a
-  policy denial) — see `denyUpstreamAuth`.
-
-### Metering — Anthropic-on-Vertex only
-
-The request parser extracts `{publisher, model, action}` from the path
-(`parseVertexPath`, [llm_request_parser/middleware.go:237-263](../../../proxy/internal/middleware/builtin/llm_request_parser/middleware.go)),
-strips the `@version` suffix from the model, and maps the publisher to a parser
-surface via `vertexPublisherVendor`:
-
- `anthropic` → `llm.provider="anthropic"` → metered through the Anthropic
-  parser, priced under the **`anthropic`** block in `defaults_pricing.yaml`
-  (the parser emits the standard Anthropic provider label, so Vertex Claude
-  reuses first-party Anthropic prices).
- `openai` → `llm.provider="openai"` (reserved; not in the catalog lineup
-  today).
- anything else (notably `google` / Gemini) → empty vendor → **no parser**.
-
-**Gemini is intentionally denied as unmeterable.** When the parser emits no
-`llm.provider` for a Vertex publisher, `llm_router` returns
-`llm_policy.unmeterable_publisher` (403) rather than forwarding the request
-uncounted — serving it would bypass token / budget metering
-([llm_router/middleware.go:144-162, 712-728](../../../proxy/internal/middleware/builtin/llm_router/middleware.go)).
-A Gemini parser would lift this restriction; until then the `google` publisher
-is omitted from the catalog.
-
-> Caveat: cross-region inference profiles in `eu` / `apac` carry a ~10% price
-> premium that the base per-token rates do **not** model — cost annotations for
-> those regions read low. Operators who need exact regional billing override
-> the affected entries in `pricing.yaml`.
-
-## AWS Bedrock (`bedrock_api`)
-
-### Catalog entry
-
-`KindProvider`, upstream host `bedrock-runtime.<region>.amazonaws.com`. Metered
-models are the Anthropic-on-Bedrock lineup (`anthropic.claude-*`) plus Amazon
-Nova and Llama 3.3 entries
-([catalog.go:300-332](../../../management/server/agentnetwork/catalog/catalog.go)).
-Anthropic-on-Bedrock reuses the first-party Claude prices (with additive cache
-buckets); Nova / Llama report no cache, so cost is `input + output`.
-
-### Credential — static bearer token
-
-Bedrock uses the **AWS Bedrock API key** as a static bearer. The operator sets
-the provider `api_key` directly (no `keyfile::` prefix); the catalog template
-is `Authorization: Bearer ${API_KEY}`
-([catalog.go:306-307](../../../management/server/agentnetwork/catalog/catalog.go)).
-No token minting — the synthesiser substitutes the key into the template and
-the router injects the resulting `Authorization` header after stripping inbound
-vendor auth (including client-supplied AWS SigV4 material: `X-Amz-Date`,
-`X-Amz-Security-Token`, `X-Amz-Content-Sha256`, see `strippedAuthHeaders`).
-
-### Model id form — cross-region inference profiles
-
-Bedrock model ids in the request path must be the cross-region
-**inference-profile** form, e.g.
-`eu.anthropic.claude-sonnet-4-5-20250929-v1:0`. The bare
-`anthropic.claude-…` id is rejected by AWS. `normalizeBedrockModel`
-([llm_request_parser/middleware.go:398-414](../../../proxy/internal/middleware/builtin/llm_request_parser/middleware.go))
-strips the region prefix (`us.` / `eu.` / `apac.` / `global.`), an optional ARN
-wrapper, and the `-YYYYMMDD-vN[:N]` version/throughput suffix so the normalised
-id (`anthropic.claude-sonnet-4-5`) matches the catalog/pricing key.
-
-### Supported endpoints + actions
-
-`/model/{modelId}/{action}` where action ∈ `invoke`,
-`invoke-with-response-stream`, `converse`, `converse-stream`
-([llm_request_parser/middleware.go:363-390](../../../proxy/internal/middleware/builtin/llm_request_parser/middleware.go)).
-`invoke` / `converse` are non-streaming; the `-stream` actions set the streaming
-flag.
-
- **InvokeModel** body uses the vendor-native shape — for Anthropic that means
-  `"anthropic_version":"bedrock-2023-05-31"` and snake_case usage with additive
-  cache buckets.
- **Converse** uses the unified camelCase shape with a precomputed `totalTokens`.
- The `BedrockParser` reads both shapes on the response leg
-  ([bedrock.go](../../../proxy/internal/llm/bedrock.go)); the request parser
-  doesn't need to distinguish them (`ParseRequest` is a no-op — model + stream
-  come from the path).
-
-### Streaming — AWS binary event-stream
-
-The `-stream` actions return `application/vnd.amazon.eventstream` (the AWS
-binary event-stream framing), and streaming **is metered**.
-`accumulateBedrockStream`
-([llm_response_parser/streaming_bedrock.go](../../../proxy/internal/middleware/builtin/llm_response_parser/streaming_bedrock.go))
-decodes the frames with `aws-sdk-go-v2/aws/protocol/eventstream`:
-
- InvokeModel `chunk` frames wrap a base64 `{"bytes":…}` payload carrying a
-  vendor-native (Anthropic) stream event — folded through the shared Anthropic
-  stream accumulator.
- Converse `contentBlockDelta` frames carry text; the trailing `metadata` frame
-  carries the final usage block.
- A truncated stream (cut at the body-tap capture cap) decodes best-effort:
-  frames up to the cut are applied and partial usage is returned.
-
-### Optional `/bedrock` gateway-namespace prefix
-
-Clients may place an optional `/bedrock` prefix before the native path
-(`/bedrock/model/{modelId}/{action}`) to disambiguate Bedrock from other
-providers that also use `/model/...`. Both the request parser
-(`trimBedrockNamespace`) and the router (`splitBedrockNamespace`) accept it.
-When the prefix is present, the router sets
-`RewriteUpstream.StripPathPrefix = "/bedrock"` so the **native** path
-(`/model/...`) is what reaches `bedrock-runtime.<region>.amazonaws.com`
-([llm_router/middleware.go:168-184, 320-348](../../../proxy/internal/middleware/builtin/llm_router/middleware.go)).
-
-## Model allowlist on path-routed providers
-
-Because the model lives in the URL rather than the body, a path-routed provider
-credential could otherwise be used for any model the upstream supports. The
-router still enforces the route's `Models` allowlist via `matchPathRoute`
-([llm_router/middleware.go:370-416](../../../proxy/internal/middleware/builtin/llm_router/middleware.go)):
-
-1. Filter to routes of the matching style (`Vertex` / `Bedrock`).
-2. Filter to routes whose `AllowedGroupIDs` authorise the caller's groups
-   (else `no_authorised_provider`).
-3. Filter to routes that **claim the requested model**. As with body-routed
-   providers, an **empty `Models` list = catch-all** (serve any model);
-   a non-empty list serves only the listed models (else `model_not_routable`).
-4. Multiple survivors disambiguate by longest `UpstreamPath` prefix match.
-
-So an operator who lists explicit models on a Vertex/Bedrock provider gets a
-hard allowlist; an operator who leaves `Models` empty accepts every model the
-upstream serves (still subject to the unmeterable-publisher gate on Vertex).
-
-Model-less OpenAI endpoints (`GET /v1/models`) are **never** routed to a
-Vertex/Bedrock provider — `matchModelless` skips path-routed routes
-([llm_router/middleware.go:427-462](../../../proxy/internal/middleware/builtin/llm_router/middleware.go))
-so a model-listing call can't be rewritten onto an upstream that would 404 it.
-
-## Catalog ↔ pricing cross-check
-
-Catalog prices and context windows are cross-checked against LiteLLM's
-`model_prices_and_context_window.json`. The proxy's embedded
-`defaults_pricing.yaml` covers **every metered first-party model** the catalog
-enumerates — guarded by
-`TestDefaultTable_FirstPartyModelCoverage`
-([pricing/defaults_coverage_test.go](../../../proxy/internal/llm/pricing/defaults_coverage_test.go)),
-which fails if a catalog model has no embedded price. Bedrock entries are keyed
-by the **normalised** id the request parser emits (region prefix + version
-suffix stripped). Vertex Claude carries no Bedrock-style prefix, so it prices
-straight off the `anthropic` block.
-
-## Things to scrutinise
-
-**Security.** The Vertex service-account key is never forwarded — only a minted
-short-lived bearer. Confirm the key material stays out of access logs (it lives
-on `ProviderRoute.GCPServiceAccountKeyB64`, not in any emitted metadata key).
-The unmeterable-publisher deny is the only thing standing between an
-operator-misconfigured Vertex provider and unmetered Gemini traffic; verify
-`vertexPublisherVendor` stays conservative (deny by default for unknown
-publishers).
-
-**Correctness.** `normalizeBedrockModel` is the join between the wire id and the
-pricing key — a model that normalises to something not in `defaults_pricing.yaml`
-meters at `cost.skipped=unknown_model` rather than failing the request. The
-`/bedrock` prefix strip must run on both the parser side (so the model is
-extracted) and the router side (so the upstream path is native); a regression in
-either silently breaks the other.
-
-**Metering caveats.** eu/apac cross-region Bedrock + Vertex profiles carry a
-~10% premium not modelled by base pricing — flagged in both the catalog comment
-and `defaults_pricing.yaml`. Operators needing exact regional billing override
-the relevant entries.
-
-## Cross-references
-
- Router + request-parser detail: [31-proxy-middleware-builtin.md](31-proxy-middleware-builtin.md)
- Bedrock parser + pricing + SSE / event-stream: [32-proxy-llm-parsers.md](32-proxy-llm-parsers.md)
- Catalog → route synthesis + `keyfile::` handling: [21-management-agentnetwork.md](21-management-agentnetwork.md)
- Overview: [../00-overview.md](../00-overview.md)
--- a/docs/testing-privileged.md
+++ b/docs/testing-privileged.md
@@ -0,0 +1,78 @@
+# Privileged tests
+
+Some tests in this repo need `root` or mutate host network state: they create
+TUN/WireGuard interfaces, open netlink/raw sockets, run eBPF programs, or shell
+out to `ip`/`iptables`/`nft`/`ifconfig`/`route`. Running them on a developer
+machine would require `sudo` and could leave stray interfaces or routes behind.
+
+These tests are gated behind the **`privileged` build tag** so the default test
+run is host-safe.
+
+## Running tests
+
+```bash
+# Host-safe: excludes privileged tests. Runs as a normal user, no sudo.
+make test-unit
+# equivalently:
+go test -tags devcert ./...
+
+# Privileged suite: runs the privileged-tagged tests inside a
+# --privileged --cap-add=NET_ADMIN container (requires Docker).
+make test-privileged
+
+# Narrow the container run to a single test / package:
+PRIV_RUN=TestNftablesManager PRIV_PKGS=./client/firewall/nftables/... make test-privileged
+```
+
+`PRIV_RUN` adds a `-run` test-name filter and `PRIV_PKGS` overrides the package
+list; both are optional and default to the full privileged suite.
+
+`make test-privileged` invokes the `ory/dockertest` harness in
+`client/testutil/privileged/`. The harness:
+
+1. Skips immediately when it detects it is already inside the container
+   (`DOCKER_CI=true`), so the privileged tests run in place instead of recursing.
+2. Otherwise spins up a `golang:1.25-alpine` container (matching CI),
+   bind-mounts the repo and the host Go build/module caches, installs the
+   required packages, and runs `go test -tags 'devcert privileged'` over the
+   client packages.
+3. Streams the container's output to the test log and fails if the suite fails.
+
+## Adding a privileged test
+
+A test is privileged if it does any of:
+
+- creates a real interface via `iface.NewWGIFace(...).Create()`,
+- opens a netlink or raw socket that hard-fails without `CAP_NET_ADMIN`,
+- runs an eBPF program (`ebpf.*.Listen()`),
+- shells out to `ip`, `iptables`, `nft`, `ifconfig`, or `route` to change state.
+
+Add the tag to the **top** of the file, combined with any existing platform
+constraint:
+
+```go
+//go:build privileged && linux
+
+package foo
+```
+
+If a file mixes privileged and pure-logic tests, **split it**: keep the pure
+tests (and any shared data — type/var declarations, table-driven `testCases`,
+helper interfaces) in an untagged file, and move the privileged tests into a
+`*_privileged_test.go` file with the tag. Shared declarations must stay untagged,
+otherwise the unprivileged files in the package will not compile.
+
+Always verify both build modes compile on every target platform:
+
+```bash
+go vet -tags devcert ./...
+go vet -tags 'devcert privileged' ./...
+```
+
+## CI
+
+- The `Client / Unit` job runs `go test -tags devcert` with **no** `sudo` — only
+  host-safe tests.
+- The `Client (Docker) / Unit` job runs `go test -tags 'devcert privileged'`
+  inside a `--privileged --cap-add=NET_ADMIN` container, which is where the
+  privileged tests actually execute.
--- a/e2e/agentnetwork/bootstrap_test.go
+++ b/e2e/agentnetwork/bootstrap_test.go
@@ -1,30 +0,0 @@
-//go:build e2e
-
-package agentnetwork
-
-import (
-	"context"
-	"testing"
-
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-)
-
-// TestCombinedBootstrap proves Pillar 1: the shared combined server came up and
-// the /api/setup-minted PAT authenticates a real management API call through
-// the typed REST client (the bootstrap itself ran in TestMain).
-func TestCombinedBootstrap(t *testing.T) {
-	ctx := context.Background()
-
-	require.NotEmpty(t, srv.PAT, "TestMain must have minted an admin PAT")
-
-	users, err := srv.API().Users.List(ctx)
-	require.NoError(t, err, "authenticated Users.List must round-trip")
-	require.NotEmpty(t, users, "the bootstrapped account must have at least one user")
-
-	var emails []string
-	for _, u := range users {
-		emails = append(emails, u.Email)
-	}
-	assert.Contains(t, emails, "admin@netbird.test", "the bootstrapped owner should appear in the users list")
-}
--- a/e2e/agentnetwork/chat_test.go
+++ b/e2e/agentnetwork/chat_test.go
@@ -1,165 +0,0 @@
-//go:build e2e
-
-package agentnetwork
-
-import (
-	"context"
-	"fmt"
-	"os"
-	"strings"
-	"testing"
-	"time"
-
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-
-	"github.com/netbirdio/netbird/e2e/harness"
-	"github.com/netbirdio/netbird/shared/management/http/api"
-)
-
-// TestChatCompletionThroughProxy is Pillar 3: it provisions an agent-network
-// gateway (provider + policy + setup key), runs the proxy and a client
-// container on the shared network, and drives a real chat-completion from the
-// client through the proxy to the upstream provider over the WireGuard tunnel,
-// asserting a 200 and that usage is recorded.
-//
-// Requires a real provider key in OPENAI_TOKEN (source ~/.llm-keys locally; set
-// the Actions secret in CI). Skips otherwise.
-func TestChatCompletionThroughProxy(t *testing.T) {
-	apiKey := os.Getenv("OPENAI_TOKEN")
-	if apiKey == "" {
-		t.Skip("OPENAI_TOKEN not set; source ~/.llm-keys to run the live chat test")
-	}
-
-	ctx, cancel := context.WithTimeout(context.Background(), 12*time.Minute)
-	defer cancel()
-
-	// Group + setup key: the client joins into this group; the policy authorizes
-	// it to reach the provider.
-	grp, err := srv.API().Groups.Create(ctx, api.PostApiGroupsJSONRequestBody{Name: "e2e-agents"})
-	require.NoError(t, err, "create agents group")
-	t.Cleanup(func() { _ = srv.API().Groups.Delete(context.Background(), grp.Id) })
-
-	ephemeral := false
-	sk, err := srv.API().SetupKeys.Create(ctx, api.PostApiSetupKeysJSONRequestBody{
-		Name:       "e2e-client",
-		Type:       "reusable",
-		ExpiresIn:  86400,
-		UsageLimit: 0,
-		AutoGroups: []string{grp.Id},
-		Ephemeral:  &ephemeral,
-	})
-	require.NoError(t, err, "mint setup key")
-	require.NotEmpty(t, sk.Key, "setup key must be returned in plaintext")
-
-	// Provider (real upstream key) + policy authorizing the group. Created
-	// before the proxy starts so the proxy's initial cluster snapshot already
-	// carries the account's synthesized service.
-	prov, err := srv.CreateProvider(ctx, api.AgentNetworkProviderRequest{
-		Name:             "OpenAI Live",
-		ProviderId:       "openai_api",
-		UpstreamUrl:      "https://api.openai.com",
-		ApiKey:           &apiKey,
-		Enabled:          ptr(true),
-		BootstrapCluster: ptr(harness.AgentNetworkCluster),
-		Models: &[]api.AgentNetworkProviderModel{
-			{Id: "gpt-4o-mini", InputPer1k: 0.00015, OutputPer1k: 0.0006},
-		},
-	})
-	require.NoError(t, err, "create provider")
-	t.Cleanup(func() { _ = srv.DeleteProvider(context.Background(), prov.Id) })
-
-	enabled := true
-	policyReq := api.AgentNetworkPolicyRequest{
-		Name:                   "e2e-allow",
-		Enabled:                &enabled,
-		SourceGroups:           []string{grp.Id},
-		DestinationProviderIds: []string{prov.Id},
-	}
-	pol, err := srv.CreatePolicy(ctx, policyReq)
-	require.NoError(t, err, "create policy")
-	t.Cleanup(func() { _ = srv.DeletePolicy(context.Background(), pol.Id) })
-
-	settings, err := srv.GetSettings(ctx)
-	require.NoError(t, err, "read settings for endpoint")
-	require.NotEmpty(t, settings.Endpoint, "agent-network endpoint must be assigned")
-
-	// Mint the proxy token via the server CLI (global, account-less) — the path
-	// the manual install uses, which drives the cluster-snapshot synthesis the
-	// proxy needs. An account-scoped REST token takes a different path that
-	// doesn't deliver the service.
-	proxyToken, err := srv.CreateProxyTokenCLI(ctx, "e2e-proxy")
-	require.NoError(t, err, "mint proxy token via CLI")
-
-	px, err := harness.StartProxy(ctx, srv, proxyToken)
-	require.NoError(t, err, "start proxy")
-	t.Cleanup(func() { _ = px.Terminate(context.Background()) })
-
-	// Client joins last, once the proxy + provider + policy are all in place, so
-	// its initial network map includes the synthesized agent-network service.
-	cl, err := harness.StartClient(ctx, srv, sk.Key)
-	require.NoError(t, err, "start client")
-	t.Cleanup(func() { _ = cl.Terminate(context.Background()) })
-
-	require.NoError(t, cl.WaitConnected(ctx, 90*time.Second), "client must connect to management")
-
-	if err := cl.WaitProxyPeer(ctx, 180*time.Second); err != nil {
-		dctx := context.Background()
-		peers, _ := srv.API().Peers.List(dctx)
-		var peerInfo []string
-		for _, p := range peers {
-			var groups []string
-			for _, g := range p.Groups {
-				groups = append(groups, g.Name)
-			}
-			peerInfo = append(peerInfo, fmt.Sprintf("%s connected=%t ip=%s groups=%v", p.Name, p.Connected, p.Ip, groups))
-		}
-		clusters, _ := srv.API().ReverseProxyClusters.List(dctx)
-		var clusterInfo []string
-		for _, cl := range clusters {
-			clusterInfo = append(clusterInfo, fmt.Sprintf("%+v", cl))
-		}
-		domains, _ := srv.API().ReverseProxyDomains.List(dctx)
-		var domainInfo []string
-		for _, d := range domains {
-			domainInfo = append(domainInfo, fmt.Sprintf("%+v", d))
-		}
-		_ = os.WriteFile("/tmp/nb-e2e-proxy.log", []byte(px.Logs(dctx)), 0o644)
-		_ = os.WriteFile("/tmp/nb-e2e-client.log", []byte(cl.Logs(dctx)), 0o644)
-		_ = os.WriteFile("/tmp/nb-e2e-combined.log", []byte(srv.Logs(dctx)), 0o644)
-		diag := fmt.Sprintf("settings: cluster=%q endpoint=%q subdomain=%q\nprovider: id=%s cluster=%s\npolicy: id=%s sourceGroups=%v dst=%v\ngroup: id=%s\npeers:\n%s\nclusters:\n%s\n",
-			settings.Cluster, settings.Endpoint, settings.Subdomain,
-			prov.Id, harness.AgentNetworkCluster,
-			pol.Id, policyReq.SourceGroups, policyReq.DestinationProviderIds,
-			grp.Id,
-			strings.Join(peerInfo, "\n"), strings.Join(clusterInfo, "\n"))
-		diag += "domains:\n" + strings.Join(domainInfo, "\n") + "\n"
-		_ = os.WriteFile("/tmp/nb-e2e-diag.txt", []byte(diag), 0o644)
-		t.Fatalf("client did not see the proxy peer: %v\n=== settings ===\ncluster=%q endpoint=%q subdomain=%q\n=== peers ===\n%v\n=== clusters ===\n%v\n=== proxy logs ===\n%s",
-			err, settings.Cluster, settings.Endpoint, settings.Subdomain, peerInfo, clusterInfo, px.Logs(dctx))
-	}
-
-	proxyIP, err := cl.ResolveProxyIP(ctx, settings.Endpoint)
-	require.NoError(t, err, "resolve agent-network endpoint to proxy IP")
-
-	code, body, err := cl.Chat(ctx, settings.Endpoint, proxyIP, "gpt-4o-mini", "Reply with exactly: pong")
-	require.NoError(t, err, "chat request through tunnel")
-	if code != 200 {
-		t.Fatalf("expected 200 from chat-completion, got %d\nbody: %s\n=== proxy logs ===\n%s", code, body, px.Logs(context.Background()))
-	}
-	assert.Contains(t, body, "choices", "chat response should carry choices")
-
-	// The per-request access-log row is ingested asynchronously after the
-	// response is forwarded; poll briefly. (Consumption rows are only booked
-	// when a policy has token/budget limits, which this one doesn't.)
-	require.Eventually(t, func() bool {
-		resp, lerr := srv.ListAccessLogs(ctx)
-		return lerr == nil && resp.TotalRecords > 0
-	}, 30*time.Second, 2*time.Second, "an access-log row should be recorded after the chat-completion")
-
-	logs, err := srv.ListAccessLogs(ctx)
-	require.NoError(t, err, "read access logs")
-	require.NotEmpty(t, logs.Data, "access-log page must contain the request row")
-	require.NotNil(t, logs.Data[0].Model, "access-log row should record the model")
-	assert.Equal(t, "gpt-4o-mini", *logs.Data[0].Model, "access-log row should record the requested model")
-}
--- a/e2e/agentnetwork/main_test.go
+++ b/e2e/agentnetwork/main_test.go
@@ -1,46 +0,0 @@
-//go:build e2e
-
-// Package agentnetwork holds the container-based agent-network e2e suite. A
-// single combined server is built and bootstrapped once per package run
-// (TestMain) and shared across tests via srv; each test creates and cleans up
-// its own resources so order doesn't matter.
-package agentnetwork
-
-import (
-	"context"
-	"fmt"
-	"os"
-	"testing"
-	"time"
-
-	"github.com/netbirdio/netbird/e2e/harness"
-)
-
-// srv is the shared combined server for the package, ready (PAT-authenticated)
-// by the time any Test runs.
-var srv *harness.Combined
-
-func TestMain(m *testing.M) {
-	os.Exit(run(m))
-}
-
-func run(m *testing.M) int {
-	// Generous timeout to cover a cold image build on first run.
-	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Minute)
-	defer cancel()
-
-	var err error
-	srv, err = harness.StartCombined(ctx)
-	if err != nil {
-		fmt.Fprintf(os.Stderr, "e2e: start combined server: %v\n", err)
-		return 1
-	}
-	defer func() { _ = srv.Terminate(context.Background()) }()
-
-	if _, err := srv.Bootstrap(ctx); err != nil {
-		fmt.Fprintf(os.Stderr, "e2e: bootstrap admin PAT: %v\n", err)
-		return 1
-	}
-
-	return m.Run()
-}
--- a/e2e/agentnetwork/management_test.go
+++ b/e2e/agentnetwork/management_test.go
@@ -1,169 +0,0 @@
-//go:build e2e
-
-package agentnetwork
-
-import (
-	"context"
-	"testing"
-
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-
-	"github.com/netbirdio/netbird/shared/management/client/rest"
-	"github.com/netbirdio/netbird/shared/management/http/api"
-)
-
-func ptr[T any](v T) *T { return &v }
-
-// newProvider creates an OpenAI-catalog provider with a dummy key (these tests
-// never call the upstream) and registers cleanup.
-func newProvider(t *testing.T, ctx context.Context, name string) api.AgentNetworkProvider {
-	t.Helper()
-	prov, err := srv.CreateProvider(ctx, api.AgentNetworkProviderRequest{
-		Name:             name,
-		ProviderId:       "openai_api",
-		UpstreamUrl:      "https://api.openai.com",
-		ApiKey:           ptr("sk-dummy-e2e-key"),
-		BootstrapCluster: ptr("eu.proxy.netbird.test"),
-	})
-	require.NoError(t, err, "create provider %q", name)
-	t.Cleanup(func() { _ = srv.DeleteProvider(context.Background(), prov.Id) })
-	return prov
-}
-
-// requireClientError asserts err is a REST APIError with a 4xx status.
-func requireClientError(t *testing.T, err error) {
-	t.Helper()
-	var apiErr *rest.APIError
-	require.ErrorAs(t, err, &apiErr, "expected a REST APIError")
-	assert.GreaterOrEqual(t, apiErr.StatusCode, 400, "expected a 4xx status")
-	assert.Less(t, apiErr.StatusCode, 500, "expected a 4xx status")
-}
-
-// TestProviderLifecycle covers create → get → list → delete → 404.
-func TestProviderLifecycle(t *testing.T) {
-	ctx := context.Background()
-
-	prov := newProvider(t, ctx, "Provider Lifecycle")
-	assert.NotEmpty(t, prov.Id, "created provider must have an id")
-	assert.Equal(t, "openai_api", prov.ProviderId)
-
-	got, err := srv.GetProvider(ctx, prov.Id)
-	require.NoError(t, err, "get provider")
-	assert.Equal(t, prov.Id, got.Id)
-
-	list, err := srv.ListProviders(ctx)
-	require.NoError(t, err, "list providers")
-	var ids []string
-	for _, p := range list {
-		ids = append(ids, p.Id)
-	}
-	assert.Contains(t, ids, prov.Id, "created provider must appear in the list")
-
-	require.NoError(t, srv.DeleteProvider(ctx, prov.Id), "delete provider")
-	_, err = srv.GetProvider(ctx, prov.Id)
-	requireClientError(t, err)
-}
-
-// TestProviderValidation rejects a missing API key and an unknown catalog id.
-func TestProviderValidation(t *testing.T) {
-	ctx := context.Background()
-
-	_, err := srv.CreateProvider(ctx, api.AgentNetworkProviderRequest{
-		Name:        "No Key",
-		ProviderId:  "openai_api",
-		UpstreamUrl: "https://api.openai.com",
-	})
-	requireClientError(t, err)
-
-	_, err = srv.CreateProvider(ctx, api.AgentNetworkProviderRequest{
-		Name:        "Unknown Catalog",
-		ProviderId:  "totally_unknown_provider",
-		UpstreamUrl: "https://example.com",
-		ApiKey:      ptr("sk-dummy"),
-	})
-	requireClientError(t, err)
-}
-
-// TestSettingsRoundTrip flips the collection toggles and confirms cluster /
-// subdomain stay immutable, then restores the original state.
-func TestSettingsRoundTrip(t *testing.T) {
-	ctx := context.Background()
-
-	// Settings are bootstrapped on first provider create.
-	newProvider(t, ctx, "Settings Bootstrap")
-
-	before, err := srv.GetSettings(ctx)
-	require.NoError(t, err, "get settings")
-	require.NotEmpty(t, before.Cluster, "settings must carry an assigned cluster")
-
-	flipped, err := srv.UpdateSettings(ctx, api.AgentNetworkSettingsRequest{
-		EnableLogCollection:    !before.EnableLogCollection,
-		EnablePromptCollection: !before.EnablePromptCollection,
-		RedactPii:              !before.RedactPii,
-	})
-	require.NoError(t, err, "update settings")
-	assert.Equal(t, !before.EnableLogCollection, flipped.EnableLogCollection, "log collection toggle must flip")
-	assert.Equal(t, !before.EnablePromptCollection, flipped.EnablePromptCollection, "prompt collection toggle must flip")
-	assert.Equal(t, before.Cluster, flipped.Cluster, "cluster must be immutable across updates")
-	assert.Equal(t, before.Subdomain, flipped.Subdomain, "subdomain must be immutable across updates")
-
-	// Restore the original toggles.
-	_, err = srv.UpdateSettings(ctx, api.AgentNetworkSettingsRequest{
-		EnableLogCollection:    before.EnableLogCollection,
-		EnablePromptCollection: before.EnablePromptCollection,
-		RedactPii:              before.RedactPii,
-	})
-	require.NoError(t, err, "restore settings")
-}
-
-// TestPolicyWindowFloor rejects an enabled limit below the 60s window floor and
-// accepts one at the floor.
-func TestPolicyWindowFloor(t *testing.T) {
-	ctx := context.Background()
-
-	grp, err := srv.API().Groups.Create(ctx, api.PostApiGroupsJSONRequestBody{Name: "e2e-policy-grp"})
-	require.NoError(t, err, "create source group")
-	t.Cleanup(func() { _ = srv.API().Groups.Delete(context.Background(), grp.Id) })
-
-	prov := newProvider(t, ctx, "Policy Provider")
-
-	limits := func(window int64) *api.AgentNetworkPolicyLimits {
-		return &api.AgentNetworkPolicyLimits{
-			TokenLimit: api.AgentNetworkPolicyTokenLimit{
-				Enabled:       true,
-				GroupCap:      1000,
-				UserCap:       1000,
-				WindowSeconds: window,
-			},
-		}
-	}
-
-	_, err = srv.CreatePolicy(ctx, api.AgentNetworkPolicyRequest{
-		Name:                   "e2e-below-floor",
-		SourceGroups:           []string{grp.Id},
-		DestinationProviderIds: []string{prov.Id},
-		Limits:                 limits(30),
-	})
-	requireClientError(t, err)
-
-	pol, err := srv.CreatePolicy(ctx, api.AgentNetworkPolicyRequest{
-		Name:                   "e2e-at-floor",
-		SourceGroups:           []string{grp.Id},
-		DestinationProviderIds: []string{prov.Id},
-		Limits:                 limits(60),
-	})
-	require.NoError(t, err, "policy at the 60s floor must be accepted")
-	assert.NotEmpty(t, pol.Id, "created policy must have an id")
-	t.Cleanup(func() { _ = srv.DeletePolicy(context.Background(), pol.Id) })
-}
-
-// TestConsumptionList confirms the read endpoint always returns an array, never
-// a 404/500.
-func TestConsumptionList(t *testing.T) {
-	ctx := context.Background()
-
-	rows, err := srv.ListConsumption(ctx)
-	require.NoError(t, err, "consumption list must not error")
-	assert.NotNil(t, rows, "consumption must be a JSON array (possibly empty)")
-}
--- a/e2e/harness/Dockerfile.client
+++ b/e2e/harness/Dockerfile.client
@@ -1,24 +0,0 @@
-# Multistage build for the NetBird client used in e2e tests. The repo has no
-# source-building client Dockerfile (client/Dockerfile packages a goreleaser
-# artifact), so this mirrors its alpine runtime + entrypoint while compiling the
-# CGO-free client inline. BuildKit cache mounts keep rebuilds incremental.
-
-FROM golang:1.25-bookworm AS builder
-WORKDIR /src
-COPY go.mod go.sum ./
-RUN --mount=type=cache,target=/go/pkg/mod go mod download
-COPY . .
-RUN --mount=type=cache,target=/go/pkg/mod \
-    --mount=type=cache,target=/root/.cache/go-build \
-    CGO_ENABLED=0 GOOS=linux go build -o /out/netbird ./client
-
-FROM alpine:3.24
-RUN apk add --no-cache bash ca-certificates ip6tables iproute2 iptables
-ENV NETBIRD_BIN="/usr/local/bin/netbird" \
-    NB_LOG_FILE="console,/var/log/netbird/client.log" \
-    NB_DAEMON_ADDR="unix:///var/run/netbird.sock" \
-    NB_ENABLE_CAPTURE="false" \
-    NB_ENTRYPOINT_SERVICE_TIMEOUT="30"
-ENTRYPOINT [ "/usr/local/bin/netbird-entrypoint.sh" ]
-COPY client/netbird-entrypoint.sh /usr/local/bin/netbird-entrypoint.sh
-COPY --from=builder /out/netbird /usr/local/bin/netbird
--- a/e2e/harness/agentnetwork.go
+++ b/e2e/harness/agentnetwork.go
@@ -1,112 +0,0 @@
-//go:build e2e
-
-package harness
-
-import (
-	"bytes"
-	"context"
-	"encoding/json"
-	"fmt"
-	"io"
-	"net/http"
-
-	"github.com/netbirdio/netbird/shared/management/http/api"
-)
-
-// The shared REST client doesn't (yet) expose typed agent-network methods, so
-// these helpers drive the /api/agent-network/* endpoints through the client's
-// NewRequest primitive — reusing its auth, error handling (rest.APIError on
-// non-2xx), and transport — while still speaking the generated api types.
-
-// anRequest issues an agent-network API call and decodes the JSON response into
-// T. A non-2xx response surfaces as a *rest.APIError from the client, which
-// tests inspect for negative-path status assertions.
-func anRequest[T any](ctx context.Context, c *Combined, method, path string, body any) (T, error) {
-	var out T
-	var reader io.Reader
-	if body != nil {
-		bs, err := json.Marshal(body)
-		if err != nil {
-			return out, fmt.Errorf("marshal %s %s: %w", method, path, err)
-		}
-		reader = bytes.NewReader(bs)
-	}
-
-	resp, err := c.api.NewRequest(ctx, method, path, reader, nil)
-	if err != nil {
-		return out, err
-	}
-	defer resp.Body.Close()
-
-	if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
-		return out, fmt.Errorf("decode %s %s response: %w", method, path, err)
-	}
-	return out, nil
-}
-
-// anDelete issues a DELETE and discards the (empty-object) body.
-func anDelete(ctx context.Context, c *Combined, path string) error {
-	resp, err := c.api.NewRequest(ctx, http.MethodDelete, path, nil, nil)
-	if err != nil {
-		return err
-	}
-	resp.Body.Close()
-	return nil
-}
-
-// CreateProvider creates an agent-network provider.
-func (c *Combined) CreateProvider(ctx context.Context, req api.AgentNetworkProviderRequest) (api.AgentNetworkProvider, error) {
-	return anRequest[api.AgentNetworkProvider](ctx, c, http.MethodPost, "/api/agent-network/providers", req)
-}
-
-// GetProvider fetches a provider by id.
-func (c *Combined) GetProvider(ctx context.Context, id string) (api.AgentNetworkProvider, error) {
-	return anRequest[api.AgentNetworkProvider](ctx, c, http.MethodGet, "/api/agent-network/providers/"+id, nil)
-}
-
-// ListProviders returns all providers for the account.
-func (c *Combined) ListProviders(ctx context.Context) ([]api.AgentNetworkProvider, error) {
-	return anRequest[[]api.AgentNetworkProvider](ctx, c, http.MethodGet, "/api/agent-network/providers", nil)
-}
-
-// DeleteProvider removes a provider by id.
-func (c *Combined) DeleteProvider(ctx context.Context, id string) error {
-	return anDelete(ctx, c, "/api/agent-network/providers/"+id)
-}
-
-// CreatePolicy creates an agent-network policy.
-func (c *Combined) CreatePolicy(ctx context.Context, req api.AgentNetworkPolicyRequest) (api.AgentNetworkPolicy, error) {
-	return anRequest[api.AgentNetworkPolicy](ctx, c, http.MethodPost, "/api/agent-network/policies", req)
-}
-
-// UpdatePolicy replaces a policy by id.
-func (c *Combined) UpdatePolicy(ctx context.Context, id string, req api.AgentNetworkPolicyRequest) (api.AgentNetworkPolicy, error) {
-	return anRequest[api.AgentNetworkPolicy](ctx, c, http.MethodPut, "/api/agent-network/policies/"+id, req)
-}
-
-// DeletePolicy removes a policy by id.
-func (c *Combined) DeletePolicy(ctx context.Context, id string) error {
-	return anDelete(ctx, c, "/api/agent-network/policies/"+id)
-}
-
-// GetSettings returns the account's agent-network settings row. It exists only
-// after the first provider create bootstraps it.
-func (c *Combined) GetSettings(ctx context.Context) (api.AgentNetworkSettings, error) {
-	return anRequest[api.AgentNetworkSettings](ctx, c, http.MethodGet, "/api/agent-network/settings", nil)
-}
-
-// UpdateSettings applies the mutable collection toggles.
-func (c *Combined) UpdateSettings(ctx context.Context, req api.AgentNetworkSettingsRequest) (api.AgentNetworkSettings, error) {
-	return anRequest[api.AgentNetworkSettings](ctx, c, http.MethodPut, "/api/agent-network/settings", req)
-}
-
-// ListConsumption returns the account's consumption rows (possibly empty).
-func (c *Combined) ListConsumption(ctx context.Context) ([]api.AgentNetworkConsumption, error) {
-	return anRequest[[]api.AgentNetworkConsumption](ctx, c, http.MethodGet, "/api/agent-network/consumption", nil)
-}
-
-// ListAccessLogs returns the account's agent-network access-log page (the
-// flattened per-request rows the proxy ships and management ingests).
-func (c *Combined) ListAccessLogs(ctx context.Context) (api.AgentNetworkAccessLogsResponse, error) {
-	return anRequest[api.AgentNetworkAccessLogsResponse](ctx, c, http.MethodGet, "/api/agent-network/access-logs", nil)
-}
--- a/e2e/harness/bootstrap.go
+++ b/e2e/harness/bootstrap.go
@@ -1,47 +0,0 @@
-//go:build e2e
-
-package harness
-
-import (
-	"context"
-	"fmt"
-
-	"github.com/netbirdio/netbird/shared/management/client/rest"
-	"github.com/netbirdio/netbird/shared/management/http/api"
-)
-
-// Bootstrap creates the initial admin owner through the unauthenticated
-// /api/setup endpoint and returns the plaintext admin PAT. It also wires an
-// authenticated REST client on the Combined (see API). create_pat requires the
-// server to run with NB_SETUP_PAT_ENABLED=true, which the harness sets. A
-// second call returns an error (the server reports setup already completed).
-func (c *Combined) Bootstrap(ctx context.Context) (string, error) {
-	// The setup endpoint is unauthenticated; use a tokenless client.
-	setupClient := rest.NewWithOptions(rest.WithManagementURL(c.BaseURL))
-
-	createPAT := true
-	expireDays := 1
-	resp, err := setupClient.Instance.Setup(ctx, api.PostApiSetupJSONRequestBody{ //nolint:gosec // static throwaway test credentials
-		Email:       "admin@netbird.test",
-		Password:    "Netbird-e2e-Passw0rd!",
-		Name:        "E2E Admin",
-		CreatePat:   &createPAT,
-		PatExpireIn: &expireDays,
-	})
-	if err != nil {
-		return "", fmt.Errorf("instance setup: %w", err)
-	}
-	if resp.PersonalAccessToken == nil || *resp.PersonalAccessToken == "" {
-		return "", fmt.Errorf("setup succeeded but no PAT returned (is NB_SETUP_PAT_ENABLED set?)")
-	}
-
-	c.PAT = *resp.PersonalAccessToken
-	c.api = rest.New(c.BaseURL, c.PAT)
-	return c.PAT, nil
-}
-
-// API returns the PAT-authenticated management REST client. It is nil until
-// Bootstrap runs.
-func (c *Combined) API() *rest.Client {
-	return c.api
-}
--- a/e2e/harness/cert.go
+++ b/e2e/harness/cert.go
@@ -1,64 +0,0 @@
-//go:build e2e
-
-package harness
-
-import (
-	"crypto/ecdsa"
-	"crypto/elliptic"
-	"crypto/rand"
-	"crypto/x509"
-	"crypto/x509/pkix"
-	"encoding/pem"
-	"fmt"
-	"math/big"
-	"os"
-	"path/filepath"
-	"time"
-)
-
-// writeSelfSignedCert generates a self-signed TLS cert/key pair covering the
-// given DNS names and writes them as tls.crt / tls.key in dir. The proxy serves
-// this for the agent-network endpoint; the client curls with -k, so validity
-// chains don't matter — the proxy just needs a usable cert to present.
-func writeSelfSignedCert(dir string, dnsNames []string) error {
-	priv, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
-	if err != nil {
-		return fmt.Errorf("generate key: %w", err)
-	}
-
-	serial, err := rand.Int(rand.Reader, new(big.Int).Lsh(big.NewInt(1), 128))
-	if err != nil {
-		return fmt.Errorf("generate serial: %w", err)
-	}
-
-	tmpl := x509.Certificate{
-		SerialNumber:          serial,
-		Subject:               pkix.Name{CommonName: dnsNames[0]},
-		NotBefore:             time.Now().Add(-time.Hour),
-		NotAfter:              time.Now().Add(365 * 24 * time.Hour),
-		KeyUsage:              x509.KeyUsageDigitalSignature,
-		ExtKeyUsage:           []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},
-		DNSNames:              dnsNames,
-		BasicConstraintsValid: true,
-	}
-
-	der, err := x509.CreateCertificate(rand.Reader, &tmpl, &tmpl, &priv.PublicKey, priv)
-	if err != nil {
-		return fmt.Errorf("create certificate: %w", err)
-	}
-
-	certPEM := pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: der})
-	if err := os.WriteFile(filepath.Join(dir, "tls.crt"), certPEM, 0o644); err != nil { //nolint:gosec // public cert, bind-mounted and read by the proxy container
-		return fmt.Errorf("write cert: %w", err)
-	}
-
-	keyDER, err := x509.MarshalECPrivateKey(priv)
-	if err != nil {
-		return fmt.Errorf("marshal key: %w", err)
-	}
-	keyPEM := pem.EncodeToMemory(&pem.Block{Type: "EC PRIVATE KEY", Bytes: keyDER})
-	if err := os.WriteFile(filepath.Join(dir, "tls.key"), keyPEM, 0o600); err != nil {
-		return fmt.Errorf("write key: %w", err)
-	}
-	return nil
-}
--- a/e2e/harness/client.go
+++ b/e2e/harness/client.go
@@ -1,207 +0,0 @@
-//go:build e2e
-
-package harness
-
-import (
-	"context"
-	"fmt"
-	"io"
-	"os/exec"
-	"strings"
-	"time"
-
-	"github.com/docker/docker/api/types/container"
-	"github.com/testcontainers/testcontainers-go"
-	tcexec "github.com/testcontainers/testcontainers-go/exec"
-)
-
-const (
-	clientDockerfile = "e2e/harness/Dockerfile.client"
-	// defaultClientImage is the published NetBird client release used by
-	// default. Override with NB_E2E_CLIENT_IMAGE; a value without a "/" is built
-	// locally from clientDockerfile.
-	defaultClientImage = "netbirdio/netbird:0.74.0-rc.2"
-	clientAlias        = "client"
-	curlImage          = "curlimages/curl:latest"
-)
-
-// Client is a running NetBird client container joined to the combined server.
-type Client struct {
-	container testcontainers.Container
-}
-
-// StartClient builds the client image and runs it on the combined server's
-// network, joining via the given setup key. The image entrypoint brings the
-// daemon up automatically; callers wait for connectivity with WaitConnected /
-// WaitProxyPeer.
-func StartClient(ctx context.Context, c *Combined, setupKey string) (*Client, error) {
-	root, err := repoRoot()
-	if err != nil {
-		return nil, err
-	}
-	clientImage, err := resolveImage(ctx, root, "NB_E2E_CLIENT_IMAGE", defaultClientImage, clientDockerfile)
-	if err != nil {
-		return nil, err
-	}
-
-	req := testcontainers.ContainerRequest{
-		Image:          clientImage,
-		Networks:       []string{c.network.Name},
-		NetworkAliases: map[string][]string{c.network.Name: {clientAlias}},
-		Env: map[string]string{
-			"NB_MANAGEMENT_URL": combinedExposedURL,
-			"NB_SETUP_KEY":      setupKey,
-			"NB_LOG_LEVEL":      "info",
-			// Match the proxy: the combined relay is WebSocket-only, so the
-			// client must use WS transport to keep a stable relay link to it.
-			"NB_RELAY_TRANSPORT": "ws",
-		},
-		HostConfigModifier: func(hc *container.HostConfig) {
-			hc.CapAdd = append(hc.CapAdd, "NET_ADMIN", "SYS_ADMIN", "SYS_RESOURCE")
-		},
-	}
-
-	ctr, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
-		ContainerRequest: req,
-		Started:          true,
-	})
-	if err != nil {
-		return nil, fmt.Errorf("start client container: %w", err)
-	}
-	return &Client{container: ctr}, nil
-}
-
-// Restart bounces the client connection (netbird down/up) so it pulls a fresh
-// network map — the documented workaround for a freshly-joined client not yet
-// seeing a synthesized agent-network service.
-func (cl *Client) Restart(ctx context.Context) error {
-	if _, _, err := cl.container.Exec(ctx, []string{"netbird", "down"}, tcexec.Multiplexed()); err != nil {
-		return fmt.Errorf("netbird down: %w", err)
-	}
-	time.Sleep(2 * time.Second)
-	code, reader, err := cl.container.Exec(ctx, []string{"netbird", "up"}, tcexec.Multiplexed())
-	if err != nil {
-		return fmt.Errorf("netbird up: %w", err)
-	}
-	if code != 0 {
-		out, _ := io.ReadAll(reader)
-		return fmt.Errorf("netbird up exited %d: %s", code, string(out))
-	}
-	return nil
-}
-
-// Status returns `netbird status` output from inside the client.
-func (cl *Client) Status(ctx context.Context) (string, error) {
-	code, reader, err := cl.container.Exec(ctx, []string{"netbird", "status"}, tcexec.Multiplexed())
-	if err != nil {
-		return "", err
-	}
-	out, _ := io.ReadAll(reader)
-	if code != 0 {
-		return string(out), fmt.Errorf("netbird status exited %d", code)
-	}
-	return string(out), nil
-}
-
-// WaitConnected polls until the client reports Management: Connected.
-func (cl *Client) WaitConnected(ctx context.Context, timeout time.Duration) error {
-	return cl.pollStatus(ctx, timeout, "Management: Connected")
-}
-
-// WaitProxyPeer polls until the client sees the proxy peer connected (1/1).
-func (cl *Client) WaitProxyPeer(ctx context.Context, timeout time.Duration) error {
-	return cl.pollStatus(ctx, timeout, "1/1 Connected")
-}
-
-func (cl *Client) pollStatus(ctx context.Context, timeout time.Duration, want string) error {
-	deadline := time.Now().Add(timeout)
-	var last string
-	for time.Now().Before(deadline) {
-		out, _ := cl.Status(ctx)
-		last = out
-		if strings.Contains(out, want) {
-			return nil
-		}
-		time.Sleep(3 * time.Second)
-	}
-	return fmt.Errorf("timed out waiting for %q; last status:\n%s", want, last)
-}
-
-// ResolveProxyIP resolves the agent-network endpoint to the proxy peer's
-// NetBird IP from inside the client (via magic DNS).
-func (cl *Client) ResolveProxyIP(ctx context.Context, endpoint string) (string, error) {
-	code, reader, err := cl.container.Exec(ctx, []string{"getent", "hosts", endpoint}, tcexec.Multiplexed())
-	if err != nil {
-		return "", err
-	}
-	out, _ := io.ReadAll(reader)
-	if code != 0 {
-		return "", fmt.Errorf("getent hosts %s exited %d", endpoint, code)
-	}
-	fields := strings.Fields(string(out))
-	if len(fields) == 0 {
-		return "", fmt.Errorf("no address for %s", endpoint)
-	}
-	return fields[0], nil
-}
-
-// Chat issues a chat-completion POST to the agent-network endpoint over the
-// client's tunnel, returning the HTTP status and response body. It runs curl in
-// a throwaway container sharing the client's network namespace so the request
-// traverses the WireGuard tunnel, pinning the endpoint to the proxy peer IP.
-func (cl *Client) Chat(ctx context.Context, endpoint, proxyIP, model, prompt string) (int, string, error) {
-	body := fmt.Sprintf(`{"model":%q,"messages":[{"role":"user","content":%q}]}`, model, prompt)
-	url := "https://" + endpoint + "/v1/chat/completions"
-
-	args := []string{
-		"run", "--rm",
-		"--network", "container:" + cl.container.GetContainerID(),
-		curlImage,
-		"-sk", "--connect-timeout", "5", "--max-time", "90",
-		"--resolve", endpoint + ":443:" + proxyIP,
-		"-o", "/dev/stderr", "-w", "%{http_code}",
-		"-X", "POST", url,
-		"-H", "Content-Type: application/json",
-		"--data", body,
-	}
-	cmd := exec.CommandContext(ctx, "docker", args...)
-	// -w writes the status code to stdout; -o /dev/stderr writes the body to
-	// stderr so we can capture both separately.
-	var stdout, stderr strings.Builder
-	cmd.Stdout = &stdout
-	cmd.Stderr = &stderr
-	if err := cmd.Run(); err != nil {
-		return 0, stderr.String(), fmt.Errorf("curl through tunnel: %w", err)
-	}
-
-	code := 0
-	_, _ = fmt.Sscanf(strings.TrimSpace(stdout.String()), "%d", &code)
-	return code, stderr.String(), nil
-}
-
-// Logs returns the client container logs, for diagnostics on failure.
-func (cl *Client) Logs(ctx context.Context) string {
-	return containerLogs(ctx, cl.container)
-}
-
-// Terminate stops the client container.
-func (cl *Client) Terminate(ctx context.Context) error {
-	if cl.container == nil {
-		return nil
-	}
-	return cl.container.Terminate(ctx)
-}
-
-// containerLogs reads up to 256 KiB of a container's logs for diagnostics.
-func containerLogs(ctx context.Context, c testcontainers.Container) string {
-	if c == nil {
-		return ""
-	}
-	r, err := c.Logs(ctx)
-	if err != nil {
-		return fmt.Sprintf("<logs error: %v>", err)
-	}
-	defer r.Close()
-	b, _ := io.ReadAll(io.LimitReader(r, 256<<10))
-	return string(b)
-}
--- a/e2e/harness/combined.go
+++ b/e2e/harness/combined.go
@@ -1,234 +0,0 @@
-//go:build e2e
-
-package harness
-
-import (
-	"context"
-	"fmt"
-	"io"
-	"os"
-	"os/exec"
-	"path/filepath"
-	"strings"
-	"time"
-
-	"github.com/docker/docker/api/types/container"
-	"github.com/docker/go-connections/nat"
-	"github.com/testcontainers/testcontainers-go"
-	tcexec "github.com/testcontainers/testcontainers-go/exec"
-	"github.com/testcontainers/testcontainers-go/network"
-	"github.com/testcontainers/testcontainers-go/wait"
-
-	"github.com/netbirdio/netbird/shared/management/client/rest"
-)
-
-const (
-	combinedDockerfile = "combined/Dockerfile.multistage"
-	// defaultCombinedImage is the published combined-server release used by
-	// default (the artifact operators run manually). Override with
-	// NB_E2E_COMBINED_IMAGE; a value without a "/" is built locally from
-	// combinedDockerfile instead of pulled.
-	defaultCombinedImage = "netbirdio/netbird-server:0.74.0-rc.2"
-	combinedHTTPPort     = "8080/tcp"
-
-	// combinedAlias is the combined server's network alias AND the deployment
-	// domain. The working manual setup uses a single NETBIRD_DOMAIN for the
-	// management exposed address, the proxy domain, and the agent-network
-	// cluster — so we mirror that: peers reach management/signal/relay at this
-	// name, the proxy registers this as its cluster, and the agent-network
-	// endpoint is <subdomain>.<combinedAlias>.
-	combinedAlias      = "netbird.local"
-	combinedExposedURL = "http://" + combinedAlias + ":8080"
-
-	// containerIssuer is the embedded IdP issuer, used only for internal JWT
-	// validation (peers authenticate with setup keys / proxy tokens, not OIDC),
-	// so the in-container localhost address is fine.
-	containerIssuer = "http://localhost:8080/oauth2"
-)
-
-// Combined is a running combined NetBird server (management + signal + relay +
-// STUN + embedded IdP) plus the connection details tests need. It owns the
-// shared docker network that the proxy and client containers join.
-type Combined struct {
-	container testcontainers.Container
-	network   *testcontainers.DockerNetwork
-	// BaseURL is the host-reachable management API root, e.g. http://127.0.0.1:51234.
-	BaseURL string
-	// PAT is the admin Personal Access Token minted via Bootstrap.
-	PAT string
-
-	api     *rest.Client
-	workDir string
-}
-
-// StartCombined builds the combined server from its multistage Dockerfile and
-// boots it with setup-PAT enabled on a fresh shared network, returning once the
-// API is serving. The caller still owns minting the admin PAT via Bootstrap.
-func StartCombined(ctx context.Context) (*Combined, error) {
-	root, err := repoRoot()
-	if err != nil {
-		return nil, err
-	}
-
-	combinedImage, err := resolveImage(ctx, root, "NB_E2E_COMBINED_IMAGE", defaultCombinedImage, combinedDockerfile)
-	if err != nil {
-		return nil, err
-	}
-
-	net, err := network.New(ctx)
-	if err != nil {
-		return nil, fmt.Errorf("create shared network: %w", err)
-	}
-
-	// Work dir under /tmp so Docker Desktop file sharing (which excludes
-	// macOS's /var/folders TMPDIR) can bind-mount it.
-	workDir, err := os.MkdirTemp("/tmp", "nb-e2e-combined-*")
-	if err != nil {
-		_ = net.Remove(ctx)
-		return nil, fmt.Errorf("create work dir: %w", err)
-	}
-
-	cfg := fmt.Sprintf(combinedConfigYAML, combinedExposedURL, containerIssuer)
-	if err := os.WriteFile(filepath.Join(workDir, "config.yaml"), []byte(cfg), 0o644); err != nil { //nolint:gosec // non-secret config, bind-mounted and read by the container
-		_ = net.Remove(ctx)
-		return nil, fmt.Errorf("write combined config: %w", err)
-	}
-	if err := os.MkdirAll(filepath.Join(workDir, "data"), 0o755); err != nil {
-		_ = net.Remove(ctx)
-		return nil, fmt.Errorf("create datadir: %w", err)
-	}
-
-	req := testcontainers.ContainerRequest{
-		Image:          combinedImage,
-		ExposedPorts:   []string{combinedHTTPPort},
-		Networks:       []string{net.Name},
-		NetworkAliases: map[string][]string{net.Name: {combinedAlias}},
-		Env: map[string]string{
-			"NB_SETUP_PAT_ENABLED": "true",
-			// Skip the GeoLite DB download — it blocks startup and agent-network
-			// ingest doesn't use geolocation.
-			"NB_DISABLE_GEOLOCATION": "true",
-		},
-		Cmd: []string{"--config", "/nb/config.yaml"},
-		HostConfigModifier: func(hc *container.HostConfig) {
-			hc.Binds = append(hc.Binds, workDir+":/nb")
-		},
-		WaitingFor: wait.ForHTTP("/api/instance").
-			WithPort(combinedHTTPPort).
-			WithStatusCodeMatcher(func(status int) bool { return status == 200 }).
-			WithStartupTimeout(120 * time.Second),
-	}
-
-	c, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
-		ContainerRequest: req,
-		Started:          true,
-	})
-	if err != nil {
-		_ = net.Remove(ctx)
-		return nil, fmt.Errorf("start combined container: %w", err)
-	}
-
-	host, err := c.Host(ctx)
-	if err != nil {
-		_ = c.Terminate(ctx)
-		_ = net.Remove(ctx)
-		return nil, fmt.Errorf("container host: %w", err)
-	}
-	mapped, err := c.MappedPort(ctx, nat.Port(combinedHTTPPort))
-	if err != nil {
-		_ = c.Terminate(ctx)
-		_ = net.Remove(ctx)
-		return nil, fmt.Errorf("mapped port: %w", err)
-	}
-
-	return &Combined{
-		container: c,
-		network:   net,
-		BaseURL:   fmt.Sprintf("http://%s:%s", host, mapped.Port()),
-		workDir:   workDir,
-	}, nil
-}
-
-// resolveImage returns the image to run for a component. The env override (or
-// the default) is used as-is when it looks like a registry reference (contains
-// "/") — testcontainers pulls it. A bare local tag is built from the repo
-// Dockerfile instead, so developers can still test source builds.
-func resolveImage(ctx context.Context, root, envKey, defaultImage, dockerfile string) (string, error) {
-	img := defaultImage
-	if v := os.Getenv(envKey); v != "" {
-		img = v
-	}
-	if strings.Contains(img, "/") {
-		return img, nil
-	}
-	if err := buildImage(ctx, root, dockerfile, img); err != nil {
-		return "", err
-	}
-	return img, nil
-}
-
-// buildImage builds an image from a repo Dockerfile via the docker CLI with
-// BuildKit enabled, so cache mounts work and unchanged sources reuse the layer
-// + go caches. Tags are stable so reruns are cheap.
-func buildImage(ctx context.Context, root, dockerfile, tag string) error {
-	cmd := exec.CommandContext(ctx, "docker", "build",
-		"-f", dockerfile,
-		"-t", tag,
-		".",
-	)
-	cmd.Dir = root
-	cmd.Env = append(os.Environ(), "DOCKER_BUILDKIT=1")
-	if out, err := cmd.CombinedOutput(); err != nil {
-		return fmt.Errorf("build image %s: %w\n%s", tag, err, string(out))
-	}
-	return nil
-}
-
-// CreateProxyTokenCLI mints a proxy access token via the server's `token
-// create` CLI inside the container — the same path the manual install uses.
-// This yields a GLOBAL (account-less) token, so the proxy serves the whole
-// cluster (SynthesizeServicesForCluster); an account-scoped REST token instead
-// drives the per-account path. Returns the plaintext token.
-func (c *Combined) CreateProxyTokenCLI(ctx context.Context, name string) (string, error) {
-	code, reader, err := c.container.Exec(ctx,
-		[]string{"/go/bin/netbird-server", "token", "create", "--name", name, "--config", "/nb/config.yaml"},
-		tcexec.Multiplexed())
-	if err != nil {
-		return "", fmt.Errorf("exec token create: %w", err)
-	}
-	out, _ := io.ReadAll(reader)
-	if code != 0 {
-		return "", fmt.Errorf("token create exited %d: %s", code, string(out))
-	}
-	for _, line := range strings.Split(string(out), "\n") {
-		line = strings.TrimSpace(line)
-		if strings.HasPrefix(line, "Token:") {
-			tok := strings.TrimSpace(strings.TrimPrefix(line, "Token:"))
-			if tok != "" {
-				return tok, nil
-			}
-		}
-	}
-	return "", fmt.Errorf("token not found in CLI output: %s", string(out))
-}
-
-// Logs returns the combined server container logs, for diagnostics.
-func (c *Combined) Logs(ctx context.Context) string {
-	return containerLogs(ctx, c.container)
-}
-
-// Terminate stops the container, removes the shared network, and cleans the
-// work dir.
-func (c *Combined) Terminate(ctx context.Context) error {
-	var err error
-	if c.container != nil {
-		err = c.container.Terminate(ctx)
-	}
-	if c.network != nil {
-		_ = c.network.Remove(ctx)
-	}
-	if c.workDir != "" {
-		_ = os.RemoveAll(c.workDir)
-	}
-	return err
-}
--- a/e2e/harness/config.go
+++ b/e2e/harness/config.go
@@ -1,26 +0,0 @@
-//go:build e2e
-
-package harness
-
-// combinedConfigYAML is a minimal combined-server config for tests: plain HTTP
-// on :8080 (no TLS cert configured → the server serves HTTP and expects to sit
-// behind a reverse proxy, which is exactly what we want for in-cluster tests),
-// embedded IdP, local signal/relay/STUN, and a sqlite store under the mounted
-// data dir. exposedAddress is the address peers use to reach this container; it
-// is overridden per-run so the value matches the container's network alias.
-const combinedConfigYAML = `server:
-  listenAddress: ":8080"
-  exposedAddress: "%s"
-  healthcheckAddress: ":9000"
-  metricsPort: 9090
-  logLevel: "info"
-  logFile: "console"
-  authSecret: "e2e-relay-secret"
-  dataDir: "/nb/data"
-  disableAnonymousMetrics: true
-  disableGeoliteUpdate: true
-  auth:
-    issuer: "%s"
-  store:
-    engine: "sqlite"
-`
--- a/e2e/harness/doc.go
+++ b/e2e/harness/doc.go
@@ -1,13 +0,0 @@
-//go:build e2e
-
-// Package harness provides a self-contained, OIDC-free way to stand up NetBird
-// components in containers for end-to-end tests. It is feature-agnostic: any
-// suite can ask for a live management server (with an admin PAT minted through
-// the unauthenticated /api/setup bootstrap) and, later, a proxy and client.
-//
-// The harness compiles each component once in a cached builder container and
-// mounts the resulting binary into a slim runtime container, so iterating on a
-// branch doesn't pay a full image rebuild per run. Everything is gated behind
-// the `e2e` build tag so normal builds and unit tests never pull in
-// testcontainers.
-package harness
--- a/e2e/harness/paths.go
+++ b/e2e/harness/paths.go
@@ -1,29 +0,0 @@
-//go:build e2e
-
-package harness
-
-import (
-	"fmt"
-	"os"
-	"path/filepath"
-)
-
-// repoRoot walks up from the working directory to the module root (the
-// directory holding go.mod), so the Docker build context is correct no matter
-// which package the test runs from.
-func repoRoot() (string, error) {
-	dir, err := os.Getwd()
-	if err != nil {
-		return "", err
-	}
-	for {
-		if _, statErr := os.Stat(filepath.Join(dir, "go.mod")); statErr == nil {
-			return dir, nil
-		}
-		parent := filepath.Dir(dir)
-		if parent == dir {
-			return "", fmt.Errorf("go.mod not found above %s", dir)
-		}
-		dir = parent
-	}
-}
--- a/e2e/harness/proxy.go
+++ b/e2e/harness/proxy.go
@@ -1,116 +0,0 @@
-//go:build e2e
-
-package harness
-
-import (
-	"context"
-	"fmt"
-	"os"
-	"time"
-
-	"github.com/docker/docker/api/types/container"
-	"github.com/testcontainers/testcontainers-go"
-	"github.com/testcontainers/testcontainers-go/wait"
-)
-
-const (
-	proxyDockerfile = "proxy/Dockerfile.multistage"
-	// defaultProxyImage is the published reverse-proxy release used by default.
-	// Override with NB_E2E_PROXY_IMAGE; a value without a "/" is built locally.
-	defaultProxyImage = "netbirdio/reverse-proxy:0.74.0-rc.2"
-	proxyAlias        = "proxy"
-
-	// AgentNetworkCluster is the proxy cluster the e2e provider bootstraps and
-	// the proxy serves. It must equal the management's exposed domain
-	// (combinedAlias) — the working manual setup uses one NETBIRD_DOMAIN for
-	// both. The agent-network endpoint is <subdomain>.<cluster>.
-	AgentNetworkCluster = combinedAlias
-)
-
-// Proxy is a running agent-network gateway (netbird proxy) container.
-type Proxy struct {
-	container testcontainers.Container
-	workDir   string
-}
-
-// StartProxy builds the proxy image and runs it on the combined server's
-// network, registered via the given account proxy token and serving the
-// AgentNetworkCluster over a self-signed wildcard cert. It does not wait for
-// peer connectivity — callers poll management for the proxy peer.
-func StartProxy(ctx context.Context, c *Combined, proxyToken string) (*Proxy, error) {
-	root, err := repoRoot()
-	if err != nil {
-		return nil, err
-	}
-	proxyImage, err := resolveImage(ctx, root, "NB_E2E_PROXY_IMAGE", defaultProxyImage, proxyDockerfile)
-	if err != nil {
-		return nil, err
-	}
-
-	workDir, err := os.MkdirTemp("/tmp", "nb-e2e-proxy-*")
-	if err != nil {
-		return nil, fmt.Errorf("create proxy work dir: %w", err)
-	}
-	if err := writeSelfSignedCert(workDir, []string{"*." + AgentNetworkCluster, AgentNetworkCluster}); err != nil {
-		return nil, err
-	}
-
-	req := testcontainers.ContainerRequest{
-		Image:          proxyImage,
-		Networks:       []string{c.network.Name},
-		NetworkAliases: map[string][]string{c.network.Name: {proxyAlias}},
-		Env: map[string]string{
-			"NB_PROXY_TOKEN":                 proxyToken,
-			"NB_PROXY_MANAGEMENT_ADDRESS":    combinedExposedURL,
-			"NB_PROXY_DOMAIN":                AgentNetworkCluster,
-			"NB_PROXY_ADDRESS":               ":443",
-			"NB_PROXY_CERTIFICATE_DIRECTORY": "/certs",
-			"NB_PROXY_HEALTH_ADDRESS":        ":8081",
-			"NB_PROXY_LOG_LEVEL":             "debug",
-			"NB_PROXY_PRIVATE":               "true",
-			// Management is plain HTTP in-cluster, so allow the proxy token to
-			// ride a non-TLS gRPC connection.
-			"NB_PROXY_ALLOW_INSECURE": "true",
-			// The combined server multiplexes the relay over WebSocket on :8080
-			// (no QUIC listener). The proxy's embedded relay client defaults to
-			// QUIC, which fails here and flaps the relay link, churning the
-			// proxy peer so it never stably registers. Force WS transport.
-			"NB_RELAY_TRANSPORT": "ws",
-			// Trace the embedded client (relay / signal / handshake) so
-			// peer-registration issues are visible in the proxy logs.
-			"NB_PROXY_CLIENT_LOG_LEVEL": "trace",
-		},
-		HostConfigModifier: func(hc *container.HostConfig) {
-			hc.Binds = append(hc.Binds, workDir+":/certs")
-			hc.CapAdd = append(hc.CapAdd, "NET_ADMIN", "SYS_ADMIN", "SYS_RESOURCE", "NET_BIND_SERVICE")
-		},
-		WaitingFor: wait.ForLog("Initial mapping sync complete").WithStartupTimeout(90 * time.Second),
-	}
-
-	ctr, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
-		ContainerRequest: req,
-		Started:          true,
-	})
-	if err != nil {
-		return nil, fmt.Errorf("start proxy container: %w", err)
-	}
-
-	return &Proxy{container: ctr, workDir: workDir}, nil
-}
-
-// Logs returns the proxy container logs, for diagnostics on failure.
-func (p *Proxy) Logs(ctx context.Context) string {
-	return containerLogs(ctx, p.container)
-}
-
-// Terminate stops the proxy container and cleans its work dir.
-func (p *Proxy) Terminate(ctx context.Context) error {
-	var err error
-	if p.container != nil {
-		err = p.container.Terminate(ctx)
-	}
-	if p.workDir != "" {
-		_ = os.RemoveAll(p.workDir)
-	}
-	return err
-}
--- a/go.mod
+++ b/go.mod
@@ -35,7 +35,6 @@ require (
 	github.com/DeRuina/timberjack v1.4.2
 	github.com/awnumar/memguard v0.23.0
 	github.com/aws/aws-sdk-go-v2 v1.38.3
-	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.1
 	github.com/aws/aws-sdk-go-v2/config v1.31.6
 	github.com/aws/aws-sdk-go-v2/credentials v1.18.10
 	github.com/aws/aws-sdk-go-v2/service/s3 v1.87.3
@@ -79,10 +78,12 @@ require (
 	github.com/mdp/qrterminal/v3 v3.2.1
 	github.com/miekg/dns v1.1.72
 	github.com/mitchellh/hashstructure/v2 v2.0.2
+	github.com/moby/moby/api v1.54.1
 	github.com/netbirdio/management-integrations/integrations v0.0.0-20260416123949-2355d972be42
 	github.com/netbirdio/signal-dispatcher/dispatcher v0.0.0-20250805121659-6b4ac470ca45
 	github.com/oapi-codegen/runtime v1.1.2
 	github.com/okta/okta-sdk-golang/v2 v2.18.0
+	github.com/ory/dockertest/v4 v4.0.0
 	github.com/oschwald/maxminddb-golang v1.12.0
 	github.com/patrickmn/go-cache v2.1.0+incompatible
 	github.com/petermattis/goid v0.0.0-20250303134427-723919f7f203
@@ -146,7 +147,7 @@ require (
 	dario.cat/mergo v1.0.1 // indirect
 	filippo.io/edwards25519 v1.1.1 // indirect
 	github.com/AppsFlyer/go-sundheit v0.6.0 // indirect
-	github.com/Azure/go-ansiterm v0.0.0-20230124172434-306776ec8161 // indirect
+	github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c // indirect
 	github.com/Azure/go-ntlmssp v0.1.0 // indirect
 	github.com/BurntSushi/toml v1.5.0 // indirect
 	github.com/Masterminds/goutils v1.1.1 // indirect
@@ -157,6 +158,7 @@ require (
 	github.com/apapsch/go-jsonmerge/v2 v2.0.0 // indirect
 	github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 // indirect
 	github.com/awnumar/memcall v0.4.0 // indirect
+	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.1 // indirect
 	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.6 // indirect
 	github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.6 // indirect
 	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.6 // indirect
@@ -177,6 +179,8 @@ require (
 	github.com/caddyserver/zerossl v0.1.3 // indirect
 	github.com/cenkalti/backoff/v5 v5.0.3 // indirect
 	github.com/cespare/xxhash/v2 v2.3.0 // indirect
+	github.com/containerd/errdefs v1.0.0 // indirect
+	github.com/containerd/errdefs/pkg v0.3.0 // indirect
 	github.com/containerd/log v0.1.0 // indirect
 	github.com/containerd/platforms v0.2.1 // indirect
 	github.com/cpuguy83/dockercfg v0.3.2 // indirect
@@ -271,11 +275,12 @@ require (
 	github.com/mitchellh/mapstructure v1.5.0 // indirect
 	github.com/mitchellh/reflectwalk v1.0.2 // indirect
 	github.com/moby/docker-image-spec v1.3.1 // indirect
+	github.com/moby/moby/client v0.4.0 // indirect
 	github.com/moby/patternmatcher v0.6.0 // indirect
 	github.com/moby/sys/sequential v0.5.0 // indirect
 	github.com/moby/sys/user v0.3.0 // indirect
 	github.com/moby/sys/userns v0.1.0 // indirect
-	github.com/moby/term v0.5.0 // indirect
+	github.com/moby/term v0.5.2 // indirect
 	github.com/morikuni/aec v1.0.0 // indirect
 	github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
 	github.com/nfnt/resize v0.0.0-20180221191011-83c6a9932646 // indirect
@@ -341,7 +346,7 @@ replace github.com/kardianos/service => github.com/netbirdio/service v0.0.0-2024

 replace github.com/getlantern/systray => github.com/netbirdio/systray v0.0.0-20231030152038-ef1ed2a27949

-replace golang.zx2c4.com/wireguard => github.com/netbirdio/wireguard-go v0.0.0-20260523085312-4b4a4e36017f
+replace golang.zx2c4.com/wireguard => github.com/netbirdio/wireguard-go v0.0.0-20260628102922-2834bebf6c1a

 replace github.com/cloudflare/circl => codeberg.org/cunicu/circl v0.0.0-20230801113412-fec58fc7b5f6

--- a/go.sum
+++ b/go.sum
@@ -23,8 +23,8 @@ github.com/AdaLogics/go-fuzz-headers v0.0.0-20230811130428-ced1acdcaa24 h1:bvDV9
 github.com/AdaLogics/go-fuzz-headers v0.0.0-20230811130428-ced1acdcaa24/go.mod h1:8o94RPi1/7XTJvwPpRSzSUedZrtlirdB3r9Z20bi2f8=
 github.com/AppsFlyer/go-sundheit v0.6.0 h1:d2hBvCjBSb2lUsEWGfPigr4MCOt04sxB+Rppl0yUMSk=
 github.com/AppsFlyer/go-sundheit v0.6.0/go.mod h1:LDdBHD6tQBtmHsdW+i1GwdTt6Wqc0qazf5ZEJVTbTME=
-github.com/Azure/go-ansiterm v0.0.0-20230124172434-306776ec8161 h1:L/gRVlceqvL25UVaW/CKtUDjefjrs0SPonmDGUVOYP0=
-github.com/Azure/go-ansiterm v0.0.0-20230124172434-306776ec8161/go.mod h1:xomTg63KZ2rFqZQzSB4Vz2SUXa1BpHTVz9L5PTmPC4E=
+github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c h1:udKWzYgxTojEKWjV8V+WSxDXJ4NFATAsZjh8iIbsQIg=
+github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c/go.mod h1:xomTg63KZ2rFqZQzSB4Vz2SUXa1BpHTVz9L5PTmPC4E=
 github.com/Azure/go-ntlmssp v0.1.0 h1:DjFo6YtWzNqNvQdrwEyr/e4nhU3vRiwenz5QX7sFz+A=
 github.com/Azure/go-ntlmssp v0.1.0/go.mod h1:NYqdhxd/8aAct/s4qSYZEerdPuH1liG2/X9DiVTbhpk=
 github.com/BurntSushi/toml v1.5.0 h1:W5quZX/G/csjUnuI8SUYlsHs9M38FC7znL0lIO+DvMg=
@@ -117,6 +117,10 @@ github.com/cilium/ebpf v0.19.0 h1:Ro/rE64RmFBeA9FGjcTc+KmCeY6jXmryu6FfnzPRIao=
 github.com/cilium/ebpf v0.19.0/go.mod h1:fLCgMo3l8tZmAdM3B2XqdFzXBpwkcSTroaVqN08OWVY=
 github.com/coder/websocket v1.8.14 h1:9L0p0iKiNOibykf283eHkKUHHrpG7f65OE3BhhO7v9g=
 github.com/coder/websocket v1.8.14/go.mod h1:NX3SzP+inril6yawo5CQXx8+fk145lPDC6pumgx0mVg=
+github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=
+github.com/containerd/errdefs v1.0.0/go.mod h1:+YBYIdtsnF4Iw6nWZhJcqGSg/dwvV7tyJ/kCkyJ2k+M=
+github.com/containerd/errdefs/pkg v0.3.0 h1:9IKJ06FvyNlexW690DXuQNx2KA2cUJXx151Xdx3ZPPE=
+github.com/containerd/errdefs/pkg v0.3.0/go.mod h1:NJw6s9HwNuRhnjJhM7pylWwMyAkmCQvQ4GpJHEqRLVk=
 github.com/containerd/log v0.1.0 h1:TCJt7ioM2cr/tfR8GPbGf9/VRAX8D2B4PjzCpfX540I=
 github.com/containerd/log v0.1.0/go.mod h1:VRRf09a7mHDIRezVKTRCrOq78v577GXq3bSa3EhrzVo=
 github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpSBQv6A=
@@ -480,6 +484,10 @@ github.com/mitchellh/reflectwalk v1.0.2 h1:G2LzWKi524PWgd3mLHV8Y5k7s6XUvT0Gef6zx
 github.com/mitchellh/reflectwalk v1.0.2/go.mod h1:mSTlrgnPZtwu0c4WaC2kGObEpuNDbx0jmZXqmk4esnw=
 github.com/moby/docker-image-spec v1.3.1 h1:jMKff3w6PgbfSa69GfNg+zN/XLhfXJGnEx3Nl2EsFP0=
 github.com/moby/docker-image-spec v1.3.1/go.mod h1:eKmb5VW8vQEh/BAr2yvVNvuiJuY6UIocYsFu/DxxRpo=
+github.com/moby/moby/api v1.54.1 h1:TqVzuJkOLsgLDDwNLmYqACUuTehOHRGKiPhvH8V3Nn4=
+github.com/moby/moby/api v1.54.1/go.mod h1:+RQ6wluLwtYaTd1WnPLykIDPekkuyD/ROWQClE83pzs=
+github.com/moby/moby/client v0.4.0 h1:S+2XegzHQrrvTCvF6s5HFzcrywWQmuVnhOXe2kiWjIw=
+github.com/moby/moby/client v0.4.0/go.mod h1:QWPbvWchQbxBNdaLSpoKpCdf5E+WxFAgNHogCWDoa7g=
 github.com/moby/patternmatcher v0.6.0 h1:GmP9lR19aU5GqSSFko+5pRqHi+Ohk1O69aFiKkVGiPk=
 github.com/moby/patternmatcher v0.6.0/go.mod h1:hDPoyOpDY7OrrMDLaYoY3hf52gNCR/YOUYxkhApJIxc=
 github.com/moby/sys/sequential v0.5.0 h1:OPvI35Lzn9K04PBbCLW0g4LcFAJgHsvXsRyewg5lXtc=
@@ -488,8 +496,8 @@ github.com/moby/sys/user v0.3.0 h1:9ni5DlcW5an3SvRSx4MouotOygvzaXbaSrc/wGDFWPo=
 github.com/moby/sys/user v0.3.0/go.mod h1:bG+tYYYJgaMtRKgEmuueC0hJEAZWwtIbZTB+85uoHjs=
 github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g=
 github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28=
-github.com/moby/term v0.5.0 h1:xt8Q1nalod/v7BqbG21f8mQPqH+xAaC9C3N3wfWbVP0=
-github.com/moby/term v0.5.0/go.mod h1:8FzsFHVUBGZdbDsJw/ot+X+d5HLUbvklYLJ9uGfcI3Y=
+github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ=
+github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc=
 github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
 github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0=
 github.com/morikuni/aec v1.0.0 h1:nP9CBfwrvYnBRgY6qfDQkygYDmYwOilePFkwzv4dU8A=
@@ -510,8 +518,8 @@ github.com/netbirdio/service v0.0.0-20240911161631-f62744f42502 h1:3tHlFmhTdX9ax
 github.com/netbirdio/service v0.0.0-20240911161631-f62744f42502/go.mod h1:CIMRFEJVL+0DS1a3Nx06NaMn4Dz63Ng6O7dl0qH0zVM=
 github.com/netbirdio/signal-dispatcher/dispatcher v0.0.0-20250805121659-6b4ac470ca45 h1:ujgviVYmx243Ksy7NdSwrdGPSRNE3pb8kEDSpH0QuAQ=
 github.com/netbirdio/signal-dispatcher/dispatcher v0.0.0-20250805121659-6b4ac470ca45/go.mod h1:5/sjFmLb8O96B5737VCqhHyGRzNFIaN/Bu7ZodXc3qQ=
-github.com/netbirdio/wireguard-go v0.0.0-20260523085312-4b4a4e36017f h1:ff2D57RBjWtyQ2wVwJOxOgXAXOe/J2lJWtSX0Bz/BRk=
-github.com/netbirdio/wireguard-go v0.0.0-20260523085312-4b4a4e36017f/go.mod h1:rpwXGsirqLqN2L0JDJQlwOboGHmptD5ZD6T2VmcqhTw=
+github.com/netbirdio/wireguard-go v0.0.0-20260628102922-2834bebf6c1a h1:3CWK+yTvRKOcC0Q8VCTGy4l60TEb27CQVS7LkMxwjmw=
+github.com/netbirdio/wireguard-go v0.0.0-20260628102922-2834bebf6c1a/go.mod h1:rpwXGsirqLqN2L0JDJQlwOboGHmptD5ZD6T2VmcqhTw=
 github.com/nfnt/resize v0.0.0-20180221191011-83c6a9932646 h1:zYyBkD/k9seD2A7fsi6Oo2LfFZAehjjQMERAvZLEDnQ=
 github.com/nfnt/resize v0.0.0-20180221191011-83c6a9932646/go.mod h1:jpp1/29i3P1S/RLdc7JQKbRpFeM1dOBd8T9ki5s+AY8=
 github.com/nicksnyder/go-i18n/v2 v2.5.1 h1:IxtPxYsR9Gp60cGXjfuR/llTqV8aYMsC472zD0D1vHk=
@@ -542,6 +550,8 @@ github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8
 github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM=
 github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040=
 github.com/opencontainers/image-spec v1.1.1/go.mod h1:qpqAh3Dmcf36wStyyWU+kCeDgrGnAve2nCC8+7h8Q0M=
+github.com/ory/dockertest/v4 v4.0.0 h1:i19aFsO/VXE0VrMk4ifnKW4G/KIJ93PCjLOslxXoPME=
+github.com/ory/dockertest/v4 v4.0.0/go.mod h1:b5Ofu8VIxWNhXFvQcLu17pRNQdoUBKtXBW74G4Ygzx8=
 github.com/oschwald/maxminddb-golang v1.12.0 h1:9FnTOD0YOhP7DGxGsq4glzpGy5+w7pq50AS6wALUMYs=
 github.com/oschwald/maxminddb-golang v1.12.0/go.mod h1:q0Nob5lTCqyQ8WT6FYgS1L7PXKVVbgiymefNwIjPzgY=
 github.com/patrickmn/go-cache v2.1.0+incompatible h1:HRMgzkcYKYpi3C8ajMPV8OFXaaRUnok+kx1WdO15EQc=
@@ -973,11 +983,13 @@ gorm.io/driver/sqlite v1.5.7/go.mod h1:U+J8craQU6Fzkcvu8oLeAQmi50TkwPEhHDEjQZXDa
 gorm.io/gorm v1.25.7/go.mod h1:hbnx/Oo0ChWMn1BIhpy1oYozzpM15i4YPuHDmfYtwg8=
 gorm.io/gorm v1.25.12 h1:I0u8i2hWQItBq1WfE0o2+WuL9+8L21K9e2HHSTE/0f8=
 gorm.io/gorm v1.25.12/go.mod h1:xh7N7RHfYlNc5EmcI/El95gXusucDrQnHXe0+CgWcLQ=
-gotest.tools/v3 v3.5.1 h1:EENdUnS3pdur5nybKYIh2Vfgc8IUNBjxDPSjtiJcOzU=
-gotest.tools/v3 v3.5.1/go.mod h1:isy3WKz7GK6uNw/sbHzfKBLvlvXwUyV06n6brMxxopU=
+gotest.tools/v3 v3.5.2 h1:7koQfIKdy+I8UTetycgUqXWSDwpgv193Ka+qRsmBY8Q=
+gotest.tools/v3 v3.5.2/go.mod h1:LtdLGcnqToBH83WByAAi/wiwSFCArdFIUV/xxN4pcjA=
 gvisor.dev/gvisor v0.0.0-20260219192049-0f2374377e89 h1:mGJaeA61P8dEHTqdvAgc70ZIV3QoUoJcXCRyyjO26OA=
 gvisor.dev/gvisor v0.0.0-20260219192049-0f2374377e89/go.mod h1:QkHjoMIBaYtpVufgwv3keYAbln78mBoCuShZrPrer1Q=
 howett.net/plist v1.0.1 h1:37GdZ8tP09Q35o9ych3ehygcsL+HqKSwzctveSlarvM=
 howett.net/plist v1.0.1/go.mod h1:lqaXoTrLY4hg8tnEzNru53gicrbv7rrk+2xJA/7hw9g=
+pgregory.net/rapid v1.2.0 h1:keKAYRcjm+e1F0oAuU5F5+YPAWcyxNNRK2wud503Gnk=
+pgregory.net/rapid v1.2.0/go.mod h1:PY5XlDGj0+V1FCq0o192FdRhpKHGTRIWBgqjDBTrq04=
 rsc.io/qr v0.2.0 h1:6vBLea5/NRMVTz8V66gipeLycZMl/+UlFmk8DvqQ6WY=
 rsc.io/qr v0.2.0/go.mod h1:IF+uZjkb9fqyeF/4tlBoynqmQxUoPfWEKh921coOuXs=
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Viktor Liu	b434cda062	[client] Refresh signal receive liveness when worker handoff drains (#6594 )	2026-06-29 12:16:47 +02:00
Zoltan Papp	0b594c639a	[client] report management unhealthy while Sync stream is failing (#6575 ) * fix(mgm): report management unhealthy while Sync stream is failing The health probe (IsHealthy) only checked the gRPC transport and a GetServerKey call. GetServerKey succeeds even when the peer cannot sync (e.g. the server returns "settings not found"), so the probe kept marking management Connected while the Sync stream failed in a tight retry loop — pinning the status to "Connected" forever despite no sync ever succeeding. Track the last Sync stream error and have IsHealthy consult it, so a healthy transport is no longer enough to report the connection healthy. * fix(mgm): record disconnected state when sync stream setup fails The connectToSyncStream failure path in handleSyncStream returned early without updating syncStreamErr, so the client could still report healthy even when stream setup failed. Mirror the receiveUpdatesEvents error path by calling notifyDisconnected and setSyncStreamDisconnected.	2026-06-29 11:28:58 +02:00
Zoltan Papp	deff8af59f	[client] Wait for signal receive watchdog to stop before reconnect (#6574 ) * [client] Wait for signal receive watchdog to stop before reconnect The per-stream watchReceiveStream goroutine was started fire-and-forget and never joined. On reconnect a lingering watchdog could still flip shared client state (receiveStalled, the disconnect notifier) on the freshly established stream, since cancelStream only cancels its own stream context. Track the watchdog with a WaitGroup and wait for it to exit (after cancelling its stream) before the operation returns, so each reconnect starts with no stale watchdog. * [client] Bind signal receive probe to the stream context The watchdog probe reused the generic Send, which derives its per-attempt timeouts from the long-lived client context, so cancelStream could not interrupt an in-flight probe. After joining the watchdog on reconnect, watchdogWg.Wait() could then block for the full send-attempt chain. Split Send into a context-aware send and pass the stream context down through sendReceiveProbe, so cancelStream aborts any in-flight probe and the watchdog exits promptly.	2026-06-29 11:24:25 +02:00
Riccardo Manfrin	5711f0e38c	[client] add per-phase timing metrics for sync processing (#6533 ) * Adds metrics sync phases time split to monitor costs * Address review fixes * Increment README.md with description on usage with debug bundles	2026-06-29 11:02:02 +02:00
Maycon Santos	1409a1325a	[misc] Update careers page link (#6538 )	2026-06-29 09:19:01 +02:00
Viktor Liu	4400372f37	[client] Forward non-address DNS record types through route forwarders (#6455 )	2026-06-28 18:50:17 +02:00
Zoltan Papp	2d7b309004	[client] Categorize privileged tests behind a build tag and run them in Docker (#6425 ) * [client] categorize root/system-mutating tests behind a privileged build tag Tests that need root or mutate host state (nftables/iptables/DNS, TUN/WireGuard interfaces, routes, eBPF, SSH/service install) are now gated behind a //go:build privileged tag. The default `go test ./client/...` runs as a non-root user with no sudo and leaves host networking untouched; mixed files were split so pure-logic tests stay in the default suite. A self-hosting ory/dockertest/v4 harness (client/testutil/privileged) runs the privileged suite inside a --privileged --cap-add=NET_ADMIN container via `make test-privileged`; a DOCKER_CI=true guard skips the spawn when already inside the container. Added `make test-unit` for the host-safe run. * [client] add PRIV_RUN/PRIV_PKGS filters to the privileged test harness The dockertest harness now reads two optional env vars when building the in-container `go test` command: PRIV_RUN adds a -run test-name filter and PRIV_PKGS overrides the package list. Both empty reproduce the full privileged suite, so CI and `make test-privileged` behave as before. Lets a developer run a single privileged test in the container, e.g.: PRIV_RUN=TestNftablesManager PRIV_PKGS=./client/firewall/nftables/... make test-privileged * [client] fix unused-helper lint after the privileged test split Splitting privileged tests into _privileged_test.go left their shared helpers in the untagged files, so in the default (no-tag) build they had no callers and golangci-lint flagged them as unused. Moved the privileged-only helpers into the privileged files next to their callers (generateDummyHandler; createEngine/startSignal/startManagement/getConnectedPeers/ getPeers + kaep/kasp; (mockDaemon).setJWTToken). Annotated the shared routing-test fixtures that must stay untagged for cross-platform compilation with //nolint:unused (systemops_bsd expected* vars, ensureIPv6DefaultRoute on bsd/windows, loopbackIfaceWindows), matching the existing linux variant. * [client] fix privileged test CI failures and run the harness on macOS The host-safe unit run dropped sudo but two privileged test groups were never tagged, and the Docker privileged job silently never ran the suite: - Gate the ssh/server PrivilegeDropper command-construction tests behind the privileged tag (they require root to target a different UID); split them into executor_unix_privileged_test.go. - Tag sharedsock raw-socket tests privileged (need CAP_NET_RAW). - Fix the Docker job command: nested single quotes around the build tags closed the sh -c wrapper early, dropping the go list package set and the privileged tag, so go test ran on the empty repo root. Use double quotes. Make the self-hosting harness usable from a dev Mac: - Build it on darwin as well as linux; it only drives Docker. - Resolve the active docker context endpoint into DOCKER_HOST when the default /var/run/docker.sock is absent (Docker Desktop, Colima, OrbStack). - Rename the misspelled containerGoModache constant to containerGoModCache. * Update client/internal/engine_privileged_test.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update client/internal/routemanager/systemops/systemops_linux_test.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update client/internal/routemanager/systemops/systemops_windows_test.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update client/server/server_privileged_test.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * [ci] Run privileged-tagged tests on darwin, windows and freebsd The privileged build tag split moved root/system-mutating tests behind //go:build privileged, but only the linux docker job was given the tag. The native darwin (sudo), windows (PsExec64 -s) and freebsd VM runners already have the required privileges, so add the privileged tag there too to keep CI running the same set of tests as before the split. * [ci] Exclude dockertest harness from the darwin privileged run The privileged tag now compiles client/testutil/privileged on darwin, whose TestRunPrivilegedSuiteInDocker spawns a container the macOS runner has no Docker for. Exclude the harness package from the darwin list, matching the linux job, so the privileged tests run in place without a container spawn. --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2026-06-28 16:15:54 +02:00
Viktor Liu	5968cff242	[client] Keep signal stream alive while receive loop is blocked on worker handoff (#6530 )	2026-06-28 15:33:30 +02:00
dependabot[bot]	cf43841b86	Bump the actions group across 1 directory with 4 updates (#6550 ) Bumps the actions group with 4 updates in the / directory: [actions/setup-go](https://github.com/actions/setup-go), [actions/cache](https://github.com/actions/cache), [actions/cache/restore](https://github.com/actions/cache) and [actions/setup-java](https://github.com/actions/setup-java). Updates `actions/setup-go` from 6.4.0 to 6.5.0 - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](`4a3601121d...924ae3a1cd`) Updates `actions/cache` from 5.0.5 to 6.0.0 - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](`27d5ce7f10...2c8a9bd745`) Updates `actions/cache/restore` from 5.0.5 to 6.0.0 - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](`27d5ce7f10...2c8a9bd745`) Updates `actions/setup-java` from 5.3.0 to 5.4.0 - [Release notes](https://github.com/actions/setup-java/releases) - [Commits](`ad2b38190b...1bcf9fb12c`) --- updated-dependencies: - dependency-name: actions/setup-go dependency-version: 6.5.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions - dependency-name: actions/cache dependency-version: 6.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions - dependency-name: actions/cache/restore dependency-version: 6.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions - dependency-name: actions/setup-java dependency-version: 5.4.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-28 15:00:05 +02:00
Maycon Santos	739e36a313	[self-hosted] Add agent-network preset with dedicated configurations (#6569 )	2026-06-28 14:56:42 +02:00
Riccardo Manfrin	2bb5421631	These logs are needed for troubleshooting (debug) (#6565 )	2026-06-28 14:52:41 +02:00
MAAZIZ Adel Ayoub	998ade6e6d	[client] fix nil pointer panic when applying SSH server setting to an existing config (#6556 )	2026-06-28 14:51:21 +02:00
Zoltan Papp	62f5467cd8	[client] Eliminate packet loss during lazy connections. (#6355 ) * [client] Remove peer deletion on lazy activity detection Updated WireGuard dependency with a patch and removed the RemovePeer call on lazy activity detection to force a new handshake initiation to the updated endpoint. This also flushed the staged queue, dropping the first packet. Since UpdatePeer (called after ICE/relay negotiation) triggers SendStagedPackets via IpcSet/handlePostConfig, the peer removal is no longer necessary. The staged packet survives and the handshake is initiated on the real endpoint automatically. This also eliminates the transient state where the peer's endpoint and routes were absent between the lazy idle and connected states. * Update WireGuard dependency * Update WireGuard dependencies * Update WireGuard dependency	2026-06-28 14:22:19 +02:00
Zoltan Papp	1b29995ece	[client] Fix blocked status lock via relay manager path (#6547 ) * peer/status: move relay-state reads off the main mux GetRelayStates held d.mux (RLock) while calling into the relay Manager (RelayStates/RelayConnectError/ServerURLs). Those calls can be slow or block on the relay manager's own locks while it is reconnecting, which kept the central Status mutex held and stalled every peer state writer (UpdatePeerState, ReplaceOfflinePeers, etc.) contending for it. Guard relayMgr/relayStates with a dedicated muxRelays mutex and release it before invoking the relay Manager, so the relay read path no longer contends with the hot peer-state writers on d.mux. * peer/status: clone relay states in nil-manager path Return a cloned snapshot of d.relayStates when relayMgr is nil so callers cannot mutate the shared cached state, matching the non-nil path.	2026-06-28 12:45:33 +02:00
Zoltan Papp	fd96b8c12f	[client] Improve network addresses filter (#6515 ) * [client] Filter link-local and multicast from network addresses Skip IPv6 link-local and multicast addresses when building the peer network_addresses list on non-iOS platforms, matching the existing iOS behavior. A flapping NIC's link-local address otherwise churns the peer meta on every interface up/down. * [client] Skip engine restart when default route is unchanged After the network monitor's debounce window, re-check the default next hop before triggering a client restart. A flapping NIC that returns to the same default route no longer forces a restart, avoiding redundant sync stream reconnects and peer meta churn. * [client] Exclude own overlay address from reported network addresses The peer's own WireGuard overlay address (v4 and v6) was reported in network_addresses. As the interface comes and goes during reconnects it churned the peer meta on the management server. Drop it in GetInfoWithChecks, matching the IP regardless of prefix length since the engine knows the overlay address with the network mask while the interface reports it as a host address. * [client] Treat missing default route per protocol in next-hop check A failed GetNextHop lookup is now treated as an absent route (zero Nexthop) and compared per protocol, instead of forcing a restart. In a single-stack network the missing IPv6 default route no longer counts as a change on every debounce, which previously defeated the unchanged-route check. * [client] Make next-hop check injectable for network monitor tests Move the next-hop comparison behind a NetworkMonitor field set by New(), so tests can supply a stub instead of hitting the host's real default route. Fixes the Event/MultiEvent tests hanging after the unchanged-route check was added. * Revert "[client] Make next-hop check injectable for network monitor tests" This reverts commit `88a9d96e8f`. * Revert "[client] Treat missing default route per protocol in next-hop check" This reverts commit `0fb531e4bc`. * Revert "[client] Skip engine restart when default route is unchanged" This reverts commit `a071b55f35`.	2026-06-28 12:44:40 +02:00