17 KiB
Plan: Standalone IdP Migration Tool (External IdP → Embedded DEX)
Context
Target repo: /Users/ashleymensah/Documents/netbird-repos/netbird (main repo, not the fork)
Self-hosted NetBird users migrating from an external IdP (Zitadel, Keycloak, Okta, etc.) to NetBird's embedded DEX-based IdP need a way to re-key all user IDs in the database. A colleague's fork at /Users/ashleymensah/Documents/netbird-repos/nico-netbird/netbird has a prototype that runs inside management as an AfterInit hook, but this has a chicken-and-egg problem (enabling EmbeddedIdP causes management to initialize DEX before migration runs → startup failure).
This plan creates a standalone CLI tool that runs with management stopped, re-keys all user IDs, then the user manually updates their management config and restarts. The main repo already has DEX/EmbeddedIdP infrastructure but is missing the store methods and migration logic — these need to be created (porting patterns from the fork).
Note: Does not need to work with the combined management container setup (that only supports embeddedIdP-enabled setups anyway).
What the migration does
For each user, transforms the old ID (e.g., a Zitadel UUID) into DEX's encoded format:
newID = EncodeDexUserID(oldUserID, connectorID)
→ base64(protobuf{field1: userID, field2: connectorID})
This encoded ID is what DEX puts in JWT sub claims, ensuring continuity after switching IdPs.
Tables requiring user ID updates
Main store (store.db / PostgreSQL) — 10 columns
| # | Table | Column | Notes |
|---|---|---|---|
| 1 | users |
id (PK) |
Primary key update, done last in transaction |
| 2 | personal_access_tokens |
user_id (FK) |
|
| 3 | personal_access_tokens |
created_by |
|
| 4 | peers |
user_id |
|
| 5 | user_invites |
created_by |
GORM TableName() returns user_invites (not user_invite_records) |
| 6 | accounts |
created_by |
|
| 7 | proxy_access_tokens |
created_by |
|
| 8 | jobs |
triggered_by |
|
| 9 | policy_rules |
authorized_user |
SSH policy user refs — missed by fork's implementation |
| 10 | access_log_entries |
user_id |
Reverse proxy access logs — missed by both fork and original plan |
Activity store (events.db / PostgreSQL) — 3 columns
| # | Table | Column | Notes |
|---|---|---|---|
| 10 | events |
initiator_id |
|
| 11 | events |
target_id |
|
| 12 | deleted_users |
id (PK) |
Raw SQL needed (GORM can't update PK via Model) |
Total: 13 columns (10 main store + 3 activity store)
Verified NOT needing migration
policy_rules.authorized_groups— maps group IDs → local Unix usernames (e.g., "root", "admin"), NOT NetBird user IDsgroups/group_peers— store peer IDs, not user IDsroutes,nameserver_groups,setup_keys,posture_checks,networks,dns_settings— no user ID fields
What exists in main repo vs what needs to be created
| Component | Main repo status | Action |
|---|---|---|
EncodeDexUserID / DecodeDexUserID |
EXISTS at idp/dex/provider.go |
No changes |
| EmbeddedIdP config + manager | EXISTS at management/server/idp/embedded.go |
No changes |
| DEX provider | EXISTS at idp/dex/provider.go |
No changes |
| Server bootstrapping (modules.go) | EXISTS at management/internals/server/modules.go |
No changes |
Store.ListUsers() interface method |
MISSING | Add to management/server/store/store.go |
SqlStore.ListUsers() implementation |
MISSING | Add to management/server/store/sql_store.go |
Store.UpdateUserID() interface method |
MISSING | Add to management/server/store/store.go |
SqlStore.UpdateUserID() implementation |
MISSING | Add to management/server/store/sql_store.go |
activity.Store.UpdateUserID() interface |
MISSING | Add to management/server/activity/store.go |
Activity Store.UpdateUserID() implementation |
MISSING | Add to management/server/activity/store/sql_store.go |
InMemoryEventStore.UpdateUserID() no-op |
MISSING | Add to management/server/activity/store.go (compile-blocking) |
txDeferFKConstraints helper |
MISSING | Port from fork to management/server/store/sql_store.go |
| Store mock regeneration | NEEDED | Run go generate ./management/server/store/... after interface changes |
| Migration package | MISSING | Create at management/server/idp/migration/ |
| Standalone CLI tool | MISSING | Create at management/cmd/migrate-idp/ |
Source of patterns: Fork at /Users/ashleymensah/Documents/netbird-repos/nico-netbird/netbird
Implementation plan
Step 1: Add ListUsers() to store interface and implementation
File: management/server/store/store.go — add to Store interface:
ListUsers(ctx context.Context) ([]*types.User, error)
File: management/server/store/sql_store.go — add implementation:
func (s *SqlStore) ListUsers(ctx context.Context) ([]*types.User, error) {
var users []*types.User
if err := s.db.Find(&users).Error; err != nil {
return nil, status.Errorf(status.Internal, "failed to list users")
}
// Decrypt sensitive fields (Email, Name) so logging shows readable values.
// No-op when fieldEncrypt is nil (no encryption key configured).
for _, user := range users {
if err := user.DecryptSensitiveData(s.fieldEncrypt); err != nil {
return nil, status.Errorf(status.Internal, "failed to decrypt user data")
}
}
return users, nil
}
Step 2: Add UpdateUserID() to store interface and implementation
File: management/server/store/store.go — add to Store interface:
UpdateUserID(ctx context.Context, accountID, oldUserID, newUserID string) error
File: management/server/store/sql_store.go — add implementation (ported from fork, with policy_rules fix):
func (s *SqlStore) UpdateUserID(ctx context.Context, accountID, oldUserID, newUserID string) error {
updates := []fkUpdate{
{&types.PersonalAccessToken{}, "user_id", "user_id = ?"},
{&types.PersonalAccessToken{}, "created_by", "created_by = ?"},
{&nbpeer.Peer{}, "user_id", "user_id = ?"},
{&types.UserInviteRecord{}, "created_by", "created_by = ?"},
{&types.Account{}, "created_by", "created_by = ?"},
{&types.ProxyAccessToken{}, "created_by", "created_by = ?"},
{&types.Job{}, "triggered_by", "triggered_by = ?"},
{&types.PolicyRule{}, "authorized_user", "authorized_user = ?"}, // missed by fork
{&accesslogs.AccessLogEntry{}, "user_id", "user_id = ?"}, // missed by both fork and original plan
}
// Transaction with deferred FK constraints, update FKs first, then users.id PK
// Note: txDeferFKConstraints helper must be ported from fork (does not exist in main repo)
// - SQLite: PRAGMA defer_foreign_keys = ON
// - PostgreSQL: SET CONSTRAINTS ALL DEFERRED (belt-and-suspenders; FK-first update order
// already handles non-deferrable constraints)
// - MySQL: handled by existing transaction() helper (SET FOREIGN_KEY_CHECKS = 0)
}
Step 2b: Port txDeferFKConstraints helper
File: management/server/store/sql_store.go — add helper (ported from fork lines 842-853):
func (s *SqlStore) txDeferFKConstraints(tx *gorm.DB) error {
// SQLite: defer FK checks until transaction commit
// PostgreSQL: defer constraints (belt-and-suspenders; update order handles non-deferrable)
// MySQL: already handled by transaction() wrapper
}
Step 3: Add UpdateUserID() to activity store interface and implementation
File: management/server/activity/store.go — add to Store interface:
UpdateUserID(ctx context.Context, oldUserID, newUserID string) error
File: management/server/activity/store.go — add no-op to InMemoryEventStore (compile-blocking):
func (store *InMemoryEventStore) UpdateUserID(_ context.Context, _, _ string) error {
return nil
}
File: management/server/activity/store/sql_store.go — add implementation (ported from fork):
- Update
events.initiator_idandevents.target_idvia GORM - Update
deleted_users.idvia raw SQL (GORM can't update PK via Model) - All in one transaction
Step 3b: Regenerate store mocks
Run go generate ./management/server/store/... to regenerate store_mock.go with the new ListUsers and UpdateUserID methods. Without this, tests using the mock won't compile.
Step 4: Create migration package
New file: management/server/idp/migration/migration.go
- Define narrow interfaces:
type MainStoreUpdater interface { ListUsers(ctx context.Context) ([]*types.User, error) UpdateUserID(ctx context.Context, accountID, oldUserID, newUserID string) error } type ActivityStoreUpdater interface { UpdateUserID(ctx context.Context, oldUserID, newUserID string) error } MigrationConfigstruct:ConnectorID,DryRun,MainStore,ActivityStoreMigrationResultstruct:Migrated,SkippedcountsMigrate(ctx, *MigrationConfig) (*MigrationResult, error):- List all users from main store
- Reconciliation pass: for already-migrated users, ensure activity store is also updated
- For each non-migrated user: encode new ID, update both stores
- Return counts
- Idempotency:
DecodeDexUserID(user.Id)succeeds → user already migrated, skip - Empty-ID guard: skip users with
Id == ""before the decode check (DecodeDexUserID("")succeeds with empty strings — edge case) - Service users:
IsServiceUser=trueusers get re-keyed like all others (they'll be looked up by the new DEX-encoded ID after migration). This is intentional — document in CLI help text. - Uses
EncodeDexUserID/DecodeDexUserIDfromidp/dex/provider.go
New file: management/server/idp/migration/migration_test.go
- Mock-based tests for
Migrate()covering: normal migration, skip already-migrated, dry-run, reconciliation, empty user list, error handling
Step 5: Build the standalone CLI tool
New file: management/cmd/migrate-idp/main.go (~200 lines)
CLI flags:
| Flag | Required | Default | Description |
|---|---|---|---|
--config |
Yes | /etc/netbird/management.json |
Path to management config |
--connector-id |
Yes | — | DEX connector ID to encode into user IDs |
--dry-run |
No | false |
Preview changes without writing |
--no-backup |
No | false |
Skip automatic database backup |
--log-level |
No | info |
Verbosity |
Flow:
- Load management config JSON (reuse
util.ReadJsonWithEnvSub) - Validate: connector-id is non-empty, DB is accessible
- Open main store via
store.NewStore(ctx, engine, datadir, nil, false)— nil metrics, run AutoMigrateskipMigration=falseensures schema is up-to-date (AutoMigrate is idempotent/non-destructive)- Using
truerisks stale schema if user upgrades management + tool simultaneously
- Call
store.SetFieldEncrypt(enc)to enable field decryption (needed forListUsersto return readable Email/Name for logging) - Open activity store via
activity_store.NewSqlStore(ctx, datadir, encryptionKey)- Gracefully handle missing activity store (e.g.,
events.dbdoesn't exist) — warn and skip activity migration
- Gracefully handle missing activity store (e.g.,
- Backup databases (SQLite: file copy; PostgreSQL: print
pg_dumpinstructions) - Call
migration.Migrate(ctx, cfg) - Print summary and exit
New file: management/cmd/migrate-idp/backup.go (~60 lines)
backupSQLiteFile(srcPath)— copies to{src}.backup-{timestamp}
Step 6: Tests
- Unit tests in
migration_test.gowith mock interfaces - Integration test in
management/cmd/migrate-idp/main_test.gowith real SQLite:- Seed users, events, policy rules with
authorized_user, access log entries withuser_id - Run migration, verify all 13 columns updated
- Run again, verify idempotent (0 new migrations)
- Test partial failure reconciliation
- Test missing activity store (graceful skip)
- Seed users, events, policy rules with
User-facing migration procedure
1. Stop management: systemctl stop netbird-management
2. Dry-run: netbird-migrate-idp \
--config /etc/netbird/management.json \
--connector-id "oidc" \
--dry-run
3. Run migration: netbird-migrate-idp \
--config /etc/netbird/management.json \
--connector-id "oidc"
4. Update management.json: Add EmbeddedIdP config with a StaticConnector
whose ID matches the --connector-id used above (see below)
5. Start management: systemctl start netbird-management
Why manual config is required (step 4)
The EmbeddedIdP config block isn't just about the connector — it includes deployment-specific values that depend on your infrastructure: OIDC issuer URL (must match your public domain), dashboard/CLI redirect URIs (depend on your reverse proxy setup), storage paths, the initial owner account (email + bcrypt password hash), and whether local password auth is disabled. Auto-generating these would require the tool to make assumptions about DNS, port config, and proxy setup that could easily be wrong. The connector ID is the only piece the migration tool owns (it's baked into user IDs). Everything else is infrastructure config that belongs in the operator's hands. Getting any of these wrong means management still won't start.
Pitfalls and mitigations
| Risk | Mitigation |
|---|---|
| Management running during migration | Warn user; SQLite will return SQLITE_BUSY with clear error |
| Wrong connector ID | Dry-run shows exact ID transformations; backup enables rollback |
| Partial failure mid-migration | Idempotent: DecodeDexUserID detects already-migrated users; reconciliation pass fixes activity store lag |
| Large user count | Each user migrated in own transaction; progress every 100 users (not per-user to avoid log spam) |
| Missing encryption key for activity store | Read from management config's DataStoreEncryptionKey |
| Missing activity store database | Warn and skip activity migration; main store migration proceeds |
| Empty user ID in database | Explicit guard before decode check; DecodeDexUserID("") succeeds with empty strings |
| Re-running with different connector-id | Already-migrated users correctly skipped (decode succeeds). To change connector-id, restore from backup first |
| MySQL store engine | Supported — existing transaction() helper handles SET FOREIGN_KEY_CHECKS = 0 |
| PostgreSQL non-deferrable FK constraints | Update order (FKs first, PK last) avoids constraint violations regardless of deferrability |
Verification
- Unit tests: Mock-based tests for migration logic (skip/migrate/dry-run/reconcile/empty-ID guard)
- Integration test: Real SQLite databases seeded with test data, verify all 13 columns
- Manual test: Run
--dry-runon a copy of a real self-hosted deployment's databases - Idempotency test: Run migration twice, second run should report 0 migrations
- Policy rules test: Seed
policy_rules.authorized_userwith old user ID, verify it's updated - Access log test: Seed
access_log_entries.user_idwith old user ID, verify it's updated - Missing activity store test: Run with missing
events.db, verify main store migration succeeds with warning
Key files (all paths relative to main repo)
New files to create:
management/server/idp/migration/migration.go— migration interfaces +Migrate()functionmanagement/server/idp/migration/migration_test.go— unit testsmanagement/cmd/migrate-idp/main.go— CLI entry pointmanagement/cmd/migrate-idp/backup.go— SQLite backup logicmanagement/cmd/migrate-idp/main_test.go— integration tests
Existing files to modify:
management/server/store/store.go— addListUsers()andUpdateUserID()to Store interfacemanagement/server/store/sql_store.go— addListUsers(),UpdateUserID(), andtxDeferFKConstraints()implementationsmanagement/server/activity/store.go— addUpdateUserID()to Store interface +InMemoryEventStore.UpdateUserID()no-opmanagement/server/activity/store/sql_store.go— addUpdateUserID()implementation
Generated files to regenerate:
management/server/store/store_mock.go— rungo generate ./management/server/store/...after interface changes
Read-only references (port patterns from fork):
- Fork's
management/server/store/sql_store.go:855-895—UpdateUserID()pattern - Fork's
management/server/activity/store/sql_store.go:230-254— activityUpdateUserID()pattern - Fork's
management/server/idp/migration/migration.go— orchestration logic pattern
Existing files used as-is (no changes):
idp/dex/provider.go—EncodeDexUserID/DecodeDexUserIDmanagement/server/types/policyrule.go:88—AuthorizedUserfieldmanagement/internals/modules/reverseproxy/accesslogs/accesslogentry.go:25—AccessLogEntry.UserIdfieldmanagement/server/idp/embedded.go— EmbeddedIdP manager