* [management] Fence peer status updates with a session token
The connect/disconnect path used a best-effort LastSeen-after-streamStart
comparison to decide whether a status update should land. Under contention
— a re-sync arriving while the previous stream's disconnect was still in
flight, or two management replicas seeing the same peer at once — the
check was a read-then-decide-then-write window: any UPDATE in between
caused the wrong row to be written. The Go-side time.Now() that fed the
comparison also drifted under lock contention, since it was captured
seconds before the write actually committed.
Replace it with an integer-nanosecond fencing token stored alongside the
status. Every gRPC sync stream uses its open time (UnixNano) as its token.
Connects only land when the incoming token is strictly greater than the
stored one; disconnects only land when the incoming token equals the
stored one (i.e. we're the stream that owns the current session). Both
are single optimistic-locked UPDATEs — no read-then-write, no transaction
wrapper.
LastSeen is now written by the database itself (CURRENT_TIMESTAMP). The
caller never supplies it, so the value always reflects the real moment
of the UPDATE rather than the moment the caller queued the work — which
was already off by minutes under heavy lock contention.
Side effects (geo lookup, peer-login-expiration scheduling, network-map
fan-out) are explicitly documented as running after the fence UPDATE
commits, never inside it. Geo also skips the update when realIP equals
the stored ConnectionIP, dropping a redundant SavePeerLocation call on
same-IP reconnects.
Tests cover the three semantic cases (matched disconnect lands, stale
disconnect dropped, stale connect dropped) plus a 16-goroutine race test
that asserts the highest token always wins.
* [management] Add SessionStartedAt to peer status updates
Stored `SessionStartedAt` for fencing token propagation across goroutines and updated database queries/functions to handle the new field. Removed outdated geolocation handling logic and adjusted tests for concurrency safety.
* Rename `peer_status_required_approval` to `peer_status_requires_approval` in SQL store fields
peerShouldReceiveUpdate waited 500ms for the expected update message,
and every outer wrapper across the management/server test suite paired
it with a 1s goroutine-drain timeout. Both were too tight for slower
CI runners (MySQL, FreeBSD, loaded sqlite), producing intermittent
"Timed out waiting for update message" failures in tests like
TestDNSAccountPeersUpdate, TestPeerAccountPeersUpdate, and
TestNameServerAccountPeersUpdate.
Introduce peerUpdateTimeout (5s) next to the helper and use it both in
the helper and in every outer wrapper so the two timeouts stay in sync.
Only runs down on failure; passing tests return as soon as the channel
delivers, so there is no slowdown on green runs.
* Add support for legacy IDP cache environment variable
* Centralize cache store creation to reuse a single Redis connection pool
Each cache consumer (IDP cache, token store, PKCE store, secrets manager,
EDR validator) was independently calling NewStore, creating separate Redis
clients with their own connection pools — up to 1400 potential connections
from a single management server process.
Introduce a shared CacheStore() singleton on BaseServer that creates one
store at boot and injects it into all consumers. Consumer constructors now
receive a store.StoreInterface instead of creating their own.
For Redis mode, all consumers share one connection pool (1000 max conns).
For in-memory mode, all consumers share one GoCache instance.
* Update management-integrations module to latest version
* sync go.sum
* Export `GetAddrFromEnv` to allow reuse across packages
* Update management-integrations module version in go.mod and go.sum
* Update management-integrations module version in go.mod and go.sum
* **Bug Fixes**
* Fixed domain configuration handling in single account mode to properly retrieve and apply domain settings from account data.
* Improved error handling when account data is unavailable with fallback to configured default domain.
* **Tests**
* Added comprehensive test coverage for single account mode domain configuration scenarios, including edge cases for missing or unavailable account data.
CLI: new expose command to publish a local port with flags for PIN, password, user groups, custom domain, name prefix and protocol (HTTP default).
Management/API: create/renew/stop expose sessions (streamed status), automatic naming/domain, TTL renewals, background expiration, new management RPCs and client methods.
UI/API: account settings now include peer_expose_enabled and peer_expose_groups; new activity codes for peer expose events.
Bug Fixes
Network and DNS updates now defer service and reverse-proxy reloads until after account updates complete, preventing inconsistent proxy state and race conditions.
Chores
Removed automatic peer/broadcast updates immediately following bulk service reloads.
Tests
Added a test ensuring network-range changes complete without deadlock.
Embed Dex as a built-in IdP to simplify self-hosting setup.
Adds an embedded OIDC Identity Provider (Dex) with local user management and optional external IdP connectors (Google/GitHub/OIDC/SAML), plus device-auth flow for CLI login. Introduces instance onboarding/setup endpoints (including owner creation), field-level encryption for sensitive user data, a streamlined self-hosting provisioning script, and expanded APIs + test coverage for IdP management.
more at https://github.com/netbirdio/netbird/pull/5008#issuecomment-3718987393
This PR adds user approval functionality to the management system, allowing administrators to manually approve new users joining via domain matching. When enabled, users are blocked with pending approval status until explicitly approved by an admin.
Adds UserApprovalRequired setting to control manual user approval requirement
Introduces user approval and rejection endpoints with corresponding business logic
Prevents pending approval users from adding peers or logging in
This PR adds a validate flow response feature to the management server by integrating an IntegratedValidator component. The main purpose is to enable validation of PKCE authorization flows through an integrated validator interface.
- Adds a new ValidateFlowResponse method to the IntegratedValidator interface
- Integrates the validator into the management server to validate PKCE authorization flows
- Updates dependency version for management-integrations
This PR introduces a new onboarding feature to handle such flows in the dashboard by defining an AccountOnboarding model, persisting it in the store, exposing CRUD operations in the manager and HTTP handlers, and updating API schemas and tests accordingly.
Add AccountOnboarding struct and embed it in Account
Extend Store and DefaultAccountManager with onboarding methods and SQL migrations
Update HTTP handlers, API types, OpenAPI spec, and add end-to-end tests
when adding a peer we calculate the network map an account using backpressure functions and some updates might arrive around the time we are deleting a policy.
This change ensures we wait enough time for the updates from add peer to be sent and read before continuing with the test logic
This PR introduces a new configuration option `DisableDefaultPolicy` that prevents the creation of the default all-to-all policy when new accounts are created. This is useful for automation scenarios where explicit policies are preferred.
### Key Changes:
- Added DisableDefaultPolicy flag to the management server config
- Modified account creation logic to respect this flag
- Updated all test cases to explicitly pass the flag (defaulting to false to maintain backward compatibility)
- Propagated the flag through the account manager initialization chain
### Testing:
- Verified default behavior remains unchanged when flag is false
- Confirmed no default policy is created when flag is true
- All existing tests pass with the new parameter
* Avoid recalculating next peer expiration
- Check if an account schedule is already running
- Cancel executing schedules only when changes occurs
- Add more context info to logs
* fix tests