netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-05-31 21:19:55 +00:00

Author	SHA1	Message	Date
mlsmaycon	3aa62e31a6	fix(synth): refresh account netmap on embedded proxy connect/disconnect SynthesizePrivateServiceZones emits A records keyed on the proxy peer's Status.Connected flag and tunnel IP, so the synth output changes every time an embedded `netbird proxy` peer flips state. The trigger was missing: MarkPeerConnected only called OnPeersUpdated when the peer was LoginExpired, and MarkPeerDisconnected never called it at all. Result: when a fresh proxy reconnects, user peers in the account hold their stale netmap (or no synth at all) until some unrelated change pokes the controller. Fire OnPeersUpdated whenever an embedded proxy peer transitions connected/disconnected. OnPeersUpdated routes through bufferSendUpdateAccountPeers so consecutive flaps coalesce and don't storm the controller. AddPeer already calls OnPeersAdded for the new peer ID but that only recomputes the proxy peer's own netmap — user peers still need this new account-wide refresh to pick up the proxy peer's tunnel IP for their private-service DNS records.	2026-05-21 14:52:43 +02:00
mlsmaycon	717c2b493d	fix(review): coderabbit follow-ups round 2 - status_test.go TestStatus_PeerStateByIP: replace require := assert.New(t) shadowing pattern with req := require.New(t) so setup assertions are fail-fast and the require package isn't shadowed. Add TestStatus_PeerStateByIP_MatchesIPv6 for the IPv6-only path. - status.go PeerStateByIP: match against both State.IP and State.IPv6 so IPv6-only peers are found by the private-service tunnel lookup. Empty input short-circuits before the loop and empty State.IP/State.IPv6 fields are treated as non-matches. - proxy.go ValidateTunnelPeer: call enforceAccountScope(ctx, service.AccountID) after the service lookup, mirroring ValidateSession. Without it, an account-scoped (BYOP) proxy token could mint session JWTs for another account's domain. - sql_store.go getClusterCapability: thread the caller's context into the GORM query via WithContext(ctx) so the lookup is cancellable and honours request deadlines. (Pre-existing on origin/main; included here because GetClusterSupportsPrivate added by this PR is now a caller.) Skipped: - proxyAcceptsMapping SupportsCustomPorts == true: the existing != nil check is intentional. The accompanying test in this PR (TestSendServiceUpdateToCluster_FiltersOnCapability) explicitly asserts "new proxy with SupportsCustomPorts=false should still receive mapping" — the non-nil check encodes "proxy is new enough to understand the protocol", not "proxy can bind custom ports". Tightening to *bool==true would break that design and the test.	2026-05-21 12:03:27 +02:00
mlsmaycon	627ee71fa8	fix(review): coderabbit follow-ups - openapi.yml: declare default: false on ServiceTargetOptions.direct_upstream so generated clients/validators reflect the documented default. - proto/proxy_service.proto: ValidateTunnelPeer doc + denied_reason list said "distribution_groups" (bearer-auth field) but the actual gate is service.access_groups. Replaced both occurrences to match the code path in checkPeerGroupAccess. - peers/manager.go (GetPeerWithGroups) + users/manager.go (GetUserWithGroups): on store error after a successful first lookup, both now return (nil, nil, err) so callers can't get a valid entity alongside a non-nil error. Findings skipped with reasons: - embedded.go merged CLI/Dashboard redirect URIs: pre-existing on origin/main, not introduced by this PR. - account_mock.go MarkPeerDisconnected zero-time UnixNano: same — pre-existing. - openapi Service schema if/then conditionals: Go-side Validate() already enforces these invariants (Private + non-empty AccessGroups, mode=http, mutually-exclusive with bearer), and oapi-codegen on OpenAPI 3.1.x doesn't honour allOf/if/then anyway. - .patch / .diff / b-n-p.sh: untracked personal artifacts, not part of any commit.	2026-05-21 11:45:11 +02:00
mlsmaycon	b21a91a507	fix(service): require non-empty host + direct_upstream on cluster targets Cluster targets dial the upstream via the host network stack, so an empty Host leaves the proxy with nothing to dial and DirectUpstream=false would route the request through the embedded NetBird client (wrong network for a cluster address). Validate() and validateTargetReferences now reject both shapes. Tests: - TestValidate_HTTPClusterTarget / _RequiresTargetId / TestValidate_Private_{AcceptsClusterTargetWithAccessGroups, RequiresAccessGroups, RejectsBearerAuth} updated to populate Host and DirectUpstream so they exercise the path past the new gates. - TestValidate_HTTPClusterTarget_RequiresHost and _RequiresDirectUpstream pin the two new error paths. - TestValidateTargetReferences_ClusterTargetSkipsLookup updated to set DirectUpstream on its fixture; new _ClusterTargetRequiresDirectUpstream test covers the store-side rejection. Drive-bys (no behavior change beyond what existing tests cover): - proxy/proxy.go: shortened the Capabilities.Private / Cluster.Private doc comments. - users/manager.go: moved the GetUserWithGroups doc from the interface to the impl. - proxy/cmd/proxy/cmd/root.go: removed unused NewRootCmd. - tunnel_cache.go: bumped tunnelCacheTTL from 30s to 300s (matches the "5 minutes" target documented on the constant; existing TTL-expiry test uses the constant directly so the bump is picked up automatically).	2026-05-21 11:30:07 +02:00
mlsmaycon	167ee08e14	feat(private-service): expose NetBird-only services over tunnel peers Adds a new "private" service mode for the reverse proxy: services reachable exclusively over the embedded WireGuard tunnel, gated by per-peer group membership instead of operator auth schemes. Wire contract - ProxyMapping.private (field 13): the proxy MUST call ValidateTunnelPeer and fail closed; operator schemes are bypassed. - ProxyCapabilities.private (4) + supports_private_service (5): capability gate. Management never streams private mappings to proxies that don't claim the capability; the broadcast path applies the same filter via filterMappingsForProxy. - ValidateTunnelPeer RPC: resolves an inbound tunnel IP to a peer, checks the peer's groups against service.AccessGroups, and mints a session JWT on success. checkPeerGroupAccess fails closed when a private service has empty AccessGroups. - ValidateSession/ValidateTunnelPeer responses now carry peer_group_ids + peer_group_names so the proxy can authorise policy-aware middlewares without an extra management round-trip. - ProxyInboundListener + SendStatusUpdate.inbound_listener: per-account inbound listener state surfaced to dashboards. - PathTargetOptions.direct_upstream (11): bypass the embedded NetBird client and dial the target via the proxy host's network stack for upstreams reachable without WireGuard. Data model - Service.Private (bool) + Service.AccessGroups ([]string, JSON- serialised). Validate() rejects bearer auth on private services. Copy() deep-copies AccessGroups. pgx getServices loads the columns. - DomainConfig.Private threaded into the proxy auth middleware. Request handler routes private services through forwardWithTunnelPeer and returns 403 on validation failure. - Account-level SynthesizePrivateServiceZones (synthetic DNS) and injectPrivateServicePolicies (synthetic ACL) gate on len(svc.AccessGroups) > 0. Proxy - /netbird proxy --private (embedded mode) flag; Config.Private in proxy/lifecycle.go. - Per-account inbound listener (proxy/inbound.go) binding HTTP/HTTPS on the embedded NetBird client's WireGuard tunnel netstack. - proxy/internal/auth/tunnel_cache: ValidateTunnelPeer response cache with single-flight de-duplication and per-account eviction. - Local peerstore short-circuit: when the inbound IP isn't in the account roster, deny fast without an RPC. - proxy/server.go reports SupportsPrivateService=true and redacts the full ProxyMapping JSON from info logs (auth_token + header-auth hashed values now only at debug level). Identity forwarding - ValidateSessionJWT returns user_id, email, method, groups, group_names. sessionkey.Claims carries Email + Groups + GroupNames so the proxy can stamp identity onto upstream requests without an extra management round-trip on every cookie-bearing request. - CapturedData carries userEmail / userGroups / userGroupNames; the proxy stamps X-NetBird-User and X-NetBird-Groups on r.Out from the authenticated identity (strips client-supplied values first to prevent spoofing). - AccessLog.UserGroups: access-log enrichment captures the user's group memberships at write time so the dashboard can render group context without reverse-resolving stale memberships. OpenAPI/dashboard surface - ReverseProxyService gains private + access_groups; ReverseProxyCluster gains private + supports_private. ReverseProxyTarget target_type enum gains "cluster". ServiceTargetOptions gains direct_upstream. ProxyAccessLog gains user_groups.	2026-05-20 22:46:18 +02:00
Pascal Fischer	454ff66518	[management] scope network router update call (#6222 )	2026-05-20 18:24:00 +02:00
Maycon Santos	d250f92c43	feat(reverse-proxy): clusters API surfaces type, online status, and capability flags (#6148 ) The cluster listing now answers three questions in one round-trip instead of forcing the dashboard to cross-reference the domains API: which clusters can this account see, are they currently up, and what do they support. The ProxyCluster wire type drops the boolean self_hosted in favour of a `type` enum (`account` / `shared`) plus explicit `online`, `supports_custom_ports`, `require_subdomain`, and `supports_crowdsec` fields. Store query reworked so offline clusters still appear (no last_seen WHERE), with online and connected_proxies both derived from the existing 2-min active window via portable CASE expressions; the 1-hour heartbeat reaper still removes long-stale rows. Service manager enriches each cluster with the capability flags via the existing per-cluster lookups (CapabilityProvider now also exposes ClusterSupportsCrowdSec). GetActiveClusterAddresses* keep their tight 2-min filter so service routing and domain enumeration aren't pulled into the wider window. The hard cut removes self_hosted from the response — the dashboard is the only consumer and is updated in the matching PR; no transitional field is shipped. Adds a cross-engine regression test asserting offline clusters surface, connected_proxies counts only fresh proxies, and account-scoped BYOP clusters never leak across accounts.	2026-05-20 10:08:34 +02:00
Maycon Santos	80966ab1b0	[management] Ensure SessionStartedAt has a default value (#6211 ) * [management] Ensure SessionStartedAt has a default value Avoid null values for the new column * [management] Add PeerStatus with LastSeen in peer_test * [management] Add migration for PeerStatusSessionStartedAt default value * [management] Add PeerStatus with LastSeen in migration tests	2026-05-20 08:25:30 +02:00
Maycon Santos	af24fd7796	[management] Add metrics for peer status updates and ephemeral cleanup (#6196 ) * [management] Add metrics for peer status updates and ephemeral cleanup The session-fenced MarkPeerConnected / MarkPeerDisconnected path and the ephemeral peer cleanup loop both run silently today: when fencing rejects a stale stream, when a cleanup tick deletes peers, or when a batch delete fails, we have no operational signal beyond log lines. Add OpenTelemetry counters and a histogram so the same SLO-style dashboards that already exist for the network-map controller can cover peer connect/disconnect and ephemeral cleanup too. All new attributes are bounded enums: operation in {connect,disconnect} and outcome in {applied,stale,error,peer_not_found}. No account, peer, or user ID is ever written as a metric label — total cardinality is fixed at compile time (8 counter series, 2 histogram series, 4 unlabeled ephemeral series). Metric methods are nil-receiver safe so test composition that doesn't wire telemetry (the bulk of the existing tests) works unchanged. The ephemeral manager exposes a SetMetrics setter rather than taking the collector through its constructor, keeping the constructor signature stable across all test call sites. * [management] Add OpenTelemetry metrics for ephemeral peer cleanup Introduce counters for tracking ephemeral peer cleanup, including peers pending deletion, cleanup runs, successful deletions, and failed batches. Metrics are nil-receiver safe to ensure compatibility with test setups without telemetry.	2026-05-18 22:55:19 +02:00
Maycon Santos	13d32d274f	[management] Fence peer status updates with a session token (#6193 ) * [management] Fence peer status updates with a session token The connect/disconnect path used a best-effort LastSeen-after-streamStart comparison to decide whether a status update should land. Under contention — a re-sync arriving while the previous stream's disconnect was still in flight, or two management replicas seeing the same peer at once — the check was a read-then-decide-then-write window: any UPDATE in between caused the wrong row to be written. The Go-side time.Now() that fed the comparison also drifted under lock contention, since it was captured seconds before the write actually committed. Replace it with an integer-nanosecond fencing token stored alongside the status. Every gRPC sync stream uses its open time (UnixNano) as its token. Connects only land when the incoming token is strictly greater than the stored one; disconnects only land when the incoming token equals the stored one (i.e. we're the stream that owns the current session). Both are single optimistic-locked UPDATEs — no read-then-write, no transaction wrapper. LastSeen is now written by the database itself (CURRENT_TIMESTAMP). The caller never supplies it, so the value always reflects the real moment of the UPDATE rather than the moment the caller queued the work — which was already off by minutes under heavy lock contention. Side effects (geo lookup, peer-login-expiration scheduling, network-map fan-out) are explicitly documented as running after the fence UPDATE commits, never inside it. Geo also skips the update when realIP equals the stored ConnectionIP, dropping a redundant SavePeerLocation call on same-IP reconnects. Tests cover the three semantic cases (matched disconnect lands, stale disconnect dropped, stale connect dropped) plus a 16-goroutine race test that asserts the highest token always wins. * [management] Add SessionStartedAt to peer status updates Stored `SessionStartedAt` for fencing token propagation across goroutines and updated database queries/functions to handle the new field. Removed outdated geolocation handling logic and adjusted tests for concurrency safety. * Rename `peer_status_required_approval` to `peer_status_requires_approval` in SQL store fields	2026-05-18 20:25:12 +02:00
Nicolas Frati	705f87fc20	[management] fix: device redirect uri wasn't registered (#6191 ) * fix: device redirect uri wasn't registered * fix lint	2026-05-18 12:57:59 +02:00
Viktor Liu	22e2519d71	[management] Avoid peer IP reallocation when account settings update preserves the network range (#6173 )	2026-05-16 15:51:48 +02:00
Viktor Liu	ea9fab4396	[management] Allocate and preserve IPv6 overlay addresses for embedded proxy peers (#6132 )	2026-05-14 16:05:33 +02:00
Vlad	77b479286e	[management] fix offline statuses for public proxy clusters (#6133 )	2026-05-14 13:27:50 +02:00
Vlad	07cbfdbede	[proxy] feature: bring your own proxy (#5627 )	2026-05-11 14:31:38 +02:00
Nicolas Frati	e89aad09f5	[management] Enable MFA for local users (#5804 ) * wip: totp for local users * fix providers not getting populated * polished UI and fix post_login_redirect_uri * fix: make sure logout is only prompted from oidc flow Signed-off-by: jnfrati <nicofrati@gmail.com> * update templates Signed-off-by: jnfrati <nicofrati@gmail.com> * deps: update dex dependency Signed-off-by: jnfrati <nicofrati@gmail.com> * fix qube issues Signed-off-by: jnfrati <nicofrati@gmail.com> * replace window with globalThis on home html Signed-off-by: jnfrati <nicofrati@gmail.com> * fixed coderabbit comments Signed-off-by: jnfrati <nicofrati@gmail.com> * debug * remove unused config and rename totp issuer * deps: update dex reference to latest * add dashboard post logout redirect uri to embedded config * implemented api for mfa configuration * update docs and config parsing * catch error on idp manager init mfa * fix tests * Add remember me for MFA * Add cookie encryption and session share between tabs * fixed logout showing non actionable error and session cookie encription key * fixed missing mfa settings on sql query for account * fix code index for mfa activity --------- Signed-off-by: jnfrati <nicofrati@gmail.com> Co-authored-by: braginini <bangvalo@gmail.com>	2026-05-08 16:31:20 +02:00
Viktor Liu	205ebcfda2	[management, client] Add IPv6 overlay support (#5631 )	2026-05-07 11:33:37 +02:00
Pascal Fischer	bfeb9b19ec	[management] remove permissions from geolocations api (#6091 )	2026-05-06 13:07:01 +02:00
Pascal Fischer	b19b7464ea	[management] fix flaky invite token test (#6077 )	2026-05-05 18:48:51 +02:00
Pascal Fischer	97db824929	[management] fix proxy reconnect (#6063 )	2026-05-04 20:43:25 +02:00
Bethuel Mmbaga	6262b0d841	[management] Track pending approval in peer event metadata (#6040 )	2026-05-04 12:47:13 +03:00
Viktor Liu	057d651d2e	[client, proxy] Add packet capture to debug bundle and CLI (#5891 )	2026-05-04 11:28:56 +02:00
Misha Bragin	c4b2da4c92	[management] Add public connection ipv4 and ipv6 posture check (#6038 ) This change enables admins to configure posture checks for connecting public IPs of their peers. It changes the behavior of the check as well and now the evaluation is if the received network is part of the configured network.	2026-04-30 18:36:50 +02:00
Nicolas Frati	dcd1db42ef	[management] Enable PAT creation during setup (#6003 ) * enable pat creation on setup * remove logic from handler towards setup service * fix lint issue * fix rollback on account id returning empty * fix coderabbit comments * fix setup PAT rollback behavior	2026-04-30 17:21:35 +02:00
Pascal Fischer	f29f5a0978	[management] add monitoring for nmap update source (#6036 )	2026-04-30 14:52:54 +02:00
Bethuel Mmbaga	df197d5001	[management] Prevent JWT reuse during peer login (#6002 )	2026-04-29 15:04:27 +03:00
Bethuel Mmbaga	db44848e2d	[management] Drop netmap calculation on peer read (#6006 )	2026-04-28 18:25:56 +03:00
Bethuel Mmbaga	f8745723fc	[management] Add Microsoft AD FS support for embedded Dex identity providers (#6008 )	2026-04-28 12:42:19 +03:00
Vlad	154b81645a	[management] removed legacy network map code (#5565 )	2026-04-27 16:02:54 +02:00
Zoltan Papp	f732b01a05	[management] unify peer-update test timeout via constant (#5952 ) peerShouldReceiveUpdate waited 500ms for the expected update message, and every outer wrapper across the management/server test suite paired it with a 1s goroutine-drain timeout. Both were too tight for slower CI runners (MySQL, FreeBSD, loaded sqlite), producing intermittent "Timed out waiting for update message" failures in tests like TestDNSAccountPeersUpdate, TestPeerAccountPeersUpdate, and TestNameServerAccountPeersUpdate. Introduce peerUpdateTimeout (5s) next to the helper and use it both in the helper and in every outer wrapper so the two timeouts stay in sync. Only runs down on failure; passing tests return as soon as the channel delivers, so there is no slowdown on green runs.	2026-04-23 21:19:21 +02:00
Pascal Fischer	fa0d58d093	[management] exclude peers for expiration job that have already been marked expired (#5970 )	2026-04-23 16:01:54 +02:00
Vlad	b6038e8acd	[management] refactor: changeable pat rate limiting (#5946 )	2026-04-23 15:13:22 +02:00
Bethuel Mmbaga	57b23c5b25	[management] Propagate context changes to upstream middleware (#5956 )	2026-04-21 23:06:52 +03:00
Vlad	eb3aa96257	[management] check policy for changes before actual db update (#5405 )	2026-04-21 18:37:04 +02:00
Nicolas Frati	8ae8f2098f	[management] chores: fix lint error on google workspace (#5907 ) * chores: fix lint error on google workspace * chores: updated google api dependency * update google golang api sdk to latest	2026-04-16 20:02:09 +02:00
Maycon Santos	53b04e512a	[management] Reuse a single cache store across all management server consumers (#5889 ) * Add support for legacy IDP cache environment variable * Centralize cache store creation to reuse a single Redis connection pool Each cache consumer (IDP cache, token store, PKCE store, secrets manager, EDR validator) was independently calling NewStore, creating separate Redis clients with their own connection pools — up to 1400 potential connections from a single management server process. Introduce a shared CacheStore() singleton on BaseServer that creates one store at boot and injects it into all consumers. Consumer constructors now receive a store.StoreInterface instead of creating their own. For Redis mode, all consumers share one connection pool (1000 max conns). For in-memory mode, all consumers share one GoCache instance. * Update management-integrations module to latest version * sync go.sum * Export `GetAddrFromEnv` to allow reuse across packages * Update management-integrations module version in go.mod and go.sum * Update management-integrations module version in go.mod and go.sum	2026-04-16 16:04:53 +02:00
Bethuel Mmbaga	08f624507d	[management] Enforce peer or peer groups requirement for network routers (#5894 )	2026-04-16 13:12:19 +03:00
Pascal Fischer	c5623307cc	[management] add context cancel monitoring (#5879 )	2026-04-14 12:49:18 +02:00
Vlad	7f666b8022	[management] revert ctx dependency in get account with backpressure (#5878 )	2026-04-14 12:16:03 +02:00
Viktor Liu	0a30b9b275	[management, proxy] Add CrowdSec IP reputation integration for reverse proxy (#5722 )	2026-04-14 12:14:58 +02:00
Pascal Fischer	5259e5df51	[management] add domain and service cleanup migration (#5850 )	2026-04-11 12:00:40 +02:00
Pascal Fischer	ee588e1536	Revert "[management] allow local routing peer resource (#5814 )" (#5847 )	2026-04-10 14:53:47 +02:00
Pascal Fischer	2a8aacc5c9	[management] allow local routing peer resource (#5814 )	2026-04-10 13:08:21 +02:00
Pascal Fischer	15709bc666	[management] update account delete with proper proxy domain and service cleanup (#5817 )	2026-04-10 13:08:04 +02:00
Pascal Fischer	ee343d5d77	[management] use sql null vars (#5844 )	2026-04-09 18:12:38 +02:00
Maycon Santos	099c493b18	[management] network map tests (#5795 ) * Add network map benchmark and correctness test files * Add tests for network map components correctness and edge cases * Skip benchmarks in CI and enhance network map test coverage with new helper functions * Remove legacy network map benchmarks and tests; refactor components-based test coverage for clarity and scalability.	2026-04-08 21:28:29 +02:00
Pascal Fischer	c1d1229ae0	[management] use NullBool for terminated flag (#5829 )	2026-04-08 21:08:43 +02:00
Viktor Liu	0588d2dbe1	[management] Load missing service columns in pgx account loader (#5816 )	2026-04-07 14:56:56 +02:00
Pascal Fischer	14b3b77bda	[management] validate permissions on groups read with name (#5749 )	2026-04-07 14:13:09 +02:00
Bethuel Mmbaga	c2c6396a04	[management] Allow updating embedded IdP user name and email (#5721 )	2026-04-02 13:02:10 +03:00

1 2 3 4 5 ...

989 Commits