Commit Graph

2914 Commits

Author SHA1 Message Date
mlsmaycon
75c8fa78e2 Merge branch 'main' into follow-up-private-services 2026-05-27 16:00:40 +02:00
Pascal Fischer
1f9a829f2c [management] update log levels (#6266) 2026-05-27 11:43:49 +02:00
mlsmaycon
4f7c73369b fix(proxy): background ctx for already-started AddPeer notification
The earlier ctx fix covered the async runClientStartup path but missed
the synchronous branch: when a service is added to an already-started
client, AddPeer called NotifyStatus with the caller's request-scoped
ctx. A cancelled request/stream could drop the connected notification
to management. Use context.Background() here too, matching
notifyClientReady.

Extends TestNetBird_AddPeer_ExistingStartedClient_NotifiesStatus to
pass a pre-cancelled caller ctx and assert the notification still ran
on a non-cancelled context.
2026-05-26 22:52:11 +02:00
mlsmaycon
924be2116b chore(api,ci,docs,test): private-service schema, proto-check, fixups
Non-functional cleanups and contract/CI hardening around the
private-service work:

API schema (openapi.yml):
- Require a non-empty access_groups and mode=http when private=true,
  on both Service and ServiceRequest, mirroring
  validatePrivateRequirements. mode stays optional-but-constrained
  (empty defaults to http server-side), matching runtime.

CI (proto-version-check.yml):
- Cover renamed .pb.go files (read base via previous_filename).
- Match protoc-gen-go-grpc version headers (optional "- " prefix and
  -gen-go-grpc suffix) so grpc-generated files are in scope.

Docs / comments:
- Reword Config field docs to say defaults are applied at Server.Start
  (initDefaults), not New.
- Rename the obsolete --private-inbound flag to --private across
  comments and the proto doc.

Pre-existing test fixups surfaced by review:
- Repair the integration-tagged validate_session_test.go (SignToken
  signature growth + new Manager interface methods).
- Fix the CI-skip boolean precedence so Windows isn't skipped
  unconditionally.
- Guard the router.HTTPListener type assertion with comma-ok.
2026-05-26 22:00:08 +02:00
mlsmaycon
43c0cb1dc2 fix(rest): reject empty Delete path params in reverse-proxy clients
ReverseProxyClustersAPI.Delete and ReverseProxyTokensAPI.Delete passed
the path parameter into url.PathEscape without an empty check.
PathEscape("") returns "" which collapses the request onto the
collection endpoint ("/api/reverse-proxies/clusters/" /
"/api/reverse-proxies/proxy-tokens/"), so a caller bug delete with no
id reached a routable URL with surprising semantics (typically 405).

Short-circuit with a typed error before the request is built. Tests
mount a handler on the collection path that fails the test if hit, so
the regression is impossible to reintroduce silently.
2026-05-26 21:59:34 +02:00
mlsmaycon
86e24a622d fix(client): include offlinePeers in PeerStateByIP lookup
ReplaceOfflinePeers moves peers into d.offlinePeers but PeerStateByIP
only scanned d.peers. Callers (the local DNS filter via
localPeerConnectivity, embed.Client.IdentityForIP used by the
proxy's tunnel-peer validator) were treating known-but-offline peers
as unknown, which:

- causes the DNS filter to keep returning records pointing at peers
  that have no live tunnel, AND
- makes the proxy's local-roster check deny a request from such a
  peer rather than letting the cached management RPC carry the
  authorisation decision.

Search both slices in PeerStateByIP. Adds a unit test for the IPv4
and IPv6 offline-match paths.
2026-05-26 21:59:34 +02:00
mlsmaycon
af416c656b fix(management): private-service validation + tunnel-IP lookup semantics
- Require an explicit port for L4 cluster targets. validateL4Target
  exempted TargetTypeCluster from the port check, but buildPathMappings
  serializes every L4 target via net.JoinHostPort(host, port) — port=0
  shipped a ":0" upstream. Cluster targets use the same Host/Port
  fields, so the same requirement applies.
- GetPeerByIP returns NotFound on a tunnel-IP miss instead of mapping
  every error to Internal. The proxy's ValidateTunnelPeer probes IPs
  that legitimately aren't in the roster; the miss is expected and now
  distinguishable from a real store failure.
- Thread ctx into getClusterCapability's gorm query so a cancelled
  request doesn't keep the store busy.

Tests updated for the L4-cluster port requirement and the GetPeerByIP
NotFound path.
2026-05-26 21:59:22 +02:00
mlsmaycon
6600f0d45f feat(proxy): short-circuit peer-own-target loops with 421
When a peer that hosts the target of a private service dials its own
service URL the request was being looped through the proxy and back
over WireGuard to the same peer — twice the WG round-trip for no
benefit, with no signal to the caller that something was wrong.

Add isSelfTargetLoop to ReverseProxy.ServeHTTP: when the request
arrived on the per-account overlay listener (IsOverlayOrigin) and the
source tunnel IP matches the target host, refuse the request with 421
Misdirected Request and a body pointing the operator at the backend
directly.

The gate is scoped to overlay origin so requests on the public
listener that happen to share a source IP with the target host are
forwarded normally.
2026-05-26 21:58:58 +02:00
mlsmaycon
878344088f fix(proxy): harden inbound listener resource + startup-ctx handling
Three correctness fixes on the per-account inbound path, with tests:

- Close the logrus ErrorLog PipeWriter on tearDown. WriterLevel hands
  back an *io.PipeWriter backed by a pipe + scanner goroutine that the
  caller owns; the two writers per account (https + plain) were never
  closed, leaking the pipe and goroutine on every teardown.
- Run the post-Start hooks on context.Background(). runClientStartup
  is launched in a goroutine from AddPeer and was inheriting the
  caller's request-scoped ctx, so a cancelled request could abort the
  inbound bring-up or fail the management status notification. The
  tail is split into notifyClientReady so the contract is testable.

Tests cover the PipeWriter close behaviour and assert the readyHandler
+ NotifyStatus calls receive a non-cancelled background context.
2026-05-26 21:58:52 +02:00
mlsmaycon
e09d51c1a8 fix(proxy): gate tunnel-peer fast-path on inbound listener marker
forwardWithTunnelPeer previously accepted any RFC1918 / ULA / CGNAT
source IP, so a public client whose address happened to fall in those
ranges could bypass the configured operator auth scheme by colliding
with a known tunnel IP. The fast-path is now gated on
TunnelLookupFromContext(r.Context()) being present — that context value
is attached only by the per-account inbound (overlay) listener, so the
host-facing listener never enters this branch.

Tests updated to reflect the new requirement: requests that don't
carry the inbound marker now fall through to the regular auth flow.
2026-05-26 21:58:23 +02:00
Bethuel Mmbaga
14af179556 [management] Refactor management server bootstrap (#6256) 2026-05-26 17:44:28 +03:00
Pascal Fischer
1fbb5e6d5d [management] fix owner role update (#6264) 2026-05-26 16:37:58 +02:00
Viktor Liu
6771e35d57 [client] Release js.FuncOf callbacks in wasm ssh and rdp to prevent leaks (#5982) 2026-05-26 14:32:39 +02:00
Viktor Liu
e89b1e0596 [proxy, client] Bound embed client WireGuard per-Device memory (#5962) 2026-05-26 11:51:53 +02:00
Philip Laine
d542c60e21 Refactor Linux system info to use syscalls (#6230) 2026-05-25 21:00:24 +02:00
Viktor Liu
4983b5cf17 [client] Match DNS wildcard handlers on label boundaries (#6255) 2026-05-25 18:38:48 +02:00
Viktor Liu
b3b0feb3b8 [client] Filter scoped/cloned default routes from BSD network monitor RTM_ADD (#6208) 2026-05-25 18:38:21 +02:00
Maycon Santos
7aebdd69dd [management, client, proxy] add expose NetBird-only services over tunnel peers (#6226)
Adds a new "private" service mode for the reverse proxy: services reachable exclusively over the embedded WireGuard tunnel, gated by per-peer group membership instead of operator auth schemes.

Wire contract
- ProxyMapping.private (field 13): the proxy MUST call ValidateTunnelPeer and fail closed; operator schemes are bypassed.
- ProxyCapabilities.private (4) + supports_private_service (5): capability gate. Management never streams private mappings to proxies that don't claim the capability; the broadcast path applies the same filter via filterMappingsForProxy.
- ValidateTunnelPeer RPC: resolves an inbound tunnel IP to a peer, checks the peer's groups against service.AccessGroups, and mints a session JWT on success. checkPeerGroupAccess fails closed when a private service has empty AccessGroups.
- ValidateSession/ValidateTunnelPeer responses now carry peer_group_ids + peer_group_names so the proxy can authorise policy-aware middlewares without an extra management round-trip.
- ProxyInboundListener + SendStatusUpdate.inbound_listener: per-account inbound listener state surfaced to dashboards.
- PathTargetOptions.direct_upstream (11): bypass the embedded NetBird client and dial the target via the proxy host's network stack for upstreams reachable without WireGuard.

Data model
- Service.Private (bool) + Service.AccessGroups ([]string, JSON- serialised). Validate() rejects bearer auth on private services. Copy() deep-copies AccessGroups. pgx getServices loads the columns.
- DomainConfig.Private threaded into the proxy auth middleware. Request handler routes private services through forwardWithTunnelPeer and returns 403 on validation failure.
- Account-level SynthesizePrivateServiceZones (synthetic DNS) and injectPrivateServicePolicies (synthetic ACL) gate on len(svc.AccessGroups) > 0.

Proxy
- /netbird proxy --private (embedded mode) flag; Config.Private in proxy/lifecycle.go.
- Per-account inbound listener (proxy/inbound.go) binding HTTP/HTTPS on the embedded NetBird client's WireGuard tunnel netstack.
- proxy/internal/auth/tunnel_cache: ValidateTunnelPeer response cache with single-flight de-duplication and per-account eviction.
- Local peerstore short-circuit: when the inbound IP isn't in the account roster, deny fast without an RPC.
- proxy/server.go reports SupportsPrivateService=true and redacts the full ProxyMapping JSON from info logs (auth_token + header-auth hashed values now only at debug level).

Identity forwarding
- ValidateSessionJWT returns user_id, email, method, groups, group_names. sessionkey.Claims carries Email + Groups + GroupNames so the proxy can stamp identity onto upstream requests without an extra management round-trip on every cookie-bearing request.
- CapturedData carries userEmail / userGroups / userGroupNames; the proxy stamps X-NetBird-User and X-NetBird-Groups on r.Out from the authenticated identity (strips client-supplied values first to prevent spoofing).
- AccessLog.UserGroups: access-log enrichment captures the user's group memberships at write time so the dashboard can render group context without reverse-resolving stale memberships.

OpenAPI/dashboard surface
- ReverseProxyService gains private + access_groups; ReverseProxyCluster gains private + supports_private. ReverseProxyTarget target_type enum gains "cluster". ServiceTargetOptions gains direct_upstream. ProxyAccessLog gains user_groups.
2026-05-25 17:41:50 +02:00
Viktor Liu
0358be2313 [client] Revert "Clean up legacy 32-bit and HKCU registry entries on Windows install (#6176)" (#6232)
This reverts commit d927ef468a.
v0.71.4
2026-05-21 16:27:12 +02:00
Viktor Liu
37052fd5bc [client] Fix nil channel panic in external chain monitor stop (#6224) v0.71.3 2026-05-20 18:46:51 +02:00
Pascal Fischer
454ff66518 [management] scope network router update call (#6222) 2026-05-20 18:24:00 +02:00
Pascal Fischer
6137a1fcc5 [proxy] concurrent proxy snapshot apply (#6207) 2026-05-20 18:21:22 +02:00
Viktor Liu
4955c345d5 Clean up README header, key features table, and self-hosted quickstart (#6178) 2026-05-20 16:25:56 +02:00
Viktor Liu
9192b4f029 [client] Bump macOS sleep callback timeout to 20s (#6220) 2026-05-20 13:09:22 +02:00
Maycon Santos
c784b02550 [misc] Update contribution guidelines (#6219)
Update contribution guidelines and PR template to require discussing impactful changes with the team
2026-05-20 12:21:03 +02:00
Maycon Santos
d250f92c43 feat(reverse-proxy): clusters API surfaces type, online status, and capability flags (#6148)
The cluster listing now answers three questions in one round-trip
instead of forcing the dashboard to cross-reference the domains API:
which clusters can this account see, are they currently up, and what
do they support. The ProxyCluster wire type drops the boolean
self_hosted in favour of a `type` enum (`account` / `shared`) plus
explicit `online`, `supports_custom_ports`, `require_subdomain`, and
`supports_crowdsec` fields.

Store query reworked so offline clusters still appear (no last_seen
WHERE), with online and connected_proxies both derived from the
existing 2-min active window via portable CASE expressions; the
1-hour heartbeat reaper still removes long-stale rows. Service
manager enriches each cluster with the capability flags via the
existing per-cluster lookups (CapabilityProvider now also exposes
ClusterSupportsCrowdSec).

GetActiveClusterAddresses* keep their tight 2-min filter so service
routing and domain enumeration aren't pulled into the wider window.

The hard cut removes self_hosted from the response — the dashboard is
the only consumer and is updated in the matching PR; no transitional
field is shipped.

Adds a cross-engine regression test asserting offline clusters
surface, connected_proxies counts only fresh proxies, and
account-scoped BYOP clusters never leak across accounts.
2026-05-20 10:08:34 +02:00
Maycon Santos
80966ab1b0 [management] Ensure SessionStartedAt has a default value (#6211)
* [management] Ensure SessionStartedAt has a default value

Avoid null values for the new column

* [management] Add PeerStatus with LastSeen in peer_test

* [management] Add migration for PeerStatusSessionStartedAt default value

* [management] Add PeerStatus with LastSeen in migration tests
2026-05-20 08:25:30 +02:00
Maycon Santos
af24fd7796 [management] Add metrics for peer status updates and ephemeral cleanup (#6196)
* [management] Add metrics for peer status updates and ephemeral cleanup

The session-fenced MarkPeerConnected / MarkPeerDisconnected path and
the ephemeral peer cleanup loop both run silently today: when fencing
rejects a stale stream, when a cleanup tick deletes peers, or when a
batch delete fails, we have no operational signal beyond log lines.

Add OpenTelemetry counters and a histogram so the same SLO-style
dashboards that already exist for the network-map controller can cover
peer connect/disconnect and ephemeral cleanup too.

All new attributes are bounded enums: operation in {connect,disconnect}
and outcome in {applied,stale,error,peer_not_found}. No account, peer,
or user ID is ever written as a metric label — total cardinality is
fixed at compile time (8 counter series, 2 histogram series, 4 unlabeled
ephemeral series).

Metric methods are nil-receiver safe so test composition that doesn't
wire telemetry (the bulk of the existing tests) works unchanged. The
ephemeral manager exposes a SetMetrics setter rather than taking the
collector through its constructor, keeping the constructor signature
stable across all test call sites.

* [management] Add OpenTelemetry metrics for ephemeral peer cleanup

Introduce counters for tracking ephemeral peer cleanup, including peers pending deletion, cleanup runs, successful deletions, and failed batches. Metrics are nil-receiver safe to ensure compatibility with test setups without telemetry.
2026-05-18 22:55:19 +02:00
Maycon Santos
13d32d274f [management] Fence peer status updates with a session token (#6193)
* [management] Fence peer status updates with a session token

The connect/disconnect path used a best-effort LastSeen-after-streamStart
comparison to decide whether a status update should land. Under contention
— a re-sync arriving while the previous stream's disconnect was still in
flight, or two management replicas seeing the same peer at once — the
check was a read-then-decide-then-write window: any UPDATE in between
caused the wrong row to be written. The Go-side time.Now() that fed the
comparison also drifted under lock contention, since it was captured
seconds before the write actually committed.

Replace it with an integer-nanosecond fencing token stored alongside the
status. Every gRPC sync stream uses its open time (UnixNano) as its token.
Connects only land when the incoming token is strictly greater than the
stored one; disconnects only land when the incoming token equals the
stored one (i.e. we're the stream that owns the current session). Both
are single optimistic-locked UPDATEs — no read-then-write, no transaction
wrapper.

LastSeen is now written by the database itself (CURRENT_TIMESTAMP). The
caller never supplies it, so the value always reflects the real moment
of the UPDATE rather than the moment the caller queued the work — which
was already off by minutes under heavy lock contention.

Side effects (geo lookup, peer-login-expiration scheduling, network-map
fan-out) are explicitly documented as running after the fence UPDATE
commits, never inside it. Geo also skips the update when realIP equals
the stored ConnectionIP, dropping a redundant SavePeerLocation call on
same-IP reconnects.

Tests cover the three semantic cases (matched disconnect lands, stale
disconnect dropped, stale connect dropped) plus a 16-goroutine race test
that asserts the highest token always wins.

* [management] Add SessionStartedAt to peer status updates

Stored `SessionStartedAt` for fencing token propagation across goroutines and updated database queries/functions to handle the new field. Removed outdated geolocation handling logic and adjusted tests for concurrency safety.

* Rename `peer_status_required_approval` to `peer_status_requires_approval` in SQL store fields
2026-05-18 20:25:12 +02:00
Nicolas Frati
705f87fc20 [management] fix: device redirect uri wasn't registered (#6191)
* fix: device redirect uri wasn't registered

* fix lint
2026-05-18 12:57:59 +02:00
Viktor Liu
3f91f49277 Clean up legacy 32-bit and HKCU registry entries on Windows install (#6176) v0.71.2 2026-05-16 16:52:57 +02:00
Maycon Santos
347c5bf317 Avoid context cancellation in cancelPeerRoutines (#6175)
When closing go routines and handling peer disconnect, we should avoid canceling the flow due to parent gRPC context cancellation.

This change triggers disconnection handling with a context that is not bound to the parent gRPC cancellation.
2026-05-16 16:29:01 +02:00
Viktor Liu
22e2519d71 [management] Avoid peer IP reallocation when account settings update preserves the network range (#6173) 2026-05-16 15:51:48 +02:00
Vlad
e916f12cca [proxy] auth token generation on mapping (#6157)
* [management / proxy] auth token generation on mapping

* fix tests
v0.71.1
2026-05-15 19:13:44 +02:00
Viktor Liu
9ed2e2a5b4 [client] Drop DNS probes for passive health projection (#5971) 2026-05-15 17:07:38 +02:00
Viktor Liu
2ccae7ec47 [client] Mirror v4 exit selection onto v6 pair and honour SkipAutoApply per route (#6150) 2026-05-15 16:58:47 +02:00
Viktor Liu
07e5450117 [management] Bracket IPv6 reverse-proxy target hosts when building URL Host field (#6141) v0.71.0 2026-05-14 16:42:40 +02:00
Viktor Liu
3f914090cb [client] Bracket IPv6 in embed listeners, expand debug bundle (#6134) 2026-05-14 16:22:53 +02:00
Viktor Liu
ea9fab4396 [management] Allocate and preserve IPv6 overlay addresses for embedded proxy peers (#6132) 2026-05-14 16:05:33 +02:00
Vlad
77b479286e [management] fix offline statuses for public proxy clusters (#6133) 2026-05-14 13:27:50 +02:00
Maycon Santos
ab2a8794e7 [client] Add short flags for status command options (#6137)
* [client] Add short flags for status command options

* uppercase filters
2026-05-14 12:30:42 +02:00
Viktor Liu
9126a192ca [client] Set 0644 perms on SSH client config after os.CreateTemp (#6126) 2026-05-12 15:05:53 +02:00
Viktor Liu
1224d6e1ee [client] Persist management URL and pre-shared key overrides on login (#6065) 2026-05-12 14:52:56 +02:00
Nicolas Frati
96672dd1f8 [management] chores: update dex version (#6124)
* chores: update dex version

* chore: update dex fork
2026-05-12 13:50:35 +02:00
Viktor Liu
946ce4c3da [client] Fix --config flag default to point at profile path (#6122) 2026-05-11 17:48:21 +02:00
Vlad
07cbfdbede [proxy] feature: bring your own proxy (#5627) 2026-05-11 14:31:38 +02:00
Viktor Liu
a4114a5e45 [client] Skip DNS upstream failover on definitive EDE (#6089) 2026-05-11 10:00:23 +02:00
Viktor Liu
6b08e89c7b [relay] Preserve non-standard port in WS dialer URL prep (#6061) 2026-05-11 09:59:33 +02:00
Viktor Liu
a852b3bd34 [client, proxy] Harden uspfilter conntrack and share TCP relay (#5936) 2026-05-11 09:59:13 +02:00
Viktor Liu
afb83b3049 [client] Use unique temp file and clean up on failure when writing ssh config (#6064) 2026-05-11 09:58:49 +02:00