Cluster targets dial the upstream via the host network stack, so an
empty Host leaves the proxy with nothing to dial and DirectUpstream=false
would route the request through the embedded NetBird client (wrong
network for a cluster address). Validate() and validateTargetReferences
now reject both shapes.
Tests:
- TestValidate_HTTPClusterTarget / _RequiresTargetId /
TestValidate_Private_{AcceptsClusterTargetWithAccessGroups,
RequiresAccessGroups, RejectsBearerAuth} updated to populate Host and
DirectUpstream so they exercise the path past the new gates.
- TestValidate_HTTPClusterTarget_RequiresHost and _RequiresDirectUpstream
pin the two new error paths.
- TestValidateTargetReferences_ClusterTargetSkipsLookup updated to set
DirectUpstream on its fixture; new _ClusterTargetRequiresDirectUpstream
test covers the store-side rejection.
Drive-bys (no behavior change beyond what existing tests cover):
- proxy/proxy.go: shortened the Capabilities.Private / Cluster.Private
doc comments.
- users/manager.go: moved the GetUserWithGroups doc from the interface
to the impl.
- proxy/cmd/proxy/cmd/root.go: removed unused NewRootCmd.
- tunnel_cache.go: bumped tunnelCacheTTL from 30s to 300s (matches the
"5 minutes" target documented on the constant; existing TTL-expiry
test uses the constant directly so the bump is picked up automatically).
The SyncMappings restore in 036e91cde kept the metric definitions
(RecordSnapshotSyncDuration, RecordSnapshotBatchDuration,
RecordAddPeerDuration) and the corresponding callbacks (OnAddPeer)
but lost their call sites — they shipped as dead code.
- proxy/server.go: introduce snapshotTracker (the type PR #6207 added
to share batch/sync timing between handleMappingStream and
handleSyncMappingsStream); both stream handlers now go through it.
- proxy/internal/roundtrip/netbird.go: add OnAddPeer struct field and
invoke it after createClientEntry with the per-call duration.
- proxy/server.go: wire s.netbird.OnAddPeer = s.meter.RecordAddPeerDuration
alongside the existing NetBird construction.
No new test coverage — PR #6207's bench tests already exercise the
batch/sync paths and continue to pass.
The MultiTransport's job is per-request dispatch between the embedded
NetBird transport and the stdlib transport based on the direct_upstream
context flag — about 25 lines of code. The header/body debug logging
that was bundled in pulls in:
- io.ReadAll on every request body, even when log level is above debug.
Forces buffering of streaming POSTs (LLM completions, file uploads)
before they reach the upstream transport.
- A header redaction list and a body-snippet cap that duplicate concerns
already covered by netbird.go's per-roundtrip log.
netbird.go already emits method/host/url/account/duration/status/err at
debug level on every roundtrip; nothing in the private-service feature
needs the extra header+body dump.
- Drop logUpstreamRequest, formatHeaders, redactHeaderValue,
snapshotRequestBody, and the upstreamLogBodyMax constant.
- Drop the logger field and the trailing nil arg from NewMultiTransport;
proxy/server.go and the tests updated accordingly.
- Switch header literals to the headerNetBirdUser / headerNetBirdGroups
constants so a future rename can't silently desync tests.
- Add GroupsOnlyWhenEmailEmpty: unattached tunnel peer (machine agent)
case — groups must still be stamped while X-NetBird-User stays unset.
- Add EmailOnlyWhenGroupsEmpty: symmetric case for users without
resolved group memberships.
- Add CapturedDataPresentButEmpty: client-supplied headers are stripped
even when CapturedData carries no identity fields.
- Extend the group-id fallback test to also exercise an explicit
empty-string entry in userGroupNames (not just a shorter slice).
Reinstates the SyncMappings RPC that landed on origin/main and the
client-side fallback to GetMappingUpdate.
- proto: SyncMappings RPC + SyncMappingsRequest{Init|Ack} +
SyncMappingsResponse messages.
- management proxy.go: SyncMappings server handler, recvSyncInit,
sendSnapshotSync (per-batch send-then-wait-for-ack), drainRecv,
waitForAck; proxyConnection.syncStream + sendResponse routes the
same sendChan onto the bidi stream when set.
- proxy/server.go: trySyncMappings + handleSyncMappingsStream that
acks after each batch is processed; outer loop tries SyncMappings
first and falls back to GetMappingUpdate on Unimplemented.
Capabilities lifted into proxyCapabilities() so both code paths
use the same flags.
Adds a new "private" service mode for the reverse proxy: services
reachable exclusively over the embedded WireGuard tunnel, gated by
per-peer group membership instead of operator auth schemes.
Wire contract
- ProxyMapping.private (field 13): the proxy MUST call
ValidateTunnelPeer and fail closed; operator schemes are bypassed.
- ProxyCapabilities.private (4) + supports_private_service (5):
capability gate. Management never streams private mappings to
proxies that don't claim the capability; the broadcast path applies
the same filter via filterMappingsForProxy.
- ValidateTunnelPeer RPC: resolves an inbound tunnel IP to a peer,
checks the peer's groups against service.AccessGroups, and mints
a session JWT on success. checkPeerGroupAccess fails closed when
a private service has empty AccessGroups.
- ValidateSession/ValidateTunnelPeer responses now carry
peer_group_ids + peer_group_names so the proxy can authorise
policy-aware middlewares without an extra management round-trip.
- ProxyInboundListener + SendStatusUpdate.inbound_listener: per-account
inbound listener state surfaced to dashboards.
- PathTargetOptions.direct_upstream (11): bypass the embedded NetBird
client and dial the target via the proxy host's network stack for
upstreams reachable without WireGuard.
Data model
- Service.Private (bool) + Service.AccessGroups ([]string, JSON-
serialised). Validate() rejects bearer auth on private services.
Copy() deep-copies AccessGroups. pgx getServices loads the columns.
- DomainConfig.Private threaded into the proxy auth middleware.
Request handler routes private services through forwardWithTunnelPeer
and returns 403 on validation failure.
- Account-level SynthesizePrivateServiceZones (synthetic DNS) and
injectPrivateServicePolicies (synthetic ACL) gate on
len(svc.AccessGroups) > 0.
Proxy
- /netbird proxy --private (embedded mode) flag; Config.Private in
proxy/lifecycle.go.
- Per-account inbound listener (proxy/inbound.go) binding HTTP/HTTPS
on the embedded NetBird client's WireGuard tunnel netstack.
- proxy/internal/auth/tunnel_cache: ValidateTunnelPeer response cache
with single-flight de-duplication and per-account eviction.
- Local peerstore short-circuit: when the inbound IP isn't in the
account roster, deny fast without an RPC.
- proxy/server.go reports SupportsPrivateService=true and redacts the
full ProxyMapping JSON from info logs (auth_token + header-auth
hashed values now only at debug level).
Identity forwarding
- ValidateSessionJWT returns user_id, email, method, groups,
group_names. sessionkey.Claims carries Email + Groups + GroupNames
so the proxy can stamp identity onto upstream requests without an
extra management round-trip on every cookie-bearing request.
- CapturedData carries userEmail / userGroups / userGroupNames; the
proxy stamps X-NetBird-User and X-NetBird-Groups on r.Out from the
authenticated identity (strips client-supplied values first to
prevent spoofing).
- AccessLog.UserGroups: access-log enrichment captures the user's
group memberships at write time so the dashboard can render group
context without reverse-resolving stale memberships.
OpenAPI/dashboard surface
- ReverseProxyService gains private + access_groups; ReverseProxyCluster
gains private + supports_private. ReverseProxyTarget target_type
enum gains "cluster". ServiceTargetOptions gains direct_upstream.
ProxyAccessLog gains user_groups.
The cluster listing now answers three questions in one round-trip
instead of forcing the dashboard to cross-reference the domains API:
which clusters can this account see, are they currently up, and what
do they support. The ProxyCluster wire type drops the boolean
self_hosted in favour of a `type` enum (`account` / `shared`) plus
explicit `online`, `supports_custom_ports`, `require_subdomain`, and
`supports_crowdsec` fields.
Store query reworked so offline clusters still appear (no last_seen
WHERE), with online and connected_proxies both derived from the
existing 2-min active window via portable CASE expressions; the
1-hour heartbeat reaper still removes long-stale rows. Service
manager enriches each cluster with the capability flags via the
existing per-cluster lookups (CapabilityProvider now also exposes
ClusterSupportsCrowdSec).
GetActiveClusterAddresses* keep their tight 2-min filter so service
routing and domain enumeration aren't pulled into the wider window.
The hard cut removes self_hosted from the response — the dashboard is
the only consumer and is updated in the matching PR; no transitional
field is shipped.
Adds a cross-engine regression test asserting offline clusters
surface, connected_proxies counts only fresh proxies, and
account-scoped BYOP clusters never leak across accounts.
* Add support for legacy IDP cache environment variable
* Centralize cache store creation to reuse a single Redis connection pool
Each cache consumer (IDP cache, token store, PKCE store, secrets manager,
EDR validator) was independently calling NewStore, creating separate Redis
clients with their own connection pools — up to 1400 potential connections
from a single management server process.
Introduce a shared CacheStore() singleton on BaseServer that creates one
store at boot and injects it into all consumers. Consumer constructors now
receive a store.StoreInterface instead of creating their own.
For Redis mode, all consumers share one connection pool (1000 max conns).
For in-memory mode, all consumers share one GoCache instance.
* Update management-integrations module to latest version
* sync go.sum
* Export `GetAddrFromEnv` to allow reuse across packages
* Update management-integrations module version in go.mod and go.sum
* Update management-integrations module version in go.mod and go.sum
* **New Features**
* Asynchronous certificate prefetch that races live issuance with periodic on-disk cache checks to surface certificates faster.
* Centralized recording and notification when certificates become available.
* New on-disk certificate reading and validation to allow immediate use of cached certs.
* **Bug Fixes & Performance**
* Optimized retrieval by polling disk while fetching in background to reduce latency.
* Added cancellation and timeout handling to fail stalled certificate operations reliably.
* **New Features**
* Access logs now include bytes_upload and bytes_download (API and schemas updated, fields required).
* Certificate issuance duration is now recorded as a metric.
* **Refactor**
* Metrics switched from Prometheus client to OpenTelemetry-backed meters; health endpoint now exposes OpenMetrics via OTLP exporter.
* **Tests**
* Metric tests updated to use OpenTelemetry Prometheus exporter and MeterProvider.
Consolidate all expose business logic (validation, permission checks, TTL tracking, reaping) into the manager layer, making the gRPC layer a pure transport adapter that only handles proto conversion and authentication.
- Add ExposeServiceRequest/ExposeServiceResponse domain types with validation in the reverseproxy package
- Move expose tracker (TTL tracking, reaping, per-peer limits) from gRPC server into manager/expose_tracker.go
- Internalize tracking in CreateServiceFromPeer, RenewServiceFromPeer, and new StopServiceFromPeer so callers don't manage tracker state
- Untrack ephemeral services in DeleteService/DeleteAllServices to keep tracker in sync when services are deleted via API
- Simplify gRPC expose handlers to parse, auth, convert, delegate
- Remove tracker methods from Manager interface (internal detail)
CLI: new expose command to publish a local port with flags for PIN, password, user groups, custom domain, name prefix and protocol (HTTP default).
Management/API: create/renew/stop expose sessions (streamed status), automatic naming/domain, TTL renewals, background expiration, new management RPCs and client methods.
UI/API: account settings now include peer_expose_enabled and peer_expose_groups; new activity codes for peer expose events.
* Fix WebSocket support by implementing Hijacker interface
Add responsewriter.PassthroughWriter to preserve optional HTTP interfaces
(Hijacker, Flusher, Pusher) when wrapping http.ResponseWriter in middleware.
Without this delegation:
- WebSocket connections fail (can't hijack the connection)
- Streaming breaks (can't flush buffers)
- HTTP/2 push doesn't work
* Add HijackTracker to manage hijacked connections during graceful shutdown
* Refactor HijackTracker to use middleware for tracking hijacked connections
* Refactor server handler chain setup for improved readability and maintainability