mirror of
https://github.com/netbirdio/netbird.git
synced 2026-06-03 06:29:54 +00:00
* fix(proxy): gate tunnel-peer fast-path on inbound listener marker
forwardWithTunnelPeer previously accepted any RFC1918 / ULA / CGNAT
source IP, so a public client whose address happened to fall in those
ranges could bypass the configured operator auth scheme by colliding
with a known tunnel IP. The fast-path is now gated on
TunnelLookupFromContext(r.Context()) being present — that context value
is attached only by the per-account inbound (overlay) listener, so the
host-facing listener never enters this branch.
Tests updated to reflect the new requirement: requests that don't
carry the inbound marker now fall through to the regular auth flow.
* fix(proxy): harden inbound listener resource + startup-ctx handling
Three correctness fixes on the per-account inbound path, with tests:
- Close the logrus ErrorLog PipeWriter on tearDown. WriterLevel hands
back an *io.PipeWriter backed by a pipe + scanner goroutine that the
caller owns; the two writers per account (https + plain) were never
closed, leaking the pipe and goroutine on every teardown.
- Run the post-Start hooks on context.Background(). runClientStartup
is launched in a goroutine from AddPeer and was inheriting the
caller's request-scoped ctx, so a cancelled request could abort the
inbound bring-up or fail the management status notification. The
tail is split into notifyClientReady so the contract is testable.
Tests cover the PipeWriter close behaviour and assert the readyHandler
+ NotifyStatus calls receive a non-cancelled background context.
* feat(proxy): short-circuit peer-own-target loops with 421
When a peer that hosts the target of a private service dials its own
service URL the request was being looped through the proxy and back
over WireGuard to the same peer — twice the WG round-trip for no
benefit, with no signal to the caller that something was wrong.
Add isSelfTargetLoop to ReverseProxy.ServeHTTP: when the request
arrived on the per-account overlay listener (IsOverlayOrigin) and the
source tunnel IP matches the target host, refuse the request with 421
Misdirected Request and a body pointing the operator at the backend
directly.
The gate is scoped to overlay origin so requests on the public
listener that happen to share a source IP with the target host are
forwarded normally.
* fix(management): private-service validation + tunnel-IP lookup semantics
- Require an explicit port for L4 cluster targets. validateL4Target
exempted TargetTypeCluster from the port check, but buildPathMappings
serializes every L4 target via net.JoinHostPort(host, port) — port=0
shipped a ":0" upstream. Cluster targets use the same Host/Port
fields, so the same requirement applies.
- GetPeerByIP returns NotFound on a tunnel-IP miss instead of mapping
every error to Internal. The proxy's ValidateTunnelPeer probes IPs
that legitimately aren't in the roster; the miss is expected and now
distinguishable from a real store failure.
- Thread ctx into getClusterCapability's gorm query so a cancelled
request doesn't keep the store busy.
Tests updated for the L4-cluster port requirement and the GetPeerByIP
NotFound path.
* fix(client): include offlinePeers in PeerStateByIP lookup
ReplaceOfflinePeers moves peers into d.offlinePeers but PeerStateByIP
only scanned d.peers. Callers (the local DNS filter via
localPeerConnectivity, embed.Client.IdentityForIP used by the
proxy's tunnel-peer validator) were treating known-but-offline peers
as unknown, which:
- causes the DNS filter to keep returning records pointing at peers
that have no live tunnel, AND
- makes the proxy's local-roster check deny a request from such a
peer rather than letting the cached management RPC carry the
authorisation decision.
Search both slices in PeerStateByIP. Adds a unit test for the IPv4
and IPv6 offline-match paths.
* fix(rest): reject empty Delete path params in reverse-proxy clients
ReverseProxyClustersAPI.Delete and ReverseProxyTokensAPI.Delete passed
the path parameter into url.PathEscape without an empty check.
PathEscape("") returns "" which collapses the request onto the
collection endpoint ("/api/reverse-proxies/clusters/" /
"/api/reverse-proxies/proxy-tokens/"), so a caller bug delete with no
id reached a routable URL with surprising semantics (typically 405).
Short-circuit with a typed error before the request is built. Tests
mount a handler on the collection path that fails the test if hit, so
the regression is impossible to reintroduce silently.
* chore(api,ci,docs,test): private-service schema, proto-check, fixups
Non-functional cleanups and contract/CI hardening around the
private-service work:
API schema (openapi.yml):
- Require a non-empty access_groups and mode=http when private=true,
on both Service and ServiceRequest, mirroring
validatePrivateRequirements. mode stays optional-but-constrained
(empty defaults to http server-side), matching runtime.
CI (proto-version-check.yml):
- Cover renamed .pb.go files (read base via previous_filename).
- Match protoc-gen-go-grpc version headers (optional "- " prefix and
-gen-go-grpc suffix) so grpc-generated files are in scope.
Docs / comments:
- Reword Config field docs to say defaults are applied at Server.Start
(initDefaults), not New.
- Rename the obsolete --private-inbound flag to --private across
comments and the proto doc.
Pre-existing test fixups surfaced by review:
- Repair the integration-tagged validate_session_test.go (SignToken
signature growth + new Manager interface methods).
- Fix the CI-skip boolean precedence so Windows isn't skipped
unconditionally.
- Guard the router.HTTPListener type assertion with comma-ok.
* fix(proxy): background ctx for already-started AddPeer notification
The earlier ctx fix covered the async runClientStartup path but missed
the synchronous branch: when a service is added to an already-started
client, AddPeer called NotifyStatus with the caller's request-scoped
ctx. A cancelled request/stream could drop the connected notification
to management. Use context.Background() here too, matching
notifyClientReady.
Extends TestNetBird_AddPeer_ExistingStartedClient_NotifiesStatus to
pass a pre-cancelled caller ctx and assert the notification still ran
on a non-cancelled context.
* use the cmd context for roundtripper
301 lines
10 KiB
Go
301 lines
10 KiB
Go
package proxy
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
"net"
|
|
"testing"
|
|
"time"
|
|
|
|
log "github.com/sirupsen/logrus"
|
|
"google.golang.org/grpc"
|
|
|
|
"github.com/netbirdio/netbird/proxy/internal/auth"
|
|
"github.com/netbirdio/netbird/proxy/internal/conntrack"
|
|
"github.com/netbirdio/netbird/proxy/internal/crowdsec"
|
|
proxymetrics "github.com/netbirdio/netbird/proxy/internal/metrics"
|
|
"github.com/netbirdio/netbird/proxy/internal/proxy"
|
|
"github.com/netbirdio/netbird/proxy/internal/roundtrip"
|
|
nbtcp "github.com/netbirdio/netbird/proxy/internal/tcp"
|
|
"github.com/netbirdio/netbird/proxy/internal/types"
|
|
udprelay "github.com/netbirdio/netbird/proxy/internal/udp"
|
|
"github.com/netbirdio/netbird/shared/management/proto"
|
|
|
|
"go.opentelemetry.io/otel/metric/noop"
|
|
)
|
|
|
|
// latencyMockClient simulates realistic gRPC latency for management calls.
|
|
type latencyMockClient struct {
|
|
proto.ProxyServiceClient
|
|
createPeerDelay time.Duration
|
|
statusUpdateDelay time.Duration
|
|
}
|
|
|
|
func (m *latencyMockClient) SendStatusUpdate(ctx context.Context, _ *proto.SendStatusUpdateRequest, _ ...grpc.CallOption) (*proto.SendStatusUpdateResponse, error) {
|
|
if m.statusUpdateDelay > 0 {
|
|
select {
|
|
case <-time.After(m.statusUpdateDelay):
|
|
case <-ctx.Done():
|
|
return nil, ctx.Err()
|
|
}
|
|
}
|
|
return &proto.SendStatusUpdateResponse{}, nil
|
|
}
|
|
|
|
func (m *latencyMockClient) CreateProxyPeer(ctx context.Context, _ *proto.CreateProxyPeerRequest, _ ...grpc.CallOption) (*proto.CreateProxyPeerResponse, error) {
|
|
if m.createPeerDelay > 0 {
|
|
select {
|
|
case <-time.After(m.createPeerDelay):
|
|
case <-ctx.Done():
|
|
return nil, ctx.Err()
|
|
}
|
|
}
|
|
return &proto.CreateProxyPeerResponse{Success: true}, nil
|
|
}
|
|
|
|
type discardWriter struct{}
|
|
|
|
func (discardWriter) Write(p []byte) (int, error) { return len(p), nil }
|
|
|
|
func benchServerWithLatency(b *testing.B, createPeerDelay, statusDelay time.Duration) *Server {
|
|
b.Helper()
|
|
logger := log.New()
|
|
logger.SetLevel(log.FatalLevel)
|
|
logger.SetOutput(&discardWriter{})
|
|
|
|
meter, err := proxymetrics.New(context.Background(), noop.Meter{})
|
|
if err != nil {
|
|
b.Fatal(err)
|
|
}
|
|
|
|
mgmtClient := &latencyMockClient{
|
|
createPeerDelay: createPeerDelay,
|
|
statusUpdateDelay: statusDelay,
|
|
}
|
|
|
|
nb := roundtrip.NewNetBird(b.Context(), "bench-proxy", "bench.test",
|
|
roundtrip.ClientConfig{MgmtAddr: "http://bench.test:9999"},
|
|
logger, nil, mgmtClient)
|
|
|
|
mainRouter := nbtcp.NewRouter(logger, func(accountID types.AccountID) (types.DialContextFunc, error) {
|
|
return (&net.Dialer{}).DialContext, nil
|
|
}, &net.TCPAddr{IP: net.IPv4(127, 0, 0, 1), Port: 443})
|
|
|
|
return &Server{
|
|
Logger: logger,
|
|
mgmtClient: mgmtClient,
|
|
netbird: nb,
|
|
proxy: proxy.NewReverseProxy(nil, "auto", nil, logger),
|
|
auth: auth.NewMiddleware(logger, nil, nil),
|
|
mainRouter: mainRouter,
|
|
mainPort: 443,
|
|
meter: meter,
|
|
hijackTracker: conntrack.HijackTracker{},
|
|
crowdsecRegistry: crowdsec.NewRegistry("", "", log.NewEntry(logger)),
|
|
crowdsecServices: make(map[types.ServiceID]bool),
|
|
lastMappings: make(map[types.ServiceID]*proto.ProxyMapping),
|
|
portRouters: make(map[uint16]*portRouter),
|
|
svcPorts: make(map[types.ServiceID][]uint16),
|
|
udpRelays: make(map[types.ServiceID]*udprelay.Relay),
|
|
}
|
|
}
|
|
|
|
// generateHTTPMappings creates N HTTP-mode mappings with the given update type.
|
|
// All belong to a single account to share the embedded client.
|
|
func generateHTTPMappings(n int, updateType proto.ProxyMappingUpdateType) []*proto.ProxyMapping {
|
|
mappings := make([]*proto.ProxyMapping, n)
|
|
for i := range n {
|
|
mappings[i] = &proto.ProxyMapping{
|
|
Type: updateType,
|
|
Id: fmt.Sprintf("svc-%d", i),
|
|
AccountId: "account-1",
|
|
Domain: fmt.Sprintf("svc-%d.bench.example.com", i),
|
|
Mode: "http",
|
|
Path: []*proto.PathMapping{
|
|
{
|
|
Path: "/",
|
|
Target: fmt.Sprintf("http://10.0.%d.%d:8080", (i/256)%256, i%256),
|
|
},
|
|
},
|
|
Auth: &proto.Authentication{},
|
|
}
|
|
}
|
|
return mappings
|
|
}
|
|
|
|
// generateMultiAccountHTTPMappings creates N HTTP-mode CREATED mappings spread
|
|
// across the given number of accounts. This stresses the AddPeer new-account
|
|
// path which calls CreateProxyPeer + embed.New per unique account.
|
|
func generateMultiAccountHTTPMappings(n, accounts int) []*proto.ProxyMapping {
|
|
mappings := make([]*proto.ProxyMapping, n)
|
|
for i := range n {
|
|
mappings[i] = &proto.ProxyMapping{
|
|
Type: proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED,
|
|
Id: fmt.Sprintf("svc-%d", i),
|
|
AccountId: fmt.Sprintf("account-%d", i%accounts),
|
|
Domain: fmt.Sprintf("svc-%d.bench.example.com", i),
|
|
Mode: "http",
|
|
Path: []*proto.PathMapping{
|
|
{
|
|
Path: "/",
|
|
Target: fmt.Sprintf("http://10.0.%d.%d:8080", (i/256)%256, i%256),
|
|
},
|
|
},
|
|
Auth: &proto.Authentication{},
|
|
}
|
|
}
|
|
return mappings
|
|
}
|
|
|
|
// generateMixedMappings creates mappings with a realistic distribution:
|
|
// 70% HTTP create, 15% modify existing, 10% TLS on main port, 5% remove.
|
|
// All use a single account to avoid embed.New dialing.
|
|
func generateMixedMappings(n int) []*proto.ProxyMapping {
|
|
mappings := make([]*proto.ProxyMapping, n)
|
|
for i := range n {
|
|
var m *proto.ProxyMapping
|
|
switch {
|
|
case i%20 < 14: // 70% HTTP create
|
|
m = &proto.ProxyMapping{
|
|
Type: proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED,
|
|
Id: fmt.Sprintf("svc-http-%d", i),
|
|
AccountId: "account-1",
|
|
Domain: fmt.Sprintf("svc-%d.bench.example.com", i),
|
|
Mode: "http",
|
|
Path: []*proto.PathMapping{
|
|
{Path: "/", Target: fmt.Sprintf("http://10.0.%d.%d:8080", (i/256)%256, i%256)},
|
|
{Path: "/api", Target: fmt.Sprintf("http://10.0.%d.%d:8081", (i/256)%256, i%256)},
|
|
},
|
|
Auth: &proto.Authentication{},
|
|
}
|
|
case i%20 < 17: // 15% modify
|
|
m = &proto.ProxyMapping{
|
|
Type: proto.ProxyMappingUpdateType_UPDATE_TYPE_MODIFIED,
|
|
Id: fmt.Sprintf("svc-http-%d", i%100),
|
|
AccountId: "account-1",
|
|
Domain: fmt.Sprintf("svc-%d.bench.example.com", i%100),
|
|
Mode: "http",
|
|
Path: []*proto.PathMapping{
|
|
{Path: "/", Target: fmt.Sprintf("http://10.1.%d.%d:8080", (i/256)%256, i%256)},
|
|
},
|
|
Auth: &proto.Authentication{},
|
|
}
|
|
case i%20 < 19: // 10% TLS passthrough on main port
|
|
m = &proto.ProxyMapping{
|
|
Type: proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED,
|
|
Id: fmt.Sprintf("svc-tls-%d", i),
|
|
AccountId: "account-1",
|
|
Domain: fmt.Sprintf("tls-%d.bench.example.com", i),
|
|
Mode: "tls",
|
|
ListenPort: 443,
|
|
Path: []*proto.PathMapping{
|
|
{Path: "/", Target: fmt.Sprintf("10.2.%d.%d:443", (i/256)%256, i%256)},
|
|
},
|
|
}
|
|
default: // 5% remove
|
|
m = &proto.ProxyMapping{
|
|
Type: proto.ProxyMappingUpdateType_UPDATE_TYPE_REMOVED,
|
|
Id: fmt.Sprintf("svc-http-%d", i%50),
|
|
AccountId: "account-1",
|
|
Domain: fmt.Sprintf("svc-%d.bench.example.com", i%50),
|
|
Mode: "http",
|
|
}
|
|
}
|
|
mappings[i] = m
|
|
}
|
|
return mappings
|
|
}
|
|
|
|
const (
|
|
createPeerLatency = 100 * time.Millisecond
|
|
statusUpdateLatency = 50 * time.Millisecond
|
|
)
|
|
|
|
// BenchmarkProcessMappings_HTTPCreate_SingleAccount benchmarks the initial sync
|
|
// scenario: N HTTP mappings all on a single account. Only the first mapping
|
|
// triggers CreateProxyPeer (100ms gRPC). The rest just register with the
|
|
// existing client. This is the "best case" production path.
|
|
func BenchmarkProcessMappings_HTTPCreate_SingleAccount(b *testing.B) {
|
|
for _, n := range []int{100, 1000, 5000} {
|
|
b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
|
|
mappings := generateHTTPMappings(n, proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED)
|
|
for range b.N {
|
|
s := benchServerWithLatency(b, createPeerLatency, statusUpdateLatency)
|
|
s.processMappings(b.Context(), mappings)
|
|
}
|
|
})
|
|
}
|
|
}
|
|
|
|
// BenchmarkProcessMappings_HTTPCreate_MultiAccount benchmarks the worst-case
|
|
// initial sync: every mapping belongs to a different account, so each one
|
|
// triggers a full CreateProxyPeer gRPC round-trip (100ms) + embed.New.
|
|
// With 500 accounts this serializes to ~50s of blocking I/O.
|
|
func BenchmarkProcessMappings_HTTPCreate_MultiAccount(b *testing.B) {
|
|
for _, tc := range []struct {
|
|
mappings int
|
|
accounts int
|
|
}{
|
|
{100, 10},
|
|
{100, 50},
|
|
{1000, 50},
|
|
{1000, 200},
|
|
{3000, 500},
|
|
} {
|
|
b.Run(fmt.Sprintf("mappings=%d/accounts=%d", tc.mappings, tc.accounts), func(b *testing.B) {
|
|
mappings := generateMultiAccountHTTPMappings(tc.mappings, tc.accounts)
|
|
for range b.N {
|
|
s := benchServerWithLatency(b, createPeerLatency, statusUpdateLatency)
|
|
s.processMappings(b.Context(), mappings)
|
|
}
|
|
})
|
|
}
|
|
}
|
|
|
|
// BenchmarkProcessMappings_Mixed benchmarks a realistic mixed workload
|
|
// of creates, modifies, TLS, and removes with production-like latency.
|
|
// TLS mappings call SendStatusUpdate (50ms each), serialized.
|
|
func BenchmarkProcessMappings_Mixed(b *testing.B) {
|
|
for _, n := range []int{100, 1000, 5000} {
|
|
b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
|
|
mappings := generateMixedMappings(n)
|
|
for range b.N {
|
|
s := benchServerWithLatency(b, createPeerLatency, statusUpdateLatency)
|
|
creates := generateHTTPMappings(100, proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED)
|
|
s.processMappings(b.Context(), creates)
|
|
s.processMappings(b.Context(), mappings)
|
|
}
|
|
})
|
|
}
|
|
}
|
|
|
|
// BenchmarkProcessMappings_ModifyOnly benchmarks bulk modification of
|
|
// already-registered mappings (no new peers needed, no gRPC).
|
|
func BenchmarkProcessMappings_ModifyOnly(b *testing.B) {
|
|
for _, n := range []int{100, 1000, 5000} {
|
|
b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
|
|
creates := generateHTTPMappings(n, proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED)
|
|
modifies := generateHTTPMappings(n, proto.ProxyMappingUpdateType_UPDATE_TYPE_MODIFIED)
|
|
for range b.N {
|
|
s := benchServerWithLatency(b, createPeerLatency, statusUpdateLatency)
|
|
s.processMappings(b.Context(), creates)
|
|
s.processMappings(b.Context(), modifies)
|
|
}
|
|
})
|
|
}
|
|
}
|
|
|
|
// BenchmarkProcessMappings_NoLatency measures pure CPU/allocation overhead
|
|
// with zero I/O latency for profiling purposes.
|
|
func BenchmarkProcessMappings_NoLatency(b *testing.B) {
|
|
for _, n := range []int{1000, 5000} {
|
|
b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
|
|
mappings := generateHTTPMappings(n, proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED)
|
|
for range b.N {
|
|
s := benchServerWithLatency(b, 0, 0)
|
|
s.processMappings(b.Context(), mappings)
|
|
}
|
|
})
|
|
}
|
|
}
|