Files
netbird-docs/src/pages/selfhosted/scaling-your-self-hosted-deployment.mdx
2026-02-10 21:29:12 +01:00

414 lines
15 KiB
Plaintext

# Splitting Your Self-Hosted Deployment
import {Note, Warning} from "@/components/mdx";
This guide explains how to split your NetBird self-hosted deployment from a single-server setup into a distributed architecture for better reliability and performance.
The most common approach is extracting the relay service (with its embedded STUN server) to separate servers and moving the PostgreSQL database to a dedicated machine.
In most cases, you won't need to extract the Signal server, but for completeness, this guide covers that as well.
NetBird clients can tolerate a Management server outage as long as connections are already established through relays or peer-to-peer.
This makes a stable relay infrastructure especially important.
This guide assumes you have already [deployed a single-server NetBird](/selfhosted/selfhosted-quickstart) and have a working configuration.
<Note>
If you are looking for a high-availability setup for the Management and Signal services, this is available through an enterprise
commercial license [here](https://netbird.io/pricing#on-prem).
</Note>
## Architecture Overview
### Before: Single Server
```
┌───────────────────────────────────────────────────────────────┐
│ │
│ ┌──── Main Server (combined) ────┐ │
│ ┌─────────┐ ┌────────────┐ ┌──────────┐ ┌─────────────┐ │
│ │Dashboard│ │ Management │ │ Signal │ │ Relay │ │
│ │(Web UI) │ │ │ │ │ │ + STUN │ │
│ │ │ │ │ │ │ │ │ │
│ └─────────┘ └────────────┘ └──────────┘ └─────────────┘ │
│ Port 3478/udp │
│ ┌─────────────┐ │
│ │ Caddy │ │
│ │ │ │
│ └─────────────┘ │
│ │
│ Port 443,80/tcp │
└───────────────────────────────────────────────────────────────┘
```
### After: Distributed Relays
```
┌────────────────────────────────────────────────┐
│ │
│ ┌ Main Server (combined) ┐ │
│ ┌─────────┐ ┌────────────┐ ┌──────────┐ │
│ │Dashboard│ │ Management │ │ Signal │ │
│ │(Web UI) │ │ │ │ │ │
│ │ │ │ │ │ │ │
│ └─────────┘ └────────────┘ └──────────┘ │
│ │
│ ┌─────────────┐ │
│ │ Caddy │ │
│ │ │ │
│ └─────────────┘ │
│ │
│ Port 443,80/tcp │
└────────────────────────────────────────────────┘
│ Peers get relay addresses
┌──────────────────────┐ ┌──────────────────────┐
│ Relay Server 1 │ │ Relay Server 2 │
│ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ Relay │ │ │ │ Relay │ │
│ │ + STUN │ │ │ │ + STUN │ │
│ └────────────────┘ │ │ └────────────────┘ │
│ │ │ │
│ Port 443, 3478/udp │ │ Port 443, 3478/udp │
└──────────────────────┘ └──────────────────────┘
```
## Step 1: Set Up External Relay Servers
For each relay server you want to deploy:
### 1.1 Server Requirements
- A Linux VM with at least **1 CPU** and **1GB RAM**
- Public IP address
- A domain name pointing to the server (e.g., `relay-us.example.com`)
- Docker installed
- Firewall ports open: **80/tcp** (Let's Encrypt HTTP challenge), **443/tcp** (relay), and **3478/udp** (STUN)
### 1.2 Generate Authentication Secret
All relay servers must share the same authentication secret with your main server. You can generate one with:
```bash
# Generate a secure random secret
openssl rand -base64 32
```
Save this secret - you'll need it for both the relay servers and your main server's config.
### 1.3 Create Relay Configuration
On your relay server, create a directory and configuration:
```bash
mkdir -p ~/netbird-relay
cd ~/netbird-relay
```
Create `relay.env` with your relay settings. The relay server can automatically obtain and renew TLS certificates via Let's Encrypt:
```bash
NB_LOG_LEVEL=info
NB_LISTEN_ADDRESS=:443
NB_EXPOSED_ADDRESS=rels://relay-us.example.com:443
NB_AUTH_SECRET=your-shared-secret-here
# TLS via Let's Encrypt (automatic certificate provisioning)
NB_LETSENCRYPT_DOMAINS=relay-us.example.com
NB_LETSENCRYPT_EMAIL=admin@example.com
NB_LETSENCRYPT_DATA_DIR=/data/letsencrypt
# Embedded STUN
NB_ENABLE_STUN=true
NB_STUN_PORTS=3478
```
<Note>
Replace `relay-us.example.com` with your relay server's domain and `your-shared-secret-here` with the secret you generated.
</Note>
Create `docker-compose.yml`:
```yaml
services:
relay:
image: netbirdio/relay:latest
container_name: netbird-relay
restart: unless-stopped
ports:
- '443:443'
- '3478:3478/udp'
env_file:
- relay.env
volumes:
- relay_data:/data
logging:
driver: "json-file"
options:
max-size: "500m"
max-file: "2"
volumes:
relay_data:
```
### 1.4 Alternative: TLS with Existing Certificates
If you have existing TLS certificates (e.g., from your own CA or a wildcard cert), replace the Let's Encrypt variables in `relay.env` with:
```bash
# Replace the NB_LETSENCRYPT_* lines with:
NB_TLS_CERT_FILE=/certs/fullchain.pem
NB_TLS_KEY_FILE=/certs/privkey.pem
```
And add a certificate volume to `docker-compose.yml`:
```yaml
volumes:
- /path/to/certs:/certs:ro
- relay_data:/data
```
### 1.5 Start the Relay Server
```bash
docker compose up -d
```
Verify it's running:
```bash
docker compose logs -f
```
You should see:
```
level=info msg="Starting relay server on :443"
level=info msg="Starting STUN server on port 3478"
```
If you configured Let's Encrypt, the relay generates TLS certificates lazily on the first incoming request. Trigger certificate provisioning and verify it by running:
```bash
curl -v https://relay-us.example.com/
```
A `404 page not found` response is expected — what matters is that the TLS handshake succeeds. Look for a valid Let's Encrypt certificate in the output:
```
* Server certificate:
* subject: CN=relay-us.example.com
* issuer: C=US; O=Let's Encrypt; CN=E8
* SSL certificate verify ok.
```
### 1.6 Repeat for Additional Relay Servers
If deploying multiple relays (e.g., for different regions), repeat steps 1.1-1.5 on each server. Use the **same `NB_AUTH_SECRET`** but update the domain name for each.
## Step 2: Update Main Server Configuration
Now update your main NetBird server to use the external relays instead of the embedded one.
### 2.1 Edit config.yaml
On your main server, edit the `config.yaml` file:
```bash
cd ~/netbird # or wherever your deployment is
nano config.yaml
```
Remove the `authSecret` from the `server` section and add `relays` and `stuns` sections pointing to your external servers. The presence of the `relays` section disables both the embedded relay and the embedded STUN server, so the `stuns` section is required to provide external STUN addresses:
```yaml
server:
listenAddress: ":80"
exposedAddress: "https://netbird.example.com:443"
# Remove authSecret to disable the embedded relay
# authSecret: ...
# Remove or comment out stunPorts since we're using external STUN
# stunPorts:
# - 3478
metricsPort: 9090
healthcheckAddress: ":9000"
logLevel: "info"
logFile: "console"
dataDir: "/var/lib/netbird"
# External STUN servers (your relay servers)
stuns:
- uri: "stun:relay-us.example.com:3478"
proto: "udp"
- uri: "stun:relay-eu.example.com:3478"
proto: "udp"
# External relay servers
relays:
addresses:
- "rels://relay-us.example.com:443"
- "rels://relay-eu.example.com:443"
secret: "your-shared-secret-here"
credentialsTTL: "24h"
auth:
enabled: true
issuer: "https://netbird.example.com/oauth2"
# ... rest of auth config
```
<Warning>
The `secret` under `relays` and the `NB_AUTH_SECRET` on all relay servers **must be identical**. Mismatched secrets will cause relay connections to fail silently.
</Warning>
### 2.2 Update docker-compose.yml (Optional)
If your main server was exposing STUN port 3478, you can remove it since STUN is now handled by external relays:
```yaml
netbird-server:
image: netbirdio/netbird-server:latest
container_name: netbird-server
restart: unless-stopped
networks: [netbird]
# Remove the STUN port - no longer needed
# ports:
# - '3478:3478/udp'
volumes:
- netbird_data:/var/lib/netbird
- ./config.yaml:/etc/netbird/config.yaml
command: ["--config", "/etc/netbird/config.yaml"]
```
### 2.3 Restart the Main Server
```bash
docker compose down
docker compose up -d
```
## Step 3: Verify the Configuration
### 3.1 Check Main Server Logs
```bash
docker compose logs netbird-server
```
Verify that the embedded relay is disabled and your external relay addresses are listed:
```
INFO combined/cmd/root.go: Management: true (log level: info)
INFO combined/cmd/root.go: Signal: true (log level: info)
INFO combined/cmd/root.go: Relay: false (log level: )
```
```
Relay addresses: [rels://relay-us.example.com:443 rels://relay-eu.example.com:443]
```
### 3.2 Check Peer Status
Connect a NetBird client and verify that both STUN and relay services are available:
```bash
netbird status -d
```
The output should list your external STUN and relay servers:
```
Relays:
[stun:relay-us.example.com:3478] is Available
[rels://relay-us.example.com:443] is Available
```
### 3.3 Test Relay Connectivity
You can force all peer connections through relay to verify it works end-to-end. On a client, run:
```bash
sudo netbird service reconfigure --service-env NB_FORCE_RELAY=true
```
Then test connectivity to another peer (e.g., with `ping`).
Once confirmed, switch back to normal mode. The client will attempt peer-to-peer connections first and fall back to relay only when direct connectivity isn't possible:
```bash
sudo netbird service reconfigure --service-env NB_FORCE_RELAY=false
```
## Configuration Reference
### Relay Server Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `NB_LISTEN_ADDRESS` | Yes | Address to listen on (e.g., `:443`) |
| `NB_EXPOSED_ADDRESS` | Yes | Public relay URL (`rels://` for TLS, `rel://` for plain) |
| `NB_AUTH_SECRET` | Yes | Shared authentication secret |
| `NB_ENABLE_STUN` | No | Enable embedded STUN server (`true`/`false`) |
| `NB_STUN_PORTS` | No | STUN UDP port(s), default `3478` |
| `NB_LETSENCRYPT_DOMAINS` | No | Domain(s) for automatic Let's Encrypt certificates |
| `NB_LETSENCRYPT_EMAIL` | No | Email for Let's Encrypt notifications |
| `NB_TLS_CERT_FILE` | No | Path to TLS certificate (alternative to Let's Encrypt) |
| `NB_TLS_KEY_FILE` | No | Path to TLS private key |
| `NB_LOG_LEVEL` | No | Log level: `debug`, `info`, `warn`, `error` |
### Main Server config.yaml - External Services
```yaml
server:
# External STUN servers
stuns:
- uri: "stun:hostname:port"
proto: "udp" # or "tcp"
# External relay servers
relays:
addresses:
- "rels://hostname:port" # TLS
- "rel://hostname:port" # Plain (not recommended)
secret: "shared-secret"
credentialsTTL: "24h" # How long relay credentials are valid
# External signal server (optional, usually keep embedded)
# signalUri: "https://signal.example.com:443"
```
## Troubleshooting
### Peers Can't Connect via Relay
1. **Check secrets match**: The `authSecret`/`NB_AUTH_SECRET` must be identical everywhere
2. **Check firewall**: Ensure port 443/tcp is open on relay servers
3. **Check TLS**: If using `rels://`, ensure TLS is properly configured
4. **Check logs**: `docker compose logs relay` on the relay server
### STUN Not Working
1. **Check UDP port**: Ensure port 3478/udp is open and not blocked by firewall
2. **Check NAT**: Some carrier-grade NATs block STUN; try a different network
3. **Verify STUN is enabled**: `NB_ENABLE_STUN=true` on relay servers
### Relay Shows as Unavailable
1. **DNS resolution**: Ensure the relay domain resolves correctly
2. **Port reachability**: Test with `nc -zv relay-us.example.com 443`
3. **Certificate issues**: Check Let's Encrypt logs or certificate validity
## Next Steps
- Add monitoring with Prometheus metrics (`NB_METRICS_PORT`)
- Set up health checks for container orchestration
- Consider geographic DNS for automatic relay selection
- Review [Reverse Proxy Configuration](/selfhosted/reverse-proxy) if placing relays behind a proxy
## See Also
- [Configuration Files Reference](/selfhosted/configuration-files) - Full config.yaml documentation
- [Self-hosting Quickstart](/selfhosted/selfhosted-quickstart) - Initial deployment guide
- [Troubleshooting](/selfhosted/troubleshooting) - Common issues and solutions