mirror of
https://github.com/netbirdio/docs.git
synced 2026-04-18 08:26:35 +00:00
414 lines
15 KiB
Plaintext
414 lines
15 KiB
Plaintext
# Splitting Your Self-Hosted Deployment
|
|
|
|
import {Note, Warning} from "@/components/mdx";
|
|
|
|
This guide explains how to split your NetBird self-hosted deployment from a single-server setup into a distributed architecture for better reliability and performance.
|
|
|
|
The most common approach is extracting the relay service (with its embedded STUN server) to separate servers and moving the PostgreSQL database to a dedicated machine.
|
|
In most cases, you won't need to extract the Signal server, but for completeness, this guide covers that as well.
|
|
|
|
NetBird clients can tolerate a Management server outage as long as connections are already established through relays or peer-to-peer.
|
|
This makes a stable relay infrastructure especially important.
|
|
|
|
This guide assumes you have already [deployed a single-server NetBird](/selfhosted/selfhosted-quickstart) and have a working configuration.
|
|
|
|
<Note>
|
|
If you are looking for a high-availability setup for the Management and Signal services, this is available through an enterprise
|
|
commercial license [here](https://netbird.io/pricing#on-prem).
|
|
</Note>
|
|
|
|
## Architecture Overview
|
|
|
|
### Before: Single Server
|
|
|
|
```
|
|
┌───────────────────────────────────────────────────────────────┐
|
|
│ │
|
|
│ ┌──── Main Server (combined) ────┐ │
|
|
│ ┌─────────┐ ┌────────────┐ ┌──────────┐ ┌─────────────┐ │
|
|
│ │Dashboard│ │ Management │ │ Signal │ │ Relay │ │
|
|
│ │(Web UI) │ │ │ │ │ │ + STUN │ │
|
|
│ │ │ │ │ │ │ │ │ │
|
|
│ └─────────┘ └────────────┘ └──────────┘ └─────────────┘ │
|
|
│ Port 3478/udp │
|
|
│ ┌─────────────┐ │
|
|
│ │ Caddy │ │
|
|
│ │ │ │
|
|
│ └─────────────┘ │
|
|
│ │
|
|
│ Port 443,80/tcp │
|
|
└───────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### After: Distributed Relays
|
|
|
|
```
|
|
┌────────────────────────────────────────────────┐
|
|
│ │
|
|
│ ┌ Main Server (combined) ┐ │
|
|
│ ┌─────────┐ ┌────────────┐ ┌──────────┐ │
|
|
│ │Dashboard│ │ Management │ │ Signal │ │
|
|
│ │(Web UI) │ │ │ │ │ │
|
|
│ │ │ │ │ │ │ │
|
|
│ └─────────┘ └────────────┘ └──────────┘ │
|
|
│ │
|
|
│ ┌─────────────┐ │
|
|
│ │ Caddy │ │
|
|
│ │ │ │
|
|
│ └─────────────┘ │
|
|
│ │
|
|
│ Port 443,80/tcp │
|
|
└────────────────────────────────────────────────┘
|
|
│
|
|
│ Peers get relay addresses
|
|
▼
|
|
┌──────────────────────┐ ┌──────────────────────┐
|
|
│ Relay Server 1 │ │ Relay Server 2 │
|
|
│ │ │ │
|
|
│ ┌────────────────┐ │ │ ┌────────────────┐ │
|
|
│ │ Relay │ │ │ │ Relay │ │
|
|
│ │ + STUN │ │ │ │ + STUN │ │
|
|
│ └────────────────┘ │ │ └────────────────┘ │
|
|
│ │ │ │
|
|
│ Port 443, 3478/udp │ │ Port 443, 3478/udp │
|
|
└──────────────────────┘ └──────────────────────┘
|
|
```
|
|
|
|
## Step 1: Set Up External Relay Servers
|
|
|
|
For each relay server you want to deploy:
|
|
|
|
### 1.1 Server Requirements
|
|
|
|
- A Linux VM with at least **1 CPU** and **1GB RAM**
|
|
- Public IP address
|
|
- A domain name pointing to the server (e.g., `relay-us.example.com`)
|
|
- Docker installed
|
|
- Firewall ports open: **80/tcp** (Let's Encrypt HTTP challenge), **443/tcp** (relay), and **3478/udp** (STUN)
|
|
|
|
### 1.2 Generate Authentication Secret
|
|
|
|
All relay servers must share the same authentication secret with your main server. You can generate one with:
|
|
|
|
```bash
|
|
# Generate a secure random secret
|
|
openssl rand -base64 32
|
|
```
|
|
|
|
Save this secret - you'll need it for both the relay servers and your main server's config.
|
|
|
|
### 1.3 Create Relay Configuration
|
|
|
|
On your relay server, create a directory and configuration:
|
|
|
|
```bash
|
|
mkdir -p ~/netbird-relay
|
|
cd ~/netbird-relay
|
|
```
|
|
|
|
Create `relay.env` with your relay settings. The relay server can automatically obtain and renew TLS certificates via Let's Encrypt:
|
|
|
|
```bash
|
|
NB_LOG_LEVEL=info
|
|
NB_LISTEN_ADDRESS=:443
|
|
NB_EXPOSED_ADDRESS=rels://relay-us.example.com:443
|
|
NB_AUTH_SECRET=your-shared-secret-here
|
|
|
|
# TLS via Let's Encrypt (automatic certificate provisioning)
|
|
NB_LETSENCRYPT_DOMAINS=relay-us.example.com
|
|
NB_LETSENCRYPT_EMAIL=admin@example.com
|
|
NB_LETSENCRYPT_DATA_DIR=/data/letsencrypt
|
|
|
|
# Embedded STUN
|
|
NB_ENABLE_STUN=true
|
|
NB_STUN_PORTS=3478
|
|
```
|
|
|
|
<Note>
|
|
Replace `relay-us.example.com` with your relay server's domain and `your-shared-secret-here` with the secret you generated.
|
|
</Note>
|
|
|
|
Create `docker-compose.yml`:
|
|
|
|
```yaml
|
|
services:
|
|
relay:
|
|
image: netbirdio/relay:latest
|
|
container_name: netbird-relay
|
|
restart: unless-stopped
|
|
ports:
|
|
- '443:443'
|
|
- '3478:3478/udp'
|
|
env_file:
|
|
- relay.env
|
|
volumes:
|
|
- relay_data:/data
|
|
logging:
|
|
driver: "json-file"
|
|
options:
|
|
max-size: "500m"
|
|
max-file: "2"
|
|
|
|
volumes:
|
|
relay_data:
|
|
```
|
|
|
|
### 1.4 Alternative: TLS with Existing Certificates
|
|
|
|
If you have existing TLS certificates (e.g., from your own CA or a wildcard cert), replace the Let's Encrypt variables in `relay.env` with:
|
|
|
|
```bash
|
|
# Replace the NB_LETSENCRYPT_* lines with:
|
|
NB_TLS_CERT_FILE=/certs/fullchain.pem
|
|
NB_TLS_KEY_FILE=/certs/privkey.pem
|
|
```
|
|
|
|
And add a certificate volume to `docker-compose.yml`:
|
|
|
|
```yaml
|
|
volumes:
|
|
- /path/to/certs:/certs:ro
|
|
- relay_data:/data
|
|
```
|
|
|
|
### 1.5 Start the Relay Server
|
|
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
Verify it's running:
|
|
|
|
```bash
|
|
docker compose logs -f
|
|
```
|
|
|
|
You should see:
|
|
```
|
|
level=info msg="Starting relay server on :443"
|
|
level=info msg="Starting STUN server on port 3478"
|
|
```
|
|
|
|
If you configured Let's Encrypt, the relay generates TLS certificates lazily on the first incoming request. Trigger certificate provisioning and verify it by running:
|
|
|
|
```bash
|
|
curl -v https://relay-us.example.com/
|
|
```
|
|
|
|
A `404 page not found` response is expected — what matters is that the TLS handshake succeeds. Look for a valid Let's Encrypt certificate in the output:
|
|
|
|
```
|
|
* Server certificate:
|
|
* subject: CN=relay-us.example.com
|
|
* issuer: C=US; O=Let's Encrypt; CN=E8
|
|
* SSL certificate verify ok.
|
|
```
|
|
|
|
### 1.6 Repeat for Additional Relay Servers
|
|
|
|
If deploying multiple relays (e.g., for different regions), repeat steps 1.1-1.5 on each server. Use the **same `NB_AUTH_SECRET`** but update the domain name for each.
|
|
|
|
## Step 2: Update Main Server Configuration
|
|
|
|
Now update your main NetBird server to use the external relays instead of the embedded one.
|
|
|
|
### 2.1 Edit config.yaml
|
|
|
|
On your main server, edit the `config.yaml` file:
|
|
|
|
```bash
|
|
cd ~/netbird # or wherever your deployment is
|
|
nano config.yaml
|
|
```
|
|
|
|
Remove the `authSecret` from the `server` section and add `relays` and `stuns` sections pointing to your external servers. The presence of the `relays` section disables both the embedded relay and the embedded STUN server, so the `stuns` section is required to provide external STUN addresses:
|
|
|
|
```yaml
|
|
server:
|
|
listenAddress: ":80"
|
|
exposedAddress: "https://netbird.example.com:443"
|
|
# Remove authSecret to disable the embedded relay
|
|
# authSecret: ...
|
|
# Remove or comment out stunPorts since we're using external STUN
|
|
# stunPorts:
|
|
# - 3478
|
|
metricsPort: 9090
|
|
healthcheckAddress: ":9000"
|
|
logLevel: "info"
|
|
logFile: "console"
|
|
dataDir: "/var/lib/netbird"
|
|
|
|
# External STUN servers (your relay servers)
|
|
stuns:
|
|
- uri: "stun:relay-us.example.com:3478"
|
|
proto: "udp"
|
|
- uri: "stun:relay-eu.example.com:3478"
|
|
proto: "udp"
|
|
|
|
# External relay servers
|
|
relays:
|
|
addresses:
|
|
- "rels://relay-us.example.com:443"
|
|
- "rels://relay-eu.example.com:443"
|
|
secret: "your-shared-secret-here"
|
|
credentialsTTL: "24h"
|
|
|
|
auth:
|
|
enabled: true
|
|
issuer: "https://netbird.example.com/oauth2"
|
|
# ... rest of auth config
|
|
```
|
|
|
|
<Warning>
|
|
The `secret` under `relays` and the `NB_AUTH_SECRET` on all relay servers **must be identical**. Mismatched secrets will cause relay connections to fail silently.
|
|
</Warning>
|
|
|
|
### 2.2 Update docker-compose.yml (Optional)
|
|
|
|
If your main server was exposing STUN port 3478, you can remove it since STUN is now handled by external relays:
|
|
|
|
```yaml
|
|
netbird-server:
|
|
image: netbirdio/netbird-server:latest
|
|
container_name: netbird-server
|
|
restart: unless-stopped
|
|
networks: [netbird]
|
|
# Remove the STUN port - no longer needed
|
|
# ports:
|
|
# - '3478:3478/udp'
|
|
volumes:
|
|
- netbird_data:/var/lib/netbird
|
|
- ./config.yaml:/etc/netbird/config.yaml
|
|
command: ["--config", "/etc/netbird/config.yaml"]
|
|
```
|
|
|
|
### 2.3 Restart the Main Server
|
|
|
|
```bash
|
|
docker compose down
|
|
docker compose up -d
|
|
```
|
|
|
|
## Step 3: Verify the Configuration
|
|
|
|
### 3.1 Check Main Server Logs
|
|
|
|
```bash
|
|
docker compose logs netbird-server
|
|
```
|
|
|
|
Verify that the embedded relay is disabled and your external relay addresses are listed:
|
|
|
|
```
|
|
INFO combined/cmd/root.go: Management: true (log level: info)
|
|
INFO combined/cmd/root.go: Signal: true (log level: info)
|
|
INFO combined/cmd/root.go: Relay: false (log level: )
|
|
```
|
|
|
|
```
|
|
Relay addresses: [rels://relay-us.example.com:443 rels://relay-eu.example.com:443]
|
|
```
|
|
|
|
### 3.2 Check Peer Status
|
|
|
|
Connect a NetBird client and verify that both STUN and relay services are available:
|
|
|
|
```bash
|
|
netbird status -d
|
|
```
|
|
|
|
The output should list your external STUN and relay servers:
|
|
|
|
```
|
|
Relays:
|
|
[stun:relay-us.example.com:3478] is Available
|
|
[rels://relay-us.example.com:443] is Available
|
|
```
|
|
|
|
### 3.3 Test Relay Connectivity
|
|
|
|
You can force all peer connections through relay to verify it works end-to-end. On a client, run:
|
|
|
|
```bash
|
|
sudo netbird service reconfigure --service-env NB_FORCE_RELAY=true
|
|
```
|
|
|
|
Then test connectivity to another peer (e.g., with `ping`).
|
|
|
|
Once confirmed, switch back to normal mode. The client will attempt peer-to-peer connections first and fall back to relay only when direct connectivity isn't possible:
|
|
|
|
```bash
|
|
sudo netbird service reconfigure --service-env NB_FORCE_RELAY=false
|
|
```
|
|
|
|
## Configuration Reference
|
|
|
|
### Relay Server Environment Variables
|
|
|
|
| Variable | Required | Description |
|
|
|----------|----------|-------------|
|
|
| `NB_LISTEN_ADDRESS` | Yes | Address to listen on (e.g., `:443`) |
|
|
| `NB_EXPOSED_ADDRESS` | Yes | Public relay URL (`rels://` for TLS, `rel://` for plain) |
|
|
| `NB_AUTH_SECRET` | Yes | Shared authentication secret |
|
|
| `NB_ENABLE_STUN` | No | Enable embedded STUN server (`true`/`false`) |
|
|
| `NB_STUN_PORTS` | No | STUN UDP port(s), default `3478` |
|
|
| `NB_LETSENCRYPT_DOMAINS` | No | Domain(s) for automatic Let's Encrypt certificates |
|
|
| `NB_LETSENCRYPT_EMAIL` | No | Email for Let's Encrypt notifications |
|
|
| `NB_TLS_CERT_FILE` | No | Path to TLS certificate (alternative to Let's Encrypt) |
|
|
| `NB_TLS_KEY_FILE` | No | Path to TLS private key |
|
|
| `NB_LOG_LEVEL` | No | Log level: `debug`, `info`, `warn`, `error` |
|
|
|
|
### Main Server config.yaml - External Services
|
|
|
|
```yaml
|
|
server:
|
|
# External STUN servers
|
|
stuns:
|
|
- uri: "stun:hostname:port"
|
|
proto: "udp" # or "tcp"
|
|
|
|
# External relay servers
|
|
relays:
|
|
addresses:
|
|
- "rels://hostname:port" # TLS
|
|
- "rel://hostname:port" # Plain (not recommended)
|
|
secret: "shared-secret"
|
|
credentialsTTL: "24h" # How long relay credentials are valid
|
|
|
|
# External signal server (optional, usually keep embedded)
|
|
# signalUri: "https://signal.example.com:443"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Peers Can't Connect via Relay
|
|
|
|
1. **Check secrets match**: The `authSecret`/`NB_AUTH_SECRET` must be identical everywhere
|
|
2. **Check firewall**: Ensure port 443/tcp is open on relay servers
|
|
3. **Check TLS**: If using `rels://`, ensure TLS is properly configured
|
|
4. **Check logs**: `docker compose logs relay` on the relay server
|
|
|
|
### STUN Not Working
|
|
|
|
1. **Check UDP port**: Ensure port 3478/udp is open and not blocked by firewall
|
|
2. **Check NAT**: Some carrier-grade NATs block STUN; try a different network
|
|
3. **Verify STUN is enabled**: `NB_ENABLE_STUN=true` on relay servers
|
|
|
|
### Relay Shows as Unavailable
|
|
|
|
1. **DNS resolution**: Ensure the relay domain resolves correctly
|
|
2. **Port reachability**: Test with `nc -zv relay-us.example.com 443`
|
|
3. **Certificate issues**: Check Let's Encrypt logs or certificate validity
|
|
|
|
## Next Steps
|
|
|
|
- Add monitoring with Prometheus metrics (`NB_METRICS_PORT`)
|
|
- Set up health checks for container orchestration
|
|
- Consider geographic DNS for automatic relay selection
|
|
- Review [Reverse Proxy Configuration](/selfhosted/reverse-proxy) if placing relays behind a proxy
|
|
|
|
## See Also
|
|
|
|
- [Configuration Files Reference](/selfhosted/configuration-files) - Full config.yaml documentation
|
|
- [Self-hosting Quickstart](/selfhosted/selfhosted-quickstart) - Initial deployment guide
|
|
- [Troubleshooting](/selfhosted/troubleshooting) - Common issues and solutions |