feat: in-depth domain Resources documentation (#465)

* feat: in-depth domain Resources documentation

* feat: domain Resources troubleshooting section
This commit is contained in:
Krzysztof Nazarewski (kdn)
2025-11-13 17:48:40 +01:00
committed by GitHub
parent 155671a8ac
commit 6928d3fbff
2 changed files with 291 additions and 17 deletions

View File

@@ -51,6 +51,56 @@ IP addresses, IP ranges, domain names, or wildcard domains (e.g., *.company.inte
Support to exit nodes and site-2-site VPNs may become available in future releases. In the meantime you can use [Network routes](/how-to/routing-traffic-to-private-networks) add your exit-node routes and site-2-site routes. Support to exit nodes and site-2-site VPNs may become available in future releases. In the meantime you can use [Network routes](/how-to/routing-traffic-to-private-networks) add your exit-node routes and site-2-site routes.
</Note> </Note>
### Domain Resources
In addition to routing IP addresses, NetBird also supports routing domain names. In the Dashboard you can just pass
a domain name (eg: `example.com`) or a wildcard domain (eg: `*.example.com`) in place where you would normally
put an IP address range. Then NetBird clients will start responding to and routing the given domain.
Please consult the
[Debugging access to Domain Resources](/how-to/troubleshooting-client#debugging-access-to-domain-resources)
documentation to troubleshoot common issues with this type of resources yourself.
<Note>
Due to a mix of a bug and initial design choice clients running `0.59.0` & `0.59.1` might not be able to resolve
domain Resources served by Routing Peers running versions `0.59.0` to `0.59.9` in case when all the Peers in the
NetBird organization are running versions `0.59.0` or newer.
Installing client in versions `<= 0.58.2` or `>= 0.59.2` or upgrading a Routing Peer to version `0.59.10+` will
resolve this issue.
</Note>
On a technical level the feature works as follows:
1. Initially (when NetBird connects) the operating system is instructed to use NetBird to resolve the requested
domain(s). No routing rules are configured yet.
2. An Application (could be a web browser) requests a domain `example.com` from the Operating System
1. the Operating System requests a name from NetBird's Local DNS Forwarder, by default running on port `53` of:
- for MacOS & Windows: the highest available IP address in your NetBird range, usually `100.xxx.255.254:53`
- for other systems: local NetBird client's IP address, eg: `100.xxx.123.45`
2. the Local DNS Forwarder forwards the query to Remote DNS Resolver running on Routing Peer's address
and the following port:
- `22054` for version `0.59.0` and newer
- `5353` for versions below `0.58.x` and older
3. the Routing Peer resolves the domain name using its local configuration (often independent of NetBird) and returns
the result.
4. the Local DNS Forwarder sets up routing rules for IP addresses returned from the query,
before returning them to the Application
- see [Trigger the Domain Resource](/how-to/troubleshooting-client#trigger-the-domain-resource)
to observe this behaviour "in action".
3. the Application receives the result "as usual", except for a slight delay before all of the above takes place the
first time a domain name is requested,
4. all subsequent requests to `example.com` will be served instantly from the Local DNS Forwarder's cache
<Note>
NetBird tries its best to automatically open up DNS forwarder ports on Routing Peer's firewalls, but might fail on
some system configurations and you might need to open up above 2 ports manually.
You can verify that firewall allows the DNS request in using following command issued from the clients device
`nslookup -port=22054 <routed-domain> <routing-peer-ip>`, eg: `nslookup -port=22054 example.com 100.123.45.67`.
This is by far the most common cause of issues with domain Resources.
</Note>
## Manage access to resources ## Manage access to resources

View File

@@ -99,7 +99,8 @@ a [github issue](https://github.com/netbirdio/netbird/issues/new/choose) and att
A debug archive containing the recent logs and the status at the time of execution can be generated with the following A debug archive containing the recent logs and the status at the time of execution can be generated with the following
command. command.
Adding the `--anonymize (-A)` flag will anonymize the logs, removing sensitive information such as public IP addresses and domain Adding the `--anonymize (-A)` flag will anonymize the logs, removing sensitive information such as public IP addresses
and domain
names. In case you have tunneling issues, omitting the `--anonymize` flag might help our analysis. names. In case you have tunneling issues, omitting the `--anonymize` flag might help our analysis.
Adding the `--system-info (-S)` flag will add system information like network routes and interfaces Adding the `--system-info (-S)` flag will add system information like network routes and interfaces
@@ -119,6 +120,7 @@ the specified time has elapsed.
```shell ```shell
netbird debug for 5m --system-info netbird debug for 5m --system-info
``` ```
<Note> <Note>
The flag `--anonymize (-A)` can be used to anonymize IP addresses and non-netbird.io domains in logs and status output when needed. The flag `--anonymize (-A)` can be used to anonymize IP addresses and non-netbird.io domains in logs and status output when needed.
</Note> </Note>
@@ -127,17 +129,22 @@ To capture any issues arising during the `up` and `down` processes, this will se
netbird `up` and `down` up to a few times. netbird `up` and `down` up to a few times.
After 5 minutes the netbird status will be restored to the previous state and the debug bundle will be generated. After 5 minutes the netbird status will be restored to the previous state and the debug bundle will be generated.
### Debug bundle uploads ### Debug bundle uploads
Since version `0.43.1`, you can share debug bundle with the NetBird development team without local administrative privileges
Since version `0.43.1`, you can share debug bundle with the NetBird development team without local administrative
privileges
by using the `--upload-bundle (-U)` flag. by using the `--upload-bundle (-U)` flag.
It will securely generate and upload the debug bundle to our servers for access by the NetBird development team. See examples below: It will securely generate and upload the debug bundle to our servers for access by the NetBird development team. See
examples below:
Run debug for a specific time and upload the bundle: Run debug for a specific time and upload the bundle:
```shell ```shell
netbird debug for 1m --system-info --upload-bundle netbird debug for 1m --system-info --upload-bundle
``` ```
To generate a bundle without restarting the client and then uploading: To generate a bundle without restarting the client and then uploading:
```shell ```shell
netbird debug bundle --system-info --upload-bundle netbird debug bundle --system-info --upload-bundle
``` ```
@@ -152,13 +159,15 @@ Local file:
Upload file key: Upload file key:
1234567890ab27fb37c88b3b4be7011e22aa2e5ca6f38ffa9c4481884941f726/12345678-90ab-cdef-1234-567890abcdef 1234567890ab27fb37c88b3b4be7011e22aa2e5ca6f38ffa9c4481884941f726/12345678-90ab-cdef-1234-567890abcdef
``` ```
<Note> <Note>
The flag `--anonymize` can be used to anonymize IP addresses and non-netbird.io domains in logs and status output when needed. The flag `--anonymize` can be used to anonymize IP addresses and non-netbird.io domains in logs and status output when needed.
</Note> </Note>
### Debug bundle uploads with GUI ### Debug bundle uploads with GUI
Since version `0.43.2` users can upload their debug bundle via the GUI client. Since version `0.43.2` users can upload their debug bundle via the GUI client.
To generate a bundle via GUI, you can access the application then go to `Settings` > `Create Debug Bundle` and follow the wizard to upload the bundle: To generate a bundle via GUI, you can access the application then go to `Settings` > `Create Debug Bundle` and follow
the wizard to upload the bundle:
<p> <p>
<img src="/docs-static/img/troubleshooting-client/ui-settings.png" alt="service-user-overview" className="imagewrapper-big"/> <img src="/docs-static/img/troubleshooting-client/ui-settings.png" alt="service-user-overview" className="imagewrapper-big"/>
@@ -171,7 +180,8 @@ To generate a bundle via GUI, you can access the application then go to `Setting
</p> </p>
By default running with trace log enable before generating the bundle is selected. This will restart the client connections and provide a `disconnect to connected` information for our engineers. By default running with trace log enable before generating the bundle is selected. This will restart the client connections and provide a `disconnect to connected` information for our engineers.
If you uncheck this option, a bundle will be generated without running this step. Which is very useful when you have an issue that recovers when restarting the client. If you uncheck this option, a bundle will be generated without running this step. Which is very useful when you have an
issue that recovers when restarting the client.
<p> <p>
<img src="/docs-static/img/troubleshooting-client/ui-bundle-success.png" alt="service-user-overview" className="imagewrapper-big"/> <img src="/docs-static/img/troubleshooting-client/ui-bundle-success.png" alt="service-user-overview" className="imagewrapper-big"/>
</p> </p>
@@ -353,9 +363,11 @@ The most notable examples of encountering the issue are:
- the user makes a mistake and selects - the user makes a mistake and selects
- the user uses different browser/profile or selects the wrong account during SSO login at the start of the workday, - the user uses different browser/profile or selects the wrong account during SSO login at the start of the workday,
If you know the exact previous Peer which was logged in, you can just delete it from Dashboard without doing anything else and attempt login again. If you know the exact previous Peer which was logged in, you can just delete it from Dashboard without doing anything
else and attempt login again.
Otherwise, to resolve the issue, you will need to remove the file manually to use the machine as a different user/Setup Key while the NetBird client daemon is stopped: Otherwise, to resolve the issue, you will need to remove the file manually to use the machine as a different user/Setup
Key while the NetBird client daemon is stopped:
1. `netbird service stop` 1. `netbird service stop`
2. `sudo rm /var/lib/netbird/default.json` (*nix) or `rm C:\ProgramData\netbird\config.json` (Windows) 2. `sudo rm /var/lib/netbird/default.json` (*nix) or `rm C:\ProgramData\netbird\config.json` (Windows)
@@ -384,14 +396,14 @@ and following Netbird network resources:
- `peer-a`: end user's device running Netbird Client, - `peer-a`: end user's device running Netbird Client,
- `peer-b`: a linux server inside the internal network running Netbird Client, - `peer-b`: a linux server inside the internal network running Netbird Client,
- it has direct access to the whole `int-net1` IP range, - it has direct access to the whole `int-net1` IP range,
- `users:employees`: a Netbird Group containing `peer-a`, - `users:employees`: a Netbird Group containing `peer-a`,
- `routers:int-net1`: a Netbird Group containing `peer-b`, - `routers:int-net1`: a Netbird Group containing `peer-b`,
- `access:srv-c`: a Netbird Groups used as a target of ACL rules for `srv-c` only, - `access:srv-c`: a Netbird Groups used as a target of ACL rules for `srv-c` only,
- `access:int-net1`: a Netbird Groups used as a target of ACL rules for the whole subnet, - `access:int-net1`: a Netbird Groups used as a target of ACL rules for the whole subnet,
- `net-a`: a Netbird Network - `net-a`: a Netbird Network
- `net-a:srv-c`: a Network Resource handling traffic to `10.123.45.17/32` (`srv-c`), - `net-a:srv-c`: a Network Resource handling traffic to `10.123.45.17/32` (`srv-c`),
- `net-a:int-net1`: a Network Resource handling traffic to `10.123.45.0/24` (`int-net1`), - `net-a:int-net1`: a Network Resource handling traffic to `10.123.45.0/24` (`int-net1`),
- `route:int-net1`: a Netbird Network Route handling traffic to `10.123.45.0/24` (`int-net1`), - `route:int-net1`: a Netbird Network Route handling traffic to `10.123.45.0/24` (`int-net1`),
- `route:srv-c`: a Netbird Network Route handling traffic to `10.123.45.17/32` (`srv-c`), - `route:srv-c`: a Netbird Network Route handling traffic to `10.123.45.17/32` (`srv-c`),
@@ -454,8 +466,8 @@ For Netbird network routing resources configurations you can use either (new) _N
A Network `net-a` should have at minimum: A Network `net-a` should have at minimum:
- _Network Resource_: `net-a:srv-c` with either of: - _Network Resource_: `net-a:srv-c` with either of:
- an _Address_ set to `10.123.45.17/32` to configure route to `srv-c` exclusively and nothing else, - an _Address_ set to `10.123.45.17/32` to configure route to `srv-c` exclusively and nothing else,
- _Assigned Groups_ set to `access:srv-c` - _Assigned Groups_ set to `access:srv-c`
- _Routing Peer Group_ assigned to `routers:int-net1` - _Routing Peer Group_ assigned to `routers:int-net1`
A _Network Route_ `route:srv-c` should have at least: A _Network Route_ `route:srv-c` should have at least:
@@ -501,9 +513,9 @@ Just like with the previous section you can loosen the above example by:
- allowing `ALL` protocol, _Ports_ will become greyed out because all traffic will be allowed, - allowing `ALL` protocol, _Ports_ will become greyed out because all traffic will be allowed,
- creating a bidirectional rule (both arrows should be green), always true for the protocol `ALL`, - creating a bidirectional rule (both arrows should be green), always true for the protocol `ALL`,
- selecting a different source group from the pool assigned to `peer-a`, - selecting a different source group from the pool assigned to `peer-a`,
- it could be built-in `All` group, but it is discouraged, - it could be built-in `All` group, but it is discouraged,
- selecting a different destination group from the pool assigned to `peer-b`, - selecting a different destination group from the pool assigned to `peer-b`,
- it could be built-in `All` group, but it is discouraged, - it could be built-in `All` group, but it is discouraged,
#### Is `peer-a`'s operating system configured to use the route? #### Is `peer-a`'s operating system configured to use the route?
@@ -678,7 +690,7 @@ PS C:\Users\kdn> Get-DnsClientNrptPolicy
Namespace : .83.100.in-addr.arpa Namespace : .83.100.in-addr.arpa
... ...
NameServers : 100.83.255.254 NameServers : 100.83.255.254
.. ...
Namespace : .netbird.cloud Namespace : .netbird.cloud
... ...
@@ -752,7 +764,7 @@ You can validate whether this is the issue in your setup by performing following
3. resolve the domain, eg: `dscacheutil -q host -a name <domain>` 3. resolve the domain, eg: `dscacheutil -q host -a name <domain>`
4. `netbird up` / `Connect` 4. `netbird up` / `Connect`
5. check whether `dscacheutil -q host -a name <domain>` works 5. check whether `dscacheutil -q host -a name <domain>` works
- if it doesn't flush the cache and retry - if it doesn't flush the cache and retry
#### Verifying the nameservers are properly registered in Linux operating system #### Verifying the nameservers are properly registered in Linux operating system
@@ -795,3 +807,215 @@ To configure `int-dns2`, while following _Access from `peer-a` to `srv-c`_ secti
address range, address range,
To test the configuration in practice please refer to previous section _Public nameservers_. To test the configuration in practice please refer to previous section _Public nameservers_.
## Debugging access to Domain Resources
While we strive to make them "just work", there still are and will be cases of domain-based Resources not behaving
correctly. It can happen for myriad of reason starting with the client's local device management software or system
firewall, through Routing Peer issues (usually a firewall) and ending with a relatively simple Access Policies
misconfiguration and resulting lack of connectivity establishment.
This section will provide general directions for verifying connectivity on every step involved in handling
the Domain Resources, to better understand where issue might lie.
For in-depth overview of the mechanism please read [Domain Resources](/how-to/networks#domain-resources) section.
Analyzing those issues will take a "backwards" approach (based on the most common issues), where we will first confirm
that Routing Peer itself is working as expected and will check the client's operating system configuration as one of the
last steps.
For the remainder of the section let's assume:
- there is a `*.nb.test` Network Resource configured,
- we are trying to access a `srv.nb.test` domain,
- a `zxc.nb.test` domain does not exist, it's used to demonstrate errors,
- the Routing Peer's NetBird address is `100.83.136.209`
- it's named `brys-vm-nbt-ubuntu-isolated-02`, when referred in the outputs
- the client is named `brys-vm-nbt-ubuntu-01`, when referred in the outputs
- the client is running Ubuntu, but a lot of commands used are working uniformly across all platforms,
- it's IP address is `100.83.73.97`,
- on MacOS & Windows you would use `100.83.255.254` to access the local DNS forwarder instead,
- the Resource is running on `brys-vm-nbt-ubuntu-isolated-01`, when referred to in the outputs
- we will only check the new port `22054`, but steps might need repeating for port `5353` for legacy clients,
<Note>
Be aware that the port `5353` is a well known Multicast DNS port (aka Avahi aka Bonjour,
used for: printer sharing, Chromecast etc.) and therefore it might be occupied by another software
running on the machine. As a result (old) Routing Peers might be prevented from routing Domain Resources.
While not an issue in the regular server operations, it might come as a surprise to find that the port `5353`
is occupied by a Chrome (and it's derivatives) Web Browser on your remotely accessible Windows Server machine.
This is the primary reason we have switched to the new port `22054`. We strongly advise you to update your fleet
to the latest version (no older than `0.59.10`) to address this issue.
</Note>
### Is Routing Peer correctly resolving queries?
While in practice it almost never the issue, it is always good to double-check whether the Routing Peer itself is able
to resolve the requested domain as-is and whether it can access the target resource.
Please refer
to [Verifying the DNS names resolve properly in practice](#verifying-the-dns-names-resolve-properly-in-practice)
section for operating-system specific commands while adjusting domain for `srv.nb.test`.
It also would not hurt to check whether the Routing Peer has an actual network access to the routed resource with:
For TCP services you should see something like this:
```shell
kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 srv.nb.test 80
Connection to srv.nb.test (192.168.100.10) 80 port [tcp/http] succeeded!
kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 srv.nb.test 12345
nc: connect to srv.nb.test (192.168.100.10) port 12345 (tcp) failed: Connection refused
```
For UDP you can use:
```shell
kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 -u srv.nb.test 12345 ; echo $?
Connection to srv.nb.test (192.168.100.10) 12345 port [udp/*] succeeded!
0
kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 -u srv.nb.test 12347 ; echo $?
1
```
### Is the remote DNS resolver accessible to the client?
We want to confirm that a client Peer can reach and use the Routing Peer's DNS resolver, this step will rule out any
firewall-related issues with the Routing Peer. If the following command fails you will need to open up a port `22054`
in the Routing Peer's firewall software.
```shell
kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 -port=22054 srv.nb.test 100.83.136.209
Server: 100.83.136.209
Address: 100.83.136.209#22054
Non-authoritative answer:
Name: srv.nb.test
Address: 192.168.100.10
kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 -port=22054 zxc.nb.test 100.83.136.209
Server: 100.83.136.209
Address: 100.83.136.209#22054
** server can't find zxc.nb.test: NXDOMAIN
```
### Trigger the Domain Resource
I have yet to see a local DNS forwarder fail, but using it is a good way of forcing the NetBird client to set up
routing for the domain (see the [Domain Resources](/how-to/networks#domain-resources) for explanation).
<Note>
On MacOS & Windows the IP address would always be `100.83.255.254` instead of `100.83.73.97`.
</Note>
Take a note of the IP addresses being initially missing from the routing table (`ip route show` on Linux), but
get added after resolving the domain for the first time using the local DNS Forwarder.
```shell
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
Available Networks:
- ID: *.nb.test
Domains: *.nb.test
Status: Selected
Resolved IPs: -
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 srv.nb.test 100.83.73.97
Server: 100.83.73.97
Address: 100.83.73.97#53
Non-authoritative answer:
Name: srv.nb.test
Address: 192.168.100.10
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
192.168.100.10 dev wt0 table 7120
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
Available Networks:
- ID: *.nb.test
Domains: *.nb.test
Status: Selected
Resolved IPs:
[srv.nb.test.]: 192.168.100.10
```
### Verifying the Domain Resource registration with the Operating System
After we have confirmed **everything** is working within NetBird's scope of operation, let's restart NetBird and
check whether the Operating System's default DNS resolver is resolving the Domain Resource correctly.
<Note>
See [Debugging access to network resources > Public nameservers](#public-nameservers) for the equivalent
MacOS and Windows debugging steps.
</Note>
<Note>
You might be surprised by a simple `netbird down` followed by `netbird up` not clearing the `Resolved IPs`:
```shell
kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird down
Disconnected
kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird up
Connected
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
Available Networks:
- ID: *.nb.test
Domains: *.nb.test
Status: Selected
Resolved IPs:
[srv.nb.test.]: 192.168.100.10
```
Don't be alarmed, this is working as expected (the results are simply stored within the client daemon's
in-memory cache), but routing rules are still properly cleared:
```shell
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
kdn@brys-vm-nbt-ubuntu-01:~$
```
</Note>
We will start "from scratch", by restarting the whole NetBird service to purge all caches and proceed with the tests:
```shell
kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird service restart
NetBird service has been restarted
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
Available Networks:
- ID: *.nb.test
Domains: *.nb.test
Status: Selected
Resolved IPs: -
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
kdn@brys-vm-nbt-ubuntu-01:~$ resolvectl query srv.nb.test
srv.nb.test: 192.168.100.10 -- link: wt0
-- Information acquired via protocol DNS in 8.1ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
192.168.100.10 dev wt0 table 7120
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
Available Networks:
- ID: *.nb.test
Domains: *.nb.test
Status: Selected
Resolved IPs:
[srv.nb.test.]: 192.168.100.10
```
<Note>
Be aware that operating system resolver might not be the only source of domains, but querying through it is
a hard requirement for getting Domain Resources to start working.
Different applications (most notably web browsers) can cache this information internally and therefore never
activate the Domain Resource routing.
While we can (and do successfully) clear the operating system resolver's caches, there is unfortunately no way to
instruct regular applications to do the same.
</Note>