mirror of
https://github.com/netbirdio/docs.git
synced 2026-04-16 07:26:35 +00:00
feat: in-depth domain Resources documentation (#465)
* feat: in-depth domain Resources documentation * feat: domain Resources troubleshooting section
This commit is contained in:
committed by
GitHub
parent
155671a8ac
commit
6928d3fbff
@@ -51,6 +51,56 @@ IP addresses, IP ranges, domain names, or wildcard domains (e.g., *.company.inte
|
||||
Support to exit nodes and site-2-site VPNs may become available in future releases. In the meantime you can use [Network routes](/how-to/routing-traffic-to-private-networks) add your exit-node routes and site-2-site routes.
|
||||
</Note>
|
||||
|
||||
### Domain Resources
|
||||
|
||||
In addition to routing IP addresses, NetBird also supports routing domain names. In the Dashboard you can just pass
|
||||
a domain name (eg: `example.com`) or a wildcard domain (eg: `*.example.com`) in place where you would normally
|
||||
put an IP address range. Then NetBird clients will start responding to and routing the given domain.
|
||||
|
||||
Please consult the
|
||||
[Debugging access to Domain Resources](/how-to/troubleshooting-client#debugging-access-to-domain-resources)
|
||||
documentation to troubleshoot common issues with this type of resources yourself.
|
||||
|
||||
<Note>
|
||||
Due to a mix of a bug and initial design choice clients running `0.59.0` & `0.59.1` might not be able to resolve
|
||||
domain Resources served by Routing Peers running versions `0.59.0` to `0.59.9` in case when all the Peers in the
|
||||
NetBird organization are running versions `0.59.0` or newer.
|
||||
|
||||
Installing client in versions `<= 0.58.2` or `>= 0.59.2` or upgrading a Routing Peer to version `0.59.10+` will
|
||||
resolve this issue.
|
||||
</Note>
|
||||
|
||||
On a technical level the feature works as follows:
|
||||
|
||||
1. Initially (when NetBird connects) the operating system is instructed to use NetBird to resolve the requested
|
||||
domain(s). No routing rules are configured yet.
|
||||
2. An Application (could be a web browser) requests a domain `example.com` from the Operating System
|
||||
1. the Operating System requests a name from NetBird's Local DNS Forwarder, by default running on port `53` of:
|
||||
- for MacOS & Windows: the highest available IP address in your NetBird range, usually `100.xxx.255.254:53`
|
||||
- for other systems: local NetBird client's IP address, eg: `100.xxx.123.45`
|
||||
2. the Local DNS Forwarder forwards the query to Remote DNS Resolver running on Routing Peer's address
|
||||
and the following port:
|
||||
- `22054` for version `0.59.0` and newer
|
||||
- `5353` for versions below `0.58.x` and older
|
||||
3. the Routing Peer resolves the domain name using its local configuration (often independent of NetBird) and returns
|
||||
the result.
|
||||
4. the Local DNS Forwarder sets up routing rules for IP addresses returned from the query,
|
||||
before returning them to the Application
|
||||
- see [Trigger the Domain Resource](/how-to/troubleshooting-client#trigger-the-domain-resource)
|
||||
to observe this behaviour "in action".
|
||||
3. the Application receives the result "as usual", except for a slight delay before all of the above takes place the
|
||||
first time a domain name is requested,
|
||||
4. all subsequent requests to `example.com` will be served instantly from the Local DNS Forwarder's cache
|
||||
|
||||
<Note>
|
||||
NetBird tries its best to automatically open up DNS forwarder ports on Routing Peer's firewalls, but might fail on
|
||||
some system configurations and you might need to open up above 2 ports manually.
|
||||
|
||||
You can verify that firewall allows the DNS request in using following command issued from the clients device
|
||||
`nslookup -port=22054 <routed-domain> <routing-peer-ip>`, eg: `nslookup -port=22054 example.com 100.123.45.67`.
|
||||
|
||||
This is by far the most common cause of issues with domain Resources.
|
||||
</Note>
|
||||
|
||||
## Manage access to resources
|
||||
|
||||
|
||||
@@ -99,7 +99,8 @@ a [github issue](https://github.com/netbirdio/netbird/issues/new/choose) and att
|
||||
A debug archive containing the recent logs and the status at the time of execution can be generated with the following
|
||||
command.
|
||||
|
||||
Adding the `--anonymize (-A)` flag will anonymize the logs, removing sensitive information such as public IP addresses and domain
|
||||
Adding the `--anonymize (-A)` flag will anonymize the logs, removing sensitive information such as public IP addresses
|
||||
and domain
|
||||
names. In case you have tunneling issues, omitting the `--anonymize` flag might help our analysis.
|
||||
Adding the `--system-info (-S)` flag will add system information like network routes and interfaces
|
||||
|
||||
@@ -119,6 +120,7 @@ the specified time has elapsed.
|
||||
```shell
|
||||
netbird debug for 5m --system-info
|
||||
```
|
||||
|
||||
<Note>
|
||||
The flag `--anonymize (-A)` can be used to anonymize IP addresses and non-netbird.io domains in logs and status output when needed.
|
||||
</Note>
|
||||
@@ -127,17 +129,22 @@ To capture any issues arising during the `up` and `down` processes, this will se
|
||||
netbird `up` and `down` up to a few times.
|
||||
After 5 minutes the netbird status will be restored to the previous state and the debug bundle will be generated.
|
||||
|
||||
|
||||
### Debug bundle uploads
|
||||
Since version `0.43.1`, you can share debug bundle with the NetBird development team without local administrative privileges
|
||||
|
||||
Since version `0.43.1`, you can share debug bundle with the NetBird development team without local administrative
|
||||
privileges
|
||||
by using the `--upload-bundle (-U)` flag.
|
||||
It will securely generate and upload the debug bundle to our servers for access by the NetBird development team. See examples below:
|
||||
It will securely generate and upload the debug bundle to our servers for access by the NetBird development team. See
|
||||
examples below:
|
||||
|
||||
Run debug for a specific time and upload the bundle:
|
||||
|
||||
```shell
|
||||
netbird debug for 1m --system-info --upload-bundle
|
||||
```
|
||||
|
||||
To generate a bundle without restarting the client and then uploading:
|
||||
|
||||
```shell
|
||||
netbird debug bundle --system-info --upload-bundle
|
||||
```
|
||||
@@ -152,13 +159,15 @@ Local file:
|
||||
Upload file key:
|
||||
1234567890ab27fb37c88b3b4be7011e22aa2e5ca6f38ffa9c4481884941f726/12345678-90ab-cdef-1234-567890abcdef
|
||||
```
|
||||
|
||||
<Note>
|
||||
The flag `--anonymize` can be used to anonymize IP addresses and non-netbird.io domains in logs and status output when needed.
|
||||
</Note>
|
||||
### Debug bundle uploads with GUI
|
||||
Since version `0.43.2` users can upload their debug bundle via the GUI client.
|
||||
|
||||
To generate a bundle via GUI, you can access the application then go to `Settings` > `Create Debug Bundle` and follow the wizard to upload the bundle:
|
||||
To generate a bundle via GUI, you can access the application then go to `Settings` > `Create Debug Bundle` and follow
|
||||
the wizard to upload the bundle:
|
||||
|
||||
<p>
|
||||
<img src="/docs-static/img/troubleshooting-client/ui-settings.png" alt="service-user-overview" className="imagewrapper-big"/>
|
||||
@@ -171,7 +180,8 @@ To generate a bundle via GUI, you can access the application then go to `Setting
|
||||
</p>
|
||||
By default running with trace log enable before generating the bundle is selected. This will restart the client connections and provide a `disconnect to connected` information for our engineers.
|
||||
|
||||
If you uncheck this option, a bundle will be generated without running this step. Which is very useful when you have an issue that recovers when restarting the client.
|
||||
If you uncheck this option, a bundle will be generated without running this step. Which is very useful when you have an
|
||||
issue that recovers when restarting the client.
|
||||
<p>
|
||||
<img src="/docs-static/img/troubleshooting-client/ui-bundle-success.png" alt="service-user-overview" className="imagewrapper-big"/>
|
||||
</p>
|
||||
@@ -353,9 +363,11 @@ The most notable examples of encountering the issue are:
|
||||
- the user makes a mistake and selects
|
||||
- the user uses different browser/profile or selects the wrong account during SSO login at the start of the workday,
|
||||
|
||||
If you know the exact previous Peer which was logged in, you can just delete it from Dashboard without doing anything else and attempt login again.
|
||||
If you know the exact previous Peer which was logged in, you can just delete it from Dashboard without doing anything
|
||||
else and attempt login again.
|
||||
|
||||
Otherwise, to resolve the issue, you will need to remove the file manually to use the machine as a different user/Setup Key while the NetBird client daemon is stopped:
|
||||
Otherwise, to resolve the issue, you will need to remove the file manually to use the machine as a different user/Setup
|
||||
Key while the NetBird client daemon is stopped:
|
||||
|
||||
1. `netbird service stop`
|
||||
2. `sudo rm /var/lib/netbird/default.json` (*nix) or `rm C:\ProgramData\netbird\config.json` (Windows)
|
||||
@@ -384,14 +396,14 @@ and following Netbird network resources:
|
||||
|
||||
- `peer-a`: end user's device running Netbird Client,
|
||||
- `peer-b`: a linux server inside the internal network running Netbird Client,
|
||||
- it has direct access to the whole `int-net1` IP range,
|
||||
- it has direct access to the whole `int-net1` IP range,
|
||||
- `users:employees`: a Netbird Group containing `peer-a`,
|
||||
- `routers:int-net1`: a Netbird Group containing `peer-b`,
|
||||
- `access:srv-c`: a Netbird Groups used as a target of ACL rules for `srv-c` only,
|
||||
- `access:int-net1`: a Netbird Groups used as a target of ACL rules for the whole subnet,
|
||||
- `net-a`: a Netbird Network
|
||||
- `net-a:srv-c`: a Network Resource handling traffic to `10.123.45.17/32` (`srv-c`),
|
||||
- `net-a:int-net1`: a Network Resource handling traffic to `10.123.45.0/24` (`int-net1`),
|
||||
- `net-a:srv-c`: a Network Resource handling traffic to `10.123.45.17/32` (`srv-c`),
|
||||
- `net-a:int-net1`: a Network Resource handling traffic to `10.123.45.0/24` (`int-net1`),
|
||||
- `route:int-net1`: a Netbird Network Route handling traffic to `10.123.45.0/24` (`int-net1`),
|
||||
- `route:srv-c`: a Netbird Network Route handling traffic to `10.123.45.17/32` (`srv-c`),
|
||||
|
||||
@@ -454,8 +466,8 @@ For Netbird network routing resources configurations you can use either (new) _N
|
||||
A Network `net-a` should have at minimum:
|
||||
|
||||
- _Network Resource_: `net-a:srv-c` with either of:
|
||||
- an _Address_ set to `10.123.45.17/32` to configure route to `srv-c` exclusively and nothing else,
|
||||
- _Assigned Groups_ set to `access:srv-c`
|
||||
- an _Address_ set to `10.123.45.17/32` to configure route to `srv-c` exclusively and nothing else,
|
||||
- _Assigned Groups_ set to `access:srv-c`
|
||||
- _Routing Peer Group_ assigned to `routers:int-net1`
|
||||
|
||||
A _Network Route_ `route:srv-c` should have at least:
|
||||
@@ -501,9 +513,9 @@ Just like with the previous section you can loosen the above example by:
|
||||
- allowing `ALL` protocol, _Ports_ will become greyed out because all traffic will be allowed,
|
||||
- creating a bidirectional rule (both arrows should be green), always true for the protocol `ALL`,
|
||||
- selecting a different source group from the pool assigned to `peer-a`,
|
||||
- it could be built-in `All` group, but it is discouraged,
|
||||
- it could be built-in `All` group, but it is discouraged,
|
||||
- selecting a different destination group from the pool assigned to `peer-b`,
|
||||
- it could be built-in `All` group, but it is discouraged,
|
||||
- it could be built-in `All` group, but it is discouraged,
|
||||
|
||||
#### Is `peer-a`'s operating system configured to use the route?
|
||||
|
||||
@@ -678,7 +690,7 @@ PS C:\Users\kdn> Get-DnsClientNrptPolicy
|
||||
Namespace : .83.100.in-addr.arpa
|
||||
...
|
||||
NameServers : 100.83.255.254
|
||||
..
|
||||
...
|
||||
|
||||
Namespace : .netbird.cloud
|
||||
...
|
||||
@@ -752,7 +764,7 @@ You can validate whether this is the issue in your setup by performing following
|
||||
3. resolve the domain, eg: `dscacheutil -q host -a name <domain>`
|
||||
4. `netbird up` / `Connect`
|
||||
5. check whether `dscacheutil -q host -a name <domain>` works
|
||||
- if it doesn't flush the cache and retry
|
||||
- if it doesn't flush the cache and retry
|
||||
|
||||
#### Verifying the nameservers are properly registered in Linux operating system
|
||||
|
||||
@@ -795,3 +807,215 @@ To configure `int-dns2`, while following _Access from `peer-a` to `srv-c`_ secti
|
||||
address range,
|
||||
|
||||
To test the configuration in practice please refer to previous section _Public nameservers_.
|
||||
|
||||
## Debugging access to Domain Resources
|
||||
|
||||
While we strive to make them "just work", there still are and will be cases of domain-based Resources not behaving
|
||||
correctly. It can happen for myriad of reason starting with the client's local device management software or system
|
||||
firewall, through Routing Peer issues (usually a firewall) and ending with a relatively simple Access Policies
|
||||
misconfiguration and resulting lack of connectivity establishment.
|
||||
This section will provide general directions for verifying connectivity on every step involved in handling
|
||||
the Domain Resources, to better understand where issue might lie.
|
||||
|
||||
For in-depth overview of the mechanism please read [Domain Resources](/how-to/networks#domain-resources) section.
|
||||
|
||||
Analyzing those issues will take a "backwards" approach (based on the most common issues), where we will first confirm
|
||||
that Routing Peer itself is working as expected and will check the client's operating system configuration as one of the
|
||||
last steps.
|
||||
|
||||
For the remainder of the section let's assume:
|
||||
|
||||
- there is a `*.nb.test` Network Resource configured,
|
||||
- we are trying to access a `srv.nb.test` domain,
|
||||
- a `zxc.nb.test` domain does not exist, it's used to demonstrate errors,
|
||||
- the Routing Peer's NetBird address is `100.83.136.209`
|
||||
- it's named `brys-vm-nbt-ubuntu-isolated-02`, when referred in the outputs
|
||||
- the client is named `brys-vm-nbt-ubuntu-01`, when referred in the outputs
|
||||
- the client is running Ubuntu, but a lot of commands used are working uniformly across all platforms,
|
||||
- it's IP address is `100.83.73.97`,
|
||||
- on MacOS & Windows you would use `100.83.255.254` to access the local DNS forwarder instead,
|
||||
- the Resource is running on `brys-vm-nbt-ubuntu-isolated-01`, when referred to in the outputs
|
||||
- we will only check the new port `22054`, but steps might need repeating for port `5353` for legacy clients,
|
||||
|
||||
<Note>
|
||||
Be aware that the port `5353` is a well known Multicast DNS port (aka Avahi aka Bonjour,
|
||||
used for: printer sharing, Chromecast etc.) and therefore it might be occupied by another software
|
||||
running on the machine. As a result (old) Routing Peers might be prevented from routing Domain Resources.
|
||||
|
||||
While not an issue in the regular server operations, it might come as a surprise to find that the port `5353`
|
||||
is occupied by a Chrome (and it's derivatives) Web Browser on your remotely accessible Windows Server machine.
|
||||
|
||||
This is the primary reason we have switched to the new port `22054`. We strongly advise you to update your fleet
|
||||
to the latest version (no older than `0.59.10`) to address this issue.
|
||||
</Note>
|
||||
|
||||
### Is Routing Peer correctly resolving queries?
|
||||
|
||||
While in practice it almost never the issue, it is always good to double-check whether the Routing Peer itself is able
|
||||
to resolve the requested domain as-is and whether it can access the target resource.
|
||||
|
||||
Please refer
|
||||
to [Verifying the DNS names resolve properly in practice](#verifying-the-dns-names-resolve-properly-in-practice)
|
||||
section for operating-system specific commands while adjusting domain for `srv.nb.test`.
|
||||
|
||||
It also would not hurt to check whether the Routing Peer has an actual network access to the routed resource with:
|
||||
|
||||
For TCP services you should see something like this:
|
||||
|
||||
```shell
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 srv.nb.test 80
|
||||
Connection to srv.nb.test (192.168.100.10) 80 port [tcp/http] succeeded!
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 srv.nb.test 12345
|
||||
nc: connect to srv.nb.test (192.168.100.10) port 12345 (tcp) failed: Connection refused
|
||||
```
|
||||
|
||||
For UDP you can use:
|
||||
|
||||
```shell
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 -u srv.nb.test 12345 ; echo $?
|
||||
Connection to srv.nb.test (192.168.100.10) 12345 port [udp/*] succeeded!
|
||||
0
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 -u srv.nb.test 12347 ; echo $?
|
||||
1
|
||||
```
|
||||
|
||||
### Is the remote DNS resolver accessible to the client?
|
||||
|
||||
We want to confirm that a client Peer can reach and use the Routing Peer's DNS resolver, this step will rule out any
|
||||
firewall-related issues with the Routing Peer. If the following command fails you will need to open up a port `22054`
|
||||
in the Routing Peer's firewall software.
|
||||
|
||||
```shell
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 -port=22054 srv.nb.test 100.83.136.209
|
||||
Server: 100.83.136.209
|
||||
Address: 100.83.136.209#22054
|
||||
|
||||
Non-authoritative answer:
|
||||
Name: srv.nb.test
|
||||
Address: 192.168.100.10
|
||||
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 -port=22054 zxc.nb.test 100.83.136.209
|
||||
Server: 100.83.136.209
|
||||
Address: 100.83.136.209#22054
|
||||
|
||||
** server can't find zxc.nb.test: NXDOMAIN
|
||||
|
||||
```
|
||||
|
||||
### Trigger the Domain Resource
|
||||
|
||||
I have yet to see a local DNS forwarder fail, but using it is a good way of forcing the NetBird client to set up
|
||||
routing for the domain (see the [Domain Resources](/how-to/networks#domain-resources) for explanation).
|
||||
|
||||
<Note>
|
||||
On MacOS & Windows the IP address would always be `100.83.255.254` instead of `100.83.73.97`.
|
||||
</Note>
|
||||
|
||||
Take a note of the IP addresses being initially missing from the routing table (`ip route show` on Linux), but
|
||||
get added after resolving the domain for the first time using the local DNS Forwarder.
|
||||
|
||||
```shell
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
|
||||
Available Networks:
|
||||
|
||||
- ID: *.nb.test
|
||||
Domains: *.nb.test
|
||||
Status: Selected
|
||||
Resolved IPs: -
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 srv.nb.test 100.83.73.97
|
||||
Server: 100.83.73.97
|
||||
Address: 100.83.73.97#53
|
||||
|
||||
Non-authoritative answer:
|
||||
Name: srv.nb.test
|
||||
Address: 192.168.100.10
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
|
||||
192.168.100.10 dev wt0 table 7120
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
|
||||
Available Networks:
|
||||
|
||||
- ID: *.nb.test
|
||||
Domains: *.nb.test
|
||||
Status: Selected
|
||||
Resolved IPs:
|
||||
[srv.nb.test.]: 192.168.100.10
|
||||
```
|
||||
|
||||
### Verifying the Domain Resource registration with the Operating System
|
||||
|
||||
After we have confirmed **everything** is working within NetBird's scope of operation, let's restart NetBird and
|
||||
check whether the Operating System's default DNS resolver is resolving the Domain Resource correctly.
|
||||
|
||||
<Note>
|
||||
See [Debugging access to network resources > Public nameservers](#public-nameservers) for the equivalent
|
||||
MacOS and Windows debugging steps.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
You might be surprised by a simple `netbird down` followed by `netbird up` not clearing the `Resolved IPs`:
|
||||
|
||||
```shell
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird down
|
||||
Disconnected
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird up
|
||||
Connected
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
|
||||
Available Networks:
|
||||
|
||||
- ID: *.nb.test
|
||||
Domains: *.nb.test
|
||||
Status: Selected
|
||||
Resolved IPs:
|
||||
[srv.nb.test.]: 192.168.100.10
|
||||
```
|
||||
|
||||
Don't be alarmed, this is working as expected (the results are simply stored within the client daemon's
|
||||
in-memory cache), but routing rules are still properly cleared:
|
||||
```shell
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$
|
||||
```
|
||||
</Note>
|
||||
|
||||
We will start "from scratch", by restarting the whole NetBird service to purge all caches and proceed with the tests:
|
||||
|
||||
```shell
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird service restart
|
||||
NetBird service has been restarted
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
|
||||
Available Networks:
|
||||
|
||||
- ID: *.nb.test
|
||||
Domains: *.nb.test
|
||||
Status: Selected
|
||||
Resolved IPs: -
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ resolvectl query srv.nb.test
|
||||
srv.nb.test: 192.168.100.10 -- link: wt0
|
||||
|
||||
-- Information acquired via protocol DNS in 8.1ms.
|
||||
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
|
||||
-- Data from: network
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100
|
||||
192.168.100.10 dev wt0 table 7120
|
||||
kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls
|
||||
Available Networks:
|
||||
|
||||
- ID: *.nb.test
|
||||
Domains: *.nb.test
|
||||
Status: Selected
|
||||
Resolved IPs:
|
||||
[srv.nb.test.]: 192.168.100.10
|
||||
```
|
||||
|
||||
<Note>
|
||||
Be aware that operating system resolver might not be the only source of domains, but querying through it is
|
||||
a hard requirement for getting Domain Resources to start working.
|
||||
|
||||
Different applications (most notably web browsers) can cache this information internally and therefore never
|
||||
activate the Domain Resource routing.
|
||||
|
||||
While we can (and do successfully) clear the operating system resolver's caches, there is unfortunately no way to
|
||||
instruct regular applications to do the same.
|
||||
</Note>
|
||||
Reference in New Issue
Block a user