diff --git a/src/pages/how-to/networks.mdx b/src/pages/how-to/networks.mdx index 138d1c12..9c60a17f 100644 --- a/src/pages/how-to/networks.mdx +++ b/src/pages/how-to/networks.mdx @@ -51,6 +51,56 @@ IP addresses, IP ranges, domain names, or wildcard domains (e.g., *.company.inte Support to exit nodes and site-2-site VPNs may become available in future releases. In the meantime you can use [Network routes](/how-to/routing-traffic-to-private-networks) add your exit-node routes and site-2-site routes. +### Domain Resources + +In addition to routing IP addresses, NetBird also supports routing domain names. In the Dashboard you can just pass +a domain name (eg: `example.com`) or a wildcard domain (eg: `*.example.com`) in place where you would normally +put an IP address range. Then NetBird clients will start responding to and routing the given domain. + +Please consult the +[Debugging access to Domain Resources](/how-to/troubleshooting-client#debugging-access-to-domain-resources) +documentation to troubleshoot common issues with this type of resources yourself. + + + Due to a mix of a bug and initial design choice clients running `0.59.0` & `0.59.1` might not be able to resolve + domain Resources served by Routing Peers running versions `0.59.0` to `0.59.9` in case when all the Peers in the + NetBird organization are running versions `0.59.0` or newer. + + Installing client in versions `<= 0.58.2` or `>= 0.59.2` or upgrading a Routing Peer to version `0.59.10+` will + resolve this issue. + + +On a technical level the feature works as follows: + +1. Initially (when NetBird connects) the operating system is instructed to use NetBird to resolve the requested + domain(s). No routing rules are configured yet. +2. An Application (could be a web browser) requests a domain `example.com` from the Operating System + 1. the Operating System requests a name from NetBird's Local DNS Forwarder, by default running on port `53` of: + - for MacOS & Windows: the highest available IP address in your NetBird range, usually `100.xxx.255.254:53` + - for other systems: local NetBird client's IP address, eg: `100.xxx.123.45` + 2. the Local DNS Forwarder forwards the query to Remote DNS Resolver running on Routing Peer's address + and the following port: + - `22054` for version `0.59.0` and newer + - `5353` for versions below `0.58.x` and older + 3. the Routing Peer resolves the domain name using its local configuration (often independent of NetBird) and returns + the result. + 4. the Local DNS Forwarder sets up routing rules for IP addresses returned from the query, + before returning them to the Application + - see [Trigger the Domain Resource](/how-to/troubleshooting-client#trigger-the-domain-resource) + to observe this behaviour "in action". +3. the Application receives the result "as usual", except for a slight delay before all of the above takes place the + first time a domain name is requested, +4. all subsequent requests to `example.com` will be served instantly from the Local DNS Forwarder's cache + + + NetBird tries its best to automatically open up DNS forwarder ports on Routing Peer's firewalls, but might fail on + some system configurations and you might need to open up above 2 ports manually. + + You can verify that firewall allows the DNS request in using following command issued from the clients device + `nslookup -port=22054 `, eg: `nslookup -port=22054 example.com 100.123.45.67`. + + This is by far the most common cause of issues with domain Resources. + ## Manage access to resources diff --git a/src/pages/how-to/troubleshooting-client.mdx b/src/pages/how-to/troubleshooting-client.mdx index 7728fbaf..ac5f73dd 100644 --- a/src/pages/how-to/troubleshooting-client.mdx +++ b/src/pages/how-to/troubleshooting-client.mdx @@ -99,7 +99,8 @@ a [github issue](https://github.com/netbirdio/netbird/issues/new/choose) and att A debug archive containing the recent logs and the status at the time of execution can be generated with the following command. -Adding the `--anonymize (-A)` flag will anonymize the logs, removing sensitive information such as public IP addresses and domain +Adding the `--anonymize (-A)` flag will anonymize the logs, removing sensitive information such as public IP addresses +and domain names. In case you have tunneling issues, omitting the `--anonymize` flag might help our analysis. Adding the `--system-info (-S)` flag will add system information like network routes and interfaces @@ -119,6 +120,7 @@ the specified time has elapsed. ```shell netbird debug for 5m --system-info ``` + The flag `--anonymize (-A)` can be used to anonymize IP addresses and non-netbird.io domains in logs and status output when needed. @@ -127,17 +129,22 @@ To capture any issues arising during the `up` and `down` processes, this will se netbird `up` and `down` up to a few times. After 5 minutes the netbird status will be restored to the previous state and the debug bundle will be generated. - ### Debug bundle uploads -Since version `0.43.1`, you can share debug bundle with the NetBird development team without local administrative privileges + +Since version `0.43.1`, you can share debug bundle with the NetBird development team without local administrative +privileges by using the `--upload-bundle (-U)` flag. -It will securely generate and upload the debug bundle to our servers for access by the NetBird development team. See examples below: +It will securely generate and upload the debug bundle to our servers for access by the NetBird development team. See +examples below: Run debug for a specific time and upload the bundle: + ```shell netbird debug for 1m --system-info --upload-bundle ``` + To generate a bundle without restarting the client and then uploading: + ```shell netbird debug bundle --system-info --upload-bundle ``` @@ -152,13 +159,15 @@ Local file: Upload file key: 1234567890ab27fb37c88b3b4be7011e22aa2e5ca6f38ffa9c4481884941f726/12345678-90ab-cdef-1234-567890abcdef ``` + The flag `--anonymize` can be used to anonymize IP addresses and non-netbird.io domains in logs and status output when needed. ### Debug bundle uploads with GUI Since version `0.43.2` users can upload their debug bundle via the GUI client. -To generate a bundle via GUI, you can access the application then go to `Settings` > `Create Debug Bundle` and follow the wizard to upload the bundle: +To generate a bundle via GUI, you can access the application then go to `Settings` > `Create Debug Bundle` and follow +the wizard to upload the bundle:

service-user-overview @@ -171,7 +180,8 @@ To generate a bundle via GUI, you can access the application then go to `Setting

By default running with trace log enable before generating the bundle is selected. This will restart the client connections and provide a `disconnect to connected` information for our engineers. -If you uncheck this option, a bundle will be generated without running this step. Which is very useful when you have an issue that recovers when restarting the client. +If you uncheck this option, a bundle will be generated without running this step. Which is very useful when you have an +issue that recovers when restarting the client.

service-user-overview

@@ -353,9 +363,11 @@ The most notable examples of encountering the issue are: - the user makes a mistake and selects - the user uses different browser/profile or selects the wrong account during SSO login at the start of the workday, -If you know the exact previous Peer which was logged in, you can just delete it from Dashboard without doing anything else and attempt login again. +If you know the exact previous Peer which was logged in, you can just delete it from Dashboard without doing anything +else and attempt login again. -Otherwise, to resolve the issue, you will need to remove the file manually to use the machine as a different user/Setup Key while the NetBird client daemon is stopped: +Otherwise, to resolve the issue, you will need to remove the file manually to use the machine as a different user/Setup +Key while the NetBird client daemon is stopped: 1. `netbird service stop` 2. `sudo rm /var/lib/netbird/default.json` (*nix) or `rm C:\ProgramData\netbird\config.json` (Windows) @@ -384,14 +396,14 @@ and following Netbird network resources: - `peer-a`: end user's device running Netbird Client, - `peer-b`: a linux server inside the internal network running Netbird Client, - - it has direct access to the whole `int-net1` IP range, + - it has direct access to the whole `int-net1` IP range, - `users:employees`: a Netbird Group containing `peer-a`, - `routers:int-net1`: a Netbird Group containing `peer-b`, - `access:srv-c`: a Netbird Groups used as a target of ACL rules for `srv-c` only, - `access:int-net1`: a Netbird Groups used as a target of ACL rules for the whole subnet, - `net-a`: a Netbird Network - - `net-a:srv-c`: a Network Resource handling traffic to `10.123.45.17/32` (`srv-c`), - - `net-a:int-net1`: a Network Resource handling traffic to `10.123.45.0/24` (`int-net1`), + - `net-a:srv-c`: a Network Resource handling traffic to `10.123.45.17/32` (`srv-c`), + - `net-a:int-net1`: a Network Resource handling traffic to `10.123.45.0/24` (`int-net1`), - `route:int-net1`: a Netbird Network Route handling traffic to `10.123.45.0/24` (`int-net1`), - `route:srv-c`: a Netbird Network Route handling traffic to `10.123.45.17/32` (`srv-c`), @@ -454,8 +466,8 @@ For Netbird network routing resources configurations you can use either (new) _N A Network `net-a` should have at minimum: - _Network Resource_: `net-a:srv-c` with either of: - - an _Address_ set to `10.123.45.17/32` to configure route to `srv-c` exclusively and nothing else, - - _Assigned Groups_ set to `access:srv-c` + - an _Address_ set to `10.123.45.17/32` to configure route to `srv-c` exclusively and nothing else, + - _Assigned Groups_ set to `access:srv-c` - _Routing Peer Group_ assigned to `routers:int-net1` A _Network Route_ `route:srv-c` should have at least: @@ -501,9 +513,9 @@ Just like with the previous section you can loosen the above example by: - allowing `ALL` protocol, _Ports_ will become greyed out because all traffic will be allowed, - creating a bidirectional rule (both arrows should be green), always true for the protocol `ALL`, - selecting a different source group from the pool assigned to `peer-a`, - - it could be built-in `All` group, but it is discouraged, + - it could be built-in `All` group, but it is discouraged, - selecting a different destination group from the pool assigned to `peer-b`, - - it could be built-in `All` group, but it is discouraged, + - it could be built-in `All` group, but it is discouraged, #### Is `peer-a`'s operating system configured to use the route? @@ -678,7 +690,7 @@ PS C:\Users\kdn> Get-DnsClientNrptPolicy Namespace : .83.100.in-addr.arpa ... NameServers : 100.83.255.254 -.. +... Namespace : .netbird.cloud ... @@ -752,7 +764,7 @@ You can validate whether this is the issue in your setup by performing following 3. resolve the domain, eg: `dscacheutil -q host -a name ` 4. `netbird up` / `Connect` 5. check whether `dscacheutil -q host -a name ` works - - if it doesn't flush the cache and retry + - if it doesn't flush the cache and retry #### Verifying the nameservers are properly registered in Linux operating system @@ -795,3 +807,215 @@ To configure `int-dns2`, while following _Access from `peer-a` to `srv-c`_ secti address range, To test the configuration in practice please refer to previous section _Public nameservers_. + +## Debugging access to Domain Resources + +While we strive to make them "just work", there still are and will be cases of domain-based Resources not behaving +correctly. It can happen for myriad of reason starting with the client's local device management software or system +firewall, through Routing Peer issues (usually a firewall) and ending with a relatively simple Access Policies +misconfiguration and resulting lack of connectivity establishment. +This section will provide general directions for verifying connectivity on every step involved in handling +the Domain Resources, to better understand where issue might lie. + +For in-depth overview of the mechanism please read [Domain Resources](/how-to/networks#domain-resources) section. + +Analyzing those issues will take a "backwards" approach (based on the most common issues), where we will first confirm +that Routing Peer itself is working as expected and will check the client's operating system configuration as one of the +last steps. + +For the remainder of the section let's assume: + +- there is a `*.nb.test` Network Resource configured, +- we are trying to access a `srv.nb.test` domain, +- a `zxc.nb.test` domain does not exist, it's used to demonstrate errors, +- the Routing Peer's NetBird address is `100.83.136.209` + - it's named `brys-vm-nbt-ubuntu-isolated-02`, when referred in the outputs +- the client is named `brys-vm-nbt-ubuntu-01`, when referred in the outputs + - the client is running Ubuntu, but a lot of commands used are working uniformly across all platforms, + - it's IP address is `100.83.73.97`, + - on MacOS & Windows you would use `100.83.255.254` to access the local DNS forwarder instead, +- the Resource is running on `brys-vm-nbt-ubuntu-isolated-01`, when referred to in the outputs +- we will only check the new port `22054`, but steps might need repeating for port `5353` for legacy clients, + + + Be aware that the port `5353` is a well known Multicast DNS port (aka Avahi aka Bonjour, + used for: printer sharing, Chromecast etc.) and therefore it might be occupied by another software + running on the machine. As a result (old) Routing Peers might be prevented from routing Domain Resources. + + While not an issue in the regular server operations, it might come as a surprise to find that the port `5353` + is occupied by a Chrome (and it's derivatives) Web Browser on your remotely accessible Windows Server machine. + + This is the primary reason we have switched to the new port `22054`. We strongly advise you to update your fleet + to the latest version (no older than `0.59.10`) to address this issue. + + +### Is Routing Peer correctly resolving queries? + +While in practice it almost never the issue, it is always good to double-check whether the Routing Peer itself is able +to resolve the requested domain as-is and whether it can access the target resource. + +Please refer +to [Verifying the DNS names resolve properly in practice](#verifying-the-dns-names-resolve-properly-in-practice) +section for operating-system specific commands while adjusting domain for `srv.nb.test`. + +It also would not hurt to check whether the Routing Peer has an actual network access to the routed resource with: + +For TCP services you should see something like this: + +```shell +kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 srv.nb.test 80 +Connection to srv.nb.test (192.168.100.10) 80 port [tcp/http] succeeded! +kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 srv.nb.test 12345 +nc: connect to srv.nb.test (192.168.100.10) port 12345 (tcp) failed: Connection refused +``` + +For UDP you can use: + +```shell +kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 -u srv.nb.test 12345 ; echo $? +Connection to srv.nb.test (192.168.100.10) 12345 port [udp/*] succeeded! +0 +kdn@brys-vm-nbt-ubuntu-01:~$ nc -vz -w 1 -u srv.nb.test 12347 ; echo $? +1 +``` + +### Is the remote DNS resolver accessible to the client? + +We want to confirm that a client Peer can reach and use the Routing Peer's DNS resolver, this step will rule out any +firewall-related issues with the Routing Peer. If the following command fails you will need to open up a port `22054` +in the Routing Peer's firewall software. + +```shell +kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 -port=22054 srv.nb.test 100.83.136.209 +Server: 100.83.136.209 +Address: 100.83.136.209#22054 + +Non-authoritative answer: +Name: srv.nb.test +Address: 192.168.100.10 + +kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 -port=22054 zxc.nb.test 100.83.136.209 +Server: 100.83.136.209 +Address: 100.83.136.209#22054 + +** server can't find zxc.nb.test: NXDOMAIN + +``` + +### Trigger the Domain Resource + +I have yet to see a local DNS forwarder fail, but using it is a good way of forcing the NetBird client to set up +routing for the domain (see the [Domain Resources](/how-to/networks#domain-resources) for explanation). + + + On MacOS & Windows the IP address would always be `100.83.255.254` instead of `100.83.73.97`. + + +Take a note of the IP addresses being initially missing from the routing table (`ip route show` on Linux), but +get added after resolving the domain for the first time using the local DNS Forwarder. + +```shell +kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls +Available Networks: + + - ID: *.nb.test + Domains: *.nb.test + Status: Selected + Resolved IPs: - +kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100 +kdn@brys-vm-nbt-ubuntu-01:~$ nslookup -timeout=1 srv.nb.test 100.83.73.97 +Server: 100.83.73.97 +Address: 100.83.73.97#53 + +Non-authoritative answer: +Name: srv.nb.test +Address: 192.168.100.10 +kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100 +192.168.100.10 dev wt0 table 7120 +kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls +Available Networks: + + - ID: *.nb.test + Domains: *.nb.test + Status: Selected + Resolved IPs: + [srv.nb.test.]: 192.168.100.10 +``` + +### Verifying the Domain Resource registration with the Operating System + +After we have confirmed **everything** is working within NetBird's scope of operation, let's restart NetBird and +check whether the Operating System's default DNS resolver is resolving the Domain Resource correctly. + + + See [Debugging access to network resources > Public nameservers](#public-nameservers) for the equivalent + MacOS and Windows debugging steps. + + + + You might be surprised by a simple `netbird down` followed by `netbird up` not clearing the `Resolved IPs`: + + ```shell + kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird down + Disconnected + kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird up + Connected + kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls + Available Networks: + + - ID: *.nb.test + Domains: *.nb.test + Status: Selected + Resolved IPs: + [srv.nb.test.]: 192.168.100.10 + ``` + + Don't be alarmed, this is working as expected (the results are simply stored within the client daemon's + in-memory cache), but routing rules are still properly cleared: + ```shell + kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100 + kdn@brys-vm-nbt-ubuntu-01:~$ + ``` + + +We will start "from scratch", by restarting the whole NetBird service to purge all caches and proceed with the tests: + +```shell +kdn@brys-vm-nbt-ubuntu-01:~$ sudo netbird service restart +NetBird service has been restarted +kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls +Available Networks: + + - ID: *.nb.test + Domains: *.nb.test + Status: Selected + Resolved IPs: - +kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100 +kdn@brys-vm-nbt-ubuntu-01:~$ resolvectl query srv.nb.test +srv.nb.test: 192.168.100.10 -- link: wt0 + +-- Information acquired via protocol DNS in 8.1ms. +-- Data is authenticated: no; Data was acquired via local or encrypted transport: no +-- Data from: network +kdn@brys-vm-nbt-ubuntu-01:~$ ip route show table all | grep 192.168.100 +192.168.100.10 dev wt0 table 7120 +kdn@brys-vm-nbt-ubuntu-01:~$ netbird networks ls +Available Networks: + + - ID: *.nb.test + Domains: *.nb.test + Status: Selected + Resolved IPs: + [srv.nb.test.]: 192.168.100.10 +``` + + + Be aware that operating system resolver might not be the only source of domains, but querying through it is + a hard requirement for getting Domain Resources to start working. + + Different applications (most notably web browsers) can cache this information internally and therefore never + activate the Domain Resource routing. + + While we can (and do successfully) clear the operating system resolver's caches, there is unfortunately no way to + instruct regular applications to do the same. + \ No newline at end of file