Compare commits

...

8 Commits

Author SHA1 Message Date
Viktor Liu
b7742b15ec Add ServeDNS test for non-retryable EDE short-circuit and OPT stripping 2026-05-06 10:50:29 +02:00
Viktor Liu
daa256b556 Bump miekg/dns to v1.1.72; show EDE on NOERROR; codespell whitelist 'additionals' 2026-05-06 10:48:10 +02:00
Viktor Liu
bea54ef6aa Strip OPT from response when client lacked EDNS0; add tests; show EDE in capture 2026-05-06 10:31:03 +02:00
Viktor Liu
e394818b9f Switch nonRetryableEDE to a map lookup 2026-05-06 10:24:36 +02:00
Viktor Liu
34c3c1a6f0 Whitelist 'ede' in codespell 2026-05-06 10:21:13 +02:00
Viktor Liu
f21965cdd8 Skip upstream failover when SERVFAIL carries a definitive EDE 2026-05-06 10:17:41 +02:00
Nicolas Frati
1795bc801d chores: updated discussions and issues templates (#6073) 2026-05-05 07:53:01 -07:00
Viktor Liu
31395f8bd2 [client] Use fwmark-aware route lookup for raw socket UDP checksum source (#6070)
* Use fwmark-aware route lookup for raw socket UDP checksum source

* Guard nil raw socket in sharedsock WriteTo
2026-05-05 16:18:22 +02:00
14 changed files with 942 additions and 141 deletions

View File

@@ -0,0 +1,130 @@
body:
- type: markdown
attributes:
value: |
## Ideas & Feature Requests
Use this category for feature requests, enhancements, integrations, and product ideas.
NetBird uses community traction in discussions — upvotes, replies, affected users, and use-case detail — as an input when deciding what should become a maintainer-curated issue or roadmap item. A clear problem statement is more useful than a solution-only request.
Please search first and add your use case to an existing discussion when one already exists.
- type: checkboxes
id: preflight
attributes:
label: Before posting
options:
- label: I searched existing discussions and issues for similar requests.
required: true
- label: I checked the documentation to confirm this is not already supported.
required: true
- label: This is a product idea or enhancement request, not a support question.
required: true
- label: I removed or anonymized sensitive details from examples and screenshots.
required: true
- type: dropdown
id: area
attributes:
label: Product area
description: Select every area this request touches.
multiple: true
options:
- Client / Agent
- CLI
- Desktop UI
- Mobile app
- Dashboard / Admin UI
- Management service / API
- Signal service
- Relay
- DNS
- Routes / Exit nodes
- NetBird SSH
- Access control policies
- Posture checks
- Identity provider / SSO
- Self-hosting / Deployment
- Kubernetes / Operator
- Terraform / Automation
- Documentation
- Other / not sure
validations:
required: true
- type: textarea
id: problem
attributes:
label: Problem or use case
description: What are you trying to accomplish, and what is difficult or impossible today?
placeholder: |
As a ...
I want to ...
Because ...
validations:
required: true
- type: textarea
id: proposal
attributes:
label: Proposed solution
description: Describe the behavior, workflow, API, UI, or integration you would like to see.
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternatives or workarounds considered
description: What have you tried today? Why is the current workaround not enough?
- type: textarea
id: impact
attributes:
label: Community impact and priority
description: Help us understand who benefits and how urgent this is.
placeholder: |
- Number of users/teams/peers affected:
- Deployment type: Cloud / self-hosted / both
- Frequency: daily / weekly / occasional
- Blocking production adoption? yes/no
- Related comments, discussions, or customer requests:
validations:
required: true
- type: textarea
id: examples
attributes:
label: Examples from other tools or products
description: If another tool solves this well, link or describe the behavior.
- type: textarea
id: security
attributes:
label: Security, privacy, and compatibility considerations
description: Note any access-control, audit, data retention, network, platform, or backward-compatibility concerns.
- type: textarea
id: implementation
attributes:
label: Implementation ideas
description: Optional. If you are familiar with the codebase or API, share possible implementation notes.
- type: dropdown
id: contribution
attributes:
label: Are you willing to help?
options:
- Yes, I can submit a PR if the approach is accepted.
- Yes, I can test or validate a proposed implementation.
- Yes, I can provide more use-case details.
- Not at this time.
validations:
required: true
- type: textarea
id: additional-context
attributes:
label: Additional context
description: Add screenshots, diagrams, links, or anything else that helps explain the request.

View File

@@ -0,0 +1,237 @@
body:
- type: markdown
attributes:
value: |
## Issue Triage
Use this category for reproducible bugs and regressions in NetBird.
The more context you include, the faster we can validate and act on your report. If you're not sure whether something is a bug, **Q&A / Support** is a good starting point — we can always move the conversation here once we've confirmed it's a product issue.
Intermittent issues are useful too. Include the trigger, frequency, timing, and any logs or debug evidence you have, and we'll work from there.
Please don't include secrets, tokens, private keys, internal hostnames, or public IPs. Security vulnerabilities should be reported through the repository security policy rather than a public discussion.
- type: checkboxes
id: preflight
attributes:
label: Before posting
options:
- label: I searched existing discussions and issues, including closed ones, and checked the relevant docs.
required: true
- label: I believe this is a product bug rather than a configuration or setup question.
required: true
- label: I can reproduce this issue, or for intermittent issues I've included trigger, frequency, and timing details below.
required: true
- label: I removed or anonymized sensitive data from logs, screenshots, and configuration.
required: true
- type: dropdown
id: area
attributes:
label: Affected area
description: Select every area this report touches.
multiple: true
options:
- Client / Agent
- Reverse Proxy
- CLI
- Desktop UI
- Mobile app
- Peer connectivity
- DNS
- Routes / Exit nodes
- NetBird SSH
- Relay / Signal / NAT traversal
- Login / Authentication / IdP
- Dashboard / Admin UI
- Management service / API
- Access control policies / Posture checks
- Self-hosting / Deployment
- Kubernetes / Operator
- Documentation
- Other / not sure
validations:
required: true
- type: dropdown
id: deployment
attributes:
label: Deployment type
options:
- NetBird Cloud
- Self-hosted - quickstart script
- Self-hosted - advanced/custom deployment
- Local development build
- Not sure / environment I do not fully control
validations:
required: true
- type: dropdown
id: platform
attributes:
label: Operating system or environment
description: Select every environment involved in the reproduction.
multiple: true
options:
- Linux
- macOS
- Windows
- Android
- iOS
- FreeBSD
- OpenWRT
- Docker
- Kubernetes
- Synology
- Browser
- Other / not sure
validations:
required: true
- type: textarea
id: version
attributes:
label: NetBird version and upgrade status
description: Run `netbird version` where applicable. For self-hosted deployments, include management, signal, relay, and dashboard versions if available. If you cannot test on a current/supported version, explain why.
placeholder: |
Example:
- Client: 0.30.2
- Management: 0.30.2
- Signal: 0.30.2
- Relay: 0.30.2
- Dashboard: 0.30.2
- Upgrade status: reproduced on current version / cannot upgrade because ...
validations:
required: true
- type: dropdown
id: regression
attributes:
label: Did this work before?
options:
- Yes, this worked before
- No, this never worked
- Not sure
validations:
required: true
- type: textarea
id: regression-details
attributes:
label: Regression details
description: If this worked before, include the last known working version, first known broken version, and any recent upgrade, configuration, network, or IdP changes.
placeholder: |
- Last known working version:
- First known broken version:
- Recent changes:
- type: textarea
id: summary
attributes:
label: Summary
description: Briefly describe the reproducible bug.
placeholder: What is broken?
validations:
required: true
- type: textarea
id: current-behavior
attributes:
label: Current behavior
description: What happens now? Include exact errors, timeouts, UI messages, or failed commands when possible.
validations:
required: true
- type: textarea
id: expected-behavior
attributes:
label: Expected behavior
description: What did you expect to happen instead?
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Steps to reproduce
description: Provide the smallest set of steps that reliably reproduces the bug. If the issue is intermittent, include the trigger, frequency, timing, and relevant timestamps.
placeholder: |
1. Configure ...
2. Run ...
3. Observe ...
For intermittent issues:
- Trigger:
- Frequency:
- Timing/timestamps:
validations:
required: true
- type: textarea
id: environment
attributes:
label: Environment and topology
description: Include the relevant topology and software involved in the reproduction. For UI/docs-only reports, write `N/A` if this does not apply. Use `None`, `Unknown`, or `N/A` where appropriate.
placeholder: |
- Peer A:
- Peer B:
- Same LAN or different networks:
- NAT/CGNAT/corporate firewall/mobile network:
- Other VPN software:
- Firewall, DNS, or endpoint security software:
- Routes, DNS, policies, posture checks, or SSH rules involved:
- IdP, reverse proxy, or browser involved:
validations:
required: true
- type: textarea
id: self-hosted-details
attributes:
label: Self-hosted details, if available
description: Optional. If you use self-hosting and have access to these details, include them. If you do not administer the environment, provide what you know and say what you cannot access.
placeholder: |
- Deployment method: quickstart / Docker Compose / Helm / operator / custom
- Management/signal/relay/dashboard versions:
- Reverse proxy:
- IdP/provider:
- STUN/TURN/coturn/relay details:
- Relevant component logs:
- type: textarea
id: logs
attributes:
label: Logs, status output, or debug evidence
description: |
For client, connectivity, DNS, route, relay/signal, or self-hosted reports, logs are essential — please include anonymized output from `netbird status -dA`, or a debug bundle via `netbird debug for 1m -AS -U`. Debug bundles are automatically deleted after 30 days.
For UI, dashboard, or documentation reports, leave the pre-filled `N/A`.
value: "N/A"
render: shell
validations:
required: true
- type: textarea
id: related-reports
attributes:
label: Related issues or discussions
description: Optional. Link similar reports you found while searching, if any.
placeholder: |
- Related issue/discussion:
- Why this may be the same or different:
- type: textarea
id: impact
attributes:
label: Impact
description: Optional. Help us understand priority. How many users, peers, environments, or workflows are affected? Is there a workaround?
placeholder: |
- Affected users/peers:
- Business or production impact:
- Workaround available:
- type: textarea
id: additional-context
attributes:
label: Additional context
description: Add links to related discussions, issues, docs, screenshots, recordings, or anything else that may help validation.

View File

@@ -0,0 +1,146 @@
body:
- type: markdown
attributes:
value: |
## Q&A / Support
Use this category for questions about configuration, setup, self-hosted deployments, troubleshooting, and general NetBird usage.
This is community support and does not provide an SLA. For NetBird Cloud support, use the official support channel linked from the issue creation page. Please do not post secrets, tokens, private keys, internal hostnames, or public IPs unless you intentionally want them public.
If your question turns into a reproducible product defect, DevRel or a maintainer may ask you to open or move the conversation to Issue Triage.
- type: checkboxes
id: preflight
attributes:
label: Before posting
options:
- label: I searched existing discussions and issues for similar questions.
required: true
- label: I reviewed the relevant NetBird documentation or troubleshooting guide.
required: true
- label: I removed or anonymized sensitive data from logs, screenshots, and configuration.
required: true
- type: dropdown
id: topic
attributes:
label: Topic
multiple: true
options:
- Getting started
- Self-hosting
- Client / Agent
- CLI
- Desktop UI
- Mobile app
- Dashboard / Admin UI
- DNS
- Routes / Exit nodes
- NetBird SSH
- Relay
- Access control policies
- Posture checks
- Identity provider / SSO
- API
- Kubernetes / Operator
- Terraform / Automation
- Documentation
- Other / not sure
validations:
required: true
- type: dropdown
id: deployment
attributes:
label: Deployment type
options:
- NetBird Cloud
- Self-hosted - quickstart script
- Self-hosted - advanced/custom deployment
- Local development build
- Not sure
validations:
required: true
- type: dropdown
id: platform
attributes:
label: Operating system or environment
multiple: true
options:
- Linux
- macOS
- Windows
- Android
- iOS
- FreeBSD
- OpenWRT
- Docker
- Kubernetes
- Synology
- Browser
- Other / not sure
validations:
required: true
- type: input
id: version
attributes:
label: NetBird version
description: Run `netbird version` where applicable. For self-hosted deployments, include component versions if relevant.
placeholder: "Example: client 0.30.2, management 0.30.2"
- type: textarea
id: question
attributes:
label: Question
description: What are you trying to understand or accomplish?
placeholder: Describe your question clearly.
validations:
required: true
- type: textarea
id: goal
attributes:
label: Desired outcome
description: What would a successful answer help you do?
placeholder: |
I want to configure ...
I expected ...
I need help deciding ...
- type: textarea
id: attempted
attributes:
label: What have you tried?
description: Include commands, documentation links, configuration attempts, or troubleshooting steps already tried.
placeholder: |
- Read ...
- Ran ...
- Changed ...
- Observed ...
- type: textarea
id: environment
attributes:
label: Relevant environment details
description: Include redacted topology, IdP/provider, reverse proxy, firewall, DNS, route, policy, or self-hosted setup details that may affect the answer.
placeholder: |
- Deployment:
- Components involved:
- Network/topology:
- Related config:
- type: textarea
id: logs
attributes:
label: Logs or output
description: Optional. Include anonymized logs, command output, screenshots, or `netbird status -dA` if relevant.
render: shell
- type: textarea
id: additional-context
attributes:
label: Additional context
description: Add links, diagrams, screenshots, or other details that may help the community answer.

View File

@@ -1,71 +0,0 @@
---
name: Bug/Issue report
about: Create a report to help us improve
title: ''
labels: ['triage-needed']
assignees: ''
---
**Describe the problem**
A clear and concise description of what the problem is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Are you using NetBird Cloud?**
Please specify whether you use NetBird Cloud or self-host NetBird's control plane.
**NetBird version**
`netbird version`
**Is any other VPN software installed?**
If yes, which one?
**Debug output**
To help us resolve the problem, please attach the following anonymized status output
netbird status -dA
Create and upload a debug bundle, and share the returned file key:
netbird debug for 1m -AS -U
*Uploaded files are automatically deleted after 30 days.*
Alternatively, create the file only and attach it here manually:
netbird debug for 1m -AS
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Additional context**
Add any other context about the problem here.
**Have you tried these troubleshooting steps?**
- [ ] Reviewed [client troubleshooting](https://docs.netbird.io/how-to/troubleshooting-client) (if applicable)
- [ ] Checked for newer NetBird versions
- [ ] Searched for similar issues on GitHub (including closed ones)
- [ ] Restarted the NetBird client
- [ ] Disabled other VPN software
- [ ] Checked firewall settings

View File

@@ -1,14 +1,26 @@
blank_issues_enabled: true
blank_issues_enabled: false
contact_links:
- name: Community Support
- name: Start an Issue Triage discussion
url: https://github.com/netbirdio/netbird/discussions/new?category=issue-triage
about: Report a bug, regression, or unexpected behavior so DevRel can validate it before it becomes an issue.
- name: Propose an idea or feature request
url: https://github.com/netbirdio/netbird/discussions/new?category=ideas-feature-requests
about: Share feature requests, enhancements, and integration ideas for community feedback and prioritization.
- name: Ask a Q&A / Support question
url: https://github.com/netbirdio/netbird/discussions/new?category=q-a-support
about: Get help with setup, configuration, self-hosting, troubleshooting, and general usage.
- name: Security vulnerability disclosure
url: https://github.com/netbirdio/netbird/security/policy
about: Please do not report security vulnerabilities in public issues or discussions.
- name: Community Support Forum
url: https://forum.netbird.io/
about: Community support forum
about: Community support forum.
- name: Cloud Support
url: https://docs.netbird.io/help/report-bug-issues
about: Contact us for support
- name: Client/Connection Troubleshooting
about: Contact NetBird for Cloud support.
- name: Client / Connection Troubleshooting
url: https://docs.netbird.io/help/troubleshooting-client
about: See our client troubleshooting guide for help addressing common issues
about: See the client troubleshooting guide for common connectivity issues.
- name: Self-host Troubleshooting
url: https://docs.netbird.io/selfhosted/troubleshooting
about: See our self-host troubleshooting guide for help addressing common issues
about: See the self-host troubleshooting guide for common deployment issues.

View File

@@ -1,20 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ['feature-request']
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@@ -0,0 +1,128 @@
name: Validated issue
description: Maintainer/DevRel only. Create an issue after a discussion has been validated or for internally validated work.
title: "[Validated]: "
body:
- type: markdown
attributes:
value: |
## Discussion-first issue policy
Issues are maintainer-curated work items. Community reports and feature requests should start in [Discussions](https://github.com/netbirdio/netbird/discussions) so DevRel can validate, reproduce, and route them before engineering time is committed.
Use this form when:
- A discussion has been validated and should become actionable work.
- A maintainer is opening internally validated work that can bypass the discussion-first flow.
Issues opened without a relevant validated discussion or maintainer context may be closed and redirected to Discussions.
- type: checkboxes
id: validation-checks
attributes:
label: Validation checklist
options:
- label: This issue is linked to a validated discussion, or it is being opened directly by a maintainer.
required: true
- label: The report has enough context for engineering to act on it without re-triaging from scratch.
required: true
- label: Sensitive data, secrets, private keys, internal hostnames, and public IPs have been removed or intentionally disclosed.
required: true
- type: dropdown
id: issue-type
attributes:
label: Issue type
options:
- Bug / Regression
- Feature / Enhancement
- Documentation
- Maintenance / Refactor
- Cross-repository coordination
- Other
validations:
required: true
- type: input
id: source-discussion
attributes:
label: Source discussion
description: Link the GitHub Discussion that was validated. Maintainers bypassing the flow can write "Maintainer-created" and explain why below.
placeholder: https://github.com/netbirdio/netbird/discussions/1234
validations:
required: true
- type: input
id: validation-owner
attributes:
label: Validation owner
description: GitHub handle of the DevRel team member or maintainer who validated this work.
placeholder: "@username"
validations:
required: true
- type: dropdown
id: target-repository
attributes:
label: Target repository
description: Where should the implementation work happen?
options:
- netbirdio/netbird
- netbirdio/dashboard
- netbirdio/kubernetes-operator
- netbirdio/docs
- Multiple repositories
- Unknown / needs routing
validations:
required: true
- type: textarea
id: summary
attributes:
label: Summary
description: Concise description of the validated work.
placeholder: What needs to be fixed, changed, documented, or built?
validations:
required: true
- type: textarea
id: evidence
attributes:
label: Validation evidence
description: For bugs, include reproduction status, affected versions, logs, and environment. For features, include community traction, affected users, and alignment notes.
placeholder: |
- Reproduced by:
- Affected versions / platforms:
- Community signal:
- Related logs or screenshots:
validations:
required: true
- type: textarea
id: scope
attributes:
label: Proposed scope
description: Describe what is in scope and, if helpful, what is explicitly out of scope.
placeholder: |
In scope:
- ...
Out of scope:
- ...
validations:
required: true
- type: textarea
id: acceptance-criteria
attributes:
label: Acceptance criteria
description: What must be true for this issue to be closed?
placeholder: |
- [ ] ...
- [ ] ...
validations:
required: true
- type: textarea
id: additional-context
attributes:
label: Additional context
description: Links to related PRs, docs, issues in other repositories, roadmap items, or implementation notes.

View File

@@ -19,7 +19,7 @@ jobs:
- name: codespell
uses: codespell-project/actions-codespell@v2
with:
ignore_words_list: erro,clienta,hastable,iif,groupd,testin,groupe,cros,ans,deriver,te,userA
ignore_words_list: erro,clienta,hastable,iif,groupd,testin,groupe,cros,ans,deriver,te,userA,ede,additionals
skip: go.mod,go.sum,**/proxy/web/**
golangci:
strategy:

View File

@@ -29,6 +29,27 @@ import (
var currentMTU uint16 = iface.DefaultMTU
// nonRetryableEDECodes lists EDE info codes (RFC 8914) for which a SERVFAIL
// from one upstream means another upstream would return the same answer:
// DNSSEC validation outcomes and policy-based blocks. Transient errors
// (network, cached, not ready) are not included.
var nonRetryableEDECodes = map[uint16]struct{}{
dns.ExtendedErrorCodeUnsupportedDNSKEYAlgorithm: {},
dns.ExtendedErrorCodeUnsupportedDSDigestType: {},
dns.ExtendedErrorCodeDNSSECIndeterminate: {},
dns.ExtendedErrorCodeDNSBogus: {},
dns.ExtendedErrorCodeSignatureExpired: {},
dns.ExtendedErrorCodeSignatureNotYetValid: {},
dns.ExtendedErrorCodeDNSKEYMissing: {},
dns.ExtendedErrorCodeRRSIGsMissing: {},
dns.ExtendedErrorCodeNoZoneKeyBitSet: {},
dns.ExtendedErrorCodeNSECMissing: {},
dns.ExtendedErrorCodeBlocked: {},
dns.ExtendedErrorCodeCensored: {},
dns.ExtendedErrorCodeFiltered: {},
dns.ExtendedErrorCodeProhibited: {},
}
func SetCurrentMTU(mtu uint16) {
currentMTU = mtu
}
@@ -243,6 +264,18 @@ func (u *upstreamResolverBase) queryUpstream(parentCtx context.Context, w dns.Re
var t time.Duration
var err error
// Advertise EDNS0 so the upstream may include Extended DNS Errors
// (RFC 8914) in failure responses; we use those to short-circuit
// failover for definitive answers like DNSSEC validation failures.
// Operate on a copy so the inbound request is unchanged: a client that
// did not advertise EDNS0 must not see an OPT in the response.
hadEdns := r.IsEdns0() != nil
reqUp := r
if !hadEdns {
reqUp = r.Copy()
reqUp.SetEdns0(upstreamUDPSize(), false)
}
var startTime time.Time
var upstreamProto *upstreamProtocolResult
func() {
@@ -250,7 +283,7 @@ func (u *upstreamResolverBase) queryUpstream(parentCtx context.Context, w dns.Re
defer cancel()
ctx, upstreamProto = contextWithupstreamProtocolResult(ctx)
startTime = time.Now()
rm, t, err = u.upstreamClient.exchange(ctx, upstream.String(), r)
rm, t, err = u.upstreamClient.exchange(ctx, upstream.String(), reqUp)
}()
if err != nil {
@@ -262,13 +295,49 @@ func (u *upstreamResolverBase) queryUpstream(parentCtx context.Context, w dns.Re
}
if rm.Rcode == dns.RcodeServerFailure || rm.Rcode == dns.RcodeRefused {
if code, ok := nonRetryableEDE(rm); ok {
resutil.SetMeta(w, "ede", edeName(code))
if !hadEdns {
stripOPT(rm)
}
u.writeSuccessResponse(w, rm, upstream, r.Question[0].Name, t, upstreamProto, logger)
return nil
}
return &upstreamFailure{upstream: upstream, reason: dns.RcodeToString[rm.Rcode]}
}
if !hadEdns {
stripOPT(rm)
}
u.writeSuccessResponse(w, rm, upstream, r.Question[0].Name, t, upstreamProto, logger)
return nil
}
// upstreamUDPSize returns the EDNS0 UDP buffer size we advertise to upstreams,
// derived from the tunnel MTU and bounded against underflow.
func upstreamUDPSize() uint16 {
if currentMTU > ipUDPHeaderSize {
return currentMTU - ipUDPHeaderSize
}
return dns.MinMsgSize
}
// stripOPT removes any OPT pseudo-RRs from the response's Extra section so
// the response complies with RFC 6891 when the client did not advertise EDNS0.
func stripOPT(rm *dns.Msg) {
if len(rm.Extra) == 0 {
return
}
out := rm.Extra[:0]
for _, rr := range rm.Extra {
if _, ok := rr.(*dns.OPT); ok {
continue
}
out = append(out, rr)
}
rm.Extra = out
}
func (u *upstreamResolverBase) handleUpstreamError(err error, upstream netip.AddrPort, startTime time.Time) *upstreamFailure {
if !errors.Is(err, context.DeadlineExceeded) && !isTimeout(err) {
return &upstreamFailure{upstream: upstream, reason: err.Error()}
@@ -330,6 +399,34 @@ func formatFailures(failures []upstreamFailure) string {
return strings.Join(parts, ", ")
}
// nonRetryableEDE returns the first non-retryable EDE code carried in the
// response, if any.
func nonRetryableEDE(rm *dns.Msg) (uint16, bool) {
opt := rm.IsEdns0()
if opt == nil {
return 0, false
}
for _, o := range opt.Option {
ede, ok := o.(*dns.EDNS0_EDE)
if !ok {
continue
}
if _, ok := nonRetryableEDECodes[ede.InfoCode]; ok {
return ede.InfoCode, true
}
}
return 0, false
}
// edeName returns a human-readable name for an EDE code, falling back to
// the numeric code when unknown.
func edeName(code uint16) string {
if name, ok := dns.ExtendedErrorCodeToString[code]; ok {
return name
}
return fmt.Sprintf("EDE %d", code)
}
// ProbeAvailability tests all upstream servers simultaneously and
// disables the resolver if none work
func (u *upstreamResolverBase) ProbeAvailability(ctx context.Context) {

View File

@@ -770,3 +770,132 @@ func TestExchangeWithFallback_TCPTruncatesToClientSize(t *testing.T) {
assert.Less(t, len(rm2.Answer), 20, "small EDNS0 client should get fewer records")
assert.True(t, rm2.Truncated, "response should be truncated for small buffer client")
}
func msgWithEDE(rcode int, codes ...uint16) *dns.Msg {
m := new(dns.Msg)
m.Response = true
m.Rcode = rcode
if len(codes) == 0 {
return m
}
opt := &dns.OPT{Hdr: dns.RR_Header{Name: ".", Rrtype: dns.TypeOPT}}
opt.SetUDPSize(dns.MinMsgSize)
for _, c := range codes {
opt.Option = append(opt.Option, &dns.EDNS0_EDE{InfoCode: c})
}
m.Extra = append(m.Extra, opt)
return m
}
func TestNonRetryableEDE(t *testing.T) {
tests := []struct {
name string
msg *dns.Msg
wantOK bool
wantCode uint16
}{
{name: "no edns0", msg: msgWithEDE(dns.RcodeServerFailure)},
{
name: "opt without ede",
msg: func() *dns.Msg {
m := msgWithEDE(dns.RcodeServerFailure)
opt := &dns.OPT{Hdr: dns.RR_Header{Name: ".", Rrtype: dns.TypeOPT}}
opt.Option = append(opt.Option, &dns.EDNS0_NSID{Code: dns.EDNS0NSID})
m.Extra = []dns.RR{opt}
return m
}(),
},
{name: "ede dnsbogus", msg: msgWithEDE(dns.RcodeServerFailure, dns.ExtendedErrorCodeDNSBogus), wantOK: true, wantCode: dns.ExtendedErrorCodeDNSBogus},
{name: "ede signature expired", msg: msgWithEDE(dns.RcodeServerFailure, dns.ExtendedErrorCodeSignatureExpired), wantOK: true, wantCode: dns.ExtendedErrorCodeSignatureExpired},
{name: "ede blocked", msg: msgWithEDE(dns.RcodeServerFailure, dns.ExtendedErrorCodeBlocked), wantOK: true, wantCode: dns.ExtendedErrorCodeBlocked},
{name: "ede prohibited", msg: msgWithEDE(dns.RcodeRefused, dns.ExtendedErrorCodeProhibited), wantOK: true, wantCode: dns.ExtendedErrorCodeProhibited},
{name: "ede cached error retryable", msg: msgWithEDE(dns.RcodeServerFailure, dns.ExtendedErrorCodeCachedError)},
{name: "ede network error retryable", msg: msgWithEDE(dns.RcodeServerFailure, dns.ExtendedErrorCodeNetworkError)},
{name: "ede not ready retryable", msg: msgWithEDE(dns.RcodeServerFailure, dns.ExtendedErrorCodeNotReady)},
{
name: "first non-retryable wins",
msg: msgWithEDE(dns.RcodeServerFailure, dns.ExtendedErrorCodeNetworkError, dns.ExtendedErrorCodeDNSBogus),
wantOK: true,
wantCode: dns.ExtendedErrorCodeDNSBogus,
},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
code, ok := nonRetryableEDE(tc.msg)
assert.Equal(t, tc.wantOK, ok, "ok should match")
if tc.wantOK {
assert.Equal(t, tc.wantCode, code, "code should match")
}
})
}
}
func TestEDEName(t *testing.T) {
assert.Equal(t, "DNSSEC Bogus", edeName(dns.ExtendedErrorCodeDNSBogus))
assert.Equal(t, "Signature Expired", edeName(dns.ExtendedErrorCodeSignatureExpired))
assert.Equal(t, "EDE 9999", edeName(9999), "unknown code falls back to numeric")
}
func TestStripOPT(t *testing.T) {
rm := &dns.Msg{
Extra: []dns.RR{
&dns.OPT{Hdr: dns.RR_Header{Name: ".", Rrtype: dns.TypeOPT}},
&dns.A{Hdr: dns.RR_Header{Name: "x.", Rrtype: dns.TypeA}, A: net.IPv4(1, 2, 3, 4)},
},
}
stripOPT(rm)
assert.Len(t, rm.Extra, 1, "OPT should be removed, A kept")
_, isOPT := rm.Extra[0].(*dns.OPT)
assert.False(t, isOPT, "remaining record must not be OPT")
}
func TestUpstreamResolver_NonRetryableEDEShortCircuits(t *testing.T) {
upstream1 := netip.MustParseAddrPort("192.0.2.1:53")
upstream2 := netip.MustParseAddrPort("192.0.2.2:53")
servfailWithEDE := msgWithEDE(dns.RcodeServerFailure, dns.ExtendedErrorCodeDNSBogus)
successResp := buildMockResponse(dns.RcodeSuccess, "192.0.2.100")
var queried []string
tracking := &trackingMockClient{
inner: &mockUpstreamResolverPerServer{
responses: map[string]mockUpstreamResponse{
upstream1.String(): {msg: servfailWithEDE},
upstream2.String(): {msg: successResp},
},
rtt: time.Millisecond,
},
queriedUpstreams: &queried,
}
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
resolver := &upstreamResolverBase{
ctx: ctx,
upstreamClient: tracking,
upstreamServers: []netip.AddrPort{upstream1, upstream2},
upstreamTimeout: UpstreamTimeout,
}
var written *dns.Msg
w := &test.MockResponseWriter{
WriteMsgFunc: func(m *dns.Msg) error {
written = m
return nil
},
}
// Client query without EDNS0 must not see an OPT in the response.
q := new(dns.Msg).SetQuestion("example.com.", dns.TypeA)
resolver.ServeDNS(w, q)
require.NotNil(t, written, "response must be written")
assert.Equal(t, dns.RcodeServerFailure, written.Rcode, "SERVFAIL must propagate")
assert.Len(t, queried, 1, "only first upstream should be queried")
assert.Equal(t, upstream1.String(), queried[0])
for _, rr := range written.Extra {
_, isOPT := rr.(*dns.OPT)
assert.False(t, isOPT, "synthetic OPT must not leak to a non-EDNS0 client")
}
}

2
go.mod
View File

@@ -72,7 +72,7 @@ require (
github.com/lrh3321/ipset-go v0.0.0-20250619021614-54a0a98ace81
github.com/mdlayher/socket v0.5.1
github.com/mdp/qrterminal/v3 v3.2.1
github.com/miekg/dns v1.1.59
github.com/miekg/dns v1.1.72
github.com/mitchellh/hashstructure/v2 v2.0.2
github.com/netbirdio/management-integrations/integrations v0.0.0-20260416123949-2355d972be42
github.com/netbirdio/signal-dispatcher/dispatcher v0.0.0-20250805121659-6b4ac470ca45

4
go.sum
View File

@@ -421,8 +421,8 @@ github.com/mdp/qrterminal/v3 v3.2.1 h1:6+yQjiiOsSuXT5n9/m60E54vdgFsw0zhADHhHLrFe
github.com/mdp/qrterminal/v3 v3.2.1/go.mod h1:jOTmXvnBsMy5xqLniO0R++Jmjs2sTm9dFSuQ5kpz/SU=
github.com/mholt/acmez/v2 v2.0.1 h1:3/3N0u1pLjMK4sNEAFSI+bcvzbPhRpY383sy1kLHJ6k=
github.com/mholt/acmez/v2 v2.0.1/go.mod h1:fX4c9r5jYwMyMsC+7tkYRxHibkOTgta5DIFGoe67e1U=
github.com/miekg/dns v1.1.59 h1:C9EXc/UToRwKLhK5wKU/I4QVsBUc8kE6MkHBkeypWZs=
github.com/miekg/dns v1.1.59/go.mod h1:nZpewl5p6IvctfgrckopVx2OlSEHPRO/U4SYkRklrEk=
github.com/miekg/dns v1.1.72 h1:vhmr+TF2A3tuoGNkLDFK9zi36F2LS+hKTRW0Uf8kbzI=
github.com/miekg/dns v1.1.72/go.mod h1:+EuEPhdHOsfk6Wk5TT2CzssZdqkmFhf8r+aVyDEToIs=
github.com/mikioh/ipaddr v0.0.0-20190404000644-d465c8ab6721 h1:RlZweED6sbSArvlE924+mUcZuXKLBHA35U7LN621Bws=
github.com/mikioh/ipaddr v0.0.0-20190404000644-d465c8ab6721/go.mod h1:Ickgr2WtCLZ2MDGd4Gr0geeCH5HybhRJbonOgQpvSxc=
github.com/mitchellh/copystructure v1.2.0 h1:vpKXTN4ewci03Vljg/q9QvCGUDttBOGBIa15WveJJGw=

View File

@@ -10,15 +10,13 @@ import (
"context"
"fmt"
"net"
"sync"
"time"
"github.com/google/gopacket"
"github.com/google/gopacket/layers"
"github.com/google/gopacket/routing"
"github.com/libp2p/go-netroute"
"github.com/mdlayher/socket"
log "github.com/sirupsen/logrus"
"github.com/vishvananda/netlink"
"golang.org/x/sync/errgroup"
"golang.org/x/sys/unix"
@@ -37,8 +35,6 @@ type SharedSocket struct {
conn6 *socket.Conn
port int
mtu uint16
routerMux sync.RWMutex
router routing.Router
packetDemux chan rcvdPacket
cancel context.CancelFunc
}
@@ -82,11 +78,6 @@ func Listen(port int, filter BPFFilter, mtu uint16) (_ net.PacketConn, err error
}
}()
rawSock.router, err = netroute.New()
if err != nil {
return nil, fmt.Errorf("failed to create raw socket router: %w", err)
}
rawSock.conn4, err = socket.Socket(unix.AF_INET, unix.SOCK_RAW, unix.IPPROTO_UDP, "raw_udp4", nil)
if err != nil {
return nil, fmt.Errorf("failed to create ipv4 raw socket: %w", err)
@@ -127,31 +118,26 @@ func Listen(port int, filter BPFFilter, mtu uint16) (_ net.PacketConn, err error
go rawSock.read(rawSock.conn6.Recvfrom)
}
go rawSock.updateRouter()
return rawSock, nil
}
// updateRouter updates the listener routing table client
// this is needed to avoid outdated information across different client networks
func (s *SharedSocket) updateRouter() {
ticker := time.NewTicker(15 * time.Second)
defer ticker.Stop()
for {
select {
case <-s.ctx.Done():
return
case <-ticker.C:
router, err := netroute.New()
if err != nil {
log.Errorf("Failed to create and update packet router for stunListener: %s", err)
continue
}
s.routerMux.Lock()
s.router = router
s.routerMux.Unlock()
// resolveSrc returns the source IP the kernel will pick for a packet sent to
// dst by these raw sockets, mirroring the fwmark the kernel will see on send.
func (s *SharedSocket) resolveSrc(dst net.IP) (net.IP, error) {
opts := &netlink.RouteGetOptions{}
if nbnet.AdvancedRouting() {
opts.Mark = nbnet.ControlPlaneMark
}
routes, err := netlink.RouteGetWithOptions(dst, opts)
if err != nil {
return nil, fmt.Errorf("route get %s: %w", dst, err)
}
for _, r := range routes {
if r.Src != nil {
return r.Src, nil
}
}
return nil, fmt.Errorf("no source IP for %s", dst)
}
// LocalAddr returns the local address, preferring IPv4 for backward compatibility.
@@ -310,15 +296,15 @@ func (s *SharedSocket) WriteTo(buf []byte, rAddr net.Addr) (n int, err error) {
DstPort: layers.UDPPort(rUDPAddr.Port),
}
s.routerMux.RLock()
defer s.routerMux.RUnlock()
_, _, src, err := s.router.Route(rUDPAddr.IP)
src, err := s.resolveSrc(rUDPAddr.IP)
if err != nil {
return 0, fmt.Errorf("got an error while checking route, err: %w", err)
return 0, fmt.Errorf("resolve source for %s: %w", rUDPAddr.IP, err)
}
rSockAddr, conn, nwLayer := s.getWriterObjects(src, rUDPAddr.IP)
if conn == nil {
return 0, fmt.Errorf("no raw socket for %s", rUDPAddr.IP)
}
if err := udp.SetNetworkLayerForChecksum(nwLayer); err != nil {
return -1, fmt.Errorf("failed to set network layer for checksum: %w", err)

View File

@@ -10,6 +10,7 @@ import (
"github.com/google/gopacket"
"github.com/google/gopacket/layers"
"github.com/miekg/dns"
)
// TextWriter writes human-readable one-line-per-packet summaries.
@@ -592,19 +593,45 @@ func formatDNSResponse(d *layers.DNS, rd string, plen int) string {
anCount := d.ANCount
nsCount := d.NSCount
arCount := d.ARCount
ede := formatEDE(d)
if d.ResponseCode != layers.DNSResponseCodeNoErr {
return fmt.Sprintf("%04x %d/%d/%d %s (%d)", d.ID, anCount, nsCount, arCount, d.ResponseCode, plen)
return fmt.Sprintf("%04x %d/%d/%d %s%s (%d)", d.ID, anCount, nsCount, arCount, d.ResponseCode, ede, plen)
}
if anCount > 0 && len(d.Answers) > 0 {
rr := d.Answers[0]
if rdata := shortRData(&rr); rdata != "" {
return fmt.Sprintf("%04x %d/%d/%d %s %s (%d)", d.ID, anCount, nsCount, arCount, rr.Type, rdata, plen)
return fmt.Sprintf("%04x %d/%d/%d %s %s%s (%d)", d.ID, anCount, nsCount, arCount, rr.Type, rdata, ede, plen)
}
}
return fmt.Sprintf("%04x %d/%d/%d (%d)", d.ID, anCount, nsCount, arCount, plen)
return fmt.Sprintf("%04x %d/%d/%d%s (%d)", d.ID, anCount, nsCount, arCount, ede, plen)
}
// dnsOPTCodeEDE is the EDNS0 option code for Extended DNS Errors (RFC 8914).
const dnsOPTCodeEDE layers.DNSOptionCode = layers.DNSOptionCode(dns.EDNS0EDE)
// formatEDE returns " EDE=Name" for the first Extended DNS Error option
// found in the response, or empty string if none is present.
func formatEDE(d *layers.DNS) string {
for _, rr := range d.Additionals {
if rr.Type != layers.DNSTypeOPT {
continue
}
for _, opt := range rr.OPT {
if opt.Code != dnsOPTCodeEDE || len(opt.Data) < 2 {
continue
}
info := binary.BigEndian.Uint16(opt.Data[:2])
name, ok := dns.ExtendedErrorCodeToString[info]
if !ok {
name = fmt.Sprintf("%d", info)
}
return " EDE=" + name
}
}
return ""
}
func shortRData(rr *layers.DNSResourceRecord) string {