fix(ci): distinguish workarounds from actual fixes in system prompt

chore(ci): switch back to gpt-4o-mini for higher quota
fix(ci): cap retry-after and handle quota exhaustion gracefully
2026-04-29 05:36:39 +00:00 · 2026-04-28 17:48:43 +02:00 · 2026-04-28 17:44:05 +02:00 · 2026-04-28 17:42:43 +02:00 · 2026-04-28 17:30:18 +02:00 · 2026-04-28 16:38:36 +02:00
14 changed files with 783 additions and 26 deletions
--- a/.github/issue-resolution/package.json
+++ b/.github/issue-resolution/package.json
@@ -0,0 +1,5 @@
 {
  "name": "issue-resolution",
  "private": true,
  "type": "module"
 }
--- a/.github/issue-resolution/prompts/issue-resolution-system.txt
+++ b/.github/issue-resolution/prompts/issue-resolution-system.txt
@@ -0,0 +1,32 @@
 You are a GitHub issue resolution classifier.
 Your job is to decide whether an open GitHub issue is:
 - AUTO_CLOSE
 - MANUAL_REVIEW
 - KEEP_OPEN
 Rules:
 1. AUTO_CLOSE is only allowed if there is objective, hard evidence:
   - a merged linked PR that clearly resolves the issue, or
   - an explicit maintainer/member/owner/collaborator comment saying the issue is fixed, resolved, duplicate, or superseded
 2. If there is any contradictory later evidence, do NOT AUTO_CLOSE.
 3. If evidence is promising but not airtight, choose MANUAL_REVIEW.
 4. If the issue still appears active or unresolved, choose KEEP_OPEN.
 5. Do not invent evidence.
 6. Output valid JSON only.
 Maintainer-authoritative roles:
 - MEMBER
 - OWNER
 - COLLABORATOR
 Workarounds vs. actual fixes:
 - A WORKAROUND is when a user changes their own setup to avoid the problem (editing configs, using a different setting, manual SQL fixes, switching tools). Workarounds do NOT count as resolution — the underlying issue is still present in the product.
 - An ACTUAL FIX is when a user reports the problem went away after upgrading to a specific version (e.g., "fixed after updating to v0.65.1") or after a specific PR was merged. This suggests the fix was shipped in the product itself.
 - If only workarounds exist and no maintainer has confirmed a fix, classify as KEEP_OPEN.
 - If a user reports an actual fix via a version upgrade but no maintainer confirmed it, classify as MANUAL_REVIEW (not AUTO_CLOSE).
 Important:
 - Later comments outweigh earlier ones.
 - A non-maintainer saying "fixed for me" is not enough for AUTO_CLOSE.
 - If uncertain, prefer MANUAL_REVIEW or KEEP_OPEN.
--- a/.github/issue-resolution/schemas/issue-resolution-output.json
+++ b/.github/issue-resolution/schemas/issue-resolution-output.json
@@ -0,0 +1,80 @@
 {
  "type": "object",
  "additionalProperties": false,
  "required": [
    "decision",
    "reason_code",
    "confidence",
    "hard_signals",
    "contradictions",
    "summary",
    "close_comment",
    "manual_review_note"
  ],
  "properties": {
    "decision": {
      "type": "string",
      "enum": ["AUTO_CLOSE", "MANUAL_REVIEW", "KEEP_OPEN"]
    },
    "reason_code": {
      "type": "string",
      "enum": [
        "resolved_by_merged_pr",
        "maintainer_confirmed_resolved",
        "duplicate_confirmed",
        "superseded_confirmed",
        "likely_fixed_but_unconfirmed",
        "still_open",
        "unclear"
      ]
    },
    "confidence": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    },
    "hard_signals": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": ["type", "url"],
        "properties": {
          "type": {
            "type": "string",
            "enum": [
              "merged_pr",
              "maintainer_comment",
              "duplicate_reference",
              "superseded_reference"
            ]
          },
          "url": { "type": "string" }
        }
      }
    },
    "contradictions": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": ["type", "url"],
        "properties": {
          "type": {
            "type": "string",
            "enum": [
              "reporter_still_broken",
              "later_unresolved_comment",
              "ambiguous_pr_link",
              "other"
            ]
          },
          "url": { "type": "string" }
        }
      }
    },
    "summary": { "type": "string" },
    "close_comment": { "type": "string" },
    "manual_review_note": { "type": "string" }
  }
 }
--- a/.github/issue-resolution/scripts/apply-decisions.mjs
+++ b/.github/issue-resolution/scripts/apply-decisions.mjs
@@ -0,0 +1,155 @@
 import fs from "node:fs/promises";
 const decisions = JSON.parse(await fs.readFile("decisions.json", "utf8"));
 const dryRun = String(process.env.DRY_RUN).toLowerCase() === "true";
 const headers = {
  Authorization: `Bearer ${process.env.GH_TOKEN}`,
  Accept: "application/vnd.github+json",
  "X-GitHub-Api-Version": "2022-11-28",
 };
 async function rest(url, method = "GET", body) {
  const res = await fetch(url, {
    method,
    headers,
    body: body ? JSON.stringify(body) : undefined
  });
  if (!res.ok) throw new Error(`${res.status} ${url}: ${await res.text()}`);
  return res.status === 204 ? null : res.json();
 }
 async function graphql(query, variables) {
  const res = await fetch("https://api.github.com/graphql", {
    method: "POST",
    headers,
    body: JSON.stringify({ query, variables })
  });
  if (!res.ok) throw new Error(`${res.status}: ${await res.text()}`);
  const json = await res.json();
  if (json.errors) throw new Error(JSON.stringify(json.errors));
  return json.data;
 }
 async function addLabel(owner, repo, issueNumber, labels) {
  return rest(
    `https://api.github.com/repos/${owner}/${repo}/issues/${issueNumber}/labels`,
    "POST",
    { labels }
  );
 }
 async function addComment(owner, repo, issueNumber, body) {
  return rest(
    `https://api.github.com/repos/${owner}/${repo}/issues/${issueNumber}/comments`,
    "POST",
    { body }
  );
 }
 async function closeIssue(owner, repo, issueNumber) {
  return rest(
    `https://api.github.com/repos/${owner}/${repo}/issues/${issueNumber}`,
    "PATCH",
    { state: "closed", state_reason: "completed" }
  );
 }
 async function getIssueNodeId(owner, repo, issueNumber) {
  const issue = await rest(`https://api.github.com/repos/${owner}/${repo}/issues/${issueNumber}`);
  return issue.node_id;
 }
 async function addToProject(issueNodeId) {
  const mutation = `
    mutation($projectId: ID!, $contentId: ID!) {
      addProjectV2ItemById(input: {projectId: $projectId, contentId: $contentId}) {
        item { id }
      }
    }
  `;
  try {
    const data = await graphql(mutation, {
      projectId: process.env.PROJECT_ID,
      contentId: issueNodeId
    });
    return data.addProjectV2ItemById.item.id;
  } catch (err) {
    console.warn(`[WARN] Could not add to project (needs PAT with project scope): ${err.message}`);
    return null;
  }
 }
 async function setTextField(itemId, fieldId, value) {
  const mutation = `
    mutation($projectId: ID!, $itemId: ID!, $fieldId: ID!, $value: String!) {
      updateProjectV2ItemFieldValue(input: {
        projectId: $projectId,
        itemId: $itemId,
        fieldId: $fieldId,
        value: { text: $value }
      }) {
        projectV2Item { id }
      }
    }
  `;
  return graphql(mutation, {
    projectId: process.env.PROJECT_ID,
    itemId,
    fieldId,
    value
  });
 }
 for (const d of decisions) {
  const [owner, repo] = d.repository.split("/");
  if (dryRun) {
    console.log(`[DRY RUN] #${d.issue_number} → ${d.final_decision} (confidence: ${d.model.confidence}, reason: ${d.model.reason_code})`);
    continue;
  }
  if (d.final_decision === "AUTO_CLOSE") {
    await addLabel(owner, repo, d.issue_number, ["auto-closed-resolved"]);
    await addComment(owner, repo, d.issue_number, d.model.close_comment);
    await closeIssue(owner, repo, d.issue_number);
  }
  if (d.final_decision === "MANUAL_REVIEW") {
    await addLabel(owner, repo, d.issue_number, ["resolution-candidate"]);
    const issueNodeId = await getIssueNodeId(owner, repo, d.issue_number);
    const itemId = await addToProject(issueNodeId);
    if (itemId) {
      if (process.env.PROJECT_CONFIDENCE_FIELD_ID) {
        await setTextField(itemId, process.env.PROJECT_CONFIDENCE_FIELD_ID, String(d.model.confidence));
      }
      if (process.env.PROJECT_REASON_FIELD_ID) {
        await setTextField(itemId, process.env.PROJECT_REASON_FIELD_ID, d.model.reason_code);
      }
      if (process.env.PROJECT_EVIDENCE_FIELD_ID) {
        await setTextField(itemId, process.env.PROJECT_EVIDENCE_FIELD_ID, d.issue_url);
      }
      if (process.env.PROJECT_LINKED_PR_FIELD_ID) {
        const linked = (d.model.hard_signals || []).map(x => x.url).join(", ");
        if (linked) {
          await setTextField(itemId, process.env.PROJECT_LINKED_PR_FIELD_ID, linked);
        }
      }
      if (process.env.PROJECT_REPO_FIELD_ID) {
        await setTextField(itemId, process.env.PROJECT_REPO_FIELD_ID, d.repository);
      }
    }
    await addComment(
      owner,
      repo,
      d.issue_number,
      d.model.manual_review_note ||
        "This issue looks like a possible resolution candidate, but not with enough certainty for automatic closure. Added to the review queue."
    );
  }
 }
--- a/.github/issue-resolution/scripts/classify-candidates.mjs
+++ b/.github/issue-resolution/scripts/classify-candidates.mjs
@@ -0,0 +1,244 @@
 import fs from "node:fs/promises";
 const candidates = JSON.parse(await fs.readFile("candidates.json", "utf8"));
 const systemPrompt = await fs.readFile("prompts/issue-resolution-system.txt", "utf8");
 const outputSchema = JSON.parse(await fs.readFile("schemas/issue-resolution-output.json", "utf8"));
 function isMaintainerRole(role) {
  return ["MEMBER", "OWNER", "COLLABORATOR"].includes(role || "");
 }
 function preScore(candidate) {
  let score = 0;
  const hardSignals = [];
  const contradictions = [];
  for (const t of candidate.timeline) {
    const sourceIssue = t.source?.issue;
    if (t.event === "cross-referenced" && sourceIssue?.pull_request?.html_url) {
      hardSignals.push({
        type: "merged_pr",
        url: sourceIssue.html_url
      });
      score += 40; // provisional until PR merged state is verified
    }
    if (["referenced", "connected"].includes(t.event)) {
      score += 10;
    }
  }
  for (const c of candidate.comments) {
    const body = c.body.toLowerCase();
    if (
      isMaintainerRole(c.author_association) &&
      /\b(fixed|resolved|duplicate|superseded|closing)\b/.test(body)
    ) {
      score += 25;
      hardSignals.push({
        type: "maintainer_comment",
        url: c.html_url
      });
    }
    if (/\b(still broken|still happening|not fixed|reproducible)\b/.test(body)) {
      score -= 50;
      contradictions.push({
        type: "later_unresolved_comment",
        url: c.html_url
      });
    }
  }
  return { score, hardSignals, contradictions };
 }
 // GitHub Models gpt-4o has an 8000 token input limit.
 // Reserve ~2000 tokens for system prompt + response overhead.
 // 1 token ~= 4 chars, so cap user message at ~24000 chars.
 const MAX_USER_MESSAGE_CHARS = 24000;
 function truncate(text, maxChars) {
  if (text.length <= maxChars) return text;
  return text.slice(0, maxChars) + "\n\n[... truncated due to length]";
 }
 function buildUserMessage(candidate) {
  const { issue, comments, timeline } = candidate;
  const commentBlock = comments
    .map((c) => `[${c.author_association}] ${c.user} (${c.created_at}):\n${c.body}`)
    .join("\n---\n");
  const timelineBlock = timeline
    .filter((t) => ["cross-referenced", "referenced", "connected", "closed", "reopened"].includes(t.event))
    .map((t) => {
      let line = `${t.event} (${t.created_at})`;
      if (t.source?.issue?.html_url) line += ` — ${t.source.issue.html_url}`;
      if (t.source?.issue?.pull_request?.html_url) line += ` (PR: ${t.source.issue.pull_request.html_url})`;
      return line;
    })
    .join("\n");
  const msg = [
    `## Issue #${issue.number}: ${issue.title}`,
    `URL: ${issue.html_url}`,
    `Created: ${issue.created_at} | Updated: ${issue.updated_at}`,
    `Labels: ${issue.labels.join(", ") || "none"}`,
    "",
    "### Body",
    truncate(issue.body || "(empty)", 4000),
    "",
    "### Comments",
    commentBlock || "(none)",
    "",
    "### Timeline events",
    timelineBlock || "(none)",
  ].join("\n");
  return truncate(msg, MAX_USER_MESSAGE_CHARS);
 }
 const MODEL = "gpt-4o-mini";
 const MAX_RETRIES = 5;
 function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
 }
 async function callGitHubModel(candidate) {
  const body = JSON.stringify({
    model: MODEL,
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: buildUserMessage(candidate) },
    ],
    response_format: {
      type: "json_schema",
      json_schema: {
        name: "issue_resolution",
        strict: true,
        schema: outputSchema,
      },
    },
    temperature: 0.1,
  });
  for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
    const res = await fetch("https://models.inference.ai.azure.com/chat/completions", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${process.env.GH_TOKEN}`,
        "Content-Type": "application/json",
      },
      body,
    });
    if (res.status === 429) {
      const retryAfter = Number(res.headers.get("retry-after")) || 30;
      if (retryAfter > 120) {
        console.warn(`  [QUOTA EXHAUSTED] API wants ${retryAfter}s wait — skipping remaining issues.`);
        return null;
      }
      console.warn(`  [RATE LIMITED] Waiting ${retryAfter}s (attempt ${attempt + 1}/${MAX_RETRIES})...`);
      await sleep(retryAfter * 1000);
      continue;
    }
    if (!res.ok) {
      const text = await res.text();
      throw new Error(`GitHub Models ${res.status}: ${text}`);
    }
    const data = await res.json();
    return JSON.parse(data.choices[0].message.content);
  }
  throw new Error(`GitHub Models: exceeded ${MAX_RETRIES} retries due to rate limiting`);
 }
 function enforcePolicy(modelOut, pre) {
  const approvedReasons = new Set([
    "resolved_by_merged_pr",
    "maintainer_confirmed_resolved",
    "duplicate_confirmed",
    "superseded_confirmed"
  ]);
  const hasHardSignal =
    (modelOut.hard_signals || []).some(s =>
      ["merged_pr", "maintainer_comment", "duplicate_reference", "superseded_reference"].includes(s.type)
    ) || pre.hardSignals.length > 0;
  const hasContradiction =
    (modelOut.contradictions || []).length > 0 || pre.contradictions.length > 0;
  if (
    modelOut.decision === "AUTO_CLOSE" &&
    modelOut.confidence >= 0.97 &&
    approvedReasons.has(modelOut.reason_code) &&
    hasHardSignal &&
    !hasContradiction
  ) {
    return "AUTO_CLOSE";
  }
  if (modelOut.decision === "KEEP_OPEN" && pre.score < 25) {
    return "KEEP_OPEN";
  }
  if (
    modelOut.decision === "MANUAL_REVIEW" ||
    modelOut.decision === "AUTO_CLOSE" ||
    pre.score >= 25
  ) {
    return "MANUAL_REVIEW";
  }
  return "KEEP_OPEN";
 }
 console.log(`Classifying ${candidates.length} candidates with ${MODEL}...\n`);
 // 15 req/min limit → 1 request every 4s. Use 4.5s for safety margin.
 const PACE_MS = 4500;
 let lastRequestTime = 0;
 async function paced(fn) {
  const elapsed = Date.now() - lastRequestTime;
  if (elapsed < PACE_MS) await sleep(PACE_MS - elapsed);
  lastRequestTime = Date.now();
  return fn();
 }
 const decisions = [];
 for (const candidate of candidates) {
  const pre = preScore(candidate);
  const modelOut = await paced(() => callGitHubModel(candidate));
  if (modelOut === null) {
    console.warn(`\nQuota exhausted after ${decisions.length} issues. Writing partial results.`);
    break;
  }
  const finalDecision = enforcePolicy(modelOut, pre);
  decisions.push({
    repository: candidate.repository,
    issue_number: candidate.issue.number,
    issue_url: candidate.issue.html_url,
    title: candidate.issue.title,
    pre_score: pre.score,
    final_decision: finalDecision,
    model: modelOut
  });
  console.log(
    `#${candidate.issue.number} | pre_score: ${pre.score} | model: ${modelOut.decision} @ ${modelOut.confidence} | final: ${finalDecision} | ${modelOut.reason_code}`
  );
 }
 await fs.writeFile("decisions.json", JSON.stringify(decisions, null, 2));
 console.log(`\nWrote ${decisions.length} decisions to decisions.json`);
--- a/.github/issue-resolution/scripts/fetch-candidates.mjs
+++ b/.github/issue-resolution/scripts/fetch-candidates.mjs
@@ -0,0 +1,89 @@
 import fs from "node:fs/promises";
 const token = process.env.GH_TOKEN;
 const repo = process.env.REPO; // "owner/repo"
 const maxIssues = Number(process.env.MAX_ISSUES) || 100;
 const headers = {
  Authorization: `Bearer ${token}`,
  Accept: "application/vnd.github+json",
  "X-GitHub-Api-Version": "2022-11-28",
 };
 async function rest(url) {
  const res = await fetch(url, { headers });
  if (!res.ok) throw new Error(`${res.status} ${url}: ${await res.text()}`);
  return res.json();
 }
 async function paginate(url, max) {
  const items = [];
  let page = 1;
  while (items.length < max) {
    const perPage = Math.min(100, max - items.length);
    const sep = url.includes("?") ? "&" : "?";
    const batch = await rest(`${url}${sep}per_page=${perPage}&page=${page}`);
    if (!batch.length) break;
    items.push(...batch);
    page++;
  }
  return items.slice(0, max);
 }
 console.log(`Fetching up to ${maxIssues} open issues from ${repo}...`);
 const issues = await paginate(
  `https://api.github.com/repos/${repo}/issues?state=open&sort=updated&direction=desc`,
  maxIssues
 );
 // Filter out pull requests (GitHub API returns PRs as issues too)
 const realIssues = issues.filter((i) => !i.pull_request);
 console.log(`Found ${realIssues.length} open issues (excluded PRs).`);
 const candidates = [];
 for (const issue of realIssues) {
  const [comments, timeline] = await Promise.all([
    rest(`https://api.github.com/repos/${repo}/issues/${issue.number}/comments?per_page=100`),
    rest(`https://api.github.com/repos/${repo}/issues/${issue.number}/timeline?per_page=100`),
  ]);
  candidates.push({
    repository: repo,
    issue: {
      number: issue.number,
      html_url: issue.html_url,
      title: issue.title,
      body: issue.body,
      created_at: issue.created_at,
      updated_at: issue.updated_at,
      labels: issue.labels.map((l) => l.name),
    },
    comments: comments.map((c) => ({
      body: c.body,
      author_association: c.author_association,
      html_url: c.html_url,
      created_at: c.created_at,
      user: c.user?.login,
    })),
    timeline: timeline.map((t) => ({
      event: t.event,
      created_at: t.created_at,
      source: t.source
        ? {
            issue: {
              html_url: t.source.issue?.html_url,
              pull_request: t.source.issue?.pull_request
                ? { html_url: t.source.issue.pull_request.html_url }
                : undefined,
            },
          }
        : undefined,
    })),
  });
  console.log(`  #${issue.number} — ${comments.length} comments, ${timeline.length} timeline events`);
 }
 await fs.writeFile("candidates.json", JSON.stringify(candidates, null, 2));
 console.log(`Wrote ${candidates.length} candidates to candidates.json`);
--- a/.github/workflows/issue-resolution-triage.yml
+++ b/.github/workflows/issue-resolution-triage.yml
@@ -0,0 +1,64 @@
 name: issue-resolution-triage
 on:
  push:
    branches: [github-issue-resolver]
  workflow_dispatch:
    inputs:
      dry_run:
        description: "If true, do not close issues"
        required: false
        default: "true"
      max_issues:
        description: "How many issues to process"
        required: false
        default: "100"
  schedule:
    - cron: "17 2 * * *"
 permissions:
  contents: read
  issues: write
  pull-requests: read
  models: read
 #  todo: remove hardcoded values
 jobs:
  triage:
    runs-on: ubuntu-latest
    env:
      GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      DRY_RUN: "true"
      MAX_ISSUES: "100"
      REPO: ${{ github.repository }}
      PROJECT_ID: "PVT_kwDOBfz4Jc4BVeWR"
      PROJECT_STATUS_FIELD_ID: "PVTSSF_lADOBfz4Jc4BVeWRzhQ56sU"
      PROJECT_CONFIDENCE_FIELD_ID: "PVTF_lADOBfz4Jc4BVeWRzhQ57x4"
      PROJECT_REASON_FIELD_ID: "PVTF_lADOBfz4Jc4BVeWRzhQ5-Lg"
      PROJECT_EVIDENCE_FIELD_ID: "PVTF_lADOBfz4Jc4BVeWRzhQ5-Pw"
      PROJECT_LINKED_PR_FIELD_ID: "PVTF_lADOBfz4Jc4BVeWRzhQ56sc"
      PROJECT_REPO_FIELD_ID: "PVTF_lADOBfz4Jc4BVeWRzhQ56sk"
      PROJECT_STATUS_OPTION_NEEDS_REVIEW_ID: "a55a2be9"
    defaults:
      run:
        working-directory: .github/issue-resolution
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
      - run: node scripts/fetch-candidates.mjs
      - run: node scripts/classify-candidates.mjs
      - run: node scripts/apply-decisions.mjs
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: triage-results
          path: |
            .github/issue-resolution/candidates.json
            .github/issue-resolution/decisions.json
--- a/client/internal/connect.go
+++ b/client/internal/connect.go
@@ -333,6 +333,10 @@ func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan
 		c.statusRecorder.MarkSignalConnected()
 		relayURLs, token := parseRelayInfo(loginResp)
 		if override, ok := peer.OverrideRelayURLs(); ok {
 			log.Infof("overriding relay URLs from %s: %v", peer.EnvKeyNBHomeRelayServers, override)
 			relayURLs = override
 		}
 		peerConfig := loginResp.GetPeerConfig()
 		engineConfig, err := createEngineConfig(myPrivateKey, c.config, peerConfig, logPath)
--- a/client/internal/engine.go
+++ b/client/internal/engine.go
@@ -944,7 +944,12 @@ func (e *Engine) handleRelayUpdate(update *mgmProto.RelayConfig) error {
 			return fmt.Errorf("update relay token: %w", err)
 		}
-		e.relayManager.UpdateServerURLs(update.Urls)
+		urls := update.Urls
 		if override, ok := peer.OverrideRelayURLs(); ok {
 			log.Infof("overriding relay URLs from %s: %v", peer.EnvKeyNBHomeRelayServers, override)
 			urls = override
 		}
 		e.relayManager.UpdateServerURLs(urls)
 		// Just in case the agent started with an MGM server where the relay was disabled but was later enabled.
 		// We can ignore all errors because the guard will manage the reconnection retries.
--- a/client/internal/peer/env.go
+++ b/client/internal/peer/env.go
@@ -8,6 +8,7 @@ import (
 const (
 	EnvKeyNBForceRelay       = "NB_FORCE_RELAY"
 	EnvKeyNBHomeRelayServers = "NB_HOME_RELAY_SERVERS"
 )
 func IsForceRelayed() bool {
@@ -16,3 +17,28 @@ func IsForceRelayed() bool {
 	}
 	return strings.EqualFold(os.Getenv(EnvKeyNBForceRelay), "true")
 }
 // OverrideRelayURLs returns the relay server URL list set in
 // NB_HOME_RELAY_SERVERS (comma-separated) and a boolean indicating whether
 // the override is active. When the env var is unset, the boolean is false
 // and the caller should keep the list received from the management server.
 // Intended for lab/debug scenarios where a peer must pin to a specific home
 // relay regardless of what management offers.
 func OverrideRelayURLs() ([]string, bool) {
 	raw := os.Getenv(EnvKeyNBHomeRelayServers)
 	if raw == "" {
 		return nil, false
 	}
 	parts := strings.Split(raw, ",")
 	urls := make([]string, 0, len(parts))
 	for _, p := range parts {
 		p = strings.TrimSpace(p)
 		if p != "" {
 			urls = append(urls, p)
 		}
 	}
 	if len(urls) == 0 {
 		return nil, false
 	}
 	return urls, true
 }
--- a/shared/management/client/grpc.go
+++ b/shared/management/client/grpc.go
@@ -252,21 +252,19 @@ func (c *GrpcClient) handleJobStream(
 					c.notifyDisconnected(err)
 					return backoff.Permanent(err) // unrecoverable error, propagate to the upper layer
 				case codes.Canceled:
-					log.Debugf("management connection context has been canceled, this usually indicates shutdown")
+					log.Debugf("job stream context has been canceled, this usually indicates shutdown")
 					return err
 				case codes.Unimplemented:
 					log.Warn("Job feature is not supported by the current management server version. " +
 						"Please update the management service to use this feature.")
 					return nil
 				default:
-					c.notifyDisconnected(err)
+					log.Warnf("job stream disconnected, will retry silently. Reason: %v", err)
 					log.Warnf("disconnected from the Management service but will retry silently. Reason: %v", err)
 					return err
 				}
 			} else {
 				// non-gRPC error
-				c.notifyDisconnected(err)
+				log.Warnf("job stream disconnected, will retry silently. Reason: %v", err)
 				log.Warnf("disconnected from the Management service but will retry silently. Reason: %v", err)
 				return err
 			}
 		}
--- a/shared/relay/client/guard.go
+++ b/shared/relay/client/guard.go
@@ -8,10 +8,7 @@ import (
 	log "github.com/sirupsen/logrus"
 )
-const (
+const defaultMaxBackoffInterval = 60 * time.Second
 	// TODO: make it configurable, the manager should validate all configurable parameters
 	reconnectingTimeout = 60 * time.Second
 )
 // Guard manage the reconnection tries to the Relay server in case of disconnection event.
 type Guard struct {
@@ -19,14 +16,23 @@ type Guard struct {
 	OnNewRelayClient chan *Client
 	OnReconnected    chan struct{}
 	serverPicker     *ServerPicker
 	// maxBackoffInterval caps the exponential backoff between reconnect
 	// attempts.
 	maxBackoffInterval time.Duration
 }
-// NewGuard creates a new guard for the relay client.
+// NewGuard creates a new guard for the relay client. A non-positive
-func NewGuard(sp *ServerPicker) *Guard {
+// maxBackoffInterval falls back to defaultMaxBackoffInterval.
 func NewGuard(sp *ServerPicker, maxBackoffInterval time.Duration) *Guard {
 	if maxBackoffInterval <= 0 {
 		maxBackoffInterval = defaultMaxBackoffInterval
 	}
 	g := &Guard{
 		OnNewRelayClient:   make(chan *Client, 1),
 		OnReconnected:      make(chan struct{}, 1),
 		serverPicker:       sp,
 		maxBackoffInterval: maxBackoffInterval,
 	}
 	return g
 }
@@ -49,7 +55,7 @@ func (g *Guard) StartReconnectTrys(ctx context.Context, relayClient *Client) {
 	}
 	// start a ticker to pick a new server
-	ticker := exponentTicker(ctx)
+	ticker := g.exponentTicker(ctx)
 	defer ticker.Stop()
 	for {
@@ -125,11 +131,11 @@ func (g *Guard) notifyReconnected() {
 	}
 }
-func exponentTicker(ctx context.Context) *backoff.Ticker {
+func (g *Guard) exponentTicker(ctx context.Context) *backoff.Ticker {
 	bo := backoff.WithContext(&backoff.ExponentialBackOff{
 		InitialInterval: 2 * time.Second,
 		Multiplier:      2,
-		MaxInterval:     reconnectingTimeout,
+		MaxInterval:     g.maxBackoffInterval,
 		Clock:           backoff.SystemClock,
 	}, ctx)
--- a/shared/relay/client/manager.go
+++ b/shared/relay/client/manager.go
@@ -39,6 +39,15 @@ func NewRelayTrack() *RelayTrack {
 type OnServerCloseListener func()
 // ManagerOption configures a Manager at construction time.
 type ManagerOption func(*Manager)
 // WithMaxBackoffInterval caps the exponential backoff between reconnect
 // attempts to the home relay. A non-positive value keeps the default.
 func WithMaxBackoffInterval(d time.Duration) ManagerOption {
 	return func(m *Manager) { m.maxBackoffInterval = d }
 }
 // Manager is a manager for the relay client instances. It establishes one persistent connection to the given relay URL
 // and automatically reconnect to them in case disconnection.
 // The manager also manage temporary relay connection. If a client wants to communicate with a client on a
@@ -65,11 +74,12 @@ type Manager struct {
 	listenerLock            sync.Mutex
 	mtu                uint16
 	maxBackoffInterval time.Duration
 }
 // NewManager creates a new manager instance.
 // The serverURL address can be empty. In this case, the manager will not serve.
-func NewManager(ctx context.Context, serverURLs []string, peerID string, mtu uint16) *Manager {
+func NewManager(ctx context.Context, serverURLs []string, peerID string, mtu uint16, opts ...ManagerOption) *Manager {
 	tokenStore := &relayAuth.TokenStore{}
 	m := &Manager{
@@ -86,8 +96,11 @@ func NewManager(ctx context.Context, serverURLs []string, peerID string, mtu uin
 		relayClients:            make(map[string]*RelayTrack),
 		onDisconnectedListeners: make(map[string]*list.List),
 	}
 	for _, opt := range opts {
 		opt(m)
 	}
 	m.serverPicker.ServerURLs.Store(serverURLs)
-	m.reconnectGuard = NewGuard(m.serverPicker)
+	m.reconnectGuard = NewGuard(m.serverPicker, m.maxBackoffInterval)
 	return m
 }
@@ -290,19 +303,36 @@ func (m *Manager) onServerConnected() {
 	go m.onReconnectedListenerFn()
 }
-// onServerDisconnected start to reconnection for home server only
+// onServerDisconnected handles relay disconnect events. For the home server it
 // starts the reconnect guard. For foreign servers it evicts the now-dead client
 // from the cache so the next OpenConn builds a fresh one instead of reusing a
 // closed client.
 func (m *Manager) onServerDisconnected(serverAddress string) {
 	m.relayClientMu.Lock()
-	if serverAddress == m.relayClient.connectionURL {
+	isHome := m.relayClient != nil && serverAddress == m.relayClient.connectionURL
 	if isHome {
 		go func(client *Client) {
 			m.reconnectGuard.StartReconnectTrys(m.ctx, client)
 		}(m.relayClient)
 	}
 	m.relayClientMu.Unlock()
 	if !isHome {
 		m.evictForeignRelay(serverAddress)
 	}
 	m.notifyOnDisconnectListeners(serverAddress)
 }
 func (m *Manager) evictForeignRelay(serverAddress string) {
 	m.relayClientsMutex.Lock()
 	defer m.relayClientsMutex.Unlock()
 	if _, ok := m.relayClients[serverAddress]; ok {
 		delete(m.relayClients, serverAddress)
 		log.Debugf("evicted disconnected foreign relay client: %s", serverAddress)
 	}
 }
 func (m *Manager) listenGuardEvent(ctx context.Context) {
 	for {
 		select {
--- a/shared/relay/client/manager_test.go
+++ b/shared/relay/client/manager_test.go
@@ -2,6 +2,7 @@ package client
 import (
 	"context"
 	"fmt"
 	"testing"
 	"time"
@@ -360,7 +361,8 @@ func TestAutoReconnect(t *testing.T) {
 		t.Fatalf("failed to serve manager: %s", err)
 	}
-	clientAlice := NewManager(mCtx, toURL(srvCfg), "alice", iface.DefaultMTU)
+	clientAlice := NewManager(mCtx, toURL(srvCfg), "alice", iface.DefaultMTU,
 		WithMaxBackoffInterval(2*time.Second))
 	err = clientAlice.Serve()
 	if err != nil {
 		t.Fatalf("failed to serve manager: %s", err)
@@ -384,7 +386,9 @@ func TestAutoReconnect(t *testing.T) {
 	}
 	log.Infof("waiting for reconnection")
-	time.Sleep(reconnectingTimeout + 1*time.Second)
+	if err := waitForReady(ctx, clientAlice, 15*time.Second); err != nil {
 		t.Fatalf("manager did not reconnect: %s", err)
 	}
 	log.Infof("reopent the connection")
 	_, err = clientAlice.OpenConn(ctx, ra, "bob")
@@ -393,6 +397,21 @@ func TestAutoReconnect(t *testing.T) {
 	}
 }
 func waitForReady(ctx context.Context, m *Manager, timeout time.Duration) error {
 	deadline := time.Now().Add(timeout)
 	for time.Now().Before(deadline) {
 		if m.Ready() {
 			return nil
 		}
 		select {
 		case <-time.After(100 * time.Millisecond):
 		case <-ctx.Done():
 			return ctx.Err()
 		}
 	}
 	return fmt.Errorf("manager not ready within %s", timeout)
 }
 func TestNotifierDoubleAdd(t *testing.T) {
 	ctx := context.Background()
Author	SHA1	Message	Date
Ashley Mensah	222d498bb6	fix(ci): distinguish workarounds from actual fixes in system prompt	2026-04-28 17:48:43 +02:00
Ashley Mensah	52cd104f1e	chore(ci): switch back to gpt-4o-mini for higher quota	2026-04-28 17:44:05 +02:00
Ashley Mensah	92f666f652	fix(ci): cap retry-after and handle quota exhaustion gracefully	2026-04-28 17:42:43 +02:00
Ashley Mensah	4fc0cb7ec4	fix(ci): pace API requests to avoid rate limit thrashing	2026-04-28 17:30:18 +02:00
Ashley Mensah	695614834e	fix(ci): fix policy logic and add message truncation - enforcePolicy: respect KEEP_OPEN when model is confident and pre_score is low. Only promote to MANUAL_REVIEW when model suggests resolution or pre_score has hard signals - Truncate user messages to 24k chars (issue body capped at 4k) to stay within GitHub Models 8000 token input limit	2026-04-28 16:38:36 +02:00
Ashley Mensah	d75fa6ad45	feat(ci): switch to gpt-4o and add rate limit retry - Upgrade model from gpt-4o-mini to gpt-4o for better classification - Add retry loop with backoff on 429 responses (up to 5 retries) - Respect Retry-After header from GitHub Models API	2026-04-28 16:23:31 +02:00
Ashley Mensah	7ce7f322eb	increase max number of dry run issues to 100	2026-04-28 16:13:38 +02:00
Ashley Mensah	22bcf70b6e	fix(ci): add additionalProperties to nested schema objects	2026-04-28 16:11:57 +02:00
Ashley Mensah	fe8aa21245	feat(ci): wire up gpt-4o-mini for issue classification - Replace stub callGitHubModel() with real GitHub Models API call using gpt-4o-mini with structured JSON output - Build detailed user messages from issue body, comments, and timeline - Add per-issue decision logging to classify step - Upload candidates.json and decisions.json as workflow artifacts	2026-04-28 16:11:01 +02:00
Ashley Mensah	29f211e51c	fix(ci): guard all decisions behind dry-run check Move the dry-run check to the top of the loop so it applies to all decision types, not just AUTO_CLOSE. In dry-run mode the workflow now only logs what it would do without touching any issues.	2026-04-28 16:08:16 +02:00
Ashley Mensah	6df3580bd3	fix(ci): handle project API permission errors gracefully GITHUB_TOKEN cannot access org-level Projects V2. Make addToProject return null on failure instead of crashing, and skip setTextField calls when project access is unavailable. A PAT with project scope is needed for full project board integration.	2026-04-28 15:55:12 +02:00
Ashley Mensah	9ff735dd52	fix(ci): fix workflow by adding working-directory, fetch script, and removing npm ci - Set defaults.run.working-directory to .github/issue-resolution so scripts resolve from the correct path - Remove npm ci step (no npm dependencies needed) - Add fetch-candidates.mjs to gather open issues with comments and timeline events via GitHub REST API - Add minimal package.json with type: module	2026-04-28 15:53:17 +02:00
Ashley Mensah	7c43973bc9	fix typo in workflow	2026-04-28 15:48:35 +02:00
Ashley Mensah	01e53d07b9	fix typo in workflow	2026-04-28 15:46:57 +02:00
Ashley Mensah	797dce1631	dummy commit to test workflow	2026-04-28 15:43:37 +02:00
Ashley Mensah	2877fcbbf6	add push trigger to workflow	2026-04-28 15:42:50 +02:00
Ashley Mensah	5d8201fcd0	Merge branch 'main' into github-issue-resolver	2026-04-28 15:36:54 +02:00
Ashley Mensah	09595bd0c2	enable dry run, add project field id values	2026-04-28 15:36:21 +02:00
Zoltan Papp	8fc4265995	[relay] evict foreign client cache on disconnect (#6015 ) * [relay] evict foreign client cache on disconnect When a foreign relay's TCP connection drops, the manager's onServerDisconnected handler only triggered reconnect logic for the home server; the disconnected foreign entry stayed in the relayClients cache. Subsequent OpenConn calls reused the closed client until the 60-second cleanup tick evicted it, breaking peer connectivity through that relay for up to a minute. Evict the foreign entry from the cache on disconnect so the next OpenConn dials a fresh client. Also: - Make the reconnect backoff cap configurable via WithMaxBackoffInterval ManagerOption; the previous hard-coded 60s constant forced TestAutoReconnect to sleep ~61s. Test now polls Ready() and finishes in ~2s. - Add NB_HOME_RELAY_SERVERS env var that overrides the relay URL list received from management, so a peer can be pinned to a specific home relay (used by the netbird-conn-lab Edge 4 reproducer). * [client] treat empty NB_HOME_RELAY_SERVERS as unset Returning (urls=[], ok=true) when the env var contained only separators or whitespace caused callers to wipe the mgmt-provided relay list, leaving the peer with no relays. Treat a parsed-empty result the same as an unset env.	2026-04-28 15:04:48 +02:00
Zoltan Papp	9c50819f20	Don't mark management disconnected on transient job stream errors (#6005 ) The JOB stream is a separate channel from the SYNC stream. Server-side EOF or other transient errors on the JOB stream do not indicate that the management connection is unhealthy — the SYNC stream remains the authoritative state signal. Previously, a JOB stream EOF would call notifyDisconnected and the client would emit OnConnecting to the UI. The backoff retry would reconnect the JOB stream, but handleJobStream never calls notifyConnected on success, so the UI was stuck on "Connecting" until the next SYNC event or health check. Keep notifyDisconnected for codes.PermissionDenied since IsLoginRequired relies on managementError to detect expired auth.	2026-04-28 15:04:41 +02:00
Ashley Mensah	6b8e40f78d	initial commit - workflow yaml, prompts and schemas	2026-04-23 18:48:38 +02:00