Update README.md

Merge pull request #2784 from fosrl/dev
Try to prevent deadlocks
2026-04-14 05:46:38 +00:00 · 2026-04-13 13:03:53 -04:00 · 2026-04-03 23:01:09 -04:00 · 2026-04-03 22:55:04 -04:00 · 2026-04-03 22:42:03 -04:00 · 2026-04-03 22:37:42 -04:00
5 changed files with 112 additions and 122 deletions
--- a/README.md
+++ b/README.md
@@ -35,19 +35,13 @@
 </div>
 <p align="center">
    <a href="https://docs.pangolin.net/careers/join-us">
        <img src="https://img.shields.io/badge/🚀_We're_Hiring!-Join_Our_Team-brightgreen?style=for-the-badge" alt="We're Hiring!" />
    </a>
 </p>
 <p align="center">
    <strong>
        Get started with Pangolin at <a href="https://app.pangolin.net/auth/signup">app.pangolin.net</a>
    </strong>
 </p>
-Pangolin is an open-source, identity-based remote access platform built on WireGuard that enables secure, seamless connectivity to private and public resources. Pangolin combines reverse proxy and VPN capabilities into one platform, providing browser-based access to web applications and client-based access to any private resources, all with zero-trust security and granular access control.
+Pangolin is an open-source, identity-based remote access platform built on WireGuard that enables secure, seamless connectivity to private and public resources. Pangolin combines reverse proxy and VPN capabilities into one platform, providing browser-based access to web applications and client-based access to any private resources with NAT traversal, all with granular access controls.
 ## Installation
@@ -60,16 +54,16 @@ Pangolin is an open-source, identity-based remote access platform built on WireG
 | <img width=500 /> | Description |
 |-----------------|--------------|
-| **Pangolin Cloud** | Fully managed service with instant setup and pay-as-you-go pricing - no infrastructure required. Or, self-host your own [remote node](https://docs.pangolin.net/manage/remote-node/understanding-nodes) and connect to our control plane. |
+| **Pangolin Cloud** | Fully managed service - no infrastructure required. |
 | **Self-Host: Community Edition** | Free, open source, and licensed under AGPL-3. |
-| **Self-Host: Enterprise Edition** | Licensed under Fossorial Commercial License. Free for personal and hobbyist use, and for businesses earning under \$100K USD annually. |
+| **Self-Host: Enterprise Edition** | Licensed under Fossorial Commercial License. Free for personal and hobbyist use, and for businesses making less than \$100K USD gross annual revenue. |
 ## Key Features
 | <img width=500 />                                                                                                                                                                                                                                                                                                                                                                | <img width=500 />                                                  |
 |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
-| **Connect remote networks with sites**<br /><br />Pangolin's lightweight site connectors create secure tunnels from remote networks without requiring public IP addresses or open ports. Sites make any network anywhere available for authorized access.                                                                                                                                                                                   | <img src="public/screenshots/sites.png" width=500 /><tr></tr>               |
+| **Connect remote networks with sites**<br /><br />Pangolin's site connectors create secure tunnels from remote networks without requiring public IP addresses or open ports. Sites make any network anywhere available for authorized access.                                                                                                                                                                                   | <img src="public/screenshots/sites.png" width=500 /><tr></tr>               |
-| **Browser-based reverse proxy access**<br /><br />Expose web applications through identity and context-aware tunneled reverse proxies. Pangolin handles routing, load balancing, health checking, and automatic SSL certificates without exposing your network directly to the internet. Users access applications through any web browser with authentication and granular access control.                                                                                                  | <img src="public/clip.gif" width=500 /><tr></tr>          |
+| **Browser-based reverse proxy access**<br /><br />Expose web applications through identity and context-aware tunneled reverse proxies. Users access applications through any web browser with authentication and granular access control. Pangolin handles routing, load balancing, health checking, and automatic SSL certificates without exposing your network directly to the internet.                                                                                                   | <img src="public/clip.gif" width=500 /><tr></tr>          |
 | **Client-based private resource access**<br /><br />Access private resources like SSH servers, databases, RDP, and entire network ranges through Pangolin clients. Intelligent NAT traversal enables connections even through restrictive firewalls, while DNS aliases provide friendly names and fast connections to resources across all your sites.                                                                                                                                                                                                | <img src="public/screenshots/private-resources.png" width=500 /><tr></tr>               |
 | **Zero-trust granular access**<br /><br />Grant users access to specific resources, not entire networks. Unlike traditional VPNs that expose full network access, Pangolin's zero-trust model ensures users can only reach the applications and services you explicitly define, reducing security risk and attack surface.                                                                                                                                                                                    | <img src="public/screenshots/user-devices.png" width=500 /><tr></tr> |
@@ -87,7 +81,7 @@ Download the Pangolin client for your platform:
 ### Sign up now
-Create an account at [app.pangolin.net](https://app.pangolin.net) to get started with Pangolin Cloud. A generous free tier is available.
+Create a free account at [app.pangolin.net](https://app.pangolin.net) to get started with Pangolin Cloud.
 ### Check out the docs
@@ -102,7 +96,3 @@ Pangolin is dual licensed under the AGPL-3 and the [Fossorial Commercial License
 ## Contributions
 Please see [CONTRIBUTING](./CONTRIBUTING.md) in the repository for guidelines and best practices.
 ---
 WireGuard® is a registered trademark of Jason A. Donenfeld.
--- a/install/config/crowdsec/traefik_config.yml
+++ b/install/config/crowdsec/traefik_config.yml
@@ -86,6 +86,8 @@ entryPoints:
    http:
      tls:
        certResolver: "letsencrypt"
      middlewares:
        - crowdsec@file
      encodedCharacters:
        allowEncodedSlash: true
        allowEncodedQuestionMark: true
--- a/server/lib/traefik/getTraefikConfig.ts
+++ b/server/lib/traefik/getTraefikConfig.ts
@@ -479,10 +479,7 @@ export async function getTraefikConfig(
                        // TODO: HOW TO HANDLE ^^^^^^ BETTER
                        const anySitesOnline = targets.some(
-                            (target) =>
+                            (target) => target.site.online
                            target.site.online ||
                            target.site.type === "local" ||
                            target.site.type === "wireguard"
                        );
                        return (
@@ -610,10 +607,7 @@ export async function getTraefikConfig(
                    servers: (() => {
                        // Check if any sites are online
                        const anySitesOnline = targets.some(
-                            (target) =>
+                            (target) => target.site.online
                            target.site.online ||
                            target.site.type === "local" ||
                            target.site.type === "wireguard"
                        );
                        return targets
--- a/server/private/lib/traefik/getTraefikConfig.ts
+++ b/server/private/lib/traefik/getTraefikConfig.ts
@@ -671,10 +671,7 @@ export async function getTraefikConfig(
                        // TODO: HOW TO HANDLE ^^^^^^ BETTER
                        const anySitesOnline = targets.some(
-                            (target) =>
+                            (target) => target.site.online
                            target.site.online ||
                            target.site.type === "local" ||
                            target.site.type === "wireguard"
                        );
                        return (
@@ -802,10 +799,7 @@ export async function getTraefikConfig(
                    servers: (() => {
                        // Check if any sites are online
                        const anySitesOnline = targets.some(
-                            (target) =>
+                            (target) => target.site.online
                            target.site.online ||
                            target.site.type === "local" ||
                            target.site.type === "wireguard"
                        );
                        return targets
--- a/server/routers/newt/pingAccumulator.ts
+++ b/server/routers/newt/pingAccumulator.ts
@@ -1,6 +1,6 @@
 import { db } from "@server/db";
 import { sites, clients, olms } from "@server/db";
-import { eq, inArray } from "drizzle-orm";
+import { inArray } from "drizzle-orm";
 import logger from "@server/logger";
 /**
@@ -21,7 +21,7 @@ import logger from "@server/logger";
 */
 const FLUSH_INTERVAL_MS = 10_000; // Flush every 10 seconds
-const MAX_RETRIES = 2;
+const MAX_RETRIES = 5;
 const BASE_DELAY_MS = 50;
 // ── Site (newt) pings ──────────────────────────────────────────────────
@@ -36,6 +36,14 @@ const pendingOlmArchiveResets: Set<string> = new Set();
 let flushTimer: NodeJS.Timeout | null = null;
 /**
 * Guard that prevents two flush cycles from running concurrently.
 * setInterval does not await async callbacks, so without this a slow flush
 * (e.g. due to DB latency) would overlap with the next scheduled cycle and
 * the two concurrent bulk UPDATEs would deadlock each other.
 */
 let isFlushing = false;
 // ── Public API ─────────────────────────────────────────────────────────
 /**
@@ -72,6 +80,12 @@ export function recordClientPing(
 /**
 * Flush all accumulated site pings to the database.
 *
 * Each batch of up to BATCH_SIZE rows is written with a **single** UPDATE
 * statement. We use the maximum timestamp across the batch so that `lastPing`
 * reflects the most recent ping seen for any site in the group. This avoids
 * the multi-statement transaction that previously created additional
 * row-lock ordering hazards.
 */
 async function flushSitePingsToDb(): Promise<void> {
    if (pendingSitePings.size === 0) {
@@ -83,55 +97,35 @@ async function flushSitePingsToDb(): Promise<void> {
    const pingsToFlush = new Map(pendingSitePings);
    pendingSitePings.clear();
-    // Sort by siteId for consistent lock ordering (prevents deadlocks)
+    const entries = Array.from(pingsToFlush.entries());
    const sortedEntries = Array.from(pingsToFlush.entries()).sort(
        ([a], [b]) => a - b
    );
    const BATCH_SIZE = 50;
-    for (let i = 0; i < sortedEntries.length; i += BATCH_SIZE) {
+    for (let i = 0; i < entries.length; i += BATCH_SIZE) {
-        const batch = sortedEntries.slice(i, i + BATCH_SIZE);
+        const batch = entries.slice(i, i + BATCH_SIZE);
        // Use the latest timestamp in the batch so that `lastPing` always
        // moves forward. Using a single timestamp for the whole batch means
        // we only ever need one UPDATE statement (no transaction).
        const maxTimestamp = Math.max(...batch.map(([, ts]) => ts));
        const siteIds = batch.map(([id]) => id);
        try {
            await withRetry(async () => {
                // Group by timestamp for efficient bulk updates
                const byTimestamp = new Map<number, number[]>();
                for (const [siteId, timestamp] of batch) {
                    const group = byTimestamp.get(timestamp) || [];
                    group.push(siteId);
                    byTimestamp.set(timestamp, group);
                }
                if (byTimestamp.size === 1) {
                    const [timestamp, siteIds] = Array.from(
                        byTimestamp.entries()
                    )[0];
                await db
                    .update(sites)
                    .set({
                        online: true,
-                            lastPing: timestamp
+                        lastPing: maxTimestamp
                    })
                    .where(inArray(sites.siteId, siteIds));
                } else {
                    await db.transaction(async (tx) => {
                        for (const [timestamp, siteIds] of byTimestamp) {
                            await tx
                                .update(sites)
                                .set({
                                    online: true,
                                    lastPing: timestamp
                                })
                                .where(inArray(sites.siteId, siteIds));
                        }
                    });
                }
            }, "flushSitePingsToDb");
        } catch (error) {
            logger.error(
                `Failed to flush site ping batch (${batch.length} sites), re-queuing for next cycle`,
                { error }
            );
            // Re-queue only if the preserved timestamp is newer than any
            // update that may have landed since we snapshotted.
            for (const [siteId, timestamp] of batch) {
                const existing = pendingSitePings.get(siteId);
                if (!existing || existing < timestamp) {
@@ -144,6 +138,8 @@ async function flushSitePingsToDb(): Promise<void> {
 /**
 * Flush all accumulated client (OLM) pings to the database.
 *
 * Same single-UPDATE-per-batch approach as `flushSitePingsToDb`.
 */
 async function flushClientPingsToDb(): Promise<void> {
    if (pendingClientPings.size === 0 && pendingOlmArchiveResets.size === 0) {
@@ -159,51 +155,25 @@ async function flushClientPingsToDb(): Promise<void> {
    // ── Flush client pings ─────────────────────────────────────────────
    if (pingsToFlush.size > 0) {
-        const sortedEntries = Array.from(pingsToFlush.entries()).sort(
+        const entries = Array.from(pingsToFlush.entries());
            ([a], [b]) => a - b
        );
        const BATCH_SIZE = 50;
-        for (let i = 0; i < sortedEntries.length; i += BATCH_SIZE) {
+        for (let i = 0; i < entries.length; i += BATCH_SIZE) {
-            const batch = sortedEntries.slice(i, i + BATCH_SIZE);
+            const batch = entries.slice(i, i + BATCH_SIZE);
            const maxTimestamp = Math.max(...batch.map(([, ts]) => ts));
            const clientIds = batch.map(([id]) => id);
            try {
                await withRetry(async () => {
                    const byTimestamp = new Map<number, number[]>();
                    for (const [clientId, timestamp] of batch) {
                        const group = byTimestamp.get(timestamp) || [];
                        group.push(clientId);
                        byTimestamp.set(timestamp, group);
                    }
                    if (byTimestamp.size === 1) {
                        const [timestamp, clientIds] = Array.from(
                            byTimestamp.entries()
                        )[0];
                    await db
                        .update(clients)
                        .set({
-                                lastPing: timestamp,
+                            lastPing: maxTimestamp,
                            online: true,
                            archived: false
                        })
                        .where(inArray(clients.clientId, clientIds));
                    } else {
                        await db.transaction(async (tx) => {
                            for (const [timestamp, clientIds] of byTimestamp) {
                                await tx
                                    .update(clients)
                                    .set({
                                        lastPing: timestamp,
                                        online: true,
                                        archived: false
                                    })
                                    .where(
                                        inArray(clients.clientId, clientIds)
                                    );
                            }
                        });
                    }
                }, "flushClientPingsToDb");
            } catch (error) {
                logger.error(
@@ -260,7 +230,12 @@ export async function flushPingsToDb(): Promise<void> {
 /**
 * Simple retry wrapper with exponential backoff for transient errors
- * (connection timeouts, unexpected disconnects).
+ * (deadlocks, connection timeouts, unexpected disconnects).
 *
 * PostgreSQL deadlocks (40P01) are always safe to retry: the database
 * guarantees exactly one winner per deadlock pair, so the loser just needs
 * to try again. MAX_RETRIES is intentionally higher than typical connection
 * retry budgets to give deadlock victims enough chances to succeed.
 */
 async function withRetry<T>(
    operation: () => Promise<T>,
@@ -277,7 +252,8 @@ async function withRetry<T>(
                const jitter = Math.random() * baseDelay;
                const delay = baseDelay + jitter;
                logger.warn(
-                    `Transient DB error in ${context}, retrying attempt ${attempt}/${MAX_RETRIES} after ${delay.toFixed(0)}ms`
+                    `Transient DB error in ${context}, retrying attempt ${attempt}/${MAX_RETRIES} after ${delay.toFixed(0)}ms`,
                    { code: error?.code ?? error?.cause?.code }
                );
                await new Promise((resolve) => setTimeout(resolve, delay));
                continue;
@@ -288,14 +264,14 @@ async function withRetry<T>(
 }
 /**
- * Detect transient connection errors that are safe to retry.
+ * Detect transient errors that are safe to retry.
 */
 function isTransientError(error: any): boolean {
    if (!error) return false;
    const message = (error.message || "").toLowerCase();
    const causeMessage = (error.cause?.message || "").toLowerCase();
-    const code = error.code || "";
+    const code = error.code || error.cause?.code || "";
    // Connection timeout / terminated
    if (
@@ -308,12 +284,17 @@ function isTransientError(error: any): boolean {
        return true;
    }
-    // PostgreSQL deadlock
+    // PostgreSQL deadlock detected — always safe to retry (one winner guaranteed)
    if (code === "40P01" || message.includes("deadlock")) {
        return true;
    }
-    // ECONNRESET, ECONNREFUSED, EPIPE
+    // PostgreSQL serialization failure
    if (code === "40001") {
        return true;
    }
    // ECONNRESET, ECONNREFUSED, EPIPE, ETIMEDOUT
    if (
        code === "ECONNRESET" ||
        code === "ECONNREFUSED" ||
@@ -337,12 +318,26 @@ export function startPingAccumulator(): void {
    }
    flushTimer = setInterval(async () => {
        // Skip this tick if the previous flush is still in progress.
        // setInterval does not await async callbacks, so without this guard
        // two flush cycles can run concurrently and deadlock each other on
        // overlapping bulk UPDATE statements.
        if (isFlushing) {
            logger.debug(
                "Ping accumulator: previous flush still in progress, skipping cycle"
            );
            return;
        }
        isFlushing = true;
        try {
            await flushPingsToDb();
        } catch (error) {
            logger.error("Unhandled error in ping accumulator flush", {
                error
            });
        } finally {
            isFlushing = false;
        }
    }, FLUSH_INTERVAL_MS);
@@ -364,7 +359,22 @@ export async function stopPingAccumulator(): Promise<void> {
        flushTimer = null;
    }
-    // Final flush to persist any remaining pings
+    // Final flush to persist any remaining pings.
    // Wait for any in-progress flush to finish first so we don't race.
    if (isFlushing) {
        logger.debug(
            "Ping accumulator: waiting for in-progress flush before stopping…"
        );
        await new Promise<void>((resolve) => {
            const poll = setInterval(() => {
                if (!isFlushing) {
                    clearInterval(poll);
                    resolve();
                }
            }, 50);
        });
    }
    try {
        await flushPingsToDb();
    } catch (error) {
Author	SHA1	Message	Date
Milo Schwartz	02dfeed3ce	Update README.md	2026-04-13 13:03:53 -04:00
Owen Schwartz	3436105bec	Merge pull request #2784 from fosrl/dev Try to prevent deadlocks	2026-04-03 23:01:09 -04:00
Owen	d948d2ec33	Try to prevent deadlocks	2026-04-03 22:55:04 -04:00
Owen Schwartz	4b3375ab8e	Merge pull request #2783 from fosrl/dev Fix 1.17.0	2026-04-03 22:42:03 -04:00
Owen	6b8a3c8d77	Revert #2570 Fix #2782	2026-04-03 22:37:42 -04:00
Owen	ba9794c067	Put middleware back Fix #2781	2026-04-03 22:16:26 -04:00