Files
docs-v2/manage/healthchecks-failover.mdx
2025-10-14 17:07:58 -07:00

226 lines
6.6 KiB
Plaintext

---
title: "Health Checks"
description: "Configure automated health monitoring and failover for high availability"
---
<Note>
Health checks are only available for targets created with managed self-hosted nodes or in the cloud.
</Note>
<iframe
className="w-full aspect-video rounded-xl"
src="https://www.youtube.com/embed/Xdme_2-AMas"
title="YouTube video player"
frameBorder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowFullScreen
></iframe>
## Overview
Pangolin provides automated health checking for [targets](./resources/targets.mdx) to ensure traffic is only routed to healthy services. When you create targets with managed nodes or in the cloud, you can optionally define health check parameters to monitor the availability and responsiveness of your services.
Health checks are essential for building highly available services, as they automatically remove unhealthy targets from traffic routing and load balancing.
## How Health Checks Work
### Monitoring Process
Health checks operate continuously in the background:
1. **Periodic Checks**: Pangolin sends requests to your target endpoints at configured intervals
2. **Status Evaluation**: Responses are evaluated against your configured criteria
3. **Traffic Management**: Healthy targets receive traffic, unhealthy targets are excluded
4. **Automatic Recovery**: Targets are automatically re-enabled when they become healthy again
### Health Check vs Target Endpoint
<Card title="Flexible Monitoring">
The health check endpoint can be the same as your target, but you can also monitor a different endpoint. This allows you to create dedicated health check endpoints that provide more detailed service status information.
</Card>
## Target Health States
Targets can exist in three distinct states that determine how traffic is routed:
<CardGroup cols={3}>
<Card title="Unknown" icon="question" color="#gray">
**Initial State**: Targets start in this state before first health check
**Traffic Behavior**: Unknown targets still route traffic normally
**Duration**: Until first health check completes
</Card>
<Card title="Unhealthy" icon="x" color="#red">
**Failed Checks**: Target has failed health check criteria
**Traffic Behavior**: No traffic is routed to unhealthy targets
**Load Balancing**: Excluded from load balancing rotation
</Card>
<Card title="Healthy" icon="check" color="#green">
**Passing Checks**: Target is responding correctly to health checks
**Traffic Behavior**: Receives traffic according to load balancing rules
**Load Balancing**: Included in load balancing rotation
</Card>
</CardGroup>
## Configuring Health Checks
<Steps>
<Step title="Access Target Settings">
In the Pangolin dashboard, navigate to your resource and locate the target in the table.
</Step>
<Step title="Open Health Check Configuration">
Click the settings wheel (⚙️) next to the health check endpoint column.
</Step>
<Step title="Configure Health Check Parameters">
Fill out the health check configuration with your desired parameters.
</Step>
<Step title="Save Configuration">
Save your settings to enable health checking for the target.
</Step>
</Steps>
## Health Check Parameters
### Endpoint Configuration
<Card title="Health Check Target">
**Target Endpoint**: The URL or address to monitor for health status
**Default Behavior**: Usually the same as your target endpoint
**Custom Endpoints**: Can monitor different endpoints (e.g., `/health`, `/status`)
</Card>
### Timing Configuration
<CardGroup cols={2}>
<Card title="Healthy Interval">
**Purpose**: How often to check targets that are currently healthy
**Typical Range**: 30-60 seconds
**Consideration**: Less frequent checks reduce overhead
</Card>
<Card title="Unhealthy Interval">
**Purpose**: How often to check targets that are currently unhealthy
**Typical Range**: 10-30 seconds
**Consideration**: More frequent checks enable faster recovery
</Card>
</CardGroup>
### Response Configuration
<Card title="Timeout Settings">
**Request Timeout**: Maximum time to wait for a health check response
**Default Behavior**: Requests exceeding timeout are considered failed
**Recommended**: Set based on your service's typical response time
</Card>
<Card title="HTTP Response Codes">
**Healthy Codes**: Which HTTP status codes indicate a healthy target
**Common Settings**: 200, 201, 202, 204
**Custom Codes**: Configure based on your service's health endpoint behavior
</Card>
## Failover Behavior
### Automatic Traffic Exclusion
When a target becomes unhealthy:
<Steps>
<Step title="Health Check Failure">
Target fails to meet health check criteria (response code, timeout, etc.)
</Step>
<Step title="Status Update">
Target status changes from "Healthy" to "Unhealthy"
</Step>
<Step title="Traffic Removal">
Target is immediately removed from traffic routing configuration
</Step>
<Step title="Load Balancer Update">
Load balancing configuration is updated to exclude the unhealthy target
</Step>
<Step title="Continued Monitoring">
Health checks continue at the unhealthy interval for recovery detection
</Step>
</Steps>
### Automatic Recovery
When an unhealthy target recovers:
<Steps>
<Step title="Successful Health Check">
Target begins responding correctly to health checks
</Step>
<Step title="Status Update">
Target status changes from "Unhealthy" to "Healthy"
</Step>
<Step title="Traffic Restoration">
Target is automatically added back to traffic routing
</Step>
<Step title="Load Balancer Update">
Load balancing resumes including the recovered target
</Step>
</Steps>
## High Availability Strategies
### Multi-Target Redundancy
<Card title="Service Redundancy">
Deploy multiple instances of your service across different targets to ensure availability even when some targets fail.
</Card>
```
Resource: web-application
├── Target 1: web-01.local:8080 (Site A) - Healthy ✅
├── Target 2: web-02.local:8080 (Site A) - Unhealthy ❌
└── Target 3: web-03.local:8080 (Site B) - Healthy ✅
Traffic routes to: Target 1 & Target 3 only
```
### Cross-Site Failover
<Card title="Geographic Distribution">
Distribute targets across multiple sites to protect against site-level failures.
</Card>
```
Resource: api-service
├── Primary Site Targets
│ ├── api-01.primary:8443 - Healthy ✅
│ └── api-02.primary:8443 - Healthy ✅
└── Backup Site Targets
├── api-01.backup:8443 - Healthy ✅
└── api-02.backup:8443 - Healthy ✅
All targets receive traffic via load balancing
```