Files
docs-v2/manage/healthchecks-failover.mdx
2025-10-15 16:54:16 -07:00

220 lines
6.3 KiB
Plaintext

---
title: "Health Checks"
description: "Configure automated health monitoring and failover for high availability"
---
<iframe
className="w-full aspect-video rounded-xl"
src="https://www.youtube.com/embed/Xdme_2-AMas"
title="YouTube video player"
frameBorder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowFullScreen
></iframe>
## Overview
Pangolin provides automated health checking for [targets](./resources/targets.mdx) to ensure traffic is only routed to healthy services. Health checks are essential for building highly available services, as they automatically remove unhealthy targets from traffic routing and load balancing.
## How Health Checks Work
### Monitoring Process
Health checks operate continuously in the background:
1. **Periodic Checks**: Pangolin sends requests to your target endpoints at configured intervals
2. **Status Evaluation**: Responses are evaluated against your configured criteria
3. **Traffic Management**: Healthy targets receive traffic, unhealthy targets are excluded
4. **Automatic Recovery**: Targets are automatically re-enabled when they become healthy again
### Health Check vs Target Endpoint
<Card title="Flexible Monitoring">
The health check endpoint can be the same as your target, but you can also monitor a different endpoint. This allows you to create dedicated health check endpoints that provide more detailed service status information.
</Card>
## Target Health States
Targets can exist in three distinct states that determine how traffic is routed:
<CardGroup cols={3}>
<Card title="Unknown" icon="question" color="#gray">
**Initial State**: Targets start in this state before first health check
**Traffic Behavior**: Unknown targets still route traffic normally
**Duration**: Until first health check completes
</Card>
<Card title="Unhealthy" icon="x" color="#red">
**Failed Checks**: Target has failed health check criteria
**Traffic Behavior**: No traffic is routed to unhealthy targets
**Load Balancing**: Excluded from load balancing rotation
</Card>
<Card title="Healthy" icon="check" color="#green">
**Passing Checks**: Target is responding correctly to health checks
**Traffic Behavior**: Receives traffic according to load balancing rules
**Load Balancing**: Included in load balancing rotation
</Card>
</CardGroup>
## Configuring Health Checks
<Steps>
<Step title="Access Target Settings">
In the Pangolin dashboard, navigate to your resource and locate the target in the table.
</Step>
<Step title="Open Health Check Configuration">
Click the settings wheel (⚙️) next to the health check endpoint column.
</Step>
<Step title="Configure Health Check Parameters">
Fill out the health check configuration with your desired parameters.
</Step>
<Step title="Save Configuration">
Save your settings to enable health checking for the target.
</Step>
</Steps>
## Health Check Parameters
### Endpoint Configuration
<Card title="Health Check Target">
**Target Endpoint**: The URL or address to monitor for health status
**Default Behavior**: Usually the same as your target endpoint
**Custom Endpoints**: Can monitor different endpoints (e.g., `/health`, `/status`)
</Card>
### Timing Configuration
<CardGroup cols={2}>
<Card title="Healthy Interval">
**Purpose**: How often to check targets that are currently healthy
**Typical Range**: 30-60 seconds
**Consideration**: Less frequent checks reduce overhead
</Card>
<Card title="Unhealthy Interval">
**Purpose**: How often to check targets that are currently unhealthy
**Typical Range**: 10-30 seconds
**Consideration**: More frequent checks enable faster recovery
</Card>
</CardGroup>
### Response Configuration
<Card title="Timeout Settings">
**Request Timeout**: Maximum time to wait for a health check response
**Default Behavior**: Requests exceeding timeout are considered failed
**Recommended**: Set based on your service's typical response time
</Card>
<Card title="HTTP Response Codes">
**Healthy Codes**: Which HTTP status codes indicate a healthy target
**Common Settings**: 200, 201, 202, 204
**Custom Codes**: Configure based on your service's health endpoint behavior
</Card>
## Failover Behavior
### Automatic Traffic Exclusion
When a target becomes unhealthy:
<Steps>
<Step title="Health Check Failure">
Target fails to meet health check criteria (response code, timeout, etc.)
</Step>
<Step title="Status Update">
Target status changes from "Healthy" to "Unhealthy"
</Step>
<Step title="Traffic Removal">
Target is immediately removed from traffic routing configuration
</Step>
<Step title="Load Balancer Update">
Load balancing configuration is updated to exclude the unhealthy target
</Step>
<Step title="Continued Monitoring">
Health checks continue at the unhealthy interval for recovery detection
</Step>
</Steps>
### Automatic Recovery
When an unhealthy target recovers:
<Steps>
<Step title="Successful Health Check">
Target begins responding correctly to health checks
</Step>
<Step title="Status Update">
Target status changes from "Unhealthy" to "Healthy"
</Step>
<Step title="Traffic Restoration">
Target is automatically added back to traffic routing
</Step>
<Step title="Load Balancer Update">
Load balancing resumes including the recovered target
</Step>
</Steps>
## High Availability Strategies
### Multi-Target Redundancy
<Card title="Service Redundancy">
Deploy multiple instances of your service across different targets to ensure availability even when some targets fail.
</Card>
```
Resource: web-application
├── Target 1: web-01.local:8080 (Site A) - Healthy ✅
├── Target 2: web-02.local:8080 (Site A) - Unhealthy ❌
└── Target 3: web-03.local:8080 (Site B) - Healthy ✅
Traffic routes to: Target 1 & Target 3 only
```
### Cross-Site Failover
<Card title="Geographic Distribution">
Distribute targets across multiple sites to protect against site-level failures.
</Card>
```
Resource: api-service
├── Primary Site Targets
│ ├── api-01.primary:8443 - Healthy ✅
│ └── api-02.primary:8443 - Healthy ✅
└── Backup Site Targets
├── api-01.backup:8443 - Healthy ✅
└── api-02.backup:8443 - Healthy ✅
All targets receive traffic via load balancing
```