7.0 KiB
gpu collector
The gpu collector exposes metrics about GPU usage and memory consumption, both at the adapter (physical GPU) and per-process level.
| Metric name prefix | gpu |
| Data source | Perflib |
| Counters | GPU Engine, GPU Adapter, GPU Process |
| Enabled by default? | No |
Flags
None
Metrics
These metrics are available on supported versions of Windows with compatible GPUs and drivers:
Adapter-level Metrics
| Name | Description | Type | Labels |
|---|---|---|---|
windows_gpu_info |
A metric with a constant '1' value labeled with gpu device information. | gauge | bus_number,device_id,function_number,luid,name,phys |
windows_gpu_dedicated_system_memory_size_bytes |
The size, in bytes, of memory that is dedicated from system memory. | gauge | device_id,luid |
windows_gpu_dedicated_video_memory_size_bytes |
The size, in bytes, of memory that is dedicated from video memory. | gauge | device_id,luid |
windows_gpu_shared_system_memory_size_bytes |
The size, in bytes, of memory from system memory that can be shared by many users. | gauge | device_id,luid |
windows_gpu_adapter_memory_committed_bytes |
Total committed GPU memory in bytes per physical GPU | gauge | device_id,luid,phys |
windows_gpu_adapter_memory_dedicated_bytes |
Dedicated GPU memory usage in bytes per physical GPU | gauge | device_id,luid,phys |
windows_gpu_adapter_memory_shared_bytes |
Shared GPU memory usage in bytes per physical GPU | gauge | device_id,luid,phys |
windows_gpu_local_adapter_memory_bytes |
Local adapter memory usage in bytes per physical GPU | gauge | device_id,luid,phys,part |
windows_gpu_non_local_adapter_memory_bytes |
Non-local adapter memory usage in bytes per physical GPU | gauge | device_id,luid,phys,part |
Per-process Metrics
| Name | Description | Type | Labels |
|---|---|---|---|
windows_gpu_engine_time_seconds |
Total running time of the GPU engine in seconds | counter | device_id,luid,phys, eng, engtype, process_id |
windows_gpu_process_memory_committed_bytes |
Total committed GPU memory in bytes per process | gauge | device_id,luid,phys,process_id |
windows_gpu_process_memory_dedicated_bytes |
Dedicated GPU memory usage in bytes per process | gauge | device_id,luid,phys,process_id |
windows_gpu_process_memory_local_bytes |
Local GPU memory usage in bytes per process | gauge | device_id,luid,phys,process_id |
windows_gpu_process_memory_non_local_bytes |
Non-local GPU memory usage in bytes per process | gauge | device_id,luid,phys,process_id |
windows_gpu_process_memory_shared_bytes |
Shared GPU memory usage in bytes per process | gauge | device_id,luid,phys,process_id |
Metric Labels
luid,phys: Physical GPU index (e.g., "0")eng: GPU engine index (e.g., "0", "1", ...)engtype: GPU engine type (e.g., "3D", "Copy", "VideoDecode", etc.)process_id: Process ID
Example Metric
These are basic queries to help you get started with GPU monitoring on Windows using Prometheus.
Show GPU information for a specific physical GPU (0):
windows_gpu_info{bus_number="8",device_id="PCI\\VEN_10DE&DEV_1B81&SUBSYS_61733842&REV_A1",function_number="0",luid="0x00000000_0x00010F8A",name="NVIDIA GeForce GTX 1070",phys="0"} 1
Show total dedicated GPU memory (in bytes) usage on GPU 0:
windows_gpu_adapter_memory_dedicated_bytes{phys="0"}
Aggregate GPU utilization across all processes for a physical GPU (3D engine):
sum by (phys) (
rate(windows_gpu_engine_time_seconds{phys="0", engtype="3D"}[1m])
) * 100
Show GPU utilization for a specific process (3D engine):
sum by (phys, process_id) (
rate(windows_gpu_engine_time_seconds{process_id="1234", engtype="3D"}[1m])
) * 100
Show dedicated GPU memory per process:
windows_gpu_adapter_memory_dedicated_bytes
Useful Queries
Show top 5 processes by GPU utilization (all engines):
topk(5, sum by (process_id) (
rate(windows_gpu_engine_time_seconds[1m])
) * 100)
Show GPU memory usage per physical GPU:
sum by (phys) (
windows_gpu_adapter_memory_dedicated_bytes
)
Show GPU engine time with process owner and command line:
windows_gpu_engine_time_seconds * on(process_id) group_left(owner, cmdline) windows_process_info
Alerting Examples
prometheus.rules
# Alert on processes using more than 80% of a GPU's capacity over 10 minutes
- alert: HighGpuUtilization
expr: |
sum by (process_id) (
rate(windows_gpu_engine_time_seconds[1m])
) * 100 > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High GPU Utilization (process {{ $labels.process_id }})"
description: "Process is using more than 80% of GPU resources\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
Notes
- Per-process metrics allow you to identify which processes are consuming GPU resources.
- Adapter-level metrics provide an overview of total GPU memory usage.
- For overall GPU utilization, aggregate per-process metrics in Prometheus using queries such as
sum(). - The collector relies on Windows performance counters; ensure your system and drivers support these counters.
Enabling the Collector
To enable the GPU collector, add gpu to the list of enabled collectors in your windows_exporter configuration.
Example (command line):
windows_exporter.exe --collectors.enabled=gpu