Skip to content

Monitor device idleness approximately

Edmund Smith requested to merge eds/lava-monitor:eds/idle-series into staging

Every time lava-monitor runs, we have the opportunity to find out from the lava API whether there is a job currently scheduled on each device. Over time, these point samples, though widely spaced, should give a reasonably accurate view of device utilization.

This PR adds a separate metric, lava_device_active, which is 1 when the device is active and 0 when idle. The PromQL query to obtain the utilization for the machines in a given rack is simply:

average_over_time(lava_device_active)[$__interval] and on (device) lava_worker{worker="lava-rack-cbg-1"}

Merge request reports