Monitor device idleness approximately
Every time lava-monitor runs, we have the opportunity to find out from the lava API whether there is a job currently scheduled on each device. Over time, these point samples, though widely spaced, should give a reasonably accurate view of device utilization.
This PR adds a separate metric, lava_device_active
, which is 1 when the device is active and 0 when idle. The PromQL query to obtain the utilization for the machines in a given rack is simply:
average_over_time(lava_device_active)[$__interval] and on (device) lava_worker{worker="lava-rack-cbg-1"}