Skip to content
Snippets Groups Projects
  • Michele Di Giorgio's avatar
    d0b7306d
    thermal: fix race condition when updating cooling device · d0b7306d
    Michele Di Giorgio authored
    
    When multiple thermal zones are bound to the same cooling device, multiple
    kernel threads may want to update the cooling device state by calling
    thermal_cdev_update(). Having cdev not protected by a mutex can lead to a race
    condition. Consider the following situation with two kernel threads k1 and k2:
    
    	    Thread k1				Thread k2
                                        ||
                                        ||  call thermal_cdev_update()
                                        ||      ...
                                        ||      set_cur_state(cdev, target);
        call power_actor_set_power()    ||
            ...                         ||
            instance->target = state;   ||
            cdev->updated = false;      ||
                                        ||      cdev->updated = true;
                                        ||      // completes execution
        call thermal_cdev_update()      ||
            // cdev->updated == true    ||
            return;                     ||
                                        \/
                                        time
    
    k2 has already looped through the thermal instances looking for the deepest
    cooling device state and is preempted right before setting cdev->updated to
    true. Now, k1 runs, modifies the thermal instance state and sets cdev->updated
    to false. Then, k1 is preempted and k2 continues the execution by setting
    cdev->updated to true, therefore preventing k1 from performing the update.
    Notice that this is not an issue if k2 looks at the instance->target modified by
    k1 "after" it is assigned by k1. In fact, in this case the update will happen
    anyway and k1 can safely return immediately from thermal_cdev_update().
    
    This may lead to a situation where a thermal governor never updates the cooling
    device. For example, this is the case for the step_wise governor: when calling
    the function thermal_zone_trip_update(), the governor may always get a new state
    equal to the old one (which, however, wasn't notified to the cooling device) and
    will therefore skip the update.
    
    CC: Zhang Rui <rui.zhang@intel.com>
    CC: Eduardo Valentin <edubezval@gmail.com>
    CC: Peter Feuerer <peter@piie.net>
    Reported-by: default avatarToby Huang <toby.huang@arm.com>
    Signed-off-by: default avatarMichele Di Giorgio <michele.digiorgio@arm.com>
    Reviewed-by: default avatarJavi Merino <javi.merino@arm.com>
    Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
    d0b7306d
    History
    thermal: fix race condition when updating cooling device
    Michele Di Giorgio authored
    
    When multiple thermal zones are bound to the same cooling device, multiple
    kernel threads may want to update the cooling device state by calling
    thermal_cdev_update(). Having cdev not protected by a mutex can lead to a race
    condition. Consider the following situation with two kernel threads k1 and k2:
    
    	    Thread k1				Thread k2
                                        ||
                                        ||  call thermal_cdev_update()
                                        ||      ...
                                        ||      set_cur_state(cdev, target);
        call power_actor_set_power()    ||
            ...                         ||
            instance->target = state;   ||
            cdev->updated = false;      ||
                                        ||      cdev->updated = true;
                                        ||      // completes execution
        call thermal_cdev_update()      ||
            // cdev->updated == true    ||
            return;                     ||
                                        \/
                                        time
    
    k2 has already looped through the thermal instances looking for the deepest
    cooling device state and is preempted right before setting cdev->updated to
    true. Now, k1 runs, modifies the thermal instance state and sets cdev->updated
    to false. Then, k1 is preempted and k2 continues the execution by setting
    cdev->updated to true, therefore preventing k1 from performing the update.
    Notice that this is not an issue if k2 looks at the instance->target modified by
    k1 "after" it is assigned by k1. In fact, in this case the update will happen
    anyway and k1 can safely return immediately from thermal_cdev_update().
    
    This may lead to a situation where a thermal governor never updates the cooling
    device. For example, this is the case for the step_wise governor: when calling
    the function thermal_zone_trip_update(), the governor may always get a new state
    equal to the old one (which, however, wasn't notified to the cooling device) and
    will therefore skip the update.
    
    CC: Zhang Rui <rui.zhang@intel.com>
    CC: Eduardo Valentin <edubezval@gmail.com>
    CC: Peter Feuerer <peter@piie.net>
    Reported-by: default avatarToby Huang <toby.huang@arm.com>
    Signed-off-by: default avatarMichele Di Giorgio <michele.digiorgio@arm.com>
    Reviewed-by: default avatarJavi Merino <javi.merino@arm.com>
    Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>