Skip to content
  • Daniel Bristot de Oliveira's avatar
    jump_label: Batch updates if arch supports it · c2ba8a15
    Daniel Bristot de Oliveira authored
    
    
    If the architecture supports the batching of jump label updates, use it!
    
    An easy way to see the benefits of this patch is switching the
    schedstats on and off. For instance:
    
    -------------------------- %< ----------------------------
      #!/bin/sh
      while [ true ]; do
          sysctl -w kernel.sched_schedstats=1
          sleep 2
          sysctl -w kernel.sched_schedstats=0
          sleep 2
      done
    -------------------------- >% ----------------------------
    
    while watching the IPI count:
    
    -------------------------- %< ----------------------------
      # watch -n1 "cat /proc/interrupts | grep Function"
    -------------------------- >% ----------------------------
    
    With the current mode, it is possible to see +- 168 IPIs each 2 seconds,
    while with this patch the number of IPIs goes to 3 each 2 seconds.
    
    Regarding the performance impact of this patch set, I made two measurements:
    
        The time to update a key (the task that is causing the change)
        The time to run the int3 handler (the side effect on a thread that
                                          hits the code being changed)
    
    The schedstats static key was chosen as the key to being switched on and off.
    The reason being is that it is used in more than 56 places, in a hot path. The
    change in the schedstats static key will be done with the following command:
    
    while [ true ]; do
        sysctl -w kernel.sched_schedstats=1
        usleep 500000
        sysctl -w kernel.sched_schedstats=0
        usleep 500000
    done
    
    In this way, they key will be updated twice per second. To force the hit of the
    int3 handler, the system will also run a kernel compilation with two jobs per
    CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor
    @2.27GHz.
    
    Regarding the update part, on average, the regular kernel takes 57 ms to update
    the schedstats key, while the kernel with the batch updates takes just 1.4 ms
    on average. Although it seems to be too good to be true, it makes sense: the
    schedstats key is used in 56 places, so it was expected that it would take
    around 56 times to update the keys with the current implementation, as the
    IPIs are the most expensive part of the update.
    
    Regarding the int3 handler, the non-batch handler takes 45 ns on average, while
    the batch version takes around 180 ns. At first glance, it seems to be a high
    value. But it is not, considering that it is doing 56 updates, rather than one!
    It is taking four times more, only. This gain is possible because the patch
    uses a binary search in the vector: log2(56)=5.8. So, it was expected to have
    an overhead within four times.
    
    (voice of tv propaganda) But, that is not all! As the int3 handler keeps on for
    a shorter period (because the update part is on for a shorter time), the number
    of hits in the int3 handler decreased by 10%.
    
    The question then is: Is it worth paying the price of "135 ns" more in the int3
    handler?
    
    Considering that, in this test case, we are saving the handling of 53 IPIs,
    that takes more than these 135 ns, it seems to be a meager price to be paid.
    Moreover, the test case was forcing the hit of the int3, in practice, it
    does not take that often. While the IPI takes place on all CPUs, hitting
    the int3 handler or not!
    
    For instance, in an isolated CPU with a process running in user-space
    (nohz_full use-case), the chances of hitting the int3 handler is barely zero,
    while there is no way to avoid the IPIs. By bounding the IPIs, we are improving
    a lot this scenario.
    
    Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Chris von Recklinghausen <crecklin@redhat.com>
    Cc: Clark Williams <williams@redhat.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Jason Baron <jbaron@akamai.com>
    Cc: Jiri Kosina <jkosina@suse.cz>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Scott Wood <swood@redhat.com>
    Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lkml.kernel.org/r/acc891dbc2dbc9fd616dd680529a2337b1d1274c.1560325897.git.bristot@redhat.com
    
    
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    c2ba8a15