Skip to content
Snippets Groups Projects
Select Git revision
  • 9cca35d42eb61b69e108a17215756c46173a5e6f
  • vme-testing default
  • ci-test
  • master
  • remoteproc
  • am625-sk-ov5640
  • pcal6534-upstreaming
  • lps22df-upstreaming
  • msc-upstreaming
  • imx8mp
  • iio/noa1305
  • vme-next
  • vme-next-4.14-rc4
  • v4.14-rc4
  • v4.14-rc3
  • v4.14-rc2
  • v4.14-rc1
  • v4.13
  • vme-next-4.13-rc7
  • v4.13-rc7
  • v4.13-rc6
  • v4.13-rc5
  • v4.13-rc4
  • v4.13-rc3
  • v4.13-rc2
  • v4.13-rc1
  • v4.12
  • v4.12-rc7
  • v4.12-rc6
  • v4.12-rc5
  • v4.12-rc4
  • v4.12-rc3
32 results

page_alloc.c

Blame
    • Mel Gorman's avatar
      9cca35d4
      mm, page_alloc: enable/disable IRQs once when freeing a list of pages · 9cca35d4
      Mel Gorman authored
      Patch series "Follow-up for speed up page cache truncation", v2.
      
      This series is a follow-on for Jan Kara's series "Speed up page cache
      truncation" series.  We both ended up looking at the same problem but
      saw different problems based on the same data.  This series builds upon
      his work.
      
      A variety of workloads were compared on four separate machines but each
      machine showed gains albeit at different levels.  Minimally, some of the
      differences are due to NUMA where truncating data from a remote node is
      slower than a local node.  The workloads checked were
      
      o sparse truncate microbenchmark, tiny
      o sparse truncate microbenchmark, large
      o reaim-io disk workfile
      o dbench4 (modified by mmtests to produce more stable results)
      o filebench varmail configuration for small memory size
      o bonnie, directory operations, working set size 2*RAM
      
      reaim-io, dbench and filebench all showed minor gains.  Truncation does
      not dominate those workloads but were tested to ensure no other
      regressions.  They will not be reported further.
      
      The sparse truncate microbench was written by Jan.  It creates a number
      of files and then times how long it takes to truncate each one.  The
      "tiny" configuraiton creates a number of files that easily fits in
      memory and times how long it takes to truncate files with page cache.
      The large configuration uses enough files to have data that is twice the
      size of memory and so timings there include truncating page cache and
      working set shadow entries in the radix tree.
      
      Patches 1-4 are the most relevant parts of this series.  Patches 5-8 are
      optional as they are deleting code that is essentially useless but has a
      negligible performance impact.
      
      The changelogs have more information on performance but just for bonnie
      delete options, the main comparison is
      
      bonnie
                                            4.14.0-rc5             4.14.0-rc5             4.14.0-rc5
                                                jan-v2                vanilla                 mel-v2
      Hmean     SeqCreate ops         76.20 (   0.00%)       75.80 (  -0.53%)       76.80 (   0.79%)
      Hmean     SeqCreate read        85.00 (   0.00%)       85.00 (   0.00%)       85.00 (   0.00%)
      Hmean     SeqCreate del      13752.31 (   0.00%)    12090.23 ( -12.09%)    15304.84 (  11.29%)
      Hmean     RandCreate ops        76.00 (   0.00%)       75.60 (  -0.53%)       77.00 (   1.32%)
      Hmean     RandCreate read       96.80 (   0.00%)       96.80 (   0.00%)       97.00 (   0.21%)
      Hmean     RandCreate del     13233.75 (   0.00%)    11525.35 ( -12.91%)    14446.61 (   9.16%)
      
      Jan's series is the baseline and the vanilla kernel is 12% slower where
      as this series on top gains another 11%.  This is from a different
      machine than the data in the changelogs but the detailed data was not
      collected as there was no substantial change in v2.
      
      This patch (of 8):
      
      Freeing a list of pages current enables/disables IRQs for each page
      freed.  This patch splits freeing a list of pages into two operations --
      preparing the pages for freeing and the actual freeing.  This is a
      tradeoff - we're taking two passes of the list to free in exchange for
      avoiding multiple enable/disable of IRQs.
      
      sparsetruncate (tiny)
                                    4.14.0-rc4             4.14.0-rc4
                                 janbatch-v1r1            oneirq-v1r1
      Min          Time      149.00 (   0.00%)      141.00 (   5.37%)
      1st-qrtle    Time      150.00 (   0.00%)      142.00 (   5.33%)
      2nd-qrtle    Time      151.00 (   0.00%)      142.00 (   5.96%)
      3rd-qrtle    Time      151.00 (   0.00%)      143.00 (   5.30%)
      Max-90%      Time      153.00 (   0.00%)      144.00 (   5.88%)
      Max-95%      Time      155.00 (   0.00%)      147.00 (   5.16%)
      Max-99%      Time      201.00 (   0.00%)      195.00 (   2.99%)
      Max          Time      236.00 (   0.00%)      230.00 (   2.54%)
      Amean        Time      152.65 (   0.00%)      144.37 (   5.43%)
      Stddev       Time        9.78 (   0.00%)       10.44 (  -6.72%)
      Coeff        Time        6.41 (   0.00%)        7.23 ( -12.84%)
      Best99%Amean Time      152.07 (   0.00%)      143.72 (   5.50%)
      Best95%Amean Time      150.75 (   0.00%)      142.37 (   5.56%)
      Best90%Amean Time      150.59 (   0.00%)      142.19 (   5.58%)
      Best75%Amean Time      150.36 (   0.00%)      141.92 (   5.61%)
      Best50%Amean Time      150.04 (   0.00%)      141.69 (   5.56%)
      Best25%Amean Time      149.85 (   0.00%)      141.38 (   5.65%)
      
      With a tiny number of files, each file truncated has resident page cache
      and it shows that time to truncate is roughtly 5-6% with some minor
      jitter.
      
                                            4.14.0-rc4             4.14.0-rc4
                                         janbatch-v1r1            oneirq-v1r1
      Hmean     SeqCreate ops         65.27 (   0.00%)       81.86 (  25.43%)
      Hmean     SeqCreate read        39.48 (   0.00%)       47.44 (  20.16%)
      Hmean     SeqCreate del      24963.95 (   0.00%)    26319.99 (   5.43%)
      Hmean     RandCreate ops        65.47 (   0.00%)       82.01 (  25.26%)
      Hmean     RandCreate read       42.04 (   0.00%)       51.75 (  23.09%)
      Hmean     RandCreate del     23377.66 (   0.00%)    23764.79 (   1.66%)
      
      As expected, there is a small gain for the delete operation.
      
      [mgorman@techsingularity.net: use page_private and set_page_private helpers]
        Link: http://lkml.kernel.org/r/20171018101547.mjycw7zreb66jzpa@techsingularity.net
      Link: http://lkml.kernel.org/r/20171018075952.10627-2-mgorman@techsingularity.net
      
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9cca35d4
      History
      mm, page_alloc: enable/disable IRQs once when freeing a list of pages
      Mel Gorman authored
      Patch series "Follow-up for speed up page cache truncation", v2.
      
      This series is a follow-on for Jan Kara's series "Speed up page cache
      truncation" series.  We both ended up looking at the same problem but
      saw different problems based on the same data.  This series builds upon
      his work.
      
      A variety of workloads were compared on four separate machines but each
      machine showed gains albeit at different levels.  Minimally, some of the
      differences are due to NUMA where truncating data from a remote node is
      slower than a local node.  The workloads checked were
      
      o sparse truncate microbenchmark, tiny
      o sparse truncate microbenchmark, large
      o reaim-io disk workfile
      o dbench4 (modified by mmtests to produce more stable results)
      o filebench varmail configuration for small memory size
      o bonnie, directory operations, working set size 2*RAM
      
      reaim-io, dbench and filebench all showed minor gains.  Truncation does
      not dominate those workloads but were tested to ensure no other
      regressions.  They will not be reported further.
      
      The sparse truncate microbench was written by Jan.  It creates a number
      of files and then times how long it takes to truncate each one.  The
      "tiny" configuraiton creates a number of files that easily fits in
      memory and times how long it takes to truncate files with page cache.
      The large configuration uses enough files to have data that is twice the
      size of memory and so timings there include truncating page cache and
      working set shadow entries in the radix tree.
      
      Patches 1-4 are the most relevant parts of this series.  Patches 5-8 are
      optional as they are deleting code that is essentially useless but has a
      negligible performance impact.
      
      The changelogs have more information on performance but just for bonnie
      delete options, the main comparison is
      
      bonnie
                                            4.14.0-rc5             4.14.0-rc5             4.14.0-rc5
                                                jan-v2                vanilla                 mel-v2
      Hmean     SeqCreate ops         76.20 (   0.00%)       75.80 (  -0.53%)       76.80 (   0.79%)
      Hmean     SeqCreate read        85.00 (   0.00%)       85.00 (   0.00%)       85.00 (   0.00%)
      Hmean     SeqCreate del      13752.31 (   0.00%)    12090.23 ( -12.09%)    15304.84 (  11.29%)
      Hmean     RandCreate ops        76.00 (   0.00%)       75.60 (  -0.53%)       77.00 (   1.32%)
      Hmean     RandCreate read       96.80 (   0.00%)       96.80 (   0.00%)       97.00 (   0.21%)
      Hmean     RandCreate del     13233.75 (   0.00%)    11525.35 ( -12.91%)    14446.61 (   9.16%)
      
      Jan's series is the baseline and the vanilla kernel is 12% slower where
      as this series on top gains another 11%.  This is from a different
      machine than the data in the changelogs but the detailed data was not
      collected as there was no substantial change in v2.
      
      This patch (of 8):
      
      Freeing a list of pages current enables/disables IRQs for each page
      freed.  This patch splits freeing a list of pages into two operations --
      preparing the pages for freeing and the actual freeing.  This is a
      tradeoff - we're taking two passes of the list to free in exchange for
      avoiding multiple enable/disable of IRQs.
      
      sparsetruncate (tiny)
                                    4.14.0-rc4             4.14.0-rc4
                                 janbatch-v1r1            oneirq-v1r1
      Min          Time      149.00 (   0.00%)      141.00 (   5.37%)
      1st-qrtle    Time      150.00 (   0.00%)      142.00 (   5.33%)
      2nd-qrtle    Time      151.00 (   0.00%)      142.00 (   5.96%)
      3rd-qrtle    Time      151.00 (   0.00%)      143.00 (   5.30%)
      Max-90%      Time      153.00 (   0.00%)      144.00 (   5.88%)
      Max-95%      Time      155.00 (   0.00%)      147.00 (   5.16%)
      Max-99%      Time      201.00 (   0.00%)      195.00 (   2.99%)
      Max          Time      236.00 (   0.00%)      230.00 (   2.54%)
      Amean        Time      152.65 (   0.00%)      144.37 (   5.43%)
      Stddev       Time        9.78 (   0.00%)       10.44 (  -6.72%)
      Coeff        Time        6.41 (   0.00%)        7.23 ( -12.84%)
      Best99%Amean Time      152.07 (   0.00%)      143.72 (   5.50%)
      Best95%Amean Time      150.75 (   0.00%)      142.37 (   5.56%)
      Best90%Amean Time      150.59 (   0.00%)      142.19 (   5.58%)
      Best75%Amean Time      150.36 (   0.00%)      141.92 (   5.61%)
      Best50%Amean Time      150.04 (   0.00%)      141.69 (   5.56%)
      Best25%Amean Time      149.85 (   0.00%)      141.38 (   5.65%)
      
      With a tiny number of files, each file truncated has resident page cache
      and it shows that time to truncate is roughtly 5-6% with some minor
      jitter.
      
                                            4.14.0-rc4             4.14.0-rc4
                                         janbatch-v1r1            oneirq-v1r1
      Hmean     SeqCreate ops         65.27 (   0.00%)       81.86 (  25.43%)
      Hmean     SeqCreate read        39.48 (   0.00%)       47.44 (  20.16%)
      Hmean     SeqCreate del      24963.95 (   0.00%)    26319.99 (   5.43%)
      Hmean     RandCreate ops        65.47 (   0.00%)       82.01 (  25.26%)
      Hmean     RandCreate read       42.04 (   0.00%)       51.75 (  23.09%)
      Hmean     RandCreate del     23377.66 (   0.00%)    23764.79 (   1.66%)
      
      As expected, there is a small gain for the delete operation.
      
      [mgorman@techsingularity.net: use page_private and set_page_private helpers]
        Link: http://lkml.kernel.org/r/20171018101547.mjycw7zreb66jzpa@techsingularity.net
      Link: http://lkml.kernel.org/r/20171018075952.10627-2-mgorman@techsingularity.net
      
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    async-thread.c 17.84 KiB