Skip to content
Snippets Groups Projects
  1. May 29, 2016
  2. May 04, 2016
  3. Apr 23, 2016
  4. Apr 19, 2016
    • Laxman Dewangan's avatar
      iio: core: Add devm_ APIs for iio_channel_{get,release}_all · efc2c013
      Laxman Dewangan authored
      
      Some of kernel driver uses the IIO framework to get the sensor
      value via ADC or IIO HW driver. The client driver get iio channel
      by iio_channel_get_all() and release it by calling
      iio_channel_release_all().
      
      Add resource managed version (devm_*) of these APIs so that if client
      calls the devm_iio_channel_get_all() then it need not to release it
      explicitly, it can be done by managed device framework when driver
      get un-binded.
      
      This reduces the code in error path and also need of .remove callback in
      some cases.
      
      Signed-off-by: default avatarLaxman Dewangan <ldewangan@nvidia.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      efc2c013
    • Laxman Dewangan's avatar
      iio: core: Add devm_ APIs for iio_channel_{get,release} · 8bf872d8
      Laxman Dewangan authored
      
      Some of kernel driver uses the IIO framework to get the sensor
      value via ADC or IIO HW driver. The client driver get iio channel
      by iio_channel_get() and release it by calling iio_channel_release().
      
      Add resource managed version (devm_*) of these APIs so that if client
      calls the devm_iio_channel_get() then it need not to release it explicitly,
      it can be done by managed device framework when driver get un-binded.
      
      This reduces the code in error path and also need of .remove callback in
      some cases.
      
      Signed-off-by: default avatarLaxman Dewangan <ldewangan@nvidia.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      8bf872d8
    • Linus Walleij's avatar
      iio: st_sensors: support open drain mode · 0e6f6871
      Linus Walleij authored
      
      Some types of ST Sensors can be connected to the same IRQ line
      as other peripherals using open drain. Add a device tree binding
      and a sensor data property to flip the right bit in the interrupt
      control register to enable open drain mode on the INT line.
      
      If the line is set to be open drain, also tag on IRQF_SHARED
      to the IRQ flags when requesting the interrupt, as the whole
      point of using open drain interrupt lines is to share them with
      more than one peripheral (wire-or).
      
      Cc: devicetree@vger.kernel.org
      Cc: Giuseppe Barba <giuseppe.barba@st.com>
      Cc: Denis Ciocca <denis.ciocca@st.com>
      Acked-by: default avatarRob Herring <rob@kernel.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      0e6f6871
    • Linus Walleij's avatar
      iio: st_sensors: verify interrupt event to status · 97865fe4
      Linus Walleij authored
      
      This makes all ST sensor drivers check that they actually have
      new data available for the requested channel(s) before claiming
      an IRQ, by reading the status register (which is conveniently
      the same for all ST sensors) and check that the channel has new
      data before proceeding to read it and fill the buffer.
      
      This way sensors can share an interrupt line: it can be flaged
      as shared and then the sensor that did not fire will return
      NO_IRQ, and the sensor that fired will handle the IRQ and
      return IRQ_HANDLED.
      
      Cc: Giuseppe Barba <giuseppe.barba@st.com>
      Cc: Denis Ciocca <denis.ciocca@st.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      97865fe4
  5. Apr 16, 2016
  6. Apr 10, 2016
  7. Apr 04, 2016
    • Tetsuo Handa's avatar
      android,lowmemorykiller: Don't abuse TIF_MEMDIE. · 77ed2c57
      Tetsuo Handa authored
      
      Currently, lowmemorykiller (LMK) is using TIF_MEMDIE for two purposes.
      One is to remember processes killed by LMK, and the other is to
      accelerate termination of processes killed by LMK.
      
      But since LMK is invoked as a memory shrinker function, there still
      should be some memory available. It is very likely that memory
      allocations by processes killed by LMK will succeed without using
      ALLOC_NO_WATERMARKS via TIF_MEMDIE. Even if their allocations cannot
      escape from memory allocation loop unless they use ALLOC_NO_WATERMARKS,
      lowmem_deathpending_timeout can guarantee forward progress by choosing
      next victim process.
      
      On the other hand, mark_oom_victim() assumes that it must be called with
      oom_lock held and it must not be called after oom_killer_disable() was
      called. But LMK is calling it without holding oom_lock and checking
      oom_killer_disabled. It is possible that LMK calls mark_oom_victim()
      due to allocation requests by kernel threads after current thread
      returned from oom_killer_disabled(). This will break synchronization
      for PM/suspend.
      
      This patch introduces per a task_struct flag for remembering processes
      killed by LMK, and replaces TIF_MEMDIE with that flag. By applying this
      patch, assumption by mark_oom_victim() becomes true.
      
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Arve Hjonnevag <arve@android.com>
      Cc: Riley Andrews <riandrews@android.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77ed2c57
  8. Apr 03, 2016
  9. Mar 25, 2016
    • Alexander Potapenko's avatar
      mm, kasan: stackdepot implementation. Enable stackdepot for SLAB · cd11016e
      Alexander Potapenko authored
      
      Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
      will allow KASAN store allocation/deallocation stack traces for memory
      chunks.  The stack traces are stored in a hash table and referenced by
      handles which reside in the kasan_alloc_meta and kasan_free_meta
      structures in the allocated memory chunks.
      
      IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
      duplication.
      
      Right now stackdepot support is only enabled in SLAB allocator.  Once
      KASAN features in SLAB are on par with those in SLUB we can switch SLUB
      to stackdepot as well, thus removing the dependency on SLUB stack
      bookkeeping, which wastes a lot of memory.
      
      This patch is based on the "mm: kasan: stack depots" patch originally
      prepared by Dmitry Chernenkov.
      
      Joonsoo has said that he plans to reuse the stackdepot code for the
      mm/page_owner.c debugging facility.
      
      [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
      [aryabinin@virtuozzo.com: comment style fixes]
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd11016e
    • Alexander Potapenko's avatar
      arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections · be7635e7
      Alexander Potapenko authored
      
      KASAN needs to know whether the allocation happens in an IRQ handler.
      This lets us strip everything below the IRQ entry point to reduce the
      number of unique stack traces needed to be stored.
      
      Move the definition of __irq_entry to <linux/interrupt.h> so that the
      users don't need to pull in <linux/ftrace.h>.  Also introduce the
      __softirq_entry macro which is similar to __irq_entry, but puts the
      corresponding functions to the .softirqentry.text section.
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be7635e7
    • Alexander Potapenko's avatar
      mm, kasan: add GFP flags to KASAN API · 505f5dcb
      Alexander Potapenko authored
      
      Add GFP flags to KASAN hooks for future patches to use.
      
      This patch is based on the "mm: kasan: unified support for SLUB and SLAB
      allocators" patch originally prepared by Dmitry Chernenkov.
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      505f5dcb
    • Alexander Potapenko's avatar
      mm, kasan: SLAB support · 7ed2f9e6
      Alexander Potapenko authored
      
      Add KASAN hooks to SLAB allocator.
      
      This patch is based on the "mm: kasan: unified support for SLUB and SLAB
      allocators" patch originally prepared by Dmitry Chernenkov.
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ed2f9e6
    • Tetsuo Handa's avatar
      include/linux/oom.h: remove undefined oom_kills_count()/note_oom_kill() · aaf4fb71
      Tetsuo Handa authored
      
      A leftover from commit c32b3cbe ("oom, PM: make OOM detection in the
      freezer path raceless").
      
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aaf4fb71
    • Tetsuo Handa's avatar
      oom, oom_reaper: protect oom_reaper_list using simpler way · bb29902a
      Tetsuo Handa authored
      
      "oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task" tried
      to protect oom_reaper_list using MMF_OOM_KILLED flag.  But we can do it
      by simply checking tsk->oom_reaper_list != NULL.
      
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb29902a
    • Vladimir Davydov's avatar
      oom: make oom_reaper_list single linked · 29c696e1
      Vladimir Davydov authored
      
      Entries are only added/removed from oom_reaper_list at head so we can
      use a single linked list and hence save a word in task_struct.
      
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29c696e1
    • Michal Hocko's avatar
      oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task · 855b0183
      Michal Hocko authored
      
      Tetsuo has reported that oom_kill_allocating_task=1 will cause
      oom_reaper_list corruption because oom_kill_process doesn't follow
      standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
      the same task multiple times - e.g.  by sacrificing the same child
      multiple times.
      
      This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag
      which is set in oom_kill_process atomically and oom reaper is disabled
      if the flag was already set.
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      855b0183
    • Michal Hocko's avatar
      mm, oom_reaper: implement OOM victims queuing · 03049269
      Michal Hocko authored
      
      wake_oom_reaper has allowed only 1 oom victim to be queued.  The main
      reason for that was the simplicity as other solutions would require some
      way of queuing.  The current approach is racy and that was deemed
      sufficient as the oom_reaper is considered a best effort approach to
      help with oom handling when the OOM victim cannot terminate in a
      reasonable time.  The race could lead to missing an oom victim which can
      get stuck
      
      out_of_memory
        wake_oom_reaper
          cmpxchg // OK
          			oom_reaper
      			  oom_reap_task
      			    __oom_reap_task
      oom_victim terminates
      			      atomic_inc_not_zero // fail
      out_of_memory
        wake_oom_reaper
          cmpxchg // fails
      			  task_to_reap = NULL
      
      This race requires 2 OOM invocations in a short time period which is not
      very likely but certainly not impossible.  E.g.  the original victim
      might have not released a lot of memory for some reason.
      
      The situation would improve considerably if wake_oom_reaper used a more
      robust queuing.  This is what this patch implements.  This means adding
      oom_reaper_list list_head into task_struct (eat a hole before embeded
      thread_struct for that purpose) and a oom_reaper_lock spinlock for
      queuing synchronization.  wake_oom_reaper will then add the task on the
      queue and oom_reaper will dequeue it.
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03049269
    • Michal Hocko's avatar
      oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space · 36324a99
      Michal Hocko authored
      
      When oom_reaper manages to unmap all the eligible vmas there shouldn't
      be much of the freable memory held by the oom victim left anymore so it
      makes sense to clear the TIF_MEMDIE flag for the victim and allow the
      OOM killer to select another task.
      
      The lack of TIF_MEMDIE also means that the victim cannot access memory
      reserves anymore but that shouldn't be a problem because it would get
      the access again if it needs to allocate and hits the OOM killer again
      due to the fatal_signal_pending resp.  PF_EXITING check.  We can safely
      hide the task from the OOM killer because it is clearly not a good
      candidate anymore as everyhing reclaimable has been torn down already.
      
      This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
      and thus hold off further global OOM killer actions granted the oom
      reaper is able to take mmap_sem for the associated mm struct.  This is
      not guaranteed now but further steps should make sure that mmap_sem for
      write should be blocked killable which will help to reduce such a lock
      contention.  This is not done by this patch.
      
      Note that exit_oom_victim might be called on a remote task from
      __oom_reap_task now so we have to check and clear the flag atomically
      otherwise we might race and underflow oom_victims or wake up waiters too
      early.
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Suggested-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      36324a99
    • Michal Hocko's avatar
      mm, oom: introduce oom reaper · aac45363
      Michal Hocko authored
      
      This patch (of 5):
      
      This is based on the idea from Mel Gorman discussed during LSFMM 2015
      and independently brought up by Oleg Nesterov.
      
      The OOM killer currently allows to kill only a single task in a good
      hope that the task will terminate in a reasonable time and frees up its
      memory.  Such a task (oom victim) will get an access to memory reserves
      via mark_oom_victim to allow a forward progress should there be a need
      for additional memory during exit path.
      
      It has been shown (e.g.  by Tetsuo Handa) that it is not that hard to
      construct workloads which break the core assumption mentioned above and
      the OOM victim might take unbounded amount of time to exit because it
      might be blocked in the uninterruptible state waiting for an event (e.g.
      lock) which is blocked by another task looping in the page allocator.
      
      This patch reduces the probability of such a lockup by introducing a
      specialized kernel thread (oom_reaper) which tries to reclaim additional
      memory by preemptively reaping the anonymous or swapped out memory owned
      by the oom victim under an assumption that such a memory won't be needed
      when its owner is killed and kicked from the userspace anyway.  There is
      one notable exception to this, though, if the OOM victim was in the
      process of coredumping the result would be incomplete.  This is
      considered a reasonable constrain because the overall system health is
      more important than debugability of a particular application.
      
      A kernel thread has been chosen because we need a reliable way of
      invocation so workqueue context is not appropriate because all the
      workers might be busy (e.g.  allocating memory).  Kswapd which sounds
      like another good fit is not appropriate as well because it might get
      blocked on locks during reclaim as well.
      
      oom_reaper has to take mmap_sem on the target task for reading so the
      solution is not 100% because the semaphore might be held or blocked for
      write but the probability is reduced considerably wrt.  basically any
      lock blocking forward progress as described above.  In order to prevent
      from blocking on the lock without any forward progress we are using only
      a trylock and retry 10 times with a short sleep in between.  Users of
      mmap_sem which need it for write should be carefully reviewed to use
      _killable waiting as much as possible and reduce allocations requests
      done with the lock held to absolute minimum to reduce the risk even
      further.
      
      The API between oom killer and oom reaper is quite trivial.
      wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only
      NULL->mm transition and oom_reaper clear this atomically once it is done
      with the work.  This means that only a single mm_struct can be reaped at
      the time.  As the operation is potentially disruptive we are trying to
      limit it to the ncessary minimum and the reaper blocks any updates while
      it operates on an mm.  mm_struct is pinned by mm_count to allow parallel
      exit_mmap and a race is detected by atomic_inc_not_zero(mm_users).
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Suggested-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aac45363
    • Andrew Morton's avatar
      sched: add schedule_timeout_idle() · 69b27baf
      Andrew Morton authored
      
      This will be needed in the patch "mm, oom: introduce oom reaper".
      
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69b27baf
    • Yan, Zheng's avatar
      ceph: fix security xattr deadlock · 315f2408
      Yan, Zheng authored
      
      When security is enabled, security module can call filesystem's
      getxattr/setxattr callbacks during d_instantiate(). For cephfs,
      d_instantiate() is usually called by MDS' dispatch thread, while
      handling MDS reply. If the MDS reply does not include xattrs and
      corresponding caps, getxattr/setxattr need to send a new request
      to MDS and waits for the reply. This makes MDS' dispatch sleep,
      nobody handles later MDS replies.
      
      The fix is make sure lookup/atomic_open reply include xattrs and
      corresponding caps. So getxattr can be handled by cached xattrs.
      This requires some modification to both MDS and request message.
      (Client tells MDS what caps it wants; MDS encodes proper caps in
      the reply)
      
      Smack security module may call setxattr during d_instantiate().
      Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
      to us. So just make setxattr return error when called by MDS'
      dispatch thread.
      
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      315f2408
    • Yan, Zheng's avatar
      libceph: add helper that duplicates last extent operation · 2c63f49a
      Yan, Zheng authored
      
      This helper duplicates last extent operation in OSD request, then
      adjusts the new extent operation's offset and length. The helper
      is for scatterd page writeback, which adds nonconsecutive dirty
      pages to single OSD request.
      
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      2c63f49a
    • Ilya Dryomov's avatar
      libceph: enable large, variable-sized OSD requests · 3f1af42a
      Ilya Dryomov authored
      
      Turn r_ops into a flexible array member to enable large, consisting of
      up to 16 ops, OSD requests.  The use case is scattered writeback in
      cephfs and, as far as the kernel client is concerned, 16 is just a made
      up number.
      
      r_ops had size 3 for copyup+hint+write, but copyup is really a special
      case - it can only happen once.  ceph_osd_request_cache is therefore
      stuffed with num_ops=2 requests, anything bigger than that is allocated
      with kmalloc().  req_mempool is backed by ceph_osd_request_cache, which
      means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
      users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
      that.
      
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      3f1af42a
    • Yan, Zheng's avatar
      libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op · 7665d85b
      Yan, Zheng authored
      
      This avoids defining large array of r_reply_op_{len,result} in
      in struct ceph_osd_request.
      
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      7665d85b
    • Ilya Dryomov's avatar
      libceph: rename ceph_osd_req_op::payload_len to indata_len · de2aa102
      Ilya Dryomov authored
      
      Follow userspace nomenclature on this - the next commit adds
      outdata_len.
      
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      de2aa102
    • Ilya Dryomov's avatar
      libceph: monc hunt rate is 3s with backoff up to 30s · 168b9090
      Ilya Dryomov authored
      
      Unless we are in the process of setting up a client (i.e. connecting to
      the monitor cluster for the first time), apply a backoff: every time we
      want to reopen a session, increase our timeout by a multiple (currently
      2); when we complete the connection, reduce that multipler by 50%.
      
      Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2.
      
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      168b9090
    • Ilya Dryomov's avatar
      libceph: monc ping rate is 10s · 58d81b12
      Ilya Dryomov authored
      
      Split ping interval and ping timeout: ping interval is 10s; keepalive
      timeout is 30s.
      
      Make monc_ping_timeout a constant while at it - it's not actually
      exported as a mount option (and the rest of tick-related settings won't
      be either), so it's got no place in ceph_options.
      
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      58d81b12
    • Ilya Dryomov's avatar
      libceph: revamp subs code, switch to SUBSCRIBE2 protocol · 82dcabad
      Ilya Dryomov authored
      
      It is currently hard-coded in the mon_client that mdsmap and monmap
      subs are continuous, while osdmap sub is always "onetime".  To better
      handle full clusters/pools in the osd_client, we need to be able to
      issue continuous osdmap subs.  Revamp subs code to allow us to specify
      for each sub whether it should be continuous or not.
      
      Although not strictly required for the above, switch to SUBSCRIBE2
      protocol while at it, eliminating the ambiguity between a request for
      "every map since X" and a request for "just the latest" when we don't
      have a map yet (i.e. have epoch 0).  SUBSCRIBE2 feature bit is now
      required - it's been supported since pre-argonaut (2010).
      
      Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
      in before we validate the epoch and successfully install the new map
      can mess up mon_client sub state.
      
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      82dcabad
  10. Mar 24, 2016
  11. Mar 23, 2016
Loading