1. 02 May, 2012 1 commit
    • David Teigland's avatar
      dlm: fixes for nodir mode · 4875647a
      David Teigland authored
      The "nodir" mode (statically assign master nodes instead
      of using the resource directory) has always been highly
      experimental, and never seriously used.  This commit
      fixes a number of problems, making nodir much more usable.
      
      - Major change to recovery: recover all locks and restart
        all in-progress operations after recovery.  In some
        cases it's not possible to know which in-progess locks
        to recover, so recover all.  (Most require recovery
        in nodir mode anyway since rehashing changes most
        master nodes.)
      
      - Change the way nodir mode is enabled, from a command
        line mount arg passed through gfs2, into a sysfs
        file managed by dlm_controld, consistent with the
        other config settings.
      
      - Allow recovering MSTCPY locks on an rsb that has not
        yet been turned into a master copy.
      
      - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
        from a previous, aborted recovery cycle.  Base this
        on the local recovery status not being in the state
        where any nodes should be sending LOCK messages for the
        current recovery cycle.
      
      - Hold rsb lock around dlm_purge_mstcpy_locks() because it
        may run concurrently with dlm_recover_master_copy().
      
      - Maintain highbast on process-copy lkb's (in addition to
        the master as is usual), because the lkb can switch
        back and forth between being a master and being a
        process copy as the master node changes in recovery.
      
      - When recovering MSTCPY locks, flag rsb's that have
        non-empty convert or waiting queues for granting
        at the end of recovery.  (Rename flag from LOCKS_PURGED
        to RECOVER_GRANT and similar for the recovery function,
        because it's not only resources with purged locks
        that need grant a grant attempt.)
      
      - Replace a couple of unnecessary assertion panics with
        error messages.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      4875647a
  2. 26 Apr, 2012 1 commit
  3. 30 Nov, 2009 1 commit
    • David Teigland's avatar
      dlm: always use GFP_NOFS · 573c24c4
      David Teigland authored
      Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
      ls_allocation would be GFP_KERNEL for userland lockspaces
      and GFP_NOFS for file system lockspaces.
      
      It was discovered that any lockspaces on the system can
      affect all others by triggering memory reclaim in the
      file system which could in turn call back into the dlm
      to acquire locks, deadlocking dlm threads that were
      shared by all lockspaces, like dlm_recv.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      573c24c4
  4. 15 May, 2009 1 commit
    • David Teigland's avatar
      dlm: use more NOFS allocation · 748285cc
      David Teigland authored
      Change some GFP_KERNEL allocations to use either GFP_NOFS or
      ls_allocation (when available) which the fs sets to GFP_NOFS.
      The point is to prevent allocations from going back into the
      cluster fs in places where that might lead to deadlock.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      748285cc
  5. 04 Feb, 2008 1 commit
  6. 10 Oct, 2007 1 commit
    • David Teigland's avatar
      [DLM] block dlm_recv in recovery transition · c36258b5
      David Teigland authored
      Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
      threads while working in the dlm.  This allows dlm_recv activity to be
      suspended when the lockspace transitions to, from and between recovery
      cycles.
      
      The specific bug prompting this change is one where an in-progress
      recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
      processing a recovery message, the recovery cycle was aborted and
      dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
      on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
      suspending dlm_recv (taking write lock on the rwsem) before aborting the
      current recovery.
      
      The transitions to/from normal and recovery modes are simplified by using
      this new ability to block dlm_recv.  The switch from normal to recovery
      mode means dlm_recv goes from processing locking messages, to saving them
      for later, and vice versa.  Races are avoided by blocking dlm_recv when
      setting the flag that switches between modes.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      c36258b5
  7. 30 Nov, 2006 2 commits
    • David Teigland's avatar
      [DLM] fix add_requestqueue checking nodes list · 2896ee37
      David Teigland authored
      Requests that arrive after recovery has started are saved in the
      requestqueue and processed after recovery is done.  Some of these requests
      are purged during recovery if they are from nodes that have been removed.
      We move the purging of the requests (dlm_purge_requestqueue) to later in
      the recovery sequence which allows the routine saving requests
      (dlm_add_requestqueue) to avoid filtering out requests by nodeid since the
      same will be done by the purge.  The current code has add_requestqueue
      filtering by nodeid but doesn't hold any locks when accessing the list of
      current nodes.  This also means that we need to call the purge routine
      when the lockspace is being shut down since the add routine will not be
      rejecting requests itself any more.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      2896ee37
    • David Teigland's avatar
      [DLM] fix requestqueue race · d4400156
      David Teigland authored
      Red Hat BZ 211914
      
      There's a race between dlm_recoverd (1) enabling locking and (2) clearing
      out the requestqueue, and dlm_recvd (1) checking if locking is enabled and
      (2) adding a message to the requestqueue.  An order of recoverd(1),
      recvd(1), recvd(2), recoverd(2) will result in a message being left on the
      requestqueue.  The fix is to have dlm_recvd check if dlm_recoverd has
      enabled locking after taking the mutex for the requestqueue and if it has
      processing the message instead of queueing it.
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      d4400156
  8. 20 Jan, 2006 1 commit
  9. 18 Jan, 2006 1 commit