Skip to content
Snippets Groups Projects
  1. Sep 11, 2017
    • Jens Axboe's avatar
      block: directly insert blk-mq request from blk_insert_cloned_request() · 157f377b
      Jens Axboe authored
      
      A NULL pointer crash was reported for the case of having the BFQ IO
      scheduler attached to the underlying blk-mq paths of a DM multipath
      device.  The crash occured in blk_mq_sched_insert_request()'s call to
      e->type->ops.mq.insert_requests().
      
      Paolo Valente correctly summarized why the crash occured with:
      "the call chain (dm_mq_queue_rq -> map_request -> setup_clone ->
      blk_rq_prep_clone) creates a cloned request without invoking
      e->type->ops.mq.prepare_request for the target elevator e.  The cloned
      request is therefore not initialized for the scheduler, but it is
      however inserted into the scheduler by blk_mq_sched_insert_request."
      
      All said, a request-based DM multipath device's IO scheduler should be
      the only one used -- when the original requests are issued to the
      underlying paths as cloned requests they are inserted directly in the
      underlying dispatch queue(s) rather than through an additional elevator.
      
      But commit bd166ef1 ("blk-mq-sched: add framework for MQ capable IO
      schedulers") switched blk_insert_cloned_request() from using
      blk_mq_insert_request() to blk_mq_sched_insert_request().  Which
      incorrectly added elevator machinery into a call chain that isn't
      supposed to have any.
      
      To fix this introduce a blk-mq private blk_mq_request_bypass_insert()
      that blk_insert_cloned_request() calls to insert the request without
      involving any elevator that may be attached to the cloned request's
      request_queue.
      
      Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarBart Van Assche <Bart.VanAssche@wdc.com>
      Tested-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      157f377b
    • Mikulas Patocka's avatar
      block: fix integer overflow in __blkdev_sectors_to_bio_pages() · 09c2c359
      Mikulas Patocka authored
      
      Fix possible integer overflow in __blkdev_sectors_to_bio_pages if
      sector_t is 32-bit.
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Fixes: 615d22a5 ("block: Fix __blkdev_issue_zeroout loop")
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      09c2c359
    • Scott Bauer's avatar
      block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled · dbec491b
      Scott Bauer authored
      
      Users who are booting off their Opal enabled drives are having
      issues when they have a shadow MBR set up after s3/resume cycle.
      When the Drive has a shadow MBR setup the MBRDone flag is set to
      false upon power loss (S3/S4/S5). When the MBRDone flag is false
      I/O to LBA 0 -> LBA_END_MBR are remapped to the shadow mbr
      of the drive. If the drive contains useful data in the 0 -> end_mbr
      range upon s3 resume the user can never get to that data as the
      drive will keep remapping it to the MBR. To fix this when we unlock
      on S3 resume, we need to tell the drive that we're done with the
      shadow mbr (even though we didnt use it) by setting true to MBRDone.
      This way the drive will stop the remapping and the user can access
      their data.
      
      Acked-by Jon Derrick: <jonathan.derrick@intel.com>
      Signed-off-by: default avatarScott Bauer <scott.bauer@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dbec491b
  2. Sep 09, 2017
  3. Sep 01, 2017
  4. Aug 31, 2017
    • Bart Van Assche's avatar
      compat_hdio_ioctl: Fix a declaration · 8363dae2
      Bart Van Assche authored
      
      This patch avoids that sparse reports the following warning messages:
      
      block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces)
      block/compat_ioctl.c:85:11:    expected unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:85:11:    got void [noderef] <asn:1>*
      block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces)
      block/compat_ioctl.c:91:21:    expected void const volatile [noderef] <asn:1>*<noident>
      block/compat_ioctl.c:91:21:    got unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:87:53: warning: dereference of noderef expression
      block/compat_ioctl.c:91:21: warning: dereference of noderef expression
      
      Fixes: commit d597580d ("generic ...copy_..._user primitives")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8363dae2
    • Paolo Valente's avatar
      block, bfq: guarantee update_next_in_service always returns an eligible entity · 24d90bb2
      Paolo Valente authored
      
      If the function bfq_update_next_in_service is invoked as a consequence
      of the activation or requeueing of an entity, say E, then it doesn't
      invoke bfq_lookup_next_entity to get the next-in-service entity. In
      contrast, it follows a shorter path: if E happens to be eligible (see
      commit "bfq-sq-mq: make lookup_next_entity push up vtime on
      expirations" for details on eligibility) and to have a lower virtual
      finish time than the current candidate as next-in-service entity, then
      E directly becomes the next-in-service entity. Unfortunately, there is
      a corner case for which this shorter path makes
      bfq_update_next_in_service choose a non eligible entity: it occurs if
      both E and the current next-in-service entity happen to be non
      eligible when bfq_update_next_in_service is invoked. In this case, E
      is not set as next-in-service, and, since bfq_lookup_next_entity is
      not invoked, the state of the parent entity is not updated so as to
      end up with an eligible entity as the proper next-in-service entity.
      
      In this respect, next-in-service is actually allowed to be non
      eligible while some queue is in service: since no system-virtual-time
      push-up can be performed in that case (see again commit "bfq-sq-mq:
      make lookup_next_entity push up vtime on expirations" for details),
      next-in-service is chosen, speculatively, as a function of the
      possible value that the system virtual time may get after a push
      up. But the correctness of the schedule breaks if next-in-service is
      still a non eligible entity when it is time to set in service the next
      entity. Unfortunately, this may happen in the above corner case.
      
      This commit fixes this problem by making bfq_update_next_in_service
      invoke bfq_lookup_next_entity not only if the above shorter path
      cannot be taken, but also if the shorter path is taken but fails to
      yield an eligible next-in-service entity.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      24d90bb2
    • Paolo Valente's avatar
      block, bfq: remove direct switch to an entity in higher class · a02195ce
      Paolo Valente authored
      
      If the function bfq_update_next_in_service is invoked as a consequence
      of the activation or requeueing of an entity, say E, and finds out
      that E belongs to a higher-priority class than that of the current
      next-in-service entity, then it sets next_in_service directly to
      E. But this may lead to anomalous schedules, because E may happen not
      be eligible for service, because its virtual start time is higher than
      the system virtual time for its service tree.
      
      This commit addresses this issue by simply removing this direct
      switch.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a02195ce
    • Paolo Valente's avatar
      block, bfq: make lookup_next_entity push up vtime on expirations · 80294c3b
      Paolo Valente authored
      
      To provide a very smooth service, bfq starts to serve a bfq_queue
      only if the queue is 'eligible', i.e., if the same queue would
      have started to be served in the ideal, perfectly fair system that
      bfq simulates internally. This is obtained by associating each
      queue with a virtual start time, and by computing a special system
      virtual time quantity: a queue is eligible only if the system
      virtual time has reached the virtual start time of the
      queue. Finally, bfq guarantees that, when a new queue must be set
      in service, there is always at least one eligible entity for each
      active parent entity in the scheduler. To provide this guarantee,
      the function __bfq_lookup_next_entity pushes up, for each parent
      entity on which it is invoked, the system virtual time to the
      minimum among the virtual start times of the entities in the
      active tree for the parent entity (more precisely, the push up
      occurs if the system virtual time happens to be lower than all
      such virtual start times).
      
      There is however a circumstance in which __bfq_lookup_next_entity
      cannot push up the system virtual time for a parent entity, even
      if the system virtual time is lower than the virtual start times
      of all the child entities in the active tree. It happens if one of
      the child entities is in service. In fact, in such a case, there
      is already an eligible entity, the in-service one, even if it may
      not be not present in the active tree (because in-service entities
      may be removed from the active tree).
      
      Unfortunately, in the last re-design of the
      hierarchical-scheduling engine, the reset of the pointer to the
      in-service entity for a given parent entity--reset to be done as a
      consequence of the expiration of the in-service entity--always
      happens after the function __bfq_lookup_next_entity has been
      invoked. This causes the function to think that there is still an
      entity in service for the parent entity, and then that the system
      virtual time cannot be pushed up, even if actually such a
      no-more-in-service entity has already been properly reinserted
      into the active tree (or in some other tree if no more
      active). Yet, the system virtual time *had* to be pushed up, to be
      ready to correctly choose the next queue to serve. Because of the
      lack of this push up, bfq may wrongly set in service a queue that
      had been speculatively pre-computed as the possible
      next-in-service queue, but that would no more be the one to serve
      after the expiration and the reinsertion into the active trees of
      the previously in-service entities.
      
      This commit addresses this issue by making
      __bfq_lookup_next_entity properly push up the system virtual time
      if an expiration is occurring.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      80294c3b
  5. Aug 30, 2017
  6. Aug 29, 2017
  7. Aug 28, 2017
  8. Aug 25, 2017
  9. Aug 24, 2017
    • Bart Van Assche's avatar
      compat_hdio_ioctl: Fix a declaration · 6a934bb8
      Bart Van Assche authored
      
      This patch avoids that sparse reports the following warning messages:
      
      block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces)
      block/compat_ioctl.c:85:11:    expected unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:85:11:    got void [noderef] <asn:1>*
      block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces)
      block/compat_ioctl.c:91:21:    expected void const volatile [noderef] <asn:1>*<noident>
      block/compat_ioctl.c:91:21:    got unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:87:53: warning: dereference of noderef expression
      block/compat_ioctl.c:91:21: warning: dereference of noderef expression
      
      Fixes: commit d597580d ("generic ...copy_..._user primitives")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6a934bb8
    • weiping zhang's avatar
      block: remove blk_free_devt in add_partition · 47570848
      weiping zhang authored
      
      put_device(pdev) will call pdev->type->release finally, and blk_free_devt
      has been called in part_release(), so remove it.
      
      Signed-off-by: default avatarweiping zhang <zhangweiping@didichuxing.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      47570848
    • Benjamin Block's avatar
      bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer · 50b4d485
      Benjamin Block authored
      
      Since we split the scsi_request out of struct request bsg fails to
      provide a reply-buffer for the drivers. This was done via the pointer
      for sense-data, that is not preallocated anymore.
      
      Failing to allocate/assign it results in illegal dereferences because
      LLDs use this pointer unquestioned.
      
      An example panic on s390x, using the zFCP driver, looks like this (I had
      debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
      s390x):
      
      Unable to handle kernel pointer dereference in virtual kernel address space
      Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
      Fault in home space mode while using kernel ASCE.
      AS:0000000001590007 R3:0000000000000024
      Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in: <Long List>
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
      Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
      task: 0000000065cb0100 task.stack: 0000000065cb4000
      Krnl PSW : 0704e00180000000 000003ff801e4156 (zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
      Krnl GPRS: 0000000000000001 000000005fa9d0d0 000000005fa9d078 0000000000e16866
                 000003ff00000290 6b6b6b6b6b6b6b6b 0000000059f78f00 000000000000000f
                 00000000593a0958 00000000593a0958 0000000060d88800 000000005ddd4c38
                 0000000058b50100 07000000659cba08 000003ff801e8556 00000000659cb9a8
      Krnl Code: 000003ff801e4146: e31020500004        lg      %r1,80(%r2)
                 000003ff801e414c: 58402040           l       %r4,64(%r2)
                #000003ff801e4150: e35020200004       lg      %r5,32(%r2)
                >000003ff801e4156: 50405004           st      %r4,4(%r5)
                 000003ff801e415a: e54c50080000       mvhi    8(%r5),0
                 000003ff801e4160: e33010280012       lt      %r3,40(%r1)
                 000003ff801e4166: a718fffb           lhi     %r1,-5
                 000003ff801e416a: 1803               lr      %r0,%r3
      Call Trace:
      ([<000003ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
       [<000003ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
       [<000003ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
       [<00000000009b91b6>] qdio_kick_handler+0x2ae/0x2c8
       [<00000000009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
       [<00000000001684c2>] tasklet_action+0x15a/0x1d8
       [<0000000000bd28ec>] __do_softirq+0x3ec/0x848
       [<00000000001675a4>] irq_exit+0x74/0xf8
       [<000000000010dd6a>] do_IRQ+0xba/0xf0
       [<0000000000bd19e8>] io_int_handler+0x104/0x2d4
       [<00000000001033b6>] enabled_wait+0xb6/0x188
      ([<000000000010339e>] enabled_wait+0x9e/0x188)
       [<000000000010396a>] arch_cpu_idle+0x32/0x50
       [<0000000000bd0112>] default_idle_call+0x52/0x68
       [<00000000001cd0fa>] do_idle+0x102/0x188
       [<00000000001cd41e>] cpu_startup_entry+0x3e/0x48
       [<0000000000118c64>] smp_start_secondary+0x11c/0x130
       [<0000000000bd2016>] restart_int_handler+0x62/0x78
       [<0000000000000000>]           (null)
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
       [<000003ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]
      
      Kernel panic - not syncing: Fatal exception in interrupt
      
      This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
      time, including the allocation of a buffer for the reply-data.
      
      This means, struct bsg_job is not allocated separately anymore, but as part
      of struct request allocation - similar to struct scsi_cmd. Reflect this in
      the function names that used to handle creation/destruction of struct
      bsg_job.
      
      Reported-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Fixes: 82ed4db4 ("block: split scsi_request out of struct request")
      Cc: <stable@vger.kernel.org> #4.11+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      50b4d485
    • Milan Broz's avatar
      bio-integrity: Fix regression if profile verify_fn is NULL · 97e05463
      Milan Broz authored
      
      In dm-integrity target we register integrity profile that have
      both generate_fn and verify_fn callbacks set to NULL.
      
      This is used if dm-integrity is stacked under a dm-crypt device
      for authenticated encryption (integrity payload contains authentication
      tag and IV seed).
      
      In this case the verification is done through own crypto API
      processing inside dm-crypt; integrity profile is only holder
      of these data. (And memory is owned by dm-crypt as well.)
      
      After the commit (and previous changes)
        Commit 7c20f116
        Author: Christoph Hellwig <hch@lst.de>
        Date:   Mon Jul 3 16:58:43 2017 -0600
      
          bio-integrity: stop abusing bi_end_io
      
      we get this crash:
      
      : BUG: unable to handle kernel NULL pointer dereference at   (null)
      : IP:   (null)
      : *pde = 00000000
      ...
      :
      : Workqueue: kintegrityd bio_integrity_verify_fn
      : task: f48ae180 task.stack: f4b5c000
      : EIP:   (null)
      : EFLAGS: 00210286 CPU: 0
      : EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000
      : ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4
      :  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      : CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0
      : Call Trace:
      :  ? bio_integrity_process+0xe3/0x1e0
      :  bio_integrity_verify_fn+0xea/0x150
      :  process_one_work+0x1c7/0x5c0
      :  worker_thread+0x39/0x380
      :  kthread+0xd6/0x110
      :  ? process_one_work+0x5c0/0x5c0
      :  ? kthread_worker_fn+0x100/0x100
      :  ? kthread_worker_fn+0x100/0x100
      :  ret_from_fork+0x19/0x24
      : Code:  Bad EIP value.
      : EIP:   (null) SS:ESP: 0068:f4b5dea4
      : CR2: 0000000000000000
      
      Patch just skip the whole verify workqueue if verify_fn is set to NULL.
      
      Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      [hch: trivial whitespace fix]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      97e05463
  10. Aug 23, 2017
  11. Aug 18, 2017
Loading