Skip to content
Snippets Groups Projects
  1. May 26, 2021
  2. May 21, 2021
  3. May 20, 2021
  4. May 19, 2021
  5. May 18, 2021
  6. May 17, 2021
    • Jan Kara's avatar
      quota: Disable quotactl_path syscall · 5b9fedb3
      Jan Kara authored
      In commit fa8b9007 ("quota: wire up quotactl_path") we have wired up
      new quotactl_path syscall. However some people in LWN discussion have
      objected that the path based syscall is missing dirfd and flags argument
      which is mostly standard for contemporary path based syscalls. Indeed
      they have a point and after a discussion with Christian Brauner and
      Sascha Hauer I've decided to disable the syscall for now and update its
      API. Since there is no userspace currently using that syscall and it
      hasn't been released in any major release, we should be fine.
      
      CC: Christian Brauner <christian.brauner@ubuntu.com>
      CC: Sascha Hauer <s.hauer@pengutronix.de>
      Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein
      
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      5b9fedb3
  7. May 15, 2021
    • Peter Zijlstra's avatar
      openrisc: Define memory barrier mb · 8b549c18
      Peter Zijlstra authored
      
      This came up in the discussion of the requirements of qspinlock on an
      architecture.  OpenRISC uses qspinlock, but it was noticed that the
      memmory barrier was not defined.
      
      Peter defined it in the mail thread writing:
      
          As near as I can tell this should do. The arch spec only lists
          this one instruction and the text makes it sound like a completion
          barrier.
      
      This is correct so applying this patch.
      
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      [shorne@gmail.com:Turned the mail into a patch]
      Signed-off-by: default avatarStafford Horne <shorne@gmail.com>
      8b549c18
  8. May 14, 2021
    • Catalin Marinas's avatar
      arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache() · 588a513d
      Catalin Marinas authored
      
      To ensure that instructions are observable in a new mapping, the arm64
      set_pte_at() implementation cleans the D-cache and invalidates the
      I-cache to the PoU. As an optimisation, this is only done on executable
      mappings and the PG_dcache_clean page flag is set to avoid future cache
      maintenance on the same page.
      
      When two different processes map the same page (e.g. private executable
      file or shared mapping) there's a potential race on checking and setting
      PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
      fault paths the page is locked (PG_locked), mprotect() does not take the
      page lock. The result is that one process may see the PG_dcache_clean
      flag set but the I/D cache maintenance not yet performed.
      
      Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()
      and set_bit(). In the rare event of a race, the cache maintenance is
      done twice.
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: <stable@vger.kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20210514095001.13236-1-catalin.marinas@arm.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      588a513d
    • Stefano Stabellini's avatar
      xen/swiotlb: check if the swiotlb has already been initialized · 97729b65
      Stefano Stabellini authored
      
      xen_swiotlb_init calls swiotlb_late_init_with_tbl, which fails with
      -ENOMEM if the swiotlb has already been initialized.
      
      Add an explicit check io_tlb_default_mem != NULL at the beginning of
      xen_swiotlb_init. If the swiotlb is already initialized print a warning
      and return -EEXIST.
      
      On x86, the error propagates.
      
      On ARM, we don't actually need a special swiotlb buffer (yet), any
      buffer would do. So ignore the error and continue.
      
      CC: boris.ostrovsky@oracle.com
      CC: jgross@suse.com
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@xilinx.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrvsky@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20210512201823.1963-3-sstabellini@kernel.org
      
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      97729b65
    • Christoph Hellwig's avatar
      arm64: do not set SWIOTLB_NO_FORCE when swiotlb is required · 687842ec
      Christoph Hellwig authored
      
      Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init,
      today dma_direct_map_page returns error if SWIOTLB_NO_FORCE.
      
      For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can
      do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it
      is going to be required later (e.g. Xen requires it).
      
      CC: boris.ostrovsky@oracle.com
      CC: jgross@suse.com
      CC: catalin.marinas@arm.com
      CC: will@kernel.org
      CC: linux-arm-kernel@lists.infradead.org
      Fixes: 2726bf3f ("swiotlb: Make SWIOTLB_NO_FORCE perform no allocation")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@xilinx.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Link: https://lore.kernel.org/r/20210512201823.1963-2-sstabellini@kernel.org
      
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      687842ec
    • Stefano Stabellini's avatar
      xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.h · cb6f6b33
      Stefano Stabellini authored
      
      Move xen_swiotlb_detect to a static inline function to make it available
      to !CONFIG_XEN builds.
      
      CC: boris.ostrovsky@oracle.com
      CC: jgross@suse.com
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@xilinx.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Link: https://lore.kernel.org/r/20210512201823.1963-1-sstabellini@kernel.org
      
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      cb6f6b33
    • Vitaly Kuznetsov's avatar
      clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86 · 3486d2c9
      Vitaly Kuznetsov authored
      Mohammed reports (https://bugzilla.kernel.org/show_bug.cgi?id=213029
      
      )
      the commit e4ab4658 ("clocksource/drivers/hyper-v: Handle vDSO
      differences inline") broke vDSO on x86. The problem appears to be that
      VDSO_CLOCKMODE_HVCLOCK is an enum value in 'enum vdso_clock_mode' and
      '#ifdef VDSO_CLOCKMODE_HVCLOCK' branch evaluates to false (it is not
      a define).
      
      Use a dedicated HAVE_VDSO_CLOCKMODE_HVCLOCK define instead.
      
      Fixes: e4ab4658 ("clocksource/drivers/hyper-v: Handle vDSO differences inline")
      Reported-by: default avatarMohammed Gamal <mgamal@redhat.com>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Link: https://lore.kernel.org/r/20210513073246.1715070-1-vkuznets@redhat.com
      3486d2c9
    • Nicholas Piggin's avatar
      powerpc/64e/interrupt: Fix nvgprs being clobbered · c6ac667b
      Nicholas Piggin authored
      
      Some interrupt handlers have an "extra" that saves 1 or 2
      registers (r14, r15) in the paca save area and makes them available to
      use by the handler.
      
      The change to always save nvgprs in exception handlers lead to some
      interrupt handlers saving those scratch r14 / r15 registers into the
      interrupt frame's GPR saves, which get restored on interrupt exit.
      
      Fix this by always reloading those scratch registers from paca before
      the EXCEPTION_COMMON that saves nvgprs.
      
      Fixes: 4228b2c3 ("powerpc/64e/interrupt: always save nvgprs on interrupt")
      Reported-by: default avatarChristian Zigotzky <chzigotzky@xenosoft.de>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarChristian Zigotzky <chzigotzky@xenosoft.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
      c6ac667b
    • Nicholas Piggin's avatar
      powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled · 4ec5feec
      Nicholas Piggin authored
      
      scv support introduced the notion of code that implicitly soft-masks
      irqs due to the instruction addresses. This is required because scv
      enters the kernel with MSR[EE]=1.
      
      If a NMI (including soft-NMI) interrupt hits when we are implicitly
      soft-masked then its regs->softe does not reflect this because it is
      derived from the explicit soft mask state (paca->irq_soft_mask). This
      makes arch_irq_disabled_regs(regs) return false.
      
      This can trigger a warning in the soft-NMI watchdog code (shown below).
      Fix it by having NMI interrupts set regs->softe to disabled in case of
      interrupting an implicit soft-masked region.
      
        ------------[ cut here ]------------
        WARNING: CPU: 41 PID: 1103 at arch/powerpc/kernel/watchdog.c:259 soft_nmi_interrupt+0x3e4/0x5f0
        CPU: 41 PID: 1103 Comm: (spawn) Not tainted
        NIP:  c000000000039534 LR: c000000000039234 CTR: c000000000009a00
        REGS: c000007fffbcf940 TRAP: 0700   Not tainted
        MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 22042482  XER: 200400ad
        CFAR: c000000000039260 IRQMASK: 3
        GPR00: c000000000039204 c000007fffbcfbe0 c000000001d6c300 0000000000000003
        GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020
        GPR08: 0000007ffd4e0000 0000000000000000 c000007ffffceb00 7265677368657265
        GPR12: 9000000000009033 c000007ffffceb00 00000f7075bf4480 000000000000002a
        GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000
        GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040
        GPR24: 0000000000000000 0000000000000029 c000000001dae058 0000000000000029
        GPR28: 0000000000000000 0000000000000800 0000000000000009 c000007fffbcfd60
        NIP [c000000000039534] soft_nmi_interrupt+0x3e4/0x5f0
        LR [c000000000039234] soft_nmi_interrupt+0xe4/0x5f0
        Call Trace:
        [c000007fffbcfbe0] [c000000000039204] soft_nmi_interrupt+0xb4/0x5f0 (unreliable)
        [c000007fffbcfcf0] [c00000000000c0e8] soft_nmi_common+0x138/0x1c4
        --- interrupt: 900 at end_real_trampolines+0x0/0x1000
        NIP:  c000000000003000 LR: 00007ca426adb03c CTR: 900000000280f033
        REGS: c000007fffbcfd60 TRAP: 0900
        MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44042482  XER: 200400ad
        CFAR: 00007ca426946020 IRQMASK: 0
        GPR00: 00000000000000ad 00007ffffa45d050 00007ca426b07f00 0000000000000035
        GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020
        GPR08: 0000000000000000 0000000000100000 0000000010000000 00007ffffa45d110
        GPR12: 0000000000000001 00007ca426d4e680 00000f7075bf4480 000000000000002a
        GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000
        GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040
        GPR24: 0000000000000000 00000f7057473f68 0000000000000003 000000000000041b
        GPR28: 00007ffffa45d4c4 0000000000000035 0000000000000000 00000f7057473f68
        NIP [c000000000003000] end_real_trampolines+0x0/0x1000
        LR [00007ca426adb03c] 0x7ca426adb03c
        --- interrupt: 900
        Instruction dump:
        60000000 60000000 60420000 38600001 482b3ae5 60000000 e93f0138 a36d0008
        7daa6b78 71290001 7f7907b4 4082fd34 <0fe00000> 4bfffd2c 60420000 ea6100a8
        ---[ end trace dc75f67d819779da ]---
      
      Fixes: 118178e6 ("powerpc: move NMI entry/exit code into wrapper")
      Reported-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210503111708.758261-1-npiggin@gmail.com
      4ec5feec
    • Michael Ellerman's avatar
      powerpc/64s: Fix stf mitigation patching w/strict RWX & hash · 5b48ba2f
      Michael Ellerman authored
      
      The stf entry barrier fallback is unsafe to execute in a semi-patched
      state, which can happen when enabling/disabling the mitigation with
      strict kernel RWX enabled and using the hash MMU.
      
      See the previous commit for more details.
      
      Fix it by changing the order in which we patch the instructions.
      
      Note the stf barrier fallback is only used on Power6 or earlier.
      
      Fixes: bd573a81 ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210513140800.1391706-2-mpe@ellerman.id.au
      5b48ba2f
    • Michael Ellerman's avatar
      powerpc/64s: Fix entry flush patching w/strict RWX & hash · 49b39ec2
      Michael Ellerman authored
      
      The entry flush mitigation can be enabled/disabled at runtime. When this
      happens it results in the kernel patching its own instructions to
      enable/disable the mitigation sequence.
      
      With strict kernel RWX enabled instruction patching happens via a
      secondary mapping of the kernel text, so that we don't have to make the
      primary mapping writable. With the hash MMU this leads to a hash fault,
      which causes us to execute the exception entry which contains the entry
      flush mitigation.
      
      This means we end up executing the entry flush in a semi-patched state,
      ie. after we have patched the first instruction but before we patch the
      second or third instruction of the sequence.
      
      On machines with updated firmware the entry flush is a series of special
      nops, and it's safe to to execute in a semi-patched state.
      
      However when using the fallback flush the sequence is mflr/branch/mtlr,
      and so it's not safe to execute if we have patched out the mflr but not
      the other two instructions. Doing so leads to us corrputing LR, leading
      to an oops, for example:
      
        # echo 0 > /sys/kernel/debug/powerpc/entry_flush
        kernel tried to execute exec-protected page (c000000002971000) - exploit attempt? (uid: 0)
        BUG: Unable to handle kernel instruction fetch
        Faulting instruction address: 0xc000000002971000
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1
        NIP:  c000000002971000 LR: c000000002971000 CTR: c000000000120c40
        REGS: c000000013243840 TRAP: 0400   Not tainted  (5.13.0-rc1-00010-gda3bb206c9ce)
        MSR:  8000000010009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48428482  XER: 00000000
        ...
        NIP  0xc000000002971000
        LR   0xc000000002971000
        Call Trace:
          do_patch_instruction+0xc4/0x340 (unreliable)
          do_entry_flush_fixups+0x100/0x3b0
          entry_flush_set+0x50/0xe0
          simple_attr_write+0x160/0x1a0
          full_proxy_write+0x8c/0x110
          vfs_write+0xf0/0x340
          ksys_write+0x84/0x140
          system_call_exception+0x164/0x2d0
          system_call_common+0xec/0x278
      
      The simplest fix is to change the order in which we patch the
      instructions, so that the sequence is always safe to execute. For the
      non-fallback flushes it doesn't matter what order we patch in.
      
      Fixes: bd573a81 ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210513140800.1391706-1-mpe@ellerman.id.au
      49b39ec2
    • Michael Ellerman's avatar
      powerpc/64s: Fix crashes when toggling entry flush barrier · aec86b05
      Michael Ellerman authored
      
      The entry flush mitigation can be enabled/disabled at runtime via a
      debugfs file (entry_flush), which causes the kernel to patch itself to
      enable/disable the relevant mitigations.
      
      However depending on which mitigation we're using, it may not be safe to
      do that patching while other CPUs are active. For example the following
      crash:
      
        sleeper[15639]: segfault (11) at c000000000004c20 nip c000000000004c20 lr c000000000004c20
      
      Shows that we returned to userspace with a corrupted LR that points into
      the kernel, due to executing the partially patched call to the fallback
      entry flush (ie. we missed the LR restore).
      
      Fix it by doing the patching under stop machine. The CPUs that aren't
      doing the patching will be spinning in the core of the stop machine
      logic. That is currently sufficient for our purposes, because none of
      the patching we do is to that code or anywhere in the vicinity.
      
      Fixes: f7964378 ("powerpc/64s: flush L1D on kernel entry")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210506044959.1298123-2-mpe@ellerman.id.au
      aec86b05
    • Michael Ellerman's avatar
      powerpc/64s: Fix crashes when toggling stf barrier · 8ec7791b
      Michael Ellerman authored
      
      The STF (store-to-load forwarding) barrier mitigation can be
      enabled/disabled at runtime via a debugfs file (stf_barrier), which
      causes the kernel to patch itself to enable/disable the relevant
      mitigations.
      
      However depending on which mitigation we're using, it may not be safe to
      do that patching while other CPUs are active. For example the following
      crash:
      
        User access of kernel address (c00000003fff5af0) - exploit attempt? (uid: 0)
        segfault (11) at c00000003fff5af0 nip 7fff8ad12198 lr 7fff8ad121f8 code 1
        code: 40820128 e93c00d0 e9290058 7c292840 40810058 38600000 4bfd9a81 e8410018
        code: 2c030006 41810154 3860ffb6 e9210098 <e94d8ff0> 7d295279 39400000 40820a3c
      
      Shows that we returned to userspace without restoring the user r13
      value, due to executing the partially patched STF exit code.
      
      Fix it by doing the patching under stop machine. The CPUs that aren't
      doing the patching will be spinning in the core of the stop machine
      logic. That is currently sufficient for our purposes, because none of
      the patching we do is to that code or anywhere in the vicinity.
      
      Fixes: a048a07d ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit")
      Cc: stable@vger.kernel.org # v4.17+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210506044959.1298123-1-mpe@ellerman.id.au
      8ec7791b
  9. May 13, 2021
  10. May 12, 2021
Loading