1. 08 Nov, 2017 1 commit
  2. 15 Aug, 2017 1 commit
  3. 03 Aug, 2017 1 commit
    • Lionel Landwerlin's avatar
      drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface · f89823c2
      Lionel Landwerlin authored
      
      
      The motivation behind this new interface is expose at runtime the
      creation of new OA configs which can be used as part of the i915 perf
      open interface. This will enable the kernel to learn new configs which
      may be experimental, or otherwise not part of the core set currently
      available through the i915 perf interface.
      
      v2: Drop DRM_ERROR for userspace errors (Matthew)
          Add padding to userspace structure (Matthew)
          s/guid/uuid/ (Matthew)
      
      v3: Use u32 instead of int to iterate through registers (Matthew)
      
      v4: Lock access to dynamic config list (Lionel)
      
      v5: by Matthew:
          Fix uninitialized error values
          Fix incorrect unwiding when opening perf stream
          Use kmalloc_array() to store register
          Use uuid_is_valid() to valid config uuids
          Declare ioctls as write only
          Check padding members are set to 0
          by Lionel:
          Return ENOENT rather than EINVAL when trying to remove non
          existing config
      
      v6: by Chris:
          Use ref counts for OA configs
          Store UUID in drm_i915_perf_oa_config rather then using pointer
          Shuffle fields of drm_i915_perf_oa_config to avoid padding
      
      v7: by Chris
          Rename uapi pointers fields to end with '_ptr'
      
      v8: by Andrzej, Marek, Sebastian
          Update register whitelisting
          by Lionel
          Add more register names for documentation
          Allow configuration programming in non-paranoid mode
          Add support for value filter for a couple of registers already
          programmed in other part of the kernel
      
      v9: Documentation fix (Lionel)
          Allow writing WAIT_FOR_RC6_EXIT only on Gen8+ (Andrzej)
      
      v10: Perform read access_ok() on register pointers (Lionel)
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Signed-off-by: default avatarAndrzej Datczuk <andrzej.datczuk@intel.com>
      Reviewed-by: default avatarAndrzej Datczuk <andrzej.datczuk@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-2-lionel.g.landwerlin@intel.com
      f89823c2
  4. 16 Jun, 2017 1 commit
  5. 14 Jun, 2017 3 commits
    • Robert Bragg's avatar
      drm/i915/perf: Add OA unit support for Gen 8+ · 19f81df2
      Robert Bragg authored
      
      
      Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
      share (more-or-less) the same OA unit design.
      
      Of particular note in comparison to Haswell: some OA unit HW config
      state has become per-context state and as a consequence it is somewhat
      more complicated to manage synchronous state changes from the cpu while
      there's no guarantee of what context (if any) is currently actively
      running on the gpu.
      
      The periodic sampling frequency which can be particularly useful for
      system-wide analysis (as opposed to command stream synchronised
      MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
      have become per-context save and restored (while the OABUFFER
      destination is still a shared, system-wide resource).
      
      This support for gen8+ takes care to consider a number of timing
      challenges involved in synchronously updating per-context state
      primarily by programming all config state from the cpu and updating all
      current and saved contexts synchronously while the OA unit is still
      disabled.
      
      The driver intentionally avoids depending on command streamer
      programming to update OA state considering the lack of synchronization
      between the automatic loading of OACTXCONTROL state (that includes the
      periodic sampling state and enable state) on context restore and the
      parsing of any general purpose BB the driver can control. I.e. this
      implementation is careful to avoid the possibility of a context restore
      temporarily enabling any out-of-date periodic sampling state. In
      addition to the risk of transiently-out-of-date state being loaded
      automatically; there are also internal HW latencies involved in the
      loading of MUX configurations which would be difficult to account for
      from the command streamer (and we only want to enable the unit when once
      the MUX configuration is complete).
      
      Since the Gen8+ OA unit design no longer supports clock gating the unit
      off for a single given context (which effectively stopped any progress
      of counters while any other context was running) and instead supports
      tagging OA reports with a context ID for filtering on the CPU, it means
      we can no longer hide the system-wide progress of counters from a
      non-privileged application only interested in metrics for its own
      context. Although we could theoretically try and subtract the progress
      of other contexts before forwarding reports via read() we aren't in a
      position to filter reports captured via MI_REPORT_PERF_COUNT commands.
      As a result, for Gen8+, we always require the
      dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
      if not root.
      
      v5: Drain submitted requests when enabling metric set to ensure no
          lite-restore erases the context image we just updated (Lionel)
      
      v6: In addition to drain, switch to kernel context & update all
          context in place (Chris)
      
      v7: Add missing mutex_unlock() if switching to kernel context fails
          (Matthew)
      
      v8: Simplify OA period/flex-eu-counters programming by using the
          batchbuffer instead of modifying ctx-image (Lionel)
      
      v9: Back to updating the context image (due to erroneous testing,
          batchbuffer programming the OA unit doesn't actually work)
          (Lionel)
          Pin context before updating context image (Chris)
          Drop MMIO programming now that we switch to a kernel context with
          right values in initial context image (Chris)
      
      v10: Just pin_map the contexts we want to modify or let the
           configuration happen on first use (Chris)
      
      v11: Update kernel context OA config through the batchbuffer rather
           than on the fly ctx-image update (Lionel)
      
      v12: Rework OA context registers update again by swithing away from
           user contexts and reconfiguring the kernel context through the
           batchbuffer and updating all the other contexts' context image.
           Also take care to lock slice/subslice configuration when OA is
           on. (Lionel)
      
      v13: Request rpcs updates on all engine when updating the OA config
           (Lionel)
      
      v14: Drop any kind of rpcs management now that we monitor sseu
           configuration changes in a later patch (Lionel)
           Remove usleep after programming the NOA configs on Gen8+, this
           doesn't seem to be needed (Lionel)
      
      v15: Respect coding style for block comments (Chris)
      
      v16: Add missing i915_add_request() in case we fail to emit OA
           configuration (Matthew)
      Signed-off-by: default avatarRobert Bragg <robert@sixbynine.org>
      Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      19f81df2
    • Robert Bragg's avatar
      drm/i915: expose _SUBSLICE_MASK GETPARM · f5320233
      Robert Bragg authored
      
      
      Assuming a uniform mask across all slices, this enables userspace to
      determine the specific sub slices can be enabled. This information is
      required, for example, to be able to analyse some OA counter reports
      where the counter configuration depends on the HW sub slice
      configuration.
      Signed-off-by: default avatarRobert Bragg <robert@sixbynine.org>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      f5320233
    • Robert Bragg's avatar
      drm/i915: expose _SLICE_MASK GETPARM · 7fed555c
      Robert Bragg authored
      
      
      Enables userspace to determine the maximum number of slices that can
      be enabled on the device and also know what specific slices can be
      enabled. This information is required, for example, to be able to
      analyse some OA counter reports where the counter configuration
      depends on the HW slice configuration.
      Signed-off-by: default avatarRobert Bragg <robert@sixbynine.org>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      7fed555c
  6. 15 Apr, 2017 1 commit
  7. 12 Apr, 2017 1 commit
  8. 27 Jan, 2017 2 commits
  9. 19 Jan, 2017 1 commit
  10. 01 Dec, 2016 1 commit
  11. 22 Nov, 2016 2 commits
    • Robert Bragg's avatar
      drm/i915: Enable i915 perf stream for Haswell OA unit · d7965152
      Robert Bragg authored
      
      
      Gen graphics hardware can be set up to periodically write snapshots of
      performance counters into a circular buffer via its Observation
      Architecture and this patch exposes that capability to userspace via the
      i915 perf interface.
      
      v2:
         Make sure to initialize ->specific_ctx_id when opening, without
         relying on _pin_notify hook, in case ctx already pinned.
      v3:
         Revert back to pinning ctx upfront when opening stream, removing
         need to hook in to pinning and to update OACONTROL on the fly.
      Signed-off-by: default avatarRobert Bragg <robert@sixbynine.org>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Reviewed-by: default avatarSourab Gupta <sourab.gupta@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-7-robert@sixbynine.org
      d7965152
    • Robert Bragg's avatar
      drm/i915: Add i915 perf infrastructure · eec688e1
      Robert Bragg authored
      
      
      Adds base i915 perf infrastructure for Gen performance metrics.
      
      This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
      properties to configure a stream of metrics and returns a new fd usable
      with standard VFS system calls including read() to read typed and sized
      records; ioctl() to enable or disable capture and poll() to wait for
      data.
      
      A stream is opened something like:
      
        uint64_t properties[] = {
            /* Single context sampling */
            DRM_I915_PERF_PROP_CTX_HANDLE,        ctx_handle,
      
            /* Include OA reports in samples */
            DRM_I915_PERF_PROP_SAMPLE_OA,         true,
      
            /* OA unit configuration */
            DRM_I915_PERF_PROP_OA_METRICS_SET,    metrics_set_id,
            DRM_I915_PERF_PROP_OA_FORMAT,         report_format,
            DRM_I915_PERF_PROP_OA_EXPONENT,       period_exponent,
         };
         struct drm_i915_perf_open_param parm = {
            .flags = I915_PERF_FLAG_FD_CLOEXEC |
                     I915_PERF_FLAG_FD_NONBLOCK |
                     I915_PERF_FLAG_DISABLED,
            .properties_ptr = (uint64_t)properties,
            .num_properties = sizeof(properties) / 16,
         };
         int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param);
      
      Records read all start with a common { type, size } header with
      DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
      contain an extensible number of fields and it's the
      DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
      determine what's included in every sample.
      
      No specific streams are supported yet so any attempt to open a stream
      will return an error.
      
      v2:
          use i915_gem_context_get() - Chris Wilson
      v3:
          update read() interface to avoid passing state struct - Chris Wilson
          fix some rebase fallout, with i915-perf init/deinit
      v4:
          s/DRM_IORW/DRM_IOW/ - Emil Velikov
      Signed-off-by: default avatarRobert Bragg <robert@sixbynine.org>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Reviewed-by: default avatarSourab Gupta <sourab.gupta@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org
      eec688e1
  12. 21 Nov, 2016 1 commit
  13. 14 Nov, 2016 1 commit
  14. 26 Aug, 2016 1 commit
  15. 16 Aug, 2016 1 commit
  16. 05 Aug, 2016 1 commit
  17. 04 Aug, 2016 1 commit
  18. 19 Jul, 2016 2 commits
  19. 04 Jul, 2016 1 commit
  20. 01 Jul, 2016 1 commit
  21. 13 May, 2016 1 commit
  22. 28 Jan, 2016 1 commit
  23. 21 Jan, 2016 1 commit
  24. 10 Dec, 2015 1 commit
  25. 09 Dec, 2015 1 commit
    • Chris Wilson's avatar
      drm/i915: Add soft-pinning API for execbuffer · 506a8e87
      Chris Wilson authored
      
      
      Userspace can pass in an offset that it presumes the object is located
      at. The kernel will then do its utmost to fit the object into that
      location. The assumption is that userspace is handling its own object
      locations (for example along with full-ppgtt) and that the kernel will
      rarely have to make space for the user's requests.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      
      v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris
      Wilson.  Fixed incorrect error paths causing crash found by Michal Winiarski.
      (Not published externally)
      
      v3: Rebased because of trivial conflict in object_bind_to_vm.  Fixed eviction
      to allow eviction of soft-pinned objects when another soft-pinned object used
      by a subsequent execbuffer overlaps reported by Michal Winiarski.
      (Not published externally)
      
      v4: Moved soft-pinned objects to the front of ordered_vmas so that they are
      pinned first after an address conflict happens to avoid repeated conflicts in
      rare cases (Suggested by Chris Wilson).  Expanded comment on
      drm_i915_gem_exec_object2.offset to cover this new API.
      
      v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability
      (Kristian). Added check for multiple pinnings on eviction (Akash). Made sure
      buffers are not considered misplaced without the user specifying
      EXEC_OBJECT_SUPPORTS_48B_ADDRESS.  User must assume responsibility for any
      addressing workarounds.  Updated object2.offset field comment again to clarify
      NO_RELOC case (Chris).  checkpatch cleanup.
      
      v6: Trivial rebase on latest drm-intel-nightly
      
      v7: Catch attempts to pin above the max virtual address size and return
      EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and
      EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin
      something at an offset above 4GB (Chris, Daniel Vetter).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Akash Goel <akash.goel@intel.com>
      Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
      Cc: Michal Winiarski <michal.winiarski@intel.com>
      Cc: Zou Nanhai <nanhai.zou@intel.com>
      Cc: Kristian Høgsberg <hoegsberg@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Acked-by: PDT
      Signed-off-by: default avatarThomas Daniel <thomas.daniel@intel.com>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com
      506a8e87
  26. 18 Nov, 2015 1 commit
  27. 19 Oct, 2015 1 commit
    • Chris Wilson's avatar
      drm/i915: Report context GTT size · fa8848f2
      Chris Wilson authored
      Since the beginning we have conflated the size of the global GTT with
      that of the per-process context sizes. In recent times (gen8+), those
      are no longer the same where the global GTT is limited to 2/4GiB but the
      per-process GTT may be anything up to 256TiB. Userspace knows nothing of
      this discrepancy and outside of one or two hacks, uses the getaperture
      ioctl to determine the maximum size it can use. Let's leave that as
      reporting the global GTT and use the context reporting method to
      describe the per-process value (which naturally fallsback to reporting
      the aliasing or global on older platforms, so userspace can always use
      this method where available).
      
      Testcase: igt/gem_userptr_blits/minor-normal-sync
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90065
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      fa8848f2
  28. 01 Oct, 2015 1 commit
    • Michel Thierry's avatar
      drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset · 101b506a
      Michel Thierry authored
      There are some allocations that must be only referenced by 32-bit
      offsets. To limit the chances of having the first 4GB already full,
      objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
      DRM_MM_CREATE_TOP flags
      
      In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
      General State Heap (GSH) or Instruction State Heap (ISH) must be in a
      32-bit range, because the General State Offset and Instruction State
      Offset are limited to 32-bits.
      
      Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
      they can be allocated above the 32-bit address range. To limit the
      chances of having the first 4GB already full, objects will use
      DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
      
      The libdrm user of the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag is here:
      http://lists.freedesktop.org/archives/intel-gfx/2015-September/075836.html
      
      
      
      v2: Changed flag logic from neeeds_32b, to supports_48b.
      v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
      v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
      to use last PIN_ defined instead of hard-coded value; use correct limit
      check in eb_vma_misplaced. (Chris)
      v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
      v6: Apply pin-high for ggtt too (Chris)
      v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
          Fix check for entries currently using +4GB addresses, use min_t and
          other polish in object_bind_to_vm (Chris)
      v8: Commit message updated to point to libdrm patch.
      v9: vmas are allocated in the correct ozone, so only check flag when the
          vma has not been allocated. (Chris)
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      101b506a
  29. 02 Sep, 2015 1 commit
  30. 21 Jul, 2015 1 commit
  31. 15 Jul, 2015 1 commit
  32. 06 Jul, 2015 1 commit
    • Abdiel Janulgue's avatar
      drm/i915: Expose I915_EXEC_RESOURCE_STREAMER flag and getparam · a9ed33ca
      Abdiel Janulgue authored
      
      
      Ensures that the batch buffer is executed by the resource streamer.
      And will let userspace know whether Resource Streamer is supported in
      the kernel.
      
      v2: Don't skip 1<<15 for the exec flags (Jani Nikula)
      v3: Use HAS_RESOURCE_STREAMER macro for execbuf validation (Chris Wilson)
      
      (from getparam patch)
      
      v2: Update I915_PARAM_HAS_RESOURCE_STREAMER so it's after
          I915_PARAM_HAS_GPU_RESET.
      v3: Only advertise RS support for hardware that supports it.
      v4: Add HAS_RESOURCE_STREAMER() macro (Chris)
      
      Testcase: igt/gem_exec_params
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
      [danvet: squash in getparam patch since it'd break bisect, suggested
      by Chris.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      a9ed33ca
  33. 15 Jun, 2015 1 commit
  34. 29 May, 2015 1 commit
  35. 26 May, 2015 1 commit