1. 24 Aug, 2018 21 commits
    • Emil Velikov's avatar
      Revert "configure: allow building with python3" · cff80b6c
      Emil Velikov authored
      This reverts commit ae7898df.
      
      Turns out the python scripts are _not_ fully python 3 compatible.
      As Ilia reported using get_xmlpool.py with LANG=C produces some weird
      output - see the link for details.
      
      Even though the issue was spotted with the autoconf build, it exposes a
      genuine problem with the script (and lack of lang handling of the meson
      build.)
      
      https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
      cff80b6c
    • Emil Velikov's avatar
      Revert "travis: use python3 for the autoconf builds" · 7a4d2d1f
      Emil Velikov authored
      This reverts commit 855af9a5.
      
      Turns out the python scripts are _not_ fully python 3 compatible.
      As Ilia reported using get_xmlpool.py with LANG=C produces some weird
      output - see the link for details.
      
      Even though the issue was spotted with the autoconf build, it exposes a
      genuine problem with the script (and lack of lang handling of the meson
      build.)
      
      https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
      7a4d2d1f
    • Kenneth Graunke's avatar
      Revert "mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICES" · 93e8e17f
      Kenneth Graunke authored
      This reverts commit 095515e1.
      
      This breaks KHR-GL46.map_buffer_alignment.functional on i965.
      
      This code was apparently not reviewed and I don't know why we would
      move from a driver configurable constant to a hardcoded value for all
      drivers.  This really looks like an accidental hack push.
      93e8e17f
    • Kenneth Graunke's avatar
      Revert recent changes about not including compute in combined limits. · 9d670fd8
      Kenneth Graunke authored
      As far as I can tell, no one reviewed these changes, they made i965
      assert fail on driver load, and I am not certain they are correct.
      (Hopefully reverting these does not break radeonsi too badly...)
      
      The uniform related changes seem fine and reasonable, but the texture
      image units change is possibly incorrect.  According to the
      OES_tessellation_shader spec issue 5:
      
         (5) How are aggregate shader limits computed?
      
          RESOLVED: Following the GL 4.4 model, but we restrict uniform
          buffer bindings to 12/stage instead of 14, this results in
      
              MAX_UNIFORM_BUFFER_BINDINGS = 72
                  This is 12 bindings/stage * 6 shader stages, allowing a static
                  partitioning of the bindings even though at most 5 stages can
                  appear in a program object).
              MAX_COMBINED_UNIFORM_BLOCKS = 60
                  This is 12 blocks/stage * 5 stages, since compute shaders can't
                  be mixed with other stages.
              MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96
                  This is 16 textures/stage * 6 stages.
      
      which definitely is including compute shaders in that last limit.
      Not including compute shaders breaks the following test:
      dEQP-GLES31.functional.state_query.integer.max_combined_texture_image_units_getinteger
      
      There was enough breakage that I figured we should just send this back
      to the drawing board.
      
      Revert "i965: don't include compute resources in "Combined" limits"
      Revert "st/mesa: don't include compute resources in "Combined" limits"
      Revert "mesa: don't include compute resources in MAX_COMBINED_* limits"
      
      This reverts commit b03dcb1e.
      This reverts commit cff290df.
      This reverts commit 45f87a48.
      9d670fd8
    • Roland Scheidegger's avatar
      gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0 · 8e1be9a3
      Roland Scheidegger authored
      These have been removed. Unfortunately auto-upgrade doesn't work for
      jit. (Worse, it seems we don't get a compilation error anymore when
      compiling the shader, rather llvm will just do a call to a null
      function in the jitted shaders making it difficult to detect when
      intrinsics vanish.)
      
      Luckily the signed ones are still there, I helped convincing llvm
      removing them is a bad idea for now, since while the unsigned ones have
      sort of agreed-upon simplest patterns to replace them with, this is not
      the case for the signed ones, and they require _significantly_ more
      complex patterns - to the point that the recognition is IMHO probably
      unlikely to ever work reliably in practice (due to other optimizations
      interfering). (Even for the relatively trivial unsigned patterns, llvm
      already added test cases where recognition doesn't work, unsaturated
      add followed by saturated add may produce atrocious code.)
      Nevertheless, it seems there's a serious quest to squash all
      cpu-specific intrinsics going on, so I'd expect patches to nuke them as
      well to resurface.
      
      Adapt the existing fallback code to match the simple patterns llvm uses
      and hope for the best. I've verified with lp_test_blend that it does
      produce the expected saturated assembly instructions. Though our
      cmp/select build helpers don't use boolean masks, but it doesn't seem
      to interfere with llvm's ability to recognize the pattern.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231Reviewed-by: 's avatarJose Fonseca <jfonseca@vmware.com>
      8e1be9a3
    • Marek Olšák's avatar
      st/mesa: expose KHR_texture_compression_astc_sliced_3d · 45b5f5fa
      Marek Olšák authored
      This is ASTC 2D LDR allowing texture arrays and 3D, compressing each
      slice as a separate 2D image. Tested by piglit. Trivial.
      45b5f5fa
    • Marek Olšák's avatar
      st/mesa: expose EXT_disjoint_timer_query · dae4cf39
      Marek Olšák authored
      same cap as ARB_timer_query, no changes needed, tested by piglit
      dae4cf39
    • Marek Olšák's avatar
      mesa: expose EXT_vertex_attrib_64bit · 263c962c
      Marek Olšák authored
      because the closed driver exposes it.
      It's the same as the ARB extension.
      Reviewed-by: 's avatarIan Romanick <ian.d.romanick@intel.com>
      263c962c
    • Marek Olšák's avatar
      mesa: expose AMD_query_buffer_object · 5c900910
      Marek Olšák authored
      it's a subset of the ARB extension.
      Reviewed-by: 's avatarIan Romanick <ian.d.romanick@intel.com>
      5c900910
    • Marek Olšák's avatar
      mesa: expose AMD_multi_draw_indirect · 056b9a5a
      Marek Olšák authored
      because the closed driver exposes it.
      This is equivalent to the ARB extension.
      Reviewed-by: 's avatarIan Romanick <ian.d.romanick@intel.com>
      056b9a5a
    • Marek Olšák's avatar
      mesa: expose AMD_gpu_shader_int64 · b3c17330
      Marek Olšák authored
      because the closed driver exposes it.
      
      It's equivalent to ARB_gpu_shader_int64.
      In this patch, I did everything the same as we do for ARB_gpu_shader_int64.
      Reviewed-by: 's avatarIan Romanick <ian.d.romanick@intel.com>
      b3c17330
    • Marek Olšák's avatar
      mesa: expose ARB_post_depth_coverage in the Compatibility profile · 1cf3631b
      Marek Olšák authored
      It only contains GLSL changes.
      
      v2: allow the layout qualifier on GLSL <= 1.30
      1cf3631b
    • Jason Ekstrand's avatar
      intel/nir: Enable nir_opt_find_array_copies · 8d822246
      Jason Ekstrand authored
      We have to be a bit careful with this one because we want it to run in
      the optimization loop but only in the first brw_nir_optimize call.
      Later calls assume that we've lowered away copy_deref instructions and
      we don't want to introduce any more.
      
      Shader-db results on Kaby Lake:
      
          total instructions in shared programs: 15176942 -> 15176942 (0.00%)
          instructions in affected programs: 0 -> 0
          helped: 0
          HURT: 0
      
      In spite of the lack of any shader-db improvement, this patch completely
      eliminates spilling in the Batman: Arkham City tessellation shaders.
      This is because we are now able to detect that the temporary array
      created by DXVK for storing TCS inputs is a copy of the input arrays and
      use indirect URB reads instead of making a copy of 4.5 KiB of input data
      and then indirecting on it with if-ladders.
      Reviewed-by: 's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      8d822246
    • Jason Ekstrand's avatar
      nir: Add an array copy optimization · 53072582
      Jason Ekstrand authored
      This peephole optimization looks for a series of load/store_deref or
      copy_deref instructions that copy an array from one variable to another
      and turns it into a copy_deref that copies the entire array.  The
      pattern it looks for is extremely specific but it's good enough to pick
      up on the input array copies in DXVK and should also be able to pick up
      the sequence generated by spirv_to_nir for a OpLoad of a large composite
      followed by OpStore.  It can always be improved later if needed.
      Reviewed-by: 's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      53072582
    • Jason Ekstrand's avatar
      intel/nir: Use nir_shrink_vec_array_vars · a4a9c075
      Jason Ekstrand authored
      Shader-db results on Kaby Lake:
      
          total instructions in shared programs: 15177605 -> 15176765 (<.01%)
          instructions in affected programs: 4259 -> 3419 (-19.72%)
          helped: 1
          HURT: 0
      
          total spills in shared programs: 10954 -> 10855 (-0.90%)
          spills in affected programs: 295 -> 196 (-33.56%)
          helped: 1
          HURT: 0
      
          total fills in shared programs: 22222 -> 22117 (-0.47%)
          fills in affected programs: 417 -> 312 (-25.18%)
          helped: 1
          HURT: 0
      
      The helped shader is from the OglCSDof synmark test.  On my Kaby Lake
      laptop, the actual framerate of the benchmark didn't appear to improve
      beyond the noise.
      Reviewed-by: 's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      a4a9c075
    • Jason Ekstrand's avatar
      nir: Add a array-of-vector variable shrinking pass · be8d0099
      Jason Ekstrand authored
      This pass looks for variables with vector or array-of-vector types and
      narrows the type to only the components used.
      Reviewed-by: 's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      be8d0099
    • Jason Ekstrand's avatar
      intel/nir: Use the new structure and array splitting passes · 02a5442d
      Jason Ekstrand authored
      We call structure splitting once because it is guaranteed to split all
      the structures in the entire shader in one go.  We call array splitting
      in the loop in case future optimizations turn indirects into direct
      dereferences and we can split more arrays.
      
      Shader-db results on Kaby Lake:
      
          total instructions in shared programs: 15177605 -> 15177605 (0.00%)
          instructions in affected programs: 0 -> 0
          helped: 0
          HURT: 0
      
      This is unsurprising because nir_lower_vars_to_ssa already effectively
      does structure and array splitting internally.  It doesn't actually
      split the variables but it's ability to reason about aliasing in the
      presence of arrays and structures and pick out scalars or vectors to be
      lowered to SSA values is fairly advanced.
      Reviewed-by: 's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      02a5442d
    • Jason Ekstrand's avatar
      nir: Add an array splitting pass · fa641749
      Jason Ekstrand authored
      This pass looks for array variables where at least one level of the
      array is never indirected and splits it into multiple smaller variables.
      
      This pass doesn't really do much now because nir_lower_vars_to_ssa can
      already see through arrays of arrays and can detect indirects on just
      one level or even see that arr[i][0][5] does not alias arr[i][1][j].
      This pass exists to help other passes more easily see through arrays of
      arrays.  If a back-end does implement arrays using scratch or indirects
      on registers, having more smaller arrays is likely to have better memory
      efficiency.
      
      v2 (Jason Ekstrand):
       - Better comments and naming (some from Caio)
       - Rework to use one hash map instead of two
      
      v2.1 (Jason Ekstrand):
       - Fix a couple of bugs that were added in the rework including one
         which basically prevented it from running
      Reviewed-by: 's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      fa641749
    • Jason Ekstrand's avatar
      nir: Add a structure splitting pass · 26eb077e
      Jason Ekstrand authored
      This pass doesn't really do much now because nir_lower_vars_to_ssa can
      already see through structures and considers them to be "split".  This
      pass exists to help other passes more easily see through structure
      variables.  If a back-end does implement arrays using scratch or
      indirects on registers, having more smaller arrays is likely to have
      better memory efficiency.
      Reviewed-by: 's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      26eb077e
    • Jason Ekstrand's avatar
      nir/types: Add array_or_matrix helpers · b489998e
      Jason Ekstrand authored
      Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
      b489998e
    • Kenneth Graunke's avatar
      i965: don't include compute resources in "Combined" limits · b03dcb1e
      Kenneth Graunke authored
      The combined limits should only include shader stages that can be active
      at the same time.  We don't need to include compute.
      
      See also cff290df for st/mesa.
      
      Unbreaks i965 from assert failing on driver load since Marek's
      45f87a48, which dropped the core
      Mesa capabilities before adjusting driver limits down to match.
      b03dcb1e
  2. 23 Aug, 2018 19 commits