 16 Sep, 2019 1 commit


Sergii Romantsov authored
v2: by J.Ekstrand suggestion moved lowering of large constants after lowering of copy_deref is done. CC: Jason Ekstrand <jason@jlekstrand.net> CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450Signedoffby: Sergii Romantsov <sergii.romantsov@globallogic.com>

 06 Sep, 2019 1 commit


Vasily Khoruzhick authored
Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewedby: Qiang Yu <yuq825@gmail.com> Reviewedby: Eric Anholt <eric@anholt.net> Reviewedby: Jason Ekstrand <jason@jlekstrand.net> Signedoffby: Vasily Khoruzhick <anarsoul@gmail.com>

 21 Aug, 2019 1 commit


Jason Ekstrand authored
So many duplicated switch statements.... Reviewedby: Kenneth Graunke <kenneth@whitecape.org>

 12 Aug, 2019 1 commit


Rhys Perry authored
v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signedoffby: Rhys Perry <pendingchaos02@gmail.com> Reviewedby: Eric Anholt <eric@anholt.net>

 03 Aug, 2019 2 commits


Jason Ekstrand authored
Reviewedby: Matt Turner <mattst88@gmail.com>

Jason Ekstrand authored
We already had one in the vec4 code, we just had move it. Reviewedby: Matt Turner <mattst88@gmail.com>

 31 Jul, 2019 1 commit


Jason Ekstrand authored
Reviewedby: Matt Turner <mattst88@gmail.com>

 24 Jul, 2019 5 commits


Jason Ekstrand authored
Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Jason Ekstrand authored
Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Jason Ekstrand authored
The rules for gl_SubgroupSize in Vulkan require that it be a constant that can be queried through the API. However, all GL requires is that it's a uniform. Instead of always claiming that the subgroup size in the shader is 32 in GL like we have to do for Vulkan, claim 8 for geometry stages, the maximum for fragment shaders, and the actual size for compute. Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Jason Ekstrand authored
Instead of lowering the subgroup size so early, wait until we have more information. In particular, we're going to want different subgroup sizes from different stages depending on the API. We also defer lowering of subgroup masks because the ge/gt masks require the subgroup size to generate a subgroup mask. Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Jason Ekstrand authored
Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

 22 Jul, 2019 1 commit


Caio Marcelo de Oliveira Filho authored
anv vkpipelinedb results for SKL: total instructions in shared programs: 3622461 > 3611281 (0.31%) instructions in affected programs: 396452 > 385272 (2.82%) helped: 2062 HURT: 1 total cycles in shared programs: 1458144669 > 1458105320 (<.01%) cycles in affected programs: 4171830 > 4132481 (0.94%) helped: 1874 HURT: 180 total loops in shared programs: 2437 > 2437 (0.00%) loops in affected programs: 0 > 0 helped: 0 HURT: 0 total spills in shared programs: 8745 > 8748 (0.03%) spills in affected programs: 8 > 11 (37.50%) helped: 1 HURT: 1 total fills in shared programs: 23392 > 23395 (0.01%) fills in affected programs: 8 > 11 (37.50%) helped: 1 HURT: 1 LOST: 0 GAINED: 1 No changes to shaderdb on i965 or iris. The glsl compiler already does a similar optimization. Improvement suggested by Daniel Schürmann. Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

 16 Jul, 2019 1 commit


Jason Ekstrand authored
Now that the 64bit lowering passes do a complete lowering in one go, we don't need to loop anymore. We do, however, have to ensure that int64 lowering happens after double lowering because double lowering can produce int64 ops. Reviewedby: Eric Anholt <eric@anholt.net>

 13 Jul, 2019 1 commit


Jason Ekstrand authored
For bindless SSBO access, we have to do 64bit address calculations. On ICL and above, we don't have 64bit integer support so we have to lower the address calculations to 32bit arithmetic. If we don't run the optimization loop before lowering, we won't fold any of the address chain calculations before lowering 64bit arithmetic and they aren't really foldable afterwards. This cuts the size of the generated code in the compute shader in dEQPVK.ssbo.phys.layout.random.16bit.scalar.13 by around 30%. Reviewedby: Kenneth Graunke <kenneth@whitecape.org> Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

 12 Jul, 2019 1 commit


Andres Gomez authored
c8665005: ("intel/compiler: Don't always require precise lowering of flrp") forgot to remove some comments that didn't apply any more after the change. Signedoffby: Andres Gomez <agomez@igalia.com> Reviewedby: Jason Ekstrand <jason@jlekstrnd.net>

 08 Jul, 2019 1 commit


Connor Abbott authored
Pretty much every driver using nir_lower_io_to_temporaries followed by nir_lower_io is going to want this. In particular, radv and radeonsi in the next commits. Reviewedby: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

 02 Jul, 2019 2 commits


Jason Ekstrand authored
On gen11, the removed the PLN instruction so we have to emit a pile of MAD to emulate it. We may as well do that in NIR so we can optimize and later schedule it. Shaderdb results on Ice Lake: total instructions in shared programs: 17145644 > 16556440 (3.44%) instructions in affected programs: 11507454 > 10918250 (5.12%) helped: 35763 HURT: 42085 helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18 helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49% HURT stats (abs) min: 1 max: 248 x̄: 2.22 x̃: 2 HURT stats (rel) min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47% 95% mean confidence interval for instructions value: 7.67 7.47 95% mean confidence interval for instructions %change: 4.46% 4.29% Instructions are helped. total loops in shared programs: 4370 > 4370 (0.00%) loops in affected programs: 0 > 0 helped: 0 HURT: 0 total cycles in shared programs: 360624645 > 368220857 (2.11%) cycles in affected programs: 269631244 > 277227456 (2.82%) helped: 15583 HURT: 65874 helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32 helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44% HURT stats (abs) min: 1 max: 238638 x̄: 133.87 x̃: 20 HURT stats (rel) min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97% 95% mean confidence interval for cycles value: 67.42 119.09 95% mean confidence interval for cycles %change: 3.61% 3.73% Cycles are HURT. total spills in shared programs: 8943 > 8981 (0.42%) spills in affected programs: 1925 > 1963 (1.97%) helped: 44 HURT: 14 total fills in shared programs: 21815 > 21925 (0.50%) fills in affected programs: 3511 > 3621 (3.13%) helped: 41 HURT: 18 LOST: 70 GAINED: 14 Reviewedby: Matt Turner <mattst88@gmail.com>

Jason Ekstrand authored
Reviewedby: Matt Turner <mattst88@gmail.com>

 05 Jun, 2019 2 commits


Jason Ekstrand authored
Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_* is enabled, we can just take a single pointer and not a pointer to pointer. Reviewedby: Kenneth Graunke <kenneth@whitecape.org>

Jason Ekstrand authored
Now that NIR_TEST_* doesn't swap the shader out from under us, it's sufficient to just modify the shader rather than having to return in case we're testing serialization or cloning. Reviewedby: Kenneth Graunke <kenneth@whitecape.org>

 31 May, 2019 1 commit


Ian Romanick authored
Almost all of the spill / fill benefit is in Deus Ex. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224438 > 17196395 (0.16%) instructions in affected programs: 1518658 > 1490615 (1.85%) helped: 1550 HURT: 3 helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2 helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45% HURT stats (abs) min: 5 max: 10 x̄: 6.67 x̃: 5 HURT stats (rel) min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32% 95% mean confidence interval for instructions value: 19.86 16.26 95% mean confidence interval for instructions %change: 1.19% 1.04% Instructions are helped. total cycles in shared programs: 361468455 > 361288721 (0.05%) cycles in affected programs: 197367688 > 197187954 (0.09%) helped: 990 HURT: 683 helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16 helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26% HURT stats (abs) min: 1 max: 12190 x̄: 905.14 x̃: 22 HURT stats (rel) min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47% 95% mean confidence interval for cycles value: 315.45 100.58 95% mean confidence interval for cycles %change: 0.31% <.01% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12147 > 8948 (26.34%) spills in affected programs: 5433 > 2234 (58.88%) helped: 343 HURT: 0 total fills in shared programs: 25262 > 21814 (13.65%) fills in affected programs: 7771 > 4323 (44.37%) helped: 343 HURT: 3 LOST: 0 GAINED: 17 Ivy Bridge total instructions in shared programs: 12083517 > 12081427 (0.02%) instructions in affected programs: 540744 > 538654 (0.39%) helped: 786 HURT: 29 helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31% 95% mean confidence interval for instructions value: 2.83 2.30 95% mean confidence interval for instructions %change: 0.57% 0.47% Instructions are helped. total cycles in shared programs: 180153463 > 180124798 (0.02%) cycles in affected programs: 72597920 > 72569255 (0.04%) helped: 572 HURT: 249 helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13 helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26% HURT stats (abs) min: 1 max: 11060 x̄: 136.37 x̃: 10 HURT stats (rel) min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32% 95% mean confidence interval for cycles value: 96.22 26.39 95% mean confidence interval for cycles %change: 0.43% 0.23% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3625 > 3623 (0.06%) spills in affected programs: 46 > 44 (4.35%) helped: 1 HURT: 0 total fills in shared programs: 4065 > 4061 (0.10%) fills in affected programs: 104 > 100 (3.85%) helped: 1 HURT: 0 LOST: 0 GAINED: 8 Sandy Bridge total instructions in shared programs: 10879656 > 10878699 (<.01%) instructions in affected programs: 275167 > 274210 (0.35%) helped: 544 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25% 95% mean confidence interval for instructions value: 1.97 1.55 95% mean confidence interval for instructions %change: 0.43% 0.36% Instructions are helped. total cycles in shared programs: 154089096 > 154081132 (<.01%) cycles in affected programs: 4422722 > 4414758 (0.18%) helped: 459 HURT: 214 helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8 helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14% HURT stats (abs) min: 1 max: 226 x̄: 19.99 x̃: 4 HURT stats (rel) min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09% 95% mean confidence interval for cycles value: 15.51 8.15 95% mean confidence interval for cycles %change: 0.31% 0.17% Cycles are helped. total spills in shared programs: 2880 > 2876 (0.14%) spills in affected programs: 636 > 632 (0.63%) helped: 2 HURT: 0 total fills in shared programs: 3161 > 3157 (0.13%) fills in affected programs: 1519 > 1515 (0.26%) helped: 2 HURT: 0 LOST: 0 GAINED: 2 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8157361 > 8155067 (0.03%) instructions in affected programs: 382491 > 380197 (0.60%) helped: 677 HURT: 0 helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2 helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42% 95% mean confidence interval for instructions value: 3.76 3.01 95% mean confidence interval for instructions %change: 0.72% 0.59% Instructions are helped. total cycles in shared programs: 188588292 > 188583040 (<.01%) cycles in affected programs: 3155064 > 3149812 (0.17%) helped: 377 HURT: 13 helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6 helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12% HURT stats (abs) min: 2 max: 8 x̄: 5.85 x̃: 6 HURT stats (rel) min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04% 95% mean confidence interval for cycles value: 15.67 11.27 95% mean confidence interval for cycles %change: 0.45% 0.30% Cycles are helped. Reviewedby: Matt Turner <mattst88@gmail.com>

 24 May, 2019 1 commit


Jason Ekstrand authored
A few of our very late passes can end up generating vectors accidentally so we need to get rid of them. The only known case of this is the ffma peephole which generates fneg and fabs as vectors. Currently, they're not a problem because they get turned into fmov which the backend compiler knows how to handle as a vector. That's about to change. Reviewedby: Kristian H. Kristensen <hoegsberg@google.com> Ackedby: Alyssa Rosenzweig <alyssa@rosenzweig.io>

 14 May, 2019 1 commit


Ian Romanick authored
A tiny bit of help seems to come from nir_copy_prop. Future patches will benefit from this change. Doing more copy propagation on the vec4 backend led to a disaster in hurt cycles. v2: Fix typo in comment. Noticed by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224634 > 17224623 (<.01%) instructions in affected programs: 4586 > 4575 (0.24%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %change: 0.36% 0.19% Instructions are helped. total cycles in shared programs: 360828542 > 360828714 (<.01%) cycles in affected programs: 151159 > 151331 (0.11%) helped: 49 HURT: 28 helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6 helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42% HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15 HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88% 95% mean confidence interval for cycles value: 13.48 17.95 95% mean confidence interval for cycles %change: 0.69% 0.84% Inconclusive result (value mean confidence interval includes 0). Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13529544 > 13529542 (<.01%) instructions in affected programs: 358 > 356 (0.56%) helped: 2 HURT: 0 total cycles in shared programs: 357290311 > 357289678 (<.01%) cycles in affected programs: 178324 > 177691 (0.35%) helped: 48 HURT: 40 helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13 helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66% HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6 HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31% 95% mean confidence interval for cycles value: 18.28 3.89 95% mean confidence interval for cycles %change: 1.01% 0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8159110 > 8158980 (<.01%) instructions in affected programs: 22719 > 22589 (0.57%) helped: 65 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74% 95% mean confidence interval for instructions value: 2.06 1.94 95% mean confidence interval for instructions %change: 0.78% 0.68% Instructions are helped. total cycles in shared programs: 188609448 > 188609214 (<.01%) cycles in affected programs: 1875852 > 1875618 (0.01%) helped: 109 HURT: 104 helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4 helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07% HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2 HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for cycles value: 1.95 0.25 95% mean confidence interval for cycles %change: 0.04% 0.01% Cycles are helped. Reviewedby: Matt Turner <mattst88@gmail.com>

 10 May, 2019 1 commit


Jonathan Marek authored
This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalaronly. Signedoffby: Jonathan Marek <jonathan@marek.ca> Reviewedby: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewedby: Eric Anholt <eric@anholt.net>

 07 May, 2019 2 commits


Ian Romanick authored
No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8164367 > 8135551 (0.35%) instructions in affected programs: 3271235 > 3242419 (0.88%) helped: 13636 HURT: 90 helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1 helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97% HURT stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 2 HURT stats (rel) min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78% 95% mean confidence interval for instructions value: 2.13 2.07 95% mean confidence interval for instructions %change: 1.16% 1.13% Instructions are helped. total cycles in shared programs: 188719974 > 188586222 (0.07%) cycles in affected programs: 70415766 > 70282014 (0.19%) helped: 12563 HURT: 515 helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6 helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27% HURT stats (abs) min: 2 max: 54 x̄: 6.07 x̃: 4 HURT stats (rel) min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08% 95% mean confidence interval for cycles value: 10.56 9.90 95% mean confidence interval for cycles %change: 0.47% 0.45% Cycles are helped. LOST: 0 GAINED: 13 Reviewedby: Matt Turner <mattst88@gmail.com>

Ian Romanick authored
I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1c)+bc. On all other platforms 64bit nir_op_flrp and on Gen11 32bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 > 188647748 (<.01%) cycles in affected programs: 5096 > 5090 (0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewedby: Matt Turner <mattst88@gmail.com>

 25 Apr, 2019 1 commit


Caio Marcelo de Oliveira Filho authored
These will be lowered by nir_lower_tex() with the lower_tex_when_implicit_lod_not_supported, so don't need the extra handling here. Reviewedby: Rob Clark <robdclark@gmail.com> Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

 19 Apr, 2019 2 commits


Jason Ekstrand authored
When we have a bindless sampler, we need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Jason Ekstrand authored
We're about to start doing 64bit pointer calculations in ANV. They will get applied after brw_preprocess_nir which is where we currently do 64bit integer arithmetic lowering. Because we're adding 64bit integer arithmetic after the initial lowering has happened, we need to lower again. Reviewedby: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

 18 Apr, 2019 4 commits


Iago Toral Quiroga authored
Particularly, we need the same lowewrings we use for 16bit integers. Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Iago Toral Quiroga authored
Extended math with halffloat operands is only supported since gen9, but it is limited to SIMD8. In gen8 we lower it to 32bit. v2: quashed together the following patches (Jason):  intel/compiler: allow extended math functions with HF operands  intel/compiler: lower 16bit extended math to 32bit prior to gen9  intel/compiler: extended Math is limited to SIMD8 on halffloat Reviewedby: Jason Ekstrand <jason@jlekstrand.net> Reviewedby: Topi Pohjolainen <topi.pohjolainen@intel.com> (allow extended math functions with HF operands, extended Math is limited to SIMD8 on halffloat)

Iago Toral Quiroga authored
The hardware doesn't support halffloat for these. Reviewedby: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

Iago Toral Quiroga authored
Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2:  Consider fp16 rounding conversion opcodes  Properly handle swizzles on conversion sources. v3  Run the pass earlier, right after nir_opt_algebraic_late (Jason)  NIR alu output types already have the bitsize (Jason)  Use 'is_conversion' to identify conversion operations (Jason) v4:  Be careful about the intermediate types we use so we don't lose range and avoid incorrect rounding semantics (Jason) Reviewedby: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

 14 Apr, 2019 1 commit


Karol Herbst authored
Signedoffby: Karol Herbst <kherbst@redhat.com> Reviewedby: Jason Ekstrand <jason@jlekstrand.net>

 10 Apr, 2019 1 commit


Mark Janes authored
libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggestedby: Kenneth Graunke <kenneth@whitecape.org> Reviewedby: Kenneth Graunke <kenneth@whitecape.org>

 09 Apr, 2019 1 commit


Timothy Arceri authored
When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other ifstatements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shaderdb for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 > 1267549 (0.02 %) VGPRS: 896876 > 895920 (0.11 %) Spilled SGPRs: 24701 > 26367 (6.74 %) Code Size: 48379452 > 48507880 (0.27 %) bytes Max Waves: 241159 > 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 > 23816 (0.98 %) VGPRS: 25908 > 24952 (3.69 %) Spilled SGPRs: 503 > 2169 (331.21 %) Code Size: 2471392 > 2599820 (5.20 %) bytes Max Waves: 586 > 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on nonAMD drivers Reviewedby: Ian Romanick <ian.d.romanick@intel.com> (v1) Ackedby: Samuel Pitoiset <samuel.pitoiset@gmail.com>

 28 Mar, 2019 1 commit


Ian Romanick authored
Almost all of the hurt shaders are repeated instances of the same shader in synmark's compilation speed tests. shaderdb results: All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15256840 > 15256389 (<.01%) instructions in affected programs: 54137 > 53686 (0.83%) helped: 288 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1 helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74% 95% mean confidence interval for instructions value: 1.76 1.38 95% mean confidence interval for instructions %change: 2.47% 1.50% Instructions are helped. total cycles in shared programs: 372286583 > 372283851 (<.01%) cycles in affected programs: 833829 > 831097 (0.33%) helped: 265 HURT: 16 helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4 helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35% HURT stats (abs) min: 2 max: 130 x̄: 24.88 x̃: 8 HURT stats (rel) min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27% 95% mean confidence interval for cycles value: 12.30 7.15 95% mean confidence interval for cycles %change: 1.06% 0.64% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5038653 > 5038495 (<.01%) instructions in affected programs: 13939 > 13781 (1.13%) helped: 50 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4 helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83% 95% mean confidence interval for instructions value: 3.73 2.47 95% mean confidence interval for instructions %change: 3.16% 1.21% Instructions are helped. total cycles in shared programs: 128118922 > 128118228 (<.01%) cycles in affected programs: 134906 > 134212 (0.51%) helped: 50 HURT: 0 helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18 helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70% 95% mean confidence interval for cycles value: 16.54 11.22 95% mean confidence interval for cycles %change: 0.95% 0.53% Cycles are helped. Reviewedby: Kenneth Graunke <kenneth@whitecape.org>

 21 Mar, 2019 1 commit


Jason Ekstrand authored
v2: turn on for turnip as well (Karol Herbst) Reviewedby: Karol Herbst <kherbst@redhat.com>

 16 Mar, 2019 1 commit


Jason Ekstrand authored
This fixes a serious performance issue with DXVK: https://github.com/doitsujin/dxvk/issues/937 This was caused by a recent change that to improve performance on RADV which backfired on ANV and killed performance for some apps: https://github.com/doitsujin/dxvk/commit/e5a06d3f4a103a54cd4eb51970fedee405d1d698 Throwing in this bit of lowering lets us come along and CSE those UBO loads (or copyprop for SSBO load) and get one load where we previously would have gotten several. VkPipelinedb results on Kaby Lake: total instructions in shared programs: 5115361 > 5073185 (0.82%) instructions in affected programs: 1754333 > 1712157 (2.40%) helped: 5331 HURT: 63 total cycles in shared programs: 2544501169 > 2481144545 (2.49%) cycles in affected programs: 2531058653 > 2467702029 (2.50%) helped: 9202 HURT: 4323 total loops in shared programs: 3340 > 3331 (0.27%) loops in affected programs: 9 > 0 helped: 9 HURT: 0 total spills in shared programs: 3246 > 3053 (5.95%) spills in affected programs: 384 > 191 (50.26%) helped: 10 HURT: 5 total fills in shared programs: 4626 > 4452 (3.76%) fills in affected programs: 439 > 265 (39.64%) helped: 10 HURT: 5 All of the shaders with hurt spilling were in Rise of the Tomb Raider which also had shaders solidly helped in the spilling department. Not shown in those results (because I've not had success dumping the shaders) is Witcher 3 where this reduces spilling and improves overall perf by around 2025%. There were no shaderdb changes. Apparently, this just isn't a pattern that happens in OpenGL. Reviewedby: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: "19.0" mesastable@lists.freedesktop.org
