-
Jason Ekstrand authored
For a block with a contiguous chunk of 32 vars that don't need updating, this lets us skip 32 vars at a time. Also, by using bitscan, we only iterate for each set bit rather than testing them all one at a time. Looking at perf (with -O0 which is unfortunately necessary to get reasonable back-traces), this seems to cuts about 50-60% of the time spent in compute_start_end() which is, itself about 4-6% of the run-time. In the real world, with a release driver build, this cuts 1.34% off a full shader-db run. (I ran shader-db 5 times in each configuration). Reviewed-by: Matt Turner <mattst88@gmail.com>
fce0214e