Skip to content
  • Marc-André Lureau's avatar
    renderer: use a thread to block for fences. · 89aea798
    Marc-André Lureau authored
    
    
    Instead of polling the fences regularly, have a thread
    that blocks for a single fence using a separate shared
    context, then uses eventfd to wake up the main thread
    when something happens.
    
    Inside the guest, glmark2 typicially runs twice as fast with the thread
    sync. Although in general, the performances seems to be about +30%. The
    benefits is mostly for CPU-bounds tasks (when main the thread hits 100%)
    
    A naive perf stat of the vtest renderer with glmark2 "build" test with a
    fixed number of frames (500) results in the following stats data:
    (do not value timing related informations, since the renderer is ran and
    stopped manually)
    
    without thread:
    
           3032.282265      task-clock (msec)         #    0.420 CPUs utilized
                 4,277      context-switches          #    0.001 M/sec
                   102      cpu-migrations            #    0.034 K/sec
                 9,020      page-faults               #    0.003 M/sec
         7,884,098,254      cycles                    #    2.600 GHz
         4,440,126,451      stalled-cycles-frontend   #   56.32% frontend cycles idle
       <not supported>      stalled-cycles-backend
        11,024,091,578      instructions              #    1.40  insns per cycle
                                                      #    0.40  stalled
                                                      #    cycles per insn
         1,091,831,588      branches                  #  360.069 M/sec
             5,426,846      branch-misses             #    0.50% of all branches
    
    with thread:
    
           3403.592921      task-clock (msec)         #    0.452 CPUs utilized
                 7,145      context-switches          #    0.002 M/sec
                   410      cpu-migrations            #    0.120 K/sec
                 6,191      page-faults               #    0.002 M/sec
         7,475,038,064      cycles                    #    2.196 GHz
         4,487,043,071      stalled-cycles-frontend   #   60.03% frontend cycles idle
       <not supported>      stalled-cycles-backend
         9,925,205,494      instructions              #    1.33  insns per cycle
                                                      #    0.45  stalled
                                                      #    cycles per insn
           834,375,503      branches                  #  245.146 M/sec
             4,919,995      branch-misses             #    0.59% of all branches
    
    Signed-off-by: default avatarMarc-André Lureau <marcandre.lureau@gmail.com>
    Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
    89aea798