You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/04/20 03:48:14 UTC

[GitHub] [arrow] cyb70289 opened a new pull request #6986: [C++] Optimize BitmapReader

cyb70289 opened a new pull request #6986:
URL: https://github.com/apache/arrow/pull/6986


   Replacing bit offset with bit mask improves about 15% performance
   with gcc-7.5. Arm64 servers have similar performance uplift.
   clang-9 doesn't benefit from this change.
   
   Below are arrow-bit-util-benchmark(BitmapReader/8192) results.
   Comparing performance of "current code" -> "after this patch".
   
   | cpu           | gcc-7.5    | clang-9    |
   | ---           | -------    | -------    |
   | Intel E5-2650 | 118 -> 146 | 117 -> 118 |
   | Intel i7-4790 | 154 -> 191 | 155 -> 154 |
   | AMD EPYC-7251 | 119 -> 133 | 122 -> 123 |


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616978252


   This change introduces severe branch misses in certain conditions. See perf logs below. I changed benchmark code to run only the problematic test case.
   
   Without this patch
   ```bash
           807.415826      task-clock (msec)         #    0.979 CPUs utilized          
                   83      context-switches          #    0.103 K/sec                  
                    0      cpu-migrations            #    0.000 K/sec                  
                  427      page-faults               #    0.529 K/sec                  
        2,285,801,407      cycles                    #    2.831 GHz                      (83.17%)
            2,313,785      stalled-cycles-frontend   #    0.10% frontend cycles idle     (83.16%)
          915,631,177      stalled-cycles-backend    #   40.06% backend cycles idle      (82.93%)
        9,997,208,858      instructions              #    4.37  insn per cycle         
                                                     #    0.09  stalled cycles per insn  (83.66%)
        1,679,799,451      branches                  # 2080.464 M/sec                    (83.66%)
              106,599      branch-misses             #    0.01% of all branches          (83.41%)
   ```
   
   With this patch
   ```bash
           902.557236      task-clock (msec)         #    0.980 CPUs utilized          
                   94      context-switches          #    0.104 K/sec                  
                    0      cpu-migrations            #    0.000 K/sec                  
                  427      page-faults               #    0.473 K/sec                  
        2,567,879,767      cycles                    #    2.845 GHz                      (83.17%)
           88,266,680      stalled-cycles-frontend   #    3.44% frontend cycles idle     (83.17%)
           20,826,862      stalled-cycles-backend    #    0.81% backend cycles idle      (83.03%)
        2,518,949,193      instructions              #    0.98  insn per cycle         
                                                     #    0.04  stalled cycles per insn  (83.62%)
          847,459,928      branches                  #  938.954 M/sec                    (83.61%)
           75,187,208      branch-misses             #    8.87% of all branches          (83.39%)
   ```
   Absolute counts are not comparable as gtest runs different loops for each test.
   The point is branch-misses jumps from 0.01% to 8.87%, which causes high frontend stall(cpu wait for fetching code to execute), and ipc(instructions per cycle) drops from 4.37 to 0.98.
   
   I didn't figure out which branch is miss predicted and why. My haswell cpu is too old to support branch tracing. My guess is [this line](https://github.com/apache/arrow/blob/5093b809d63ac8db99aec9caa7ad7e723f277c46/cpp/src/arrow/util/bit_util.cc#L285), no concrete justification.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379


   Cool, nice improvement (is this captured in our benchmark executables?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617227451


   @wesm, actually I did use codebase's benchmark executable. The problem is I only focused on one case that's directly related to this change. But ignored other cases that look not relevant, and finally found impacted.
   Guess I can write a simple script to automate benchmark result checking as a temporary solution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617368021


   See either `archery benchmark diff --help` or the [benchmark](https://arrow.apache.org/docs/developers/benchmarks.html) section of the documentation. Archery can compare the same binary with different scenarios:
   - Commit
   - Toolchain
   - Compile flags
   
   All it needs is two cmake build directory and it'll compare the benchmark binaries from there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617033346


   To be honest, `BitmapAnd` should probably be rewritten using `Bitmap::VisitWords`.
   But we can revert anyway if we fear regressions may appear in other workloads.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379


   Cool, nice improvement


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616295380


   https://issues.apache.org/jira/browse/ARROW-8523


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617149608


   > BTW: we definitely need continuous benchmark tools to detect these things early.
   
   Agreed. Hopefully some progress can be made on this in 2020 since the prior discussion in 2019 didn't go anywhere. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616468493


   @cyb70289 It's ok, we can keep this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617150324


   In the meantime, when we have microperformance patches like these it would be a good practice in the future to make sure that performance results are reproduced in the codebase's benchmark executables rather than on an ad hoc basis


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379


   Cool, nice improvement (is this captures in our benchmark executables?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616294784


   I forgot to add jira no in the first commit, modified later. Looks jira status is not synced with this PR.
   Shall I abandon and push a new PR?
   https://issues.apache.org/jira/browse/ARROW-8523


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616921669


   Opened a jira card https://issues.apache.org/jira/browse/ARROW-8537


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617228774


   FWIW, we have some benchmark diffing code already written in 
   
   https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark
   
   I'm not sure where this is documented / how to use it to check the output of a single executable 
   
   cc @fsaintjacques 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616915079


   @pitrou @wesm 
   Oops, I only checked case "BitmapReader" from benchmark [arrow-bit-util-benchmark](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit_util_benchmark.cc). Obviously it's not enough.
   
   I compared all cases just now and see huge performance drop from below 4 tests:
   
   Before this patch:
   ```bash
   BenchmarkBitmapAnd/32768/1                 563496 ns       563260 ns         1243 bytes_per_second=55.4806M/s
   BenchmarkBitmapAnd/131072/1               2219810 ns      2218984 ns          318 bytes_per_second=56.3321M/s
   BenchmarkBitmapAnd/32768/2                 561738 ns       561467 ns         1265 bytes_per_second=55.6577M/s
   BenchmarkBitmapAnd/131072/2               2246229 ns      2245119 ns          305 bytes_per_second=55.6763M/s
   ```
   
   After this patch:
   ```bash
   BenchmarkBitmapAnd/32768/1                1653467 ns      1652680 ns          422 bytes_per_second=18.9087M/s
   BenchmarkBitmapAnd/131072/1               6665501 ns      6661561 ns          105 bytes_per_second=18.7644M/s
   BenchmarkBitmapAnd/32768/2                1670793 ns      1670246 ns          423 bytes_per_second=18.7098M/s
   BenchmarkBitmapAnd/131072/2               6702369 ns      6698957 ns          103 bytes_per_second=18.6596M/s
   ```
   
   Before reverting this patch, I would like to understand why it happens.
   
   BTW: we definitely need continuous benchmark tools to detect these things early.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org