You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/05 14:53:08 UTC
[GitHub] [arrow] wesm opened a new pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
wesm opened a new pull request #7356:
URL: https://github.com/apache/arrow/pull/7356
BinaryBitBlockCounter computes the popcount of the bitwise-and of each corresponding bit-run (with target of 64 bits at a time) of two bitmaps. This permits iterating through two validity bitmaps that don't have a lot of nulls much more quickly than using two BitmapReaders.
I also added an inline-able BitBlockCounter::NextWordInline for 64-bits at a time for a single bitmap. It seems like this may be preferable to the four word version. Now we have `NextWord` and `NextFourWords` so the developer can choose either variant.
Benchmarks and tests covering all this are included. I'll post the benchmarks on my machine as a comment.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639697297
OK, here are the binary benchmarks:
```
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------
BinaryBitBlockCounterSum/8 3189138 ns 3189079 ns 216 313.57M items/s
BinaryBitBlockCounterSum/64 1839419 ns 1839359 ns 390 543.668M items/s
BinaryBitBlockCounterSum/512 630842 ns 630808 ns 1121 1.54811G items/s
BinaryBitBlockCounterSum/4096 256330 ns 256332 ns 2746 3.80976G items/s
BinaryBitBlockCounterSum/32768 204388 ns 204383 ns 3454 4.77809G items/s
BinaryBitBlockCounterSum/65536 201268 ns 201260 ns 3428 4.85225G items/s
BinaryBitBlockCounterSumWithOffset/8 3313859 ns 3313805 ns 206 301.768M items/s
BinaryBitBlockCounterSumWithOffset/64 1966957 ns 1966805 ns 360 508.439M items/s
BinaryBitBlockCounterSumWithOffset/512 672431 ns 672434 ns 1088 1.45228G items/s
BinaryBitBlockCounterSumWithOffset/4096 286651 ns 286643 ns 2469 3.40689G items/s
BinaryBitBlockCounterSumWithOffset/32768 228652 ns 228648 ns 3048 4.27103G items/s
BinaryBitBlockCounterSumWithOffset/65536 228191 ns 228188 ns 3171 4.27964G items/s
BinaryBitmapReaderSum/8 3803716 ns 3803704 ns 183 262.902M items/s
BinaryBitmapReaderSum/64 2184717 ns 2184728 ns 316 457.723M items/s
BinaryBitmapReaderSum/512 2018442 ns 2018421 ns 344 495.437M items/s
BinaryBitmapReaderSum/4096 1997782 ns 1997729 ns 349 500.568M items/s
BinaryBitmapReaderSum/32768 2024333 ns 2024318 ns 367 493.994M items/s
BinaryBitmapReaderSum/65536 2018332 ns 2018340 ns 346 495.457M items/s
BinaryBitmapReaderSumWithOffset/8 3926170 ns 3926185 ns 181 254.7M items/s
BinaryBitmapReaderSumWithOffset/64 2198425 ns 2198417 ns 323 454.873M items/s
BinaryBitmapReaderSumWithOffset/512 2001917 ns 2001864 ns 352 499.535M items/s
BinaryBitmapReaderSumWithOffset/4096 1980845 ns 1980853 ns 351 504.833M items/s
BinaryBitmapReaderSumWithOffset/32768 1979394 ns 1979403 ns 365 505.203M items/s
BinaryBitmapReaderSumWithOffset/65536 2029335 ns 2029347 ns 345 492.769M items/s
```
It seems that it is never a good idea to use BitmapReader for the binary case, even when the incidence of nulls is high, that even in that case naively using `BitUtil::GetBit` is better.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#discussion_r436764326
##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
#include <cstdint>
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
#include "arrow/util/visibility.h"
namespace arrow {
namespace internal {
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+ int16_t length;
+ int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
class ARROW_EXPORT BitBlockCounter {
public:
- struct Block {
- int16_t length;
- int16_t popcount;
- };
-
- static constexpr int16_t kTargetBlockLength = 256;
-
BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
: bitmap_(bitmap + start_offset / 8),
bits_remaining_(length),
offset_(start_offset % 8) {}
- /// \brief Return the next run of available bits, up to 256. The returned
- /// pair contains the size of run and the number of true values
- Block NextBlock();
+ /// \brief Return the next run of available bits, usually 256. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 256.
+ BitBlockCount NextFourWords();
+
+ /// \brief Return the next run of available bits, usually 64. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 64.
+ BitBlockCount NextWord();
+
+ /// \brief Inlineable implementation of NextWord
Review comment:
It was largely for the sake of benchmarking. I don't think it's worth keeping the inline version, agreed?
##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
#include <cstdint>
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
#include "arrow/util/visibility.h"
namespace arrow {
namespace internal {
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+ int16_t length;
+ int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
class ARROW_EXPORT BitBlockCounter {
public:
- struct Block {
- int16_t length;
- int16_t popcount;
- };
-
- static constexpr int16_t kTargetBlockLength = 256;
-
BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
: bitmap_(bitmap + start_offset / 8),
bits_remaining_(length),
offset_(start_offset % 8) {}
- /// \brief Return the next run of available bits, up to 256. The returned
- /// pair contains the size of run and the number of true values
- Block NextBlock();
+ /// \brief Return the next run of available bits, usually 256. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 256.
Review comment:
Well, I can alter the code to yield two blocks at the end rather than a single overfull block like now. You think that is preferable?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639555962
OK here are the fixed benchmarks. These match my intuition now. This shows that there is very little downside and nearly always upside to using `BitBlockCounter` over `BitmapReader`. There isn't a benchmark showing the naive double-BitmapReader for the binary case but that could be added too
```
-----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------
BitBlockCounterSum/8 1732025 ns 1732026 ns 400 577.358M items/s
BitBlockCounterSum/64 832029 ns 832025 ns 836 1.17372G items/s
BitBlockCounterSum/512 265104 ns 265105 ns 2666 3.68368G items/s
BitBlockCounterSum/4096 151056 ns 151056 ns 4647 6.46488G items/s
BitBlockCounterSum/32768 138635 ns 138632 ns 5081 7.0443G items/s
BitBlockCounterSum/65536 137021 ns 137019 ns 4977 7.1272G items/s
BitBlockCounterSumWithOffset/8 1779549 ns 1779525 ns 395 561.948M items/s
BitBlockCounterSumWithOffset/64 855158 ns 855149 ns 823 1.14198G items/s
BitBlockCounterSumWithOffset/512 273883 ns 273874 ns 2578 3.56574G items/s
BitBlockCounterSumWithOffset/4096 154422 ns 154422 ns 4499 6.32397G items/s
BitBlockCounterSumWithOffset/32768 141266 ns 141265 ns 4887 6.91299G items/s
BitBlockCounterSumWithOffset/65536 140049 ns 140048 ns 5024 6.97305G items/s
BitBlockCounterInlineSum/8 1714554 ns 1714564 ns 410 583.239M items/s
BitBlockCounterInlineSum/64 802222 ns 802227 ns 878 1.21731G items/s
BitBlockCounterInlineSum/512 239832 ns 239831 ns 2897 4.07188G items/s
BitBlockCounterInlineSum/4096 129035 ns 129031 ns 5384 7.56842G items/s
BitBlockCounterInlineSum/32768 116983 ns 116981 ns 6003 8.34801G items/s
BitBlockCounterInlineSum/65536 116078 ns 116079 ns 6085 8.4129G items/s
BitBlockCounterInlineSumWithOffset/8 1682920 ns 1682911 ns 414 594.209M items/s
BitBlockCounterInlineSumWithOffset/64 809214 ns 809211 ns 874 1.20681G items/s
BitBlockCounterInlineSumWithOffset/512 251953 ns 251951 ns 2756 3.876G items/s
BitBlockCounterInlineSumWithOffset/4096 139827 ns 139826 ns 4988 6.98414G items/s
BitBlockCounterInlineSumWithOffset/32768 127648 ns 127648 ns 5498 7.65041G items/s
BitBlockCounterInlineSumWithOffset/65536 126759 ns 126756 ns 5551 7.70425G items/s
BitBlockCounterFourWordsSum/8 1655902 ns 1655861 ns 423 603.916M items/s
BitBlockCounterFourWordsSum/64 1000517 ns 1000507 ns 692 999.493M items/s
BitBlockCounterFourWordsSum/512 441463 ns 441466 ns 1595 2.21209G items/s
BitBlockCounterFourWordsSum/4096 128194 ns 128193 ns 5484 7.61788G items/s
BitBlockCounterFourWordsSum/32768 85335 ns 85334 ns 8050 11.444G items/s
BitBlockCounterFourWordsSum/65536 82101 ns 82101 ns 8498 11.8947G items/s
BitBlockCounterFourWordsSumWithOffset/8 1647208 ns 1647201 ns 422 607.091M items/s
BitBlockCounterFourWordsSumWithOffset/64 1025215 ns 1025183 ns 700 975.436M items/s
BitBlockCounterFourWordsSumWithOffset/512 462082 ns 462074 ns 1572 2.11343G items/s
BitBlockCounterFourWordsSumWithOffset/4096 132541 ns 132540 ns 5257 7.36808G items/s
BitBlockCounterFourWordsSumWithOffset/32768 92098 ns 92098 ns 7651 10.6035G items/s
BitBlockCounterFourWordsSumWithOffset/65536 87406 ns 87406 ns 7908 11.1727G items/s
BitmapReaderSum/8 1600625 ns 1600619 ns 442 624.758M items/s
BitmapReaderSum/64 885446 ns 885445 ns 789 1.10291G items/s
BitmapReaderSum/512 805230 ns 805219 ns 862 1.21279G items/s
BitmapReaderSum/4096 794678 ns 794676 ns 870 1.22888G items/s
BitmapReaderSum/32768 793758 ns 793749 ns 869 1.23032G items/s
BitmapReaderSum/65536 794828 ns 794812 ns 879 1.22867G items/s
BitmapReaderSumWithOffset/8 1667559 ns 1667514 ns 419 599.695M items/s
BitmapReaderSumWithOffset/64 930337 ns 930335 ns 755 1074.88M items/s
BitmapReaderSumWithOffset/512 841240 ns 841236 ns 841 1.16087G items/s
BitmapReaderSumWithOffset/4096 840091 ns 840087 ns 853 1.16245G items/s
BitmapReaderSumWithOffset/32768 828098 ns 828103 ns 846 1.17928G items/s
BitmapReaderSumWithOffset/65536 831186 ns 831191 ns 854 1.1749G items/s
BinaryBitBlockCounterSum/8 2974962 ns 2974893 ns 235 336.146M items/s
BinaryBitBlockCounterSum/64 1697417 ns 1697403 ns 414 589.135M items/s
BinaryBitBlockCounterSum/512 622981 ns 622973 ns 1165 1.56758G items/s
BinaryBitBlockCounterSum/4096 251586 ns 251582 ns 2831 3.88168G items/s
BinaryBitBlockCounterSum/32768 202682 ns 202683 ns 3345 4.81817G items/s
BinaryBitBlockCounterSum/65536 192151 ns 192150 ns 3653 5.0823G items/s
BinaryBitBlockCounterSumWithOffset/8 3178632 ns 3178625 ns 224 314.601M items/s
BinaryBitBlockCounterSumWithOffset/64 1713947 ns 1713944 ns 404 583.449M items/s
BinaryBitBlockCounterSumWithOffset/512 605481 ns 605476 ns 1158 1.61288G items/s
BinaryBitBlockCounterSumWithOffset/4096 258490 ns 258489 ns 2716 3.77796G items/s
BinaryBitBlockCounterSumWithOffset/32768 212582 ns 212577 ns 3273 4.59393G items/s
BinaryBitBlockCounterSumWithOffset/65536 208857 ns 208857 ns 3357 4.67575G items/s
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm removed a comment on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm removed a comment on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639547819
Benchmarks
```
BitBlockCounterSum/8 1687631 ns 1687634 ns 407 592.545M items/s
BitBlockCounterSum/64 730805 ns 730798 ns 970 1.3363G items/s
BitBlockCounterSum/512 179964 ns 179949 ns 3923 5.42689G items/s
BitBlockCounterSum/4096 75167 ns 75165 ns 9461 12.9922G items/s
BitBlockCounterSum/32768 58679 ns 58679 ns 12065 16.6425G items/s
BitBlockCounterSum/65536 57921 ns 57922 ns 12311 16.86G items/s
BitBlockCounterSumWithOffset/8 1672146 ns 1672131 ns 419 598.039M items/s
BitBlockCounterSumWithOffset/64 735521 ns 735515 ns 944 1.32773G items/s
BitBlockCounterSumWithOffset/512 186470 ns 186468 ns 3782 5.23716G items/s
BitBlockCounterSumWithOffset/4096 82928 ns 82928 ns 8544 11.776G items/s
BitBlockCounterSumWithOffset/32768 66855 ns 66855 ns 10402 14.6072G items/s
BitBlockCounterSumWithOffset/65536 66821 ns 66822 ns 10131 14.6145G items/s
BitBlockCounterInlineSum/8 1563707 ns 1563716 ns 448 639.502M items/s
BitBlockCounterInlineSum/64 596145 ns 596143 ns 1166 1.63813G items/s
BitBlockCounterInlineSum/512 122461 ns 122461 ns 5554 7.9745G items/s
BitBlockCounterInlineSum/4096 30327 ns 30327 ns 22765 32.2007G items/s
BitBlockCounterInlineSum/32768 16484 ns 16484 ns 40881 59.2429G items/s
BitBlockCounterInlineSum/65536 16145 ns 16145 ns 43742 60.4885G items/s
BitBlockCounterInlineSumWithOffset/8 1577564 ns 1577548 ns 441 633.895M items/s
BitBlockCounterInlineSumWithOffset/64 609330 ns 609319 ns 1091 1.60271G items/s
BitBlockCounterInlineSumWithOffset/512 126930 ns 126931 ns 5580 7.69365G items/s
BitBlockCounterInlineSumWithOffset/4096 33720 ns 33720 ns 20848 28.9608G items/s
BitBlockCounterInlineSumWithOffset/32768 20190 ns 20190 ns 34575 48.3693G items/s
BitBlockCounterInlineSumWithOffset/65536 19914 ns 19914 ns 35329 49.0382G items/s
BitBlockCounterFourWordsSum/8 1557230 ns 1557208 ns 456 642.175M items/s
BitBlockCounterFourWordsSum/64 755518 ns 755523 ns 936 1.29257G items/s
BitBlockCounterFourWordsSum/512 314926 ns 314928 ns 2237 3.10091G items/s
BitBlockCounterFourWordsSum/4096 62012 ns 62011 ns 10790 15.7483G items/s
BitBlockCounterFourWordsSum/32768 18307 ns 18307 ns 38225 53.3436G items/s
BitBlockCounterFourWordsSum/65536 16755 ns 16755 ns 42106 58.2835G items/s
BitBlockCounterFourWordsSumWithOffset/8 1554037 ns 1554017 ns 455 643.494M items/s
BitBlockCounterFourWordsSumWithOffset/64 771938 ns 771925 ns 878 1.2651G items/s
BitBlockCounterFourWordsSumWithOffset/512 311751 ns 311753 ns 2200 3.13249G items/s
BitBlockCounterFourWordsSumWithOffset/4096 65821 ns 65821 ns 10635 14.8366G items/s
BitBlockCounterFourWordsSumWithOffset/32768 21528 ns 21528 ns 32434 45.362G items/s
BitBlockCounterFourWordsSumWithOffset/65536 20499 ns 20499 ns 35251 47.6401G items/s
BinaryBitBlockCounterSum/8 1369784 ns 1369771 ns 529 730.048M items/s
BinaryBitBlockCounterSum/64 75780 ns 75780 ns 9360 12.8867G items/s
BinaryBitBlockCounterSum/512 56490 ns 56491 ns 12401 17.2871G items/s
BinaryBitBlockCounterSum/4096 55213 ns 55213 ns 12094 17.6871G items/s
BinaryBitBlockCounterSum/32768 56260 ns 56260 ns 12740 17.358G items/s
BinaryBitBlockCounterSum/65536 57650 ns 57650 ns 12731 16.9394G items/s
BinaryBitBlockCounterSumWithOffset/8 1433256 ns 1433222 ns 479 697.728M items/s
BinaryBitBlockCounterSumWithOffset/64 96996 ns 96996 ns 7346 10.068G items/s
BinaryBitBlockCounterSumWithOffset/512 75189 ns 75188 ns 9420 12.9882G items/s
BinaryBitBlockCounterSumWithOffset/4096 74190 ns 74191 ns 9414 13.1628G items/s
BinaryBitBlockCounterSumWithOffset/32768 74885 ns 74885 ns 9362 13.0409G items/s
BinaryBitBlockCounterSumWithOffset/65536 74462 ns 74462 ns 9425 13.1148G items/s
```
I'm surprised that the binary sum (BinaryBitBlockCounterSum) seems faster than the unary sum (BitBlockCounterSum). I'm going to scrutinize the benchmark to check whether I've implemented something incorrectly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639685965
I'm adding a BitmapReader-based comparison for the binary case. Stay tuned
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-640673030
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639899415
I'm going to be basing some patches on top of this, I can rebase whenever this gets reviewed/merged
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639553741
https://issues.apache.org/jira/browse/ARROW-9034
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7356:
URL: https://github.com/apache/arrow/pull/7356
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-640673030
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639555962
OK here are the fixed benchmarks. These match my intuition now. This shows that there is very little downside and nearly always upside to using `BitBlockCounter` over `BitmapReader` (at least when there is 1% nulls or less, when there is a higher percentage you don't drop that much performance for using it). There isn't a benchmark showing the naive double-BitmapReader for the binary case but that could be added too
```
-----------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------
BitBlockCounterSum/8 1732025 ns 1732026 ns 400 577.358M items/s
BitBlockCounterSum/64 832029 ns 832025 ns 836 1.17372G items/s
BitBlockCounterSum/512 265104 ns 265105 ns 2666 3.68368G items/s
BitBlockCounterSum/4096 151056 ns 151056 ns 4647 6.46488G items/s
BitBlockCounterSum/32768 138635 ns 138632 ns 5081 7.0443G items/s
BitBlockCounterSum/65536 137021 ns 137019 ns 4977 7.1272G items/s
BitBlockCounterSumWithOffset/8 1779549 ns 1779525 ns 395 561.948M items/s
BitBlockCounterSumWithOffset/64 855158 ns 855149 ns 823 1.14198G items/s
BitBlockCounterSumWithOffset/512 273883 ns 273874 ns 2578 3.56574G items/s
BitBlockCounterSumWithOffset/4096 154422 ns 154422 ns 4499 6.32397G items/s
BitBlockCounterSumWithOffset/32768 141266 ns 141265 ns 4887 6.91299G items/s
BitBlockCounterSumWithOffset/65536 140049 ns 140048 ns 5024 6.97305G items/s
BitBlockCounterInlineSum/8 1714554 ns 1714564 ns 410 583.239M items/s
BitBlockCounterInlineSum/64 802222 ns 802227 ns 878 1.21731G items/s
BitBlockCounterInlineSum/512 239832 ns 239831 ns 2897 4.07188G items/s
BitBlockCounterInlineSum/4096 129035 ns 129031 ns 5384 7.56842G items/s
BitBlockCounterInlineSum/32768 116983 ns 116981 ns 6003 8.34801G items/s
BitBlockCounterInlineSum/65536 116078 ns 116079 ns 6085 8.4129G items/s
BitBlockCounterInlineSumWithOffset/8 1682920 ns 1682911 ns 414 594.209M items/s
BitBlockCounterInlineSumWithOffset/64 809214 ns 809211 ns 874 1.20681G items/s
BitBlockCounterInlineSumWithOffset/512 251953 ns 251951 ns 2756 3.876G items/s
BitBlockCounterInlineSumWithOffset/4096 139827 ns 139826 ns 4988 6.98414G items/s
BitBlockCounterInlineSumWithOffset/32768 127648 ns 127648 ns 5498 7.65041G items/s
BitBlockCounterInlineSumWithOffset/65536 126759 ns 126756 ns 5551 7.70425G items/s
BitBlockCounterFourWordsSum/8 1655902 ns 1655861 ns 423 603.916M items/s
BitBlockCounterFourWordsSum/64 1000517 ns 1000507 ns 692 999.493M items/s
BitBlockCounterFourWordsSum/512 441463 ns 441466 ns 1595 2.21209G items/s
BitBlockCounterFourWordsSum/4096 128194 ns 128193 ns 5484 7.61788G items/s
BitBlockCounterFourWordsSum/32768 85335 ns 85334 ns 8050 11.444G items/s
BitBlockCounterFourWordsSum/65536 82101 ns 82101 ns 8498 11.8947G items/s
BitBlockCounterFourWordsSumWithOffset/8 1647208 ns 1647201 ns 422 607.091M items/s
BitBlockCounterFourWordsSumWithOffset/64 1025215 ns 1025183 ns 700 975.436M items/s
BitBlockCounterFourWordsSumWithOffset/512 462082 ns 462074 ns 1572 2.11343G items/s
BitBlockCounterFourWordsSumWithOffset/4096 132541 ns 132540 ns 5257 7.36808G items/s
BitBlockCounterFourWordsSumWithOffset/32768 92098 ns 92098 ns 7651 10.6035G items/s
BitBlockCounterFourWordsSumWithOffset/65536 87406 ns 87406 ns 7908 11.1727G items/s
BitmapReaderSum/8 1600625 ns 1600619 ns 442 624.758M items/s
BitmapReaderSum/64 885446 ns 885445 ns 789 1.10291G items/s
BitmapReaderSum/512 805230 ns 805219 ns 862 1.21279G items/s
BitmapReaderSum/4096 794678 ns 794676 ns 870 1.22888G items/s
BitmapReaderSum/32768 793758 ns 793749 ns 869 1.23032G items/s
BitmapReaderSum/65536 794828 ns 794812 ns 879 1.22867G items/s
BitmapReaderSumWithOffset/8 1667559 ns 1667514 ns 419 599.695M items/s
BitmapReaderSumWithOffset/64 930337 ns 930335 ns 755 1074.88M items/s
BitmapReaderSumWithOffset/512 841240 ns 841236 ns 841 1.16087G items/s
BitmapReaderSumWithOffset/4096 840091 ns 840087 ns 853 1.16245G items/s
BitmapReaderSumWithOffset/32768 828098 ns 828103 ns 846 1.17928G items/s
BitmapReaderSumWithOffset/65536 831186 ns 831191 ns 854 1.1749G items/s
BinaryBitBlockCounterSum/8 2974962 ns 2974893 ns 235 336.146M items/s
BinaryBitBlockCounterSum/64 1697417 ns 1697403 ns 414 589.135M items/s
BinaryBitBlockCounterSum/512 622981 ns 622973 ns 1165 1.56758G items/s
BinaryBitBlockCounterSum/4096 251586 ns 251582 ns 2831 3.88168G items/s
BinaryBitBlockCounterSum/32768 202682 ns 202683 ns 3345 4.81817G items/s
BinaryBitBlockCounterSum/65536 192151 ns 192150 ns 3653 5.0823G items/s
BinaryBitBlockCounterSumWithOffset/8 3178632 ns 3178625 ns 224 314.601M items/s
BinaryBitBlockCounterSumWithOffset/64 1713947 ns 1713944 ns 404 583.449M items/s
BinaryBitBlockCounterSumWithOffset/512 605481 ns 605476 ns 1158 1.61288G items/s
BinaryBitBlockCounterSumWithOffset/4096 258490 ns 258489 ns 2716 3.77796G items/s
BinaryBitBlockCounterSumWithOffset/32768 212582 ns 212577 ns 3273 4.59393G items/s
BinaryBitBlockCounterSumWithOffset/65536 208857 ns 208857 ns 3357 4.67575G items/s
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-640664623
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639550537
There are some issues with the benchmarks, I'll fix them and repost numbers
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639547819
Benchmarks
```
BitBlockCounterSum/8 1687631 ns 1687634 ns 407 592.545M items/s
BitBlockCounterSum/64 730805 ns 730798 ns 970 1.3363G items/s
BitBlockCounterSum/512 179964 ns 179949 ns 3923 5.42689G items/s
BitBlockCounterSum/4096 75167 ns 75165 ns 9461 12.9922G items/s
BitBlockCounterSum/32768 58679 ns 58679 ns 12065 16.6425G items/s
BitBlockCounterSum/65536 57921 ns 57922 ns 12311 16.86G items/s
BitBlockCounterSumWithOffset/8 1672146 ns 1672131 ns 419 598.039M items/s
BitBlockCounterSumWithOffset/64 735521 ns 735515 ns 944 1.32773G items/s
BitBlockCounterSumWithOffset/512 186470 ns 186468 ns 3782 5.23716G items/s
BitBlockCounterSumWithOffset/4096 82928 ns 82928 ns 8544 11.776G items/s
BitBlockCounterSumWithOffset/32768 66855 ns 66855 ns 10402 14.6072G items/s
BitBlockCounterSumWithOffset/65536 66821 ns 66822 ns 10131 14.6145G items/s
BitBlockCounterInlineSum/8 1563707 ns 1563716 ns 448 639.502M items/s
BitBlockCounterInlineSum/64 596145 ns 596143 ns 1166 1.63813G items/s
BitBlockCounterInlineSum/512 122461 ns 122461 ns 5554 7.9745G items/s
BitBlockCounterInlineSum/4096 30327 ns 30327 ns 22765 32.2007G items/s
BitBlockCounterInlineSum/32768 16484 ns 16484 ns 40881 59.2429G items/s
BitBlockCounterInlineSum/65536 16145 ns 16145 ns 43742 60.4885G items/s
BitBlockCounterInlineSumWithOffset/8 1577564 ns 1577548 ns 441 633.895M items/s
BitBlockCounterInlineSumWithOffset/64 609330 ns 609319 ns 1091 1.60271G items/s
BitBlockCounterInlineSumWithOffset/512 126930 ns 126931 ns 5580 7.69365G items/s
BitBlockCounterInlineSumWithOffset/4096 33720 ns 33720 ns 20848 28.9608G items/s
BitBlockCounterInlineSumWithOffset/32768 20190 ns 20190 ns 34575 48.3693G items/s
BitBlockCounterInlineSumWithOffset/65536 19914 ns 19914 ns 35329 49.0382G items/s
BitBlockCounterFourWordsSum/8 1557230 ns 1557208 ns 456 642.175M items/s
BitBlockCounterFourWordsSum/64 755518 ns 755523 ns 936 1.29257G items/s
BitBlockCounterFourWordsSum/512 314926 ns 314928 ns 2237 3.10091G items/s
BitBlockCounterFourWordsSum/4096 62012 ns 62011 ns 10790 15.7483G items/s
BitBlockCounterFourWordsSum/32768 18307 ns 18307 ns 38225 53.3436G items/s
BitBlockCounterFourWordsSum/65536 16755 ns 16755 ns 42106 58.2835G items/s
BitBlockCounterFourWordsSumWithOffset/8 1554037 ns 1554017 ns 455 643.494M items/s
BitBlockCounterFourWordsSumWithOffset/64 771938 ns 771925 ns 878 1.2651G items/s
BitBlockCounterFourWordsSumWithOffset/512 311751 ns 311753 ns 2200 3.13249G items/s
BitBlockCounterFourWordsSumWithOffset/4096 65821 ns 65821 ns 10635 14.8366G items/s
BitBlockCounterFourWordsSumWithOffset/32768 21528 ns 21528 ns 32434 45.362G items/s
BitBlockCounterFourWordsSumWithOffset/65536 20499 ns 20499 ns 35251 47.6401G items/s
BinaryBitBlockCounterSum/8 1369784 ns 1369771 ns 529 730.048M items/s
BinaryBitBlockCounterSum/64 75780 ns 75780 ns 9360 12.8867G items/s
BinaryBitBlockCounterSum/512 56490 ns 56491 ns 12401 17.2871G items/s
BinaryBitBlockCounterSum/4096 55213 ns 55213 ns 12094 17.6871G items/s
BinaryBitBlockCounterSum/32768 56260 ns 56260 ns 12740 17.358G items/s
BinaryBitBlockCounterSum/65536 57650 ns 57650 ns 12731 16.9394G items/s
BinaryBitBlockCounterSumWithOffset/8 1433256 ns 1433222 ns 479 697.728M items/s
BinaryBitBlockCounterSumWithOffset/64 96996 ns 96996 ns 7346 10.068G items/s
BinaryBitBlockCounterSumWithOffset/512 75189 ns 75188 ns 9420 12.9882G items/s
BinaryBitBlockCounterSumWithOffset/4096 74190 ns 74191 ns 9414 13.1628G items/s
BinaryBitBlockCounterSumWithOffset/32768 74885 ns 74885 ns 9362 13.0409G items/s
BinaryBitBlockCounterSumWithOffset/65536 74462 ns 74462 ns 9425 13.1148G items/s
```
I'm surprised that the binary sum (BinaryBitBlockCounterSum) seems faster than the unary sum (BitBlockCounterSum). I'm going to scrutinize the benchmark to check whether I've implemented something incorrectly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#discussion_r436753353
##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
#include <cstdint>
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
#include "arrow/util/visibility.h"
namespace arrow {
namespace internal {
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+ int16_t length;
+ int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
class ARROW_EXPORT BitBlockCounter {
public:
- struct Block {
- int16_t length;
- int16_t popcount;
- };
-
- static constexpr int16_t kTargetBlockLength = 256;
-
BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
: bitmap_(bitmap + start_offset / 8),
bits_remaining_(length),
offset_(start_offset % 8) {}
- /// \brief Return the next run of available bits, up to 256. The returned
- /// pair contains the size of run and the number of true values
- Block NextBlock();
+ /// \brief Return the next run of available bits, usually 256. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 256.
Review comment:
"Longer than 256" (or "longer than 64") doesn't sound like a great idea, if the calling code can rely for the length limit to optimize a bit better.
##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
#include <cstdint>
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
#include "arrow/util/visibility.h"
namespace arrow {
namespace internal {
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+ int16_t length;
+ int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
class ARROW_EXPORT BitBlockCounter {
public:
- struct Block {
- int16_t length;
- int16_t popcount;
- };
-
- static constexpr int16_t kTargetBlockLength = 256;
-
BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
: bitmap_(bitmap + start_offset / 8),
bits_remaining_(length),
offset_(start_offset % 8) {}
- /// \brief Return the next run of available bits, up to 256. The returned
- /// pair contains the size of run and the number of true values
- Block NextBlock();
+ /// \brief Return the next run of available bits, usually 256. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 256.
+ BitBlockCount NextFourWords();
+
+ /// \brief Return the next run of available bits, usually 64. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 64.
+ BitBlockCount NextWord();
+
+ /// \brief Inlineable implementation of NextWord
Review comment:
Is it worth keeping both inlined and non-inlined versions or was is just for the sake of benchmarking this PR?
##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
#include <cstdint>
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
#include "arrow/util/visibility.h"
namespace arrow {
namespace internal {
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+ int16_t length;
+ int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
class ARROW_EXPORT BitBlockCounter {
public:
- struct Block {
- int16_t length;
- int16_t popcount;
- };
-
- static constexpr int16_t kTargetBlockLength = 256;
-
BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
: bitmap_(bitmap + start_offset / 8),
bits_remaining_(length),
offset_(start_offset % 8) {}
- /// \brief Return the next run of available bits, up to 256. The returned
- /// pair contains the size of run and the number of true values
- Block NextBlock();
+ /// \brief Return the next run of available bits, usually 256. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 256.
+ BitBlockCount NextFourWords();
+
+ /// \brief Return the next run of available bits, usually 64. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 64.
+ BitBlockCount NextWord();
+
+ /// \brief Inlineable implementation of NextWord
Review comment:
Agreed. I suppose most uses will be less trivial than a simple sum.
##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
#include <cstdint>
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
#include "arrow/util/visibility.h"
namespace arrow {
namespace internal {
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+ int16_t length;
+ int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
class ARROW_EXPORT BitBlockCounter {
public:
- struct Block {
- int16_t length;
- int16_t popcount;
- };
-
- static constexpr int16_t kTargetBlockLength = 256;
-
BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
: bitmap_(bitmap + start_offset / 8),
bits_remaining_(length),
offset_(start_offset % 8) {}
- /// \brief Return the next run of available bits, up to 256. The returned
- /// pair contains the size of run and the number of true values
- Block NextBlock();
+ /// \brief Return the next run of available bits, usually 256. The returned
+ /// pair contains the size of run and the number of true values. When the
+ /// offset is greater than zero, the last block may be longer than 256.
Review comment:
That would sound better to me, yes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org