You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/05 14:53:08 UTC

[GitHub] [arrow] wesm opened a new pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

wesm opened a new pull request #7356:
URL: https://github.com/apache/arrow/pull/7356


   BinaryBitBlockCounter computes the popcount of the bitwise-and of each corresponding bit-run (with target of 64 bits at a time) of two bitmaps. This permits iterating through two validity bitmaps that don't have a lot of nulls much more quickly than using two BitmapReaders.
   
   I also added an inline-able BitBlockCounter::NextWordInline for 64-bits at a time for a single bitmap. It seems like this may be preferable to the four word version. Now we have `NextWord` and `NextFourWords` so the developer can choose either variant. 
   
   Benchmarks and tests covering all this are included. I'll post the benchmarks on my machine as a comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639697297


   OK, here are the binary benchmarks:
   
   ```
   --------------------------------------------------------------------------------
   Benchmark                                         Time           CPU Iterations
   --------------------------------------------------------------------------------
   BinaryBitBlockCounterSum/8                  3189138 ns    3189079 ns        216    313.57M items/s
   BinaryBitBlockCounterSum/64                 1839419 ns    1839359 ns        390   543.668M items/s
   BinaryBitBlockCounterSum/512                 630842 ns     630808 ns       1121   1.54811G items/s
   BinaryBitBlockCounterSum/4096                256330 ns     256332 ns       2746   3.80976G items/s
   BinaryBitBlockCounterSum/32768               204388 ns     204383 ns       3454   4.77809G items/s
   BinaryBitBlockCounterSum/65536               201268 ns     201260 ns       3428   4.85225G items/s
   BinaryBitBlockCounterSumWithOffset/8        3313859 ns    3313805 ns        206   301.768M items/s
   BinaryBitBlockCounterSumWithOffset/64       1966957 ns    1966805 ns        360   508.439M items/s
   BinaryBitBlockCounterSumWithOffset/512       672431 ns     672434 ns       1088   1.45228G items/s
   BinaryBitBlockCounterSumWithOffset/4096      286651 ns     286643 ns       2469   3.40689G items/s
   BinaryBitBlockCounterSumWithOffset/32768     228652 ns     228648 ns       3048   4.27103G items/s
   BinaryBitBlockCounterSumWithOffset/65536     228191 ns     228188 ns       3171   4.27964G items/s
   BinaryBitmapReaderSum/8                     3803716 ns    3803704 ns        183   262.902M items/s
   BinaryBitmapReaderSum/64                    2184717 ns    2184728 ns        316   457.723M items/s
   BinaryBitmapReaderSum/512                   2018442 ns    2018421 ns        344   495.437M items/s
   BinaryBitmapReaderSum/4096                  1997782 ns    1997729 ns        349   500.568M items/s
   BinaryBitmapReaderSum/32768                 2024333 ns    2024318 ns        367   493.994M items/s
   BinaryBitmapReaderSum/65536                 2018332 ns    2018340 ns        346   495.457M items/s
   BinaryBitmapReaderSumWithOffset/8           3926170 ns    3926185 ns        181     254.7M items/s
   BinaryBitmapReaderSumWithOffset/64          2198425 ns    2198417 ns        323   454.873M items/s
   BinaryBitmapReaderSumWithOffset/512         2001917 ns    2001864 ns        352   499.535M items/s
   BinaryBitmapReaderSumWithOffset/4096        1980845 ns    1980853 ns        351   504.833M items/s
   BinaryBitmapReaderSumWithOffset/32768       1979394 ns    1979403 ns        365   505.203M items/s
   BinaryBitmapReaderSumWithOffset/65536       2029335 ns    2029347 ns        345   492.769M items/s
   ```
   
   It seems that it is never a good idea to use BitmapReader for the binary case, even when the incidence of nulls is high, that even in that case naively using `BitUtil::GetBit` is better. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#discussion_r436764326



##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
 
 #include <cstdint>
 
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
 namespace internal {
 
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+  int16_t length;
+  int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
 class ARROW_EXPORT BitBlockCounter {
  public:
-  struct Block {
-    int16_t length;
-    int16_t popcount;
-  };
-
-  static constexpr int16_t kTargetBlockLength = 256;
-
   BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
       : bitmap_(bitmap + start_offset / 8),
         bits_remaining_(length),
         offset_(start_offset % 8) {}
 
-  /// \brief Return the next run of available bits, up to 256. The returned
-  /// pair contains the size of run and the number of true values
-  Block NextBlock();
+  /// \brief Return the next run of available bits, usually 256. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 256.
+  BitBlockCount NextFourWords();
+
+  /// \brief Return the next run of available bits, usually 64. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 64.
+  BitBlockCount NextWord();
+
+  /// \brief Inlineable implementation of NextWord

Review comment:
       It was largely for the sake of benchmarking. I don't think it's worth keeping the inline version, agreed? 

##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
 
 #include <cstdint>
 
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
 namespace internal {
 
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+  int16_t length;
+  int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
 class ARROW_EXPORT BitBlockCounter {
  public:
-  struct Block {
-    int16_t length;
-    int16_t popcount;
-  };
-
-  static constexpr int16_t kTargetBlockLength = 256;
-
   BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
       : bitmap_(bitmap + start_offset / 8),
         bits_remaining_(length),
         offset_(start_offset % 8) {}
 
-  /// \brief Return the next run of available bits, up to 256. The returned
-  /// pair contains the size of run and the number of true values
-  Block NextBlock();
+  /// \brief Return the next run of available bits, usually 256. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 256.

Review comment:
       Well, I can alter the code to yield two blocks at the end rather than a single overfull block like now. You think that is preferable? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639555962


   OK here are the fixed benchmarks. These match my intuition now. This shows that there is very little downside and nearly always upside to using `BitBlockCounter` over `BitmapReader`. There isn't a benchmark showing the naive double-BitmapReader for the binary case but that could be added too
   
   ```
   -----------------------------------------------------------------------------------
   Benchmark                                            Time           CPU Iterations
   -----------------------------------------------------------------------------------
   BitBlockCounterSum/8                           1732025 ns    1732026 ns        400   577.358M items/s
   BitBlockCounterSum/64                           832029 ns     832025 ns        836   1.17372G items/s
   BitBlockCounterSum/512                          265104 ns     265105 ns       2666   3.68368G items/s
   BitBlockCounterSum/4096                         151056 ns     151056 ns       4647   6.46488G items/s
   BitBlockCounterSum/32768                        138635 ns     138632 ns       5081    7.0443G items/s
   BitBlockCounterSum/65536                        137021 ns     137019 ns       4977    7.1272G items/s
   BitBlockCounterSumWithOffset/8                 1779549 ns    1779525 ns        395   561.948M items/s
   BitBlockCounterSumWithOffset/64                 855158 ns     855149 ns        823   1.14198G items/s
   BitBlockCounterSumWithOffset/512                273883 ns     273874 ns       2578   3.56574G items/s
   BitBlockCounterSumWithOffset/4096               154422 ns     154422 ns       4499   6.32397G items/s
   BitBlockCounterSumWithOffset/32768              141266 ns     141265 ns       4887   6.91299G items/s
   BitBlockCounterSumWithOffset/65536              140049 ns     140048 ns       5024   6.97305G items/s
   BitBlockCounterInlineSum/8                     1714554 ns    1714564 ns        410   583.239M items/s
   BitBlockCounterInlineSum/64                     802222 ns     802227 ns        878   1.21731G items/s
   BitBlockCounterInlineSum/512                    239832 ns     239831 ns       2897   4.07188G items/s
   BitBlockCounterInlineSum/4096                   129035 ns     129031 ns       5384   7.56842G items/s
   BitBlockCounterInlineSum/32768                  116983 ns     116981 ns       6003   8.34801G items/s
   BitBlockCounterInlineSum/65536                  116078 ns     116079 ns       6085    8.4129G items/s
   BitBlockCounterInlineSumWithOffset/8           1682920 ns    1682911 ns        414   594.209M items/s
   BitBlockCounterInlineSumWithOffset/64           809214 ns     809211 ns        874   1.20681G items/s
   BitBlockCounterInlineSumWithOffset/512          251953 ns     251951 ns       2756     3.876G items/s
   BitBlockCounterInlineSumWithOffset/4096         139827 ns     139826 ns       4988   6.98414G items/s
   BitBlockCounterInlineSumWithOffset/32768        127648 ns     127648 ns       5498   7.65041G items/s
   BitBlockCounterInlineSumWithOffset/65536        126759 ns     126756 ns       5551   7.70425G items/s
   BitBlockCounterFourWordsSum/8                  1655902 ns    1655861 ns        423   603.916M items/s
   BitBlockCounterFourWordsSum/64                 1000517 ns    1000507 ns        692   999.493M items/s
   BitBlockCounterFourWordsSum/512                 441463 ns     441466 ns       1595   2.21209G items/s
   BitBlockCounterFourWordsSum/4096                128194 ns     128193 ns       5484   7.61788G items/s
   BitBlockCounterFourWordsSum/32768                85335 ns      85334 ns       8050    11.444G items/s
   BitBlockCounterFourWordsSum/65536                82101 ns      82101 ns       8498   11.8947G items/s
   BitBlockCounterFourWordsSumWithOffset/8        1647208 ns    1647201 ns        422   607.091M items/s
   BitBlockCounterFourWordsSumWithOffset/64       1025215 ns    1025183 ns        700   975.436M items/s
   BitBlockCounterFourWordsSumWithOffset/512       462082 ns     462074 ns       1572   2.11343G items/s
   BitBlockCounterFourWordsSumWithOffset/4096      132541 ns     132540 ns       5257   7.36808G items/s
   BitBlockCounterFourWordsSumWithOffset/32768      92098 ns      92098 ns       7651   10.6035G items/s
   BitBlockCounterFourWordsSumWithOffset/65536      87406 ns      87406 ns       7908   11.1727G items/s
   BitmapReaderSum/8                              1600625 ns    1600619 ns        442   624.758M items/s
   BitmapReaderSum/64                              885446 ns     885445 ns        789   1.10291G items/s
   BitmapReaderSum/512                             805230 ns     805219 ns        862   1.21279G items/s
   BitmapReaderSum/4096                            794678 ns     794676 ns        870   1.22888G items/s
   BitmapReaderSum/32768                           793758 ns     793749 ns        869   1.23032G items/s
   BitmapReaderSum/65536                           794828 ns     794812 ns        879   1.22867G items/s
   BitmapReaderSumWithOffset/8                    1667559 ns    1667514 ns        419   599.695M items/s
   BitmapReaderSumWithOffset/64                    930337 ns     930335 ns        755   1074.88M items/s
   BitmapReaderSumWithOffset/512                   841240 ns     841236 ns        841   1.16087G items/s
   BitmapReaderSumWithOffset/4096                  840091 ns     840087 ns        853   1.16245G items/s
   BitmapReaderSumWithOffset/32768                 828098 ns     828103 ns        846   1.17928G items/s
   BitmapReaderSumWithOffset/65536                 831186 ns     831191 ns        854    1.1749G items/s
   BinaryBitBlockCounterSum/8                     2974962 ns    2974893 ns        235   336.146M items/s
   BinaryBitBlockCounterSum/64                    1697417 ns    1697403 ns        414   589.135M items/s
   BinaryBitBlockCounterSum/512                    622981 ns     622973 ns       1165   1.56758G items/s
   BinaryBitBlockCounterSum/4096                   251586 ns     251582 ns       2831   3.88168G items/s
   BinaryBitBlockCounterSum/32768                  202682 ns     202683 ns       3345   4.81817G items/s
   BinaryBitBlockCounterSum/65536                  192151 ns     192150 ns       3653    5.0823G items/s
   BinaryBitBlockCounterSumWithOffset/8           3178632 ns    3178625 ns        224   314.601M items/s
   BinaryBitBlockCounterSumWithOffset/64          1713947 ns    1713944 ns        404   583.449M items/s
   BinaryBitBlockCounterSumWithOffset/512          605481 ns     605476 ns       1158   1.61288G items/s
   BinaryBitBlockCounterSumWithOffset/4096         258490 ns     258489 ns       2716   3.77796G items/s
   BinaryBitBlockCounterSumWithOffset/32768        212582 ns     212577 ns       3273   4.59393G items/s
   BinaryBitBlockCounterSumWithOffset/65536        208857 ns     208857 ns       3357   4.67575G items/s
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm removed a comment on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm removed a comment on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639547819


   Benchmarks
   
   ```
   BitBlockCounterSum/8                           1687631 ns    1687634 ns        407   592.545M items/s
   BitBlockCounterSum/64                           730805 ns     730798 ns        970    1.3363G items/s
   BitBlockCounterSum/512                          179964 ns     179949 ns       3923   5.42689G items/s
   BitBlockCounterSum/4096                          75167 ns      75165 ns       9461   12.9922G items/s
   BitBlockCounterSum/32768                         58679 ns      58679 ns      12065   16.6425G items/s
   BitBlockCounterSum/65536                         57921 ns      57922 ns      12311     16.86G items/s
   BitBlockCounterSumWithOffset/8                 1672146 ns    1672131 ns        419   598.039M items/s
   BitBlockCounterSumWithOffset/64                 735521 ns     735515 ns        944   1.32773G items/s
   BitBlockCounterSumWithOffset/512                186470 ns     186468 ns       3782   5.23716G items/s
   BitBlockCounterSumWithOffset/4096                82928 ns      82928 ns       8544    11.776G items/s
   BitBlockCounterSumWithOffset/32768               66855 ns      66855 ns      10402   14.6072G items/s
   BitBlockCounterSumWithOffset/65536               66821 ns      66822 ns      10131   14.6145G items/s
   BitBlockCounterInlineSum/8                     1563707 ns    1563716 ns        448   639.502M items/s
   BitBlockCounterInlineSum/64                     596145 ns     596143 ns       1166   1.63813G items/s
   BitBlockCounterInlineSum/512                    122461 ns     122461 ns       5554    7.9745G items/s
   BitBlockCounterInlineSum/4096                    30327 ns      30327 ns      22765   32.2007G items/s
   BitBlockCounterInlineSum/32768                   16484 ns      16484 ns      40881   59.2429G items/s
   BitBlockCounterInlineSum/65536                   16145 ns      16145 ns      43742   60.4885G items/s
   BitBlockCounterInlineSumWithOffset/8           1577564 ns    1577548 ns        441   633.895M items/s
   BitBlockCounterInlineSumWithOffset/64           609330 ns     609319 ns       1091   1.60271G items/s
   BitBlockCounterInlineSumWithOffset/512          126930 ns     126931 ns       5580   7.69365G items/s
   BitBlockCounterInlineSumWithOffset/4096          33720 ns      33720 ns      20848   28.9608G items/s
   BitBlockCounterInlineSumWithOffset/32768         20190 ns      20190 ns      34575   48.3693G items/s
   BitBlockCounterInlineSumWithOffset/65536         19914 ns      19914 ns      35329   49.0382G items/s
   BitBlockCounterFourWordsSum/8                  1557230 ns    1557208 ns        456   642.175M items/s
   BitBlockCounterFourWordsSum/64                  755518 ns     755523 ns        936   1.29257G items/s
   BitBlockCounterFourWordsSum/512                 314926 ns     314928 ns       2237   3.10091G items/s
   BitBlockCounterFourWordsSum/4096                 62012 ns      62011 ns      10790   15.7483G items/s
   BitBlockCounterFourWordsSum/32768                18307 ns      18307 ns      38225   53.3436G items/s
   BitBlockCounterFourWordsSum/65536                16755 ns      16755 ns      42106   58.2835G items/s
   BitBlockCounterFourWordsSumWithOffset/8        1554037 ns    1554017 ns        455   643.494M items/s
   BitBlockCounterFourWordsSumWithOffset/64        771938 ns     771925 ns        878    1.2651G items/s
   BitBlockCounterFourWordsSumWithOffset/512       311751 ns     311753 ns       2200   3.13249G items/s
   BitBlockCounterFourWordsSumWithOffset/4096       65821 ns      65821 ns      10635   14.8366G items/s
   BitBlockCounterFourWordsSumWithOffset/32768      21528 ns      21528 ns      32434    45.362G items/s
   BitBlockCounterFourWordsSumWithOffset/65536      20499 ns      20499 ns      35251   47.6401G items/s
   BinaryBitBlockCounterSum/8                     1369784 ns    1369771 ns        529   730.048M items/s
   BinaryBitBlockCounterSum/64                      75780 ns      75780 ns       9360   12.8867G items/s
   BinaryBitBlockCounterSum/512                     56490 ns      56491 ns      12401   17.2871G items/s
   BinaryBitBlockCounterSum/4096                    55213 ns      55213 ns      12094   17.6871G items/s
   BinaryBitBlockCounterSum/32768                   56260 ns      56260 ns      12740    17.358G items/s
   BinaryBitBlockCounterSum/65536                   57650 ns      57650 ns      12731   16.9394G items/s
   BinaryBitBlockCounterSumWithOffset/8           1433256 ns    1433222 ns        479   697.728M items/s
   BinaryBitBlockCounterSumWithOffset/64            96996 ns      96996 ns       7346    10.068G items/s
   BinaryBitBlockCounterSumWithOffset/512           75189 ns      75188 ns       9420   12.9882G items/s
   BinaryBitBlockCounterSumWithOffset/4096          74190 ns      74191 ns       9414   13.1628G items/s
   BinaryBitBlockCounterSumWithOffset/32768         74885 ns      74885 ns       9362   13.0409G items/s
   BinaryBitBlockCounterSumWithOffset/65536         74462 ns      74462 ns       9425   13.1148G items/s
   ```
   
   I'm surprised that the binary sum (BinaryBitBlockCounterSum) seems faster than the unary sum (BitBlockCounterSum). I'm going to scrutinize the benchmark to check whether I've implemented something incorrectly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639685965


   I'm adding a BitmapReader-based comparison for the binary case. Stay tuned


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-640673030






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639899415


   I'm going to be basing some patches on top of this, I can rebase whenever this gets reviewed/merged


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639553741


   https://issues.apache.org/jira/browse/ARROW-9034


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7356:
URL: https://github.com/apache/arrow/pull/7356


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-640673030






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639555962


   OK here are the fixed benchmarks. These match my intuition now. This shows that there is very little downside and nearly always upside to using `BitBlockCounter` over `BitmapReader` (at least when there is 1% nulls or less, when there is a higher percentage you don't drop that much performance for using it). There isn't a benchmark showing the naive double-BitmapReader for the binary case but that could be added too
   
   ```
   -----------------------------------------------------------------------------------
   Benchmark                                            Time           CPU Iterations
   -----------------------------------------------------------------------------------
   BitBlockCounterSum/8                           1732025 ns    1732026 ns        400   577.358M items/s
   BitBlockCounterSum/64                           832029 ns     832025 ns        836   1.17372G items/s
   BitBlockCounterSum/512                          265104 ns     265105 ns       2666   3.68368G items/s
   BitBlockCounterSum/4096                         151056 ns     151056 ns       4647   6.46488G items/s
   BitBlockCounterSum/32768                        138635 ns     138632 ns       5081    7.0443G items/s
   BitBlockCounterSum/65536                        137021 ns     137019 ns       4977    7.1272G items/s
   BitBlockCounterSumWithOffset/8                 1779549 ns    1779525 ns        395   561.948M items/s
   BitBlockCounterSumWithOffset/64                 855158 ns     855149 ns        823   1.14198G items/s
   BitBlockCounterSumWithOffset/512                273883 ns     273874 ns       2578   3.56574G items/s
   BitBlockCounterSumWithOffset/4096               154422 ns     154422 ns       4499   6.32397G items/s
   BitBlockCounterSumWithOffset/32768              141266 ns     141265 ns       4887   6.91299G items/s
   BitBlockCounterSumWithOffset/65536              140049 ns     140048 ns       5024   6.97305G items/s
   BitBlockCounterInlineSum/8                     1714554 ns    1714564 ns        410   583.239M items/s
   BitBlockCounterInlineSum/64                     802222 ns     802227 ns        878   1.21731G items/s
   BitBlockCounterInlineSum/512                    239832 ns     239831 ns       2897   4.07188G items/s
   BitBlockCounterInlineSum/4096                   129035 ns     129031 ns       5384   7.56842G items/s
   BitBlockCounterInlineSum/32768                  116983 ns     116981 ns       6003   8.34801G items/s
   BitBlockCounterInlineSum/65536                  116078 ns     116079 ns       6085    8.4129G items/s
   BitBlockCounterInlineSumWithOffset/8           1682920 ns    1682911 ns        414   594.209M items/s
   BitBlockCounterInlineSumWithOffset/64           809214 ns     809211 ns        874   1.20681G items/s
   BitBlockCounterInlineSumWithOffset/512          251953 ns     251951 ns       2756     3.876G items/s
   BitBlockCounterInlineSumWithOffset/4096         139827 ns     139826 ns       4988   6.98414G items/s
   BitBlockCounterInlineSumWithOffset/32768        127648 ns     127648 ns       5498   7.65041G items/s
   BitBlockCounterInlineSumWithOffset/65536        126759 ns     126756 ns       5551   7.70425G items/s
   BitBlockCounterFourWordsSum/8                  1655902 ns    1655861 ns        423   603.916M items/s
   BitBlockCounterFourWordsSum/64                 1000517 ns    1000507 ns        692   999.493M items/s
   BitBlockCounterFourWordsSum/512                 441463 ns     441466 ns       1595   2.21209G items/s
   BitBlockCounterFourWordsSum/4096                128194 ns     128193 ns       5484   7.61788G items/s
   BitBlockCounterFourWordsSum/32768                85335 ns      85334 ns       8050    11.444G items/s
   BitBlockCounterFourWordsSum/65536                82101 ns      82101 ns       8498   11.8947G items/s
   BitBlockCounterFourWordsSumWithOffset/8        1647208 ns    1647201 ns        422   607.091M items/s
   BitBlockCounterFourWordsSumWithOffset/64       1025215 ns    1025183 ns        700   975.436M items/s
   BitBlockCounterFourWordsSumWithOffset/512       462082 ns     462074 ns       1572   2.11343G items/s
   BitBlockCounterFourWordsSumWithOffset/4096      132541 ns     132540 ns       5257   7.36808G items/s
   BitBlockCounterFourWordsSumWithOffset/32768      92098 ns      92098 ns       7651   10.6035G items/s
   BitBlockCounterFourWordsSumWithOffset/65536      87406 ns      87406 ns       7908   11.1727G items/s
   BitmapReaderSum/8                              1600625 ns    1600619 ns        442   624.758M items/s
   BitmapReaderSum/64                              885446 ns     885445 ns        789   1.10291G items/s
   BitmapReaderSum/512                             805230 ns     805219 ns        862   1.21279G items/s
   BitmapReaderSum/4096                            794678 ns     794676 ns        870   1.22888G items/s
   BitmapReaderSum/32768                           793758 ns     793749 ns        869   1.23032G items/s
   BitmapReaderSum/65536                           794828 ns     794812 ns        879   1.22867G items/s
   BitmapReaderSumWithOffset/8                    1667559 ns    1667514 ns        419   599.695M items/s
   BitmapReaderSumWithOffset/64                    930337 ns     930335 ns        755   1074.88M items/s
   BitmapReaderSumWithOffset/512                   841240 ns     841236 ns        841   1.16087G items/s
   BitmapReaderSumWithOffset/4096                  840091 ns     840087 ns        853   1.16245G items/s
   BitmapReaderSumWithOffset/32768                 828098 ns     828103 ns        846   1.17928G items/s
   BitmapReaderSumWithOffset/65536                 831186 ns     831191 ns        854    1.1749G items/s
   BinaryBitBlockCounterSum/8                     2974962 ns    2974893 ns        235   336.146M items/s
   BinaryBitBlockCounterSum/64                    1697417 ns    1697403 ns        414   589.135M items/s
   BinaryBitBlockCounterSum/512                    622981 ns     622973 ns       1165   1.56758G items/s
   BinaryBitBlockCounterSum/4096                   251586 ns     251582 ns       2831   3.88168G items/s
   BinaryBitBlockCounterSum/32768                  202682 ns     202683 ns       3345   4.81817G items/s
   BinaryBitBlockCounterSum/65536                  192151 ns     192150 ns       3653    5.0823G items/s
   BinaryBitBlockCounterSumWithOffset/8           3178632 ns    3178625 ns        224   314.601M items/s
   BinaryBitBlockCounterSumWithOffset/64          1713947 ns    1713944 ns        404   583.449M items/s
   BinaryBitBlockCounterSumWithOffset/512          605481 ns     605476 ns       1158   1.61288G items/s
   BinaryBitBlockCounterSumWithOffset/4096         258490 ns     258489 ns       2716   3.77796G items/s
   BinaryBitBlockCounterSumWithOffset/32768        212582 ns     212577 ns       3273   4.59393G items/s
   BinaryBitBlockCounterSumWithOffset/65536        208857 ns     208857 ns       3357   4.67575G items/s
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-640664623






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639550537


   There are some issues with the benchmarks, I'll fix them and repost numbers


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#issuecomment-639547819


   Benchmarks
   
   ```
   BitBlockCounterSum/8                           1687631 ns    1687634 ns        407   592.545M items/s
   BitBlockCounterSum/64                           730805 ns     730798 ns        970    1.3363G items/s
   BitBlockCounterSum/512                          179964 ns     179949 ns       3923   5.42689G items/s
   BitBlockCounterSum/4096                          75167 ns      75165 ns       9461   12.9922G items/s
   BitBlockCounterSum/32768                         58679 ns      58679 ns      12065   16.6425G items/s
   BitBlockCounterSum/65536                         57921 ns      57922 ns      12311     16.86G items/s
   BitBlockCounterSumWithOffset/8                 1672146 ns    1672131 ns        419   598.039M items/s
   BitBlockCounterSumWithOffset/64                 735521 ns     735515 ns        944   1.32773G items/s
   BitBlockCounterSumWithOffset/512                186470 ns     186468 ns       3782   5.23716G items/s
   BitBlockCounterSumWithOffset/4096                82928 ns      82928 ns       8544    11.776G items/s
   BitBlockCounterSumWithOffset/32768               66855 ns      66855 ns      10402   14.6072G items/s
   BitBlockCounterSumWithOffset/65536               66821 ns      66822 ns      10131   14.6145G items/s
   BitBlockCounterInlineSum/8                     1563707 ns    1563716 ns        448   639.502M items/s
   BitBlockCounterInlineSum/64                     596145 ns     596143 ns       1166   1.63813G items/s
   BitBlockCounterInlineSum/512                    122461 ns     122461 ns       5554    7.9745G items/s
   BitBlockCounterInlineSum/4096                    30327 ns      30327 ns      22765   32.2007G items/s
   BitBlockCounterInlineSum/32768                   16484 ns      16484 ns      40881   59.2429G items/s
   BitBlockCounterInlineSum/65536                   16145 ns      16145 ns      43742   60.4885G items/s
   BitBlockCounterInlineSumWithOffset/8           1577564 ns    1577548 ns        441   633.895M items/s
   BitBlockCounterInlineSumWithOffset/64           609330 ns     609319 ns       1091   1.60271G items/s
   BitBlockCounterInlineSumWithOffset/512          126930 ns     126931 ns       5580   7.69365G items/s
   BitBlockCounterInlineSumWithOffset/4096          33720 ns      33720 ns      20848   28.9608G items/s
   BitBlockCounterInlineSumWithOffset/32768         20190 ns      20190 ns      34575   48.3693G items/s
   BitBlockCounterInlineSumWithOffset/65536         19914 ns      19914 ns      35329   49.0382G items/s
   BitBlockCounterFourWordsSum/8                  1557230 ns    1557208 ns        456   642.175M items/s
   BitBlockCounterFourWordsSum/64                  755518 ns     755523 ns        936   1.29257G items/s
   BitBlockCounterFourWordsSum/512                 314926 ns     314928 ns       2237   3.10091G items/s
   BitBlockCounterFourWordsSum/4096                 62012 ns      62011 ns      10790   15.7483G items/s
   BitBlockCounterFourWordsSum/32768                18307 ns      18307 ns      38225   53.3436G items/s
   BitBlockCounterFourWordsSum/65536                16755 ns      16755 ns      42106   58.2835G items/s
   BitBlockCounterFourWordsSumWithOffset/8        1554037 ns    1554017 ns        455   643.494M items/s
   BitBlockCounterFourWordsSumWithOffset/64        771938 ns     771925 ns        878    1.2651G items/s
   BitBlockCounterFourWordsSumWithOffset/512       311751 ns     311753 ns       2200   3.13249G items/s
   BitBlockCounterFourWordsSumWithOffset/4096       65821 ns      65821 ns      10635   14.8366G items/s
   BitBlockCounterFourWordsSumWithOffset/32768      21528 ns      21528 ns      32434    45.362G items/s
   BitBlockCounterFourWordsSumWithOffset/65536      20499 ns      20499 ns      35251   47.6401G items/s
   BinaryBitBlockCounterSum/8                     1369784 ns    1369771 ns        529   730.048M items/s
   BinaryBitBlockCounterSum/64                      75780 ns      75780 ns       9360   12.8867G items/s
   BinaryBitBlockCounterSum/512                     56490 ns      56491 ns      12401   17.2871G items/s
   BinaryBitBlockCounterSum/4096                    55213 ns      55213 ns      12094   17.6871G items/s
   BinaryBitBlockCounterSum/32768                   56260 ns      56260 ns      12740    17.358G items/s
   BinaryBitBlockCounterSum/65536                   57650 ns      57650 ns      12731   16.9394G items/s
   BinaryBitBlockCounterSumWithOffset/8           1433256 ns    1433222 ns        479   697.728M items/s
   BinaryBitBlockCounterSumWithOffset/64            96996 ns      96996 ns       7346    10.068G items/s
   BinaryBitBlockCounterSumWithOffset/512           75189 ns      75188 ns       9420   12.9882G items/s
   BinaryBitBlockCounterSumWithOffset/4096          74190 ns      74191 ns       9414   13.1628G items/s
   BinaryBitBlockCounterSumWithOffset/32768         74885 ns      74885 ns       9362   13.0409G items/s
   BinaryBitBlockCounterSumWithOffset/65536         74462 ns      74462 ns       9425   13.1148G items/s
   ```
   
   I'm surprised that the binary sum (BinaryBitBlockCounterSum) seems faster than the unary sum (BitBlockCounterSum). I'm going to scrutinize the benchmark to check whether I've implemented something incorrectly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7356: ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7356:
URL: https://github.com/apache/arrow/pull/7356#discussion_r436753353



##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
 
 #include <cstdint>
 
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
 namespace internal {
 
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+  int16_t length;
+  int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
 class ARROW_EXPORT BitBlockCounter {
  public:
-  struct Block {
-    int16_t length;
-    int16_t popcount;
-  };
-
-  static constexpr int16_t kTargetBlockLength = 256;
-
   BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
       : bitmap_(bitmap + start_offset / 8),
         bits_remaining_(length),
         offset_(start_offset % 8) {}
 
-  /// \brief Return the next run of available bits, up to 256. The returned
-  /// pair contains the size of run and the number of true values
-  Block NextBlock();
+  /// \brief Return the next run of available bits, usually 256. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 256.

Review comment:
       "Longer than 256" (or "longer than 64") doesn't sound like a great idea, if the calling code can rely for the length limit to optimize a bit better.

##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
 
 #include <cstdint>
 
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
 namespace internal {
 
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+  int16_t length;
+  int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
 class ARROW_EXPORT BitBlockCounter {
  public:
-  struct Block {
-    int16_t length;
-    int16_t popcount;
-  };
-
-  static constexpr int16_t kTargetBlockLength = 256;
-
   BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
       : bitmap_(bitmap + start_offset / 8),
         bits_remaining_(length),
         offset_(start_offset % 8) {}
 
-  /// \brief Return the next run of available bits, up to 256. The returned
-  /// pair contains the size of run and the number of true values
-  Block NextBlock();
+  /// \brief Return the next run of available bits, usually 256. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 256.
+  BitBlockCount NextFourWords();
+
+  /// \brief Return the next run of available bits, usually 64. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 64.
+  BitBlockCount NextWord();
+
+  /// \brief Inlineable implementation of NextWord

Review comment:
       Is it worth keeping both inlined and non-inlined versions or was is just for the sake of benchmarking this PR?

##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
 
 #include <cstdint>
 
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
 namespace internal {
 
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+  int16_t length;
+  int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
 class ARROW_EXPORT BitBlockCounter {
  public:
-  struct Block {
-    int16_t length;
-    int16_t popcount;
-  };
-
-  static constexpr int16_t kTargetBlockLength = 256;
-
   BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
       : bitmap_(bitmap + start_offset / 8),
         bits_remaining_(length),
         offset_(start_offset % 8) {}
 
-  /// \brief Return the next run of available bits, up to 256. The returned
-  /// pair contains the size of run and the number of true values
-  Block NextBlock();
+  /// \brief Return the next run of available bits, usually 256. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 256.
+  BitBlockCount NextFourWords();
+
+  /// \brief Return the next run of available bits, usually 64. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 64.
+  BitBlockCount NextWord();
+
+  /// \brief Inlineable implementation of NextWord

Review comment:
       Agreed. I suppose most uses will be less trivial than a simple sum.

##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -19,37 +19,108 @@
 
 #include <cstdint>
 
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/ubsan.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
 namespace internal {
 
-/// \brief A class that scans through a true/false bitmap to yield blocks of up
-/// to 256 bits at a time along with their popcount. This is used to accelerate
-/// processing of mostly-not-null array data.
+/// \brief Return value from bit block counters: the total number of bits and
+/// the number of set bits.
+struct BitBlockCount {
+  int16_t length;
+  int16_t popcount;
+};
+
+/// \brief A class that scans through a true/false bitmap to compute popcounts
+/// 64 or 256 bits at a time. This is used to accelerate processing of
+/// mostly-not-null array data.
 class ARROW_EXPORT BitBlockCounter {
  public:
-  struct Block {
-    int16_t length;
-    int16_t popcount;
-  };
-
-  static constexpr int16_t kTargetBlockLength = 256;
-
   BitBlockCounter(const uint8_t* bitmap, int64_t start_offset, int64_t length)
       : bitmap_(bitmap + start_offset / 8),
         bits_remaining_(length),
         offset_(start_offset % 8) {}
 
-  /// \brief Return the next run of available bits, up to 256. The returned
-  /// pair contains the size of run and the number of true values
-  Block NextBlock();
+  /// \brief Return the next run of available bits, usually 256. The returned
+  /// pair contains the size of run and the number of true values. When the
+  /// offset is greater than zero, the last block may be longer than 256.

Review comment:
       That would sound better to me, yes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org