You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/03 23:55:53 UTC

[GitHub] [arrow] emkornfield commented on a change in pull request #7346: ARROW-9029: [C++] Implement BitmapScanner for much faster processing of data without many nulls

emkornfield commented on a change in pull request #7346:
URL: https://github.com/apache/arrow/pull/7346#discussion_r434918911



##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -598,5 +598,49 @@ Result<std::shared_ptr<Buffer>> BitmapAllButOne(MemoryPool* pool, int64_t length
   return std::move(buffer);
 }
 
+std::pair<int64_t, int64_t> BitmapScanner::NextRun() {
+  auto load_word = [](const uint8_t* bytes) -> uint64_t {
+    return BitUtil::ToLittleEndian(util::SafeLoadAs<uint64_t>(bytes));
+  };
+  auto shift_word = [](uint64_t current, uint64_t next, int64_t shift) -> uint64_t {
+    return (current >> shift) | (next << (64 - shift));
+  };
+
+  // When the offset is > 0, we need there to be a word beyond the last aligned
+  // word in the bitmap for the bit shifting logic.
+  const int64_t bits_required_to_scan_words = offset_ == 0 ? 256 : 256 + (64 - offset_);
+  if (bits_remaining_ < bits_required_to_scan_words) {
+    // End of the bitmap, leave it to the caller to decide how to best check
+    // these bits, no need to do redundant computation here.
+    const int64_t run_length = bits_remaining_;
+    bits_remaining_ -= run_length;
+    return {run_length, CountSetBits(bitmap_, offset_, run_length)};
+  }
+
+  int64_t total_popcount = 0;
+  if (offset_ == 0) {
+    total_popcount += __builtin_popcountll(load_word(bitmap_));
+    total_popcount += __builtin_popcountll(load_word(bitmap_ + 8));
+    total_popcount += __builtin_popcountll(load_word(bitmap_ + 16));
+    total_popcount += __builtin_popcountll(load_word(bitmap_ + 24));
+  } else {
+    auto current = load_word(bitmap_);
+    auto next = load_word(bitmap_ + 8);
+    total_popcount += __builtin_popcountll(shift_word(current, next, offset_));

Review comment:
       I think we have wrappers for these functions, is there a reason you aren't using them?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org