You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/09 04:24:38 UTC

[GitHub] [arrow] nirandaperera opened a new pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

nirandaperera opened a new pull request #10487:
URL: https://github.com/apache/arrow/pull/10487


   This change adds a `Bitmap::VisitWordsAndWrite` method, that outputs the values of the visitor lambda function to a provided bitmap. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861737568


   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870112738


   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r654451587



##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -2156,5 +2187,169 @@ TEST(Bitmap, VisitWordsAnd) {
   }
 }
 
+void random_bool_vector(std::vector<bool>& vec, int64_t size, double p = 0.5) {
+  vec.reserve(size);
+  std::random_device rd;
+  std::mt19937 gen(rd());
+  std::bernoulli_distribution d(p);
+
+  for (int n = 0; n < size; ++n) {
+    vec.push_back(d(gen));
+  }
+}
+
+std::string VectorToString(const std::vector<bool>& v) {
+  std::string out(v.size() + +((v.size() - 1) / 8), ' ');
+  for (size_t i = 0; i < v.size(); ++i) {
+    out[i + (i / 8)] = v[i] ? '1' : '0';
+  }
+  return out;
+}
+
+void VerifyBoolVectorAndBitmap(const Bitmap& bitmap, const std::vector<bool>& expected) {
+  arrow::BooleanBuilder boolean_builder;
+  ASSERT_OK(boolean_builder.AppendValues(expected));
+  ASSERT_OK_AND_ASSIGN(auto arr, boolean_builder.Finish());
+
+  ASSERT_TRUE(BitmapEquals(bitmap.buffer()->data(), bitmap.offset(),
+                           arr->data()->buffers[1]->data(), 0, expected.size()))
+      << "exp: " << VectorToString(expected) << "\ngot: " << bitmap.ToString();
+}
+
+class TestBitmapVisitAndWriteOutputNoOffset : public ::testing::TestWithParam<int32_t> {};
+
+TEST_P(TestBitmapVisitAndWriteOutputNoOffset, Test1) {
+  auto part = GetParam();
+  int64_t bits = 4 * part;
+  std::vector<bool> data;
+  random_bool_vector(data, bits);

Review comment:
       I dont know why I didn't do that previously! :joy: 

##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
   Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
   Bitmap left_valid = GetBitmap(left_d, 0);
   Bitmap right_valid = GetBitmap(right_d, 0);
-  // sometimes Bitmaps will be ignored, in which case we replace access to them with
-  // duplicated (probably elided) access to cond_data
-  const Bitmap& _ = cond_data;
-
-  // lambda function that will be used inside the visitor
-  uint64_t* out_validity = nullptr;
-  int64_t i = 0;
-  auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
-                   uint64_t r_valid) {
-    out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
-    i++;
-  };
 
   // cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
   // In the following cases, we dont need to allocate out_valid bitmap

Review comment:
       One thing to note here is, `if-else` kernels have the following, 
   ```c++
       kernel.null_handling = NullHandling::COMPUTED_NO_PREALLOCATE;
       kernel.mem_allocation = MemAllocation::NO_PREALLOCATE;
   ```
   So, I allocating buffers are entirely handled by the kernel. So, I believe writing into slices should work without a problem IMO. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982118


   @ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-scalar-boolean-benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870774320


   > ArrayArrayKernel<KleeneAnd>/524288/0 15.853 GiB/sec 14.343 GiB/sec    -9.527 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}
   
   Regression for `ArrayArrayKernel<KleeneAnd>/32768/0` and `ArrayArrayKernel<KleeneAnd>/524288/0` is expected because we are populating a validity buffer always now, because the `exec` infrastructure always allocates memory for the validity buffer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r654838986



##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
   Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
   Bitmap left_valid = GetBitmap(left_d, 0);
   Bitmap right_valid = GetBitmap(right_d, 0);
-  // sometimes Bitmaps will be ignored, in which case we replace access to them with
-  // duplicated (probably elided) access to cond_data
-  const Bitmap& _ = cond_data;
-
-  // lambda function that will be used inside the visitor
-  uint64_t* out_validity = nullptr;
-  int64_t i = 0;
-  auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
-                   uint64_t r_valid) {
-    out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
-    i++;
-  };
 
   // cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
   // In the following cases, we dont need to allocate out_valid bitmap

Review comment:
       Got it! AFAIU only the fixed sized typed `if-else` kernels qualify for `can_write_into_slices` category then, isn't it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861738074


   Benchmark runs are scheduled for baseline = 39dcb43dd26df51391f0da14b6e6285d612f3829 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/7c809fd9d9af45b6ac360f31063f84d2...063924f33a8e4656b1ce37a3e9229ee8/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/ea65cb2d35d741f0abaa1cc1e5481373...ba93afcabf9643169ffcd668f08c85ba/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/086d84702ca54d9a8d3978d18b4ec2f6...b76a1d0626d44a77a7dac62cc0a234c6/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861738074


   Benchmark runs are scheduled for baseline = 39dcb43dd26df51391f0da14b6e6285d612f3829 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/7c809fd9d9af45b6ac360f31063f84d2...063924f33a8e4656b1ce37a3e9229ee8/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/ea65cb2d35d741f0abaa1cc1e5481373...ba93afcabf9643169ffcd668f08c85ba/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/086d84702ca54d9a8d3978d18b4ec2f6...b76a1d0626d44a77a7dac62cc0a234c6/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982633


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 788bd495a8ca8c180355e9066387824fc972d734. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Only ['lang', 'name'] filters are supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...f1f573636c5940ad9704c7af4a54c280/)
   [Skipped :warning: Only ['lang', 'name'] filters are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...379b66a7f5e0453c8a2c99fd6fe70503/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...38bfe9c78594477a955b82ca1a40a3fd/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022798


   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299350


   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299978


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89927876215449bcb32cfb06a3abdc28/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...5d8b14de410341159529bec47a1e8e90/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...62ad9efc3cba4d1b90cb7a5386ba261a/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-859187060


   @github-actions autotune 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870113163


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89f7711b6599405da0c61bdebf2bffe0/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...755e6720036342f3ad213715e1df19b2/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...ba6f391352984a39a9a0e82aed658680/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022871


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...a877957e35804b7197350884401e0e8c/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...b65a39a10c8a4c1ea3ae7751947bf36b/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...0e500f8140fb46e9bc9371d490d0fc61/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869980146






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869980146


   @bkietz I added the changes we discussed. Following are the latest bench results in my machine. 
   ```
   Results for master commit: c7c959a26a6512b0ad078a06df474617f1b306aa
   ----------------------------------------------------------------------------------------------------
   Benchmark                                          Time             CPU   Iterations UserCounters...
   ----------------------------------------------------------------------------------------------------
   ArrayArrayKernel<And>/32768/10000               9.21 us         9.21 us        69999 bytes_per_second=3.31485G/s items_per_second=28.4743G/s null_percent=0.01 size=32.768k
   ArrayArrayKernel<And>/32768/100                 9.11 us         9.11 us        74446 bytes_per_second=3.35138G/s items_per_second=28.7882G/s null_percent=1 size=32.768k
   ArrayArrayKernel<And>/32768/10                  8.06 us         8.06 us       103589 bytes_per_second=3.78621G/s items_per_second=32.5233G/s null_percent=10 size=32.768k
   ArrayArrayKernel<And>/32768/2                   7.27 us         7.27 us        96371 bytes_per_second=4.19765G/s items_per_second=36.0575G/s null_percent=50 size=32.768k
   ArrayArrayKernel<And>/32768/1                   8.96 us         8.95 us        91746 bytes_per_second=3.40815G/s items_per_second=29.2758G/s null_percent=100 size=32.768k
   ArrayArrayKernel<And>/32768/0                   7.87 us         7.87 us        78959 bytes_per_second=3.87712G/s items_per_second=33.3042G/s null_percent=0 size=32.768k
   ArrayArrayKernel<And>/1048576/10000              335 us          335 us         2080 bytes_per_second=2.91231G/s items_per_second=25.0165G/s null_percent=0.01 size=1048.58k
   ArrayArrayKernel<And>/1048576/100                334 us          334 us         2098 bytes_per_second=2.92463G/s items_per_second=25.1224G/s null_percent=1 size=1048.58k
   ArrayArrayKernel<And>/1048576/10                 336 us          336 us         2089 bytes_per_second=2.90452G/s items_per_second=24.9496G/s null_percent=10 size=1048.58k
   ArrayArrayKernel<And>/1048576/2                  336 us          336 us         2077 bytes_per_second=2.90715G/s items_per_second=24.9722G/s null_percent=50 size=1048.58k
   ArrayArrayKernel<And>/1048576/1                  238 us          238 us         2944 bytes_per_second=4.10794G/s items_per_second=35.2869G/s null_percent=100 size=1048.58k
   ArrayArrayKernel<And>/1048576/0                  239 us          239 us         2932 bytes_per_second=4.0919G/s items_per_second=35.1491G/s null_percent=0 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/32768/10000         15.7 us         15.7 us        44167 bytes_per_second=1.9399G/s items_per_second=16.6637G/s null_percent=0.01 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/100           15.7 us         15.7 us        44447 bytes_per_second=1.94457G/s items_per_second=16.7038G/s null_percent=1 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/10            15.7 us         15.7 us        44661 bytes_per_second=1.94512G/s items_per_second=16.7085G/s null_percent=10 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/2             15.7 us         15.7 us        44260 bytes_per_second=1.94383G/s items_per_second=16.6974G/s null_percent=50 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/1             15.6 us         15.6 us        43756 bytes_per_second=1.95105G/s items_per_second=16.7594G/s null_percent=100 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/0             5.28 us         5.28 us       136108 bytes_per_second=5.78137G/s items_per_second=49.6616G/s null_percent=0 size=32.768k
   ArrayArrayKernel<KleeneAnd>/1048576/10000        483 us          483 us         1447 bytes_per_second=2.02148G/s items_per_second=17.3643G/s null_percent=0.01 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/100          484 us          484 us         1447 bytes_per_second=2.0196G/s items_per_second=17.3482G/s null_percent=1 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/10           483 us          483 us         1445 bytes_per_second=2.02036G/s items_per_second=17.3548G/s null_percent=10 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/2            484 us          484 us         1448 bytes_per_second=2.0172G/s items_per_second=17.3276G/s null_percent=50 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/1            484 us          484 us         1448 bytes_per_second=2.01958G/s items_per_second=17.3481G/s null_percent=100 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/0            198 us          198 us         3541 bytes_per_second=4.93652G/s items_per_second=42.4044G/s null_percent=0 size=1048.58k
   
   
   
   Results for the PR:
   
   ----------------------------------------------------------------------------------------------------
   Benchmark                                          Time             CPU   Iterations UserCounters...
   ----------------------------------------------------------------------------------------------------
   ArrayArrayKernel<And>/32768/10000               8.64 us         8.64 us        82671 bytes_per_second=3.53361G/s items_per_second=30.3535G/s null_percent=0.01 size=32.768k
   ArrayArrayKernel<And>/32768/100                 10.4 us         10.2 us        68076 bytes_per_second=2.99687G/s items_per_second=25.7429G/s null_percent=1 size=32.768k
   ArrayArrayKernel<And>/32768/10                  10.4 us         10.4 us        64431 bytes_per_second=2.94554G/s items_per_second=25.302G/s null_percent=10 size=32.768k
   ArrayArrayKernel<And>/32768/2                   7.92 us         7.92 us       102414 bytes_per_second=3.85474G/s items_per_second=33.1119G/s null_percent=50 size=32.768k
   ArrayArrayKernel<And>/32768/1                   7.37 us         7.37 us        97390 bytes_per_second=4.14169G/s items_per_second=35.5769G/s null_percent=100 size=32.768k
   ArrayArrayKernel<And>/32768/0                   6.48 us         6.48 us       106870 bytes_per_second=4.71213G/s items_per_second=40.4769G/s null_percent=0 size=32.768k
   ArrayArrayKernel<And>/1048576/10000              336 us          336 us         2086 bytes_per_second=2.90622G/s items_per_second=24.9642G/s null_percent=0.01 size=1048.58k
   ArrayArrayKernel<And>/1048576/100                333 us          333 us         2091 bytes_per_second=2.93627G/s items_per_second=25.2224G/s null_percent=1 size=1048.58k
   ArrayArrayKernel<And>/1048576/10                 333 us          333 us         2101 bytes_per_second=2.93204G/s items_per_second=25.186G/s null_percent=10 size=1048.58k
   ArrayArrayKernel<And>/1048576/2                  333 us          333 us         2104 bytes_per_second=2.93041G/s items_per_second=25.172G/s null_percent=50 size=1048.58k
   ArrayArrayKernel<And>/1048576/1                  238 us          238 us         2947 bytes_per_second=4.10724G/s items_per_second=35.2809G/s null_percent=100 size=1048.58k
   ArrayArrayKernel<And>/1048576/0                  239 us          239 us         2939 bytes_per_second=4.09211G/s items_per_second=35.1509G/s null_percent=0 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/32768/10000         13.4 us         13.4 us        52481 bytes_per_second=2.28086G/s items_per_second=19.5925G/s null_percent=0.01 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/100           13.3 us         13.3 us        52642 bytes_per_second=2.28768G/s items_per_second=19.651G/s null_percent=1 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/10            13.3 us         13.3 us        52251 bytes_per_second=2.29669G/s items_per_second=19.7284G/s null_percent=10 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/2             13.3 us         13.3 us        52684 bytes_per_second=2.29603G/s items_per_second=19.7227G/s null_percent=50 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/1             13.4 us         13.4 us        52555 bytes_per_second=2.28596G/s items_per_second=19.6362G/s null_percent=100 size=32.768k
   ArrayArrayKernel<KleeneAnd>/32768/0             6.10 us         6.10 us       118983 bytes_per_second=5.00416G/s items_per_second=42.9854G/s null_percent=0 size=32.768k
   ArrayArrayKernel<KleeneAnd>/1048576/10000        388 us          388 us         1807 bytes_per_second=2.51753G/s items_per_second=21.6254G/s null_percent=0.01 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/100          389 us          389 us         1805 bytes_per_second=2.50864G/s items_per_second=21.5491G/s null_percent=1 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/10           390 us          390 us         1804 bytes_per_second=2.50681G/s items_per_second=21.5333G/s null_percent=10 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/2            391 us          391 us         1803 bytes_per_second=2.49759G/s items_per_second=21.4541G/s null_percent=50 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/1            388 us          388 us         1807 bytes_per_second=2.51458G/s items_per_second=21.6001G/s null_percent=100 size=1048.58k
   ArrayArrayKernel<KleeneAnd>/1048576/0            238 us          238 us         2939 bytes_per_second=4.11143G/s items_per_second=35.3169G/s null_percent=0 size=1048.58k
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870774752


   @nirandaperera I see, thanks for the insight.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870773753


   There are more regressions with gcc 9, though:
   ```
   ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (2)
   ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              benchmark      baseline     contender  change %                                                                                                                                               counters
   ArrayArrayKernel<KleeneAnd>/524288/1 2.149 GiB/sec 2.318 GiB/sec     7.887 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3713, 'null_percent': 100.0}
   ArrayArrayKernel<KleeneAnd>/524288/2 2.249 GiB/sec 2.262 GiB/sec     0.569  {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4816, 'null_percent': 50.0}
   
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (10)
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                  benchmark       baseline      contender  change %                                                                                                                                                  counters
        ArrayArrayKernel<KleeneAnd>/32768/0  7.645 GiB/sec  6.972 GiB/sec    -8.804     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 178864, 'null_percent': 0.0}
       ArrayArrayKernel<KleeneAnd>/524288/0 16.468 GiB/sec 13.337 GiB/sec   -19.014     {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 22855, 'null_percent': 0.0}
      ArrayArrayKernel<KleeneAnd>/524288/10  2.801 GiB/sec  2.040 GiB/sec   -27.172    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4221, 'null_percent': 10.0}
     ArrayArrayKernel<KleeneAnd>/524288/100  2.984 GiB/sec  2.147 GiB/sec   -28.031    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5191, 'null_percent': 1.0}
   ArrayArrayKernel<KleeneAnd>/524288/10000  3.224 GiB/sec  2.250 GiB/sec   -30.221 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5014, 'null_percent': 0.01}
       ArrayArrayKernel<KleeneAnd>/32768/10  2.991 GiB/sec  1.970 GiB/sec   -34.143    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68647, 'null_percent': 10.0}
    ArrayArrayKernel<KleeneAnd>/32768/10000  2.991 GiB/sec  1.964 GiB/sec   -34.335 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68084, 'null_percent': 0.01}
        ArrayArrayKernel<KleeneAnd>/32768/2  2.999 GiB/sec  1.969 GiB/sec   -34.335     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68549, 'null_percent': 50.0}
        ArrayArrayKernel<KleeneAnd>/32768/1  2.997 GiB/sec  1.966 GiB/sec   -34.402    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68530, 'null_percent': 100.0}
      ArrayArrayKernel<KleeneAnd>/32768/100  3.000 GiB/sec  1.967 GiB/sec   -34.414    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68686, 'null_percent': 1.0}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861738074


   Benchmark runs are scheduled for baseline = 39dcb43dd26df51391f0da14b6e6285d612f3829 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/7c809fd9d9af45b6ac360f31063f84d2...063924f33a8e4656b1ce37a3e9229ee8/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/ea65cb2d35d741f0abaa1cc1e5481373...ba93afcabf9643169ffcd668f08c85ba/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/086d84702ca54d9a8d3978d18b4ec2f6...b76a1d0626d44a77a7dac62cc0a234c6/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861738074


   Benchmark runs are scheduled for baseline = 39dcb43dd26df51391f0da14b6e6285d612f3829 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/7c809fd9d9af45b6ac360f31063f84d2...063924f33a8e4656b1ce37a3e9229ee8/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/ea65cb2d35d741f0abaa1cc1e5481373...ba93afcabf9643169ffcd668f08c85ba/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/086d84702ca54d9a8d3978d18b4ec2f6...b76a1d0626d44a77a7dac62cc0a234c6/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r655697111



##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
   Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
   Bitmap left_valid = GetBitmap(left_d, 0);
   Bitmap right_valid = GetBitmap(right_d, 0);
-  // sometimes Bitmaps will be ignored, in which case we replace access to them with
-  // duplicated (probably elided) access to cond_data
-  const Bitmap& _ = cond_data;
-
-  // lambda function that will be used inside the visitor
-  uint64_t* out_validity = nullptr;
-  int64_t i = 0;
-  auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
-                   uint64_t r_valid) {
-    out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
-    i++;
-  };
 
   // cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
   // In the following cases, we dont need to allocate out_valid bitmap

Review comment:
       @bkietz I have added the changes required. Could you please review the changes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r654669300



##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
   Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
   Bitmap left_valid = GetBitmap(left_d, 0);
   Bitmap right_valid = GetBitmap(right_d, 0);
-  // sometimes Bitmaps will be ignored, in which case we replace access to them with
-  // duplicated (probably elided) access to cond_data
-  const Bitmap& _ = cond_data;
-
-  // lambda function that will be used inside the visitor
-  uint64_t* out_validity = nullptr;
-  int64_t i = 0;
-  auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
-                   uint64_t r_valid) {
-    out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
-    i++;
-  };
 
   // cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
   // In the following cases, we dont need to allocate out_valid bitmap

Review comment:
       Writing into slices of a preallocated output cannot proceed if the kernel does its own allocation of output: https://github.com/bkietz/arrow/blob/282c5aa713011209d6aa76e2ef5996a17161575d/cpp/src/arrow/compute/exec.cc#L703-L711
   
   Effectively, `can_write_into_slices` will be ignored until you have
   ```c++
   kernel.null_handling = NullHandling::COMPUTED_PREALLOCATE;
   kernel.mem_allocation = MemAllocation::PREALLOCATE;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

pitrou edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870768509


   Locally, I seem to get varying results from run to run (and also depending on the compiler), but `archery benchmark diff` doesn't show very worrying regressions with clang 10:
   ```
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (11)
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                  benchmark      baseline     contender  change %                                                                                                                                                  counters
       ArrayArrayKernel<KleeneAnd>/524288/1 4.309 GiB/sec 5.868 GiB/sec    36.176    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 100.0}
      ArrayArrayKernel<KleeneAnd>/524288/10 4.337 GiB/sec 5.888 GiB/sec    35.749    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 10.0}
     ArrayArrayKernel<KleeneAnd>/524288/100 4.335 GiB/sec 5.850 GiB/sec    34.939    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6232, 'null_percent': 1.0}
   ArrayArrayKernel<KleeneAnd>/524288/10000 4.318 GiB/sec 5.755 GiB/sec    33.290 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6221, 'null_percent': 0.01}
       ArrayArrayKernel<KleeneAnd>/524288/2 4.365 GiB/sec 5.762 GiB/sec    31.994     {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6200, 'null_percent': 50.0}
        ArrayArrayKernel<KleeneAnd>/32768/1 3.204 GiB/sec 4.220 GiB/sec    31.726    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 75304, 'null_percent': 100.0}
        ArrayArrayKernel<KleeneAnd>/32768/2 3.370 GiB/sec 4.212 GiB/sec    24.961     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77611, 'null_percent': 50.0}
    ArrayArrayKernel<KleeneAnd>/32768/10000 3.360 GiB/sec 4.197 GiB/sec    24.906 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76897, 'null_percent': 0.01}
       ArrayArrayKernel<KleeneAnd>/32768/10 3.373 GiB/sec 4.208 GiB/sec    24.745    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76903, 'null_percent': 10.0}
      ArrayArrayKernel<KleeneAnd>/32768/100 3.383 GiB/sec 4.202 GiB/sec    24.212    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77559, 'null_percent': 1.0}
        ArrayArrayKernel<KleeneAnd>/32768/0 7.838 GiB/sec 7.956 GiB/sec     1.499     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 187182, 'null_percent': 0.0}
   
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (1)
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              benchmark       baseline      contender  change %                                                                                                                                              counters
   ArrayArrayKernel<KleeneAnd>/524288/0 15.853 GiB/sec 14.343 GiB/sec    -9.527 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}
   ```
   
   (on Ubuntu 20.04, AMD Ryzen 3900)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870800492


   I got archery running on my machine and I can confirm that gcc-9 is the problem there. If I use clang-10, it shows a better performance. But gcc-9 shows a lot of regressions. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r660081336



##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else_test.cc
##########
@@ -56,8 +57,12 @@ TYPED_TEST(TestIfElsePrimitive, IfElseFixedSizeRand) {
 
   random::RandomArrayGenerator rand(/*seed=*/0);
   int64_t len = 1000;
-  auto cond = std::static_pointer_cast<BooleanArray>(
-      rand.ArrayOf(boolean(), len, /*null_probability=*/0.01));
+  ASSERT_OK_AND_ASSIGN(auto temp1, MakeArrayFromScalar(BooleanScalar(true), 64));

Review comment:
       Please add a comment describing what's useful about this construction

##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +243,132 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
     return min_offset;
   }
 
+  template <size_t N, size_t M, typename ReaderT, typename WriterT, typename Visitor,
+            typename Word = typename std::decay<
+                internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+  static void RunVisitWordsAndWriteLoop(int64_t bit_length,
+                                        std::array<ReaderT, N>& readers,
+                                        std::array<WriterT, M>& writers,
+                                        Visitor&& visitor) {
+    constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+    std::array<Word, N> visited_words;
+    std::array<Word, M> output_words;
+
+    // every reader will have same number of words, since they are same length'ed
+    // TODO($JIRA) this will be inefficient in some cases. When there are offsets beyond
+    //  Word boundary, every Word would have to be created from 2 adjoining Words
+    auto n_words = readers[0].words();
+    bit_length -= n_words * kBitWidth;
+    while (n_words--) {
+      // first collect all words to visited_words array
+      for (size_t i = 0; i < N; i++) {
+        visited_words[i] = readers[i].NextWord();
+      }
+      visitor(visited_words, &output_words);
+      for (size_t i = 0; i < M; i++) {
+        writers[i].PutNextWord(output_words[i]);
+      }
+    }
+
+    // every reader will have same number of trailing bytes, because of the above reason
+    // tailing portion could be more than one word! (ref: BitmapWordReader constructor)
+    // remaining full/ partial words to write
+
+    if (bit_length) {
+      // convert the word visitor lambda to a byte_visitor
+      auto byte_visitor = [&](const std::array<uint8_t, N>& in,
+                              std::array<uint8_t, M>* out) {
+        std::array<Word, N> in_words;
+        std::array<Word, M> out_words;
+        std::copy(in.begin(), in.end(), in_words.begin());
+        visitor(in_words, &out_words);
+        for (size_t i = 0; i < M; i++) {
+          out->at(i) = static_cast<uint8_t>(out_words[i]);
+        }
+      };
+
+      std::array<uint8_t, N> visited_bytes;
+      std::array<uint8_t, M> output_bytes;
+      int n_bytes = readers[0].trailing_bytes();
+      while (n_bytes--) {
+        visited_bytes.fill(0);
+        output_bytes.fill(0);
+        int valid_bits;
+        for (size_t i = 0; i < N; i++) {
+          visited_bytes[i] = readers[i].NextTrailingByte(valid_bits);
+        }
+        byte_visitor(visited_bytes, &output_bytes);
+        for (size_t i = 0; i < M; i++) {
+          writers[i].PutNextTrailingByte(output_bytes[i], valid_bits);
+        }
+      }
+    }
+  }
+
+  /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+  /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+  ///
+  /// All bitmaps must have identical length. The first bit in a visited bitmap
+  /// may be offset within the first visited word, but words will otherwise contain
+  /// densely packed bits loaded from the bitmap. That offset within the first word is
+  /// returned.
+  /// Visitor is expected to have the following signature
+  ///     [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+  ///
+  // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+  // It also has a large prolog / epilog overhead and should be used
+  // carefully in other cases.
+  // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+  // and BitmapUInt64Reader.
+  template <size_t N, size_t M, typename Visitor,
+            typename Word = typename std::decay<
+                internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+  static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+                                 std::array<Bitmap, M>* out_bitmaps_arg,
+                                 Visitor&& visitor) {
+    int64_t bit_length = BitLength(bitmaps_arg);
+    assert(bit_length == BitLength(*out_bitmaps_arg));
+
+    // if both input and output bitmaps have no byte offset, then use special template
+    if (std::all_of(bitmaps_arg.begin(), bitmaps_arg.end(),
+                    [](const Bitmap& b) { return b.offset_ % 8 == 0; }) &&
+        std::all_of(out_bitmaps_arg->begin(), out_bitmaps_arg->end(),
+                    [](const Bitmap& b) { return b.offset_ % 8 == 0; })) {
+      std::array<BitmapWordReader<Word, /*may_have_byte_offset=*/false>, N> readers;
+      for (size_t i = 0; i < N; ++i) {
+        const Bitmap& in_bitmap = bitmaps_arg[i];
+        readers[i] = BitmapWordReader<Word, /*may_have_byte_offset=*/false>(
+            in_bitmap.buffer_->data(), in_bitmap.offset_, in_bitmap.length_);
+      }
+
+      std::array<BitmapWordWriter<Word, /*may_have_byte_offset=*/false>, M> writers;
+      for (size_t i = 0; i < M; ++i) {
+        const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+        writers[i] = BitmapWordWriter<Word, /*may_have_byte_offset=*/false>(
+            out_bitmap.buffer_->mutable_data(), out_bitmap.offset_, out_bitmap.length_);
+      }
+
+      RunVisitWordsAndWriteLoop(bit_length, readers, writers, std::move(visitor));
+    } else {
+      std::array<BitmapWordReader<Word>, N> readers;
+      for (size_t i = 0; i < N; ++i) {
+        const Bitmap& in_bitmap = bitmaps_arg[i];
+        readers[i] = BitmapWordReader<Word>(in_bitmap.buffer_->data(), in_bitmap.offset_,
+                                            in_bitmap.length_);
+      }
+
+      std::array<BitmapWordWriter<Word>, M> writers;
+      for (size_t i = 0; i < M; ++i) {
+        const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+        writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+                                            out_bitmap.offset_, out_bitmap.length_);
+      }
+
+      RunVisitWordsAndWriteLoop(bit_length, readers, writers, std::move(visitor));

Review comment:
       ```suggestion
         RunVisitWordsAndWriteLoop(bit_length, readers, writers, visitor);
   ```

##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,37 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
 ARROW_EXPORT
 void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
 
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// \brief Clears all bits in the bitmap (set to false)
+ARROW_EXPORT
+void ClearBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// Returns a mask with lower i bits set to 1. If i >= sizeof(Word)*8, all-ones will be
+/// returned
+/// ex:
+/// PrecedingWordBitmask<uint_8>(0)= 0x00
+/// PrecedingWordBitmask<uint_8>(4)= 0x0f
+/// PrecedingWordBitmask<uint_8>(8)= 0xff
+/// PrecedingWordBitmask<uint_32>(8)= 0x00ff
+/// ref: https://stackoverflow.com/a/59523400
+template <typename Word>
+constexpr Word PrecedingWordBitmask(unsigned int const i) {
+  return (static_cast<Word>(i < sizeof(Word) * 8) << (i & (sizeof(Word) * 8 - 1))) - 1;
+}

Review comment:
       Since this is constexpr, the examples can be written as static assertions
   ```suggestion
   /// ref: https://stackoverflow.com/a/59523400
   template <typename Word>
   constexpr Word PrecedingWordBitmask(unsigned int const i) {
     return (static_cast<Word>(i < sizeof(Word) * 8) << (i & (sizeof(Word) * 8 - 1))) - 1;
   }
   static_assert(PrecedingWordBitmask<uint8_t>(0) == 0x00, "");
   static_assert(PrecedingWordBitmask<uint8_t>(4) == 0x0f, "");
   static_assert(PrecedingWordBitmask<uint8_t>(8) == 0xff, "");
   static_assert(PrecedingWordBitmask<uint16_t>(8) == 0x00ff, "");
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870113163


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89f7711b6599405da0c61bdebf2bffe0/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...755e6720036342f3ad213715e1df19b2/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...ba6f391352984a39a9a0e82aed658680/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870768509


   Locally, I seem to get varying results from run to run (and also depending on the compiler), but `archery benchmark diff` doesn't show very worrying regressions:
   ```
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (11)
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                  benchmark      baseline     contender  change %                                                                                                                                                  counters
       ArrayArrayKernel<KleeneAnd>/524288/1 4.309 GiB/sec 5.868 GiB/sec    36.176    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 100.0}
      ArrayArrayKernel<KleeneAnd>/524288/10 4.337 GiB/sec 5.888 GiB/sec    35.749    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 10.0}
     ArrayArrayKernel<KleeneAnd>/524288/100 4.335 GiB/sec 5.850 GiB/sec    34.939    {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6232, 'null_percent': 1.0}
   ArrayArrayKernel<KleeneAnd>/524288/10000 4.318 GiB/sec 5.755 GiB/sec    33.290 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6221, 'null_percent': 0.01}
       ArrayArrayKernel<KleeneAnd>/524288/2 4.365 GiB/sec 5.762 GiB/sec    31.994     {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6200, 'null_percent': 50.0}
        ArrayArrayKernel<KleeneAnd>/32768/1 3.204 GiB/sec 4.220 GiB/sec    31.726    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 75304, 'null_percent': 100.0}
        ArrayArrayKernel<KleeneAnd>/32768/2 3.370 GiB/sec 4.212 GiB/sec    24.961     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77611, 'null_percent': 50.0}
    ArrayArrayKernel<KleeneAnd>/32768/10000 3.360 GiB/sec 4.197 GiB/sec    24.906 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76897, 'null_percent': 0.01}
       ArrayArrayKernel<KleeneAnd>/32768/10 3.373 GiB/sec 4.208 GiB/sec    24.745    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76903, 'null_percent': 10.0}
      ArrayArrayKernel<KleeneAnd>/32768/100 3.383 GiB/sec 4.202 GiB/sec    24.212    {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77559, 'null_percent': 1.0}
        ArrayArrayKernel<KleeneAnd>/32768/0 7.838 GiB/sec 7.956 GiB/sec     1.499     {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 187182, 'null_percent': 0.0}
   
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (1)
   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              benchmark       baseline      contender  change %                                                                                                                                              counters
   ArrayArrayKernel<KleeneAnd>/524288/0 15.853 GiB/sec 14.343 GiB/sec    -9.527 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}
   ```
   
   (on Ubuntu 20.04, AMD Ryzen 3900)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870774464


   What's weird as well is that, sometimes, L2-sized benchmarks are faster than L1-sized, but sometimes they are slower.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299978


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89927876215449bcb32cfb06a3abdc28/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...5d8b14de410341159529bec47a1e8e90/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...62ad9efc3cba4d1b90cb7a5386ba261a/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299978


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89927876215449bcb32cfb06a3abdc28/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...5d8b14de410341159529bec47a1e8e90/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...62ad9efc3cba4d1b90cb7a5386ba261a/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r660081336



##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else_test.cc
##########
@@ -56,8 +57,12 @@ TYPED_TEST(TestIfElsePrimitive, IfElseFixedSizeRand) {
 
   random::RandomArrayGenerator rand(/*seed=*/0);
   int64_t len = 1000;
-  auto cond = std::static_pointer_cast<BooleanArray>(
-      rand.ArrayOf(boolean(), len, /*null_probability=*/0.01));
+  ASSERT_OK_AND_ASSIGN(auto temp1, MakeArrayFromScalar(BooleanScalar(true), 64));

Review comment:
       Please add a comment describing what's useful about this construction

##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +243,132 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
     return min_offset;
   }
 
+  template <size_t N, size_t M, typename ReaderT, typename WriterT, typename Visitor,
+            typename Word = typename std::decay<
+                internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+  static void RunVisitWordsAndWriteLoop(int64_t bit_length,
+                                        std::array<ReaderT, N>& readers,
+                                        std::array<WriterT, M>& writers,
+                                        Visitor&& visitor) {
+    constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+    std::array<Word, N> visited_words;
+    std::array<Word, M> output_words;
+
+    // every reader will have same number of words, since they are same length'ed
+    // TODO($JIRA) this will be inefficient in some cases. When there are offsets beyond
+    //  Word boundary, every Word would have to be created from 2 adjoining Words
+    auto n_words = readers[0].words();
+    bit_length -= n_words * kBitWidth;
+    while (n_words--) {
+      // first collect all words to visited_words array
+      for (size_t i = 0; i < N; i++) {
+        visited_words[i] = readers[i].NextWord();
+      }
+      visitor(visited_words, &output_words);
+      for (size_t i = 0; i < M; i++) {
+        writers[i].PutNextWord(output_words[i]);
+      }
+    }
+
+    // every reader will have same number of trailing bytes, because of the above reason
+    // tailing portion could be more than one word! (ref: BitmapWordReader constructor)
+    // remaining full/ partial words to write
+
+    if (bit_length) {
+      // convert the word visitor lambda to a byte_visitor
+      auto byte_visitor = [&](const std::array<uint8_t, N>& in,
+                              std::array<uint8_t, M>* out) {
+        std::array<Word, N> in_words;
+        std::array<Word, M> out_words;
+        std::copy(in.begin(), in.end(), in_words.begin());
+        visitor(in_words, &out_words);
+        for (size_t i = 0; i < M; i++) {
+          out->at(i) = static_cast<uint8_t>(out_words[i]);
+        }
+      };
+
+      std::array<uint8_t, N> visited_bytes;
+      std::array<uint8_t, M> output_bytes;
+      int n_bytes = readers[0].trailing_bytes();
+      while (n_bytes--) {
+        visited_bytes.fill(0);
+        output_bytes.fill(0);
+        int valid_bits;
+        for (size_t i = 0; i < N; i++) {
+          visited_bytes[i] = readers[i].NextTrailingByte(valid_bits);
+        }
+        byte_visitor(visited_bytes, &output_bytes);
+        for (size_t i = 0; i < M; i++) {
+          writers[i].PutNextTrailingByte(output_bytes[i], valid_bits);
+        }
+      }
+    }
+  }
+
+  /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+  /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+  ///
+  /// All bitmaps must have identical length. The first bit in a visited bitmap
+  /// may be offset within the first visited word, but words will otherwise contain
+  /// densely packed bits loaded from the bitmap. That offset within the first word is
+  /// returned.
+  /// Visitor is expected to have the following signature
+  ///     [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+  ///
+  // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+  // It also has a large prolog / epilog overhead and should be used
+  // carefully in other cases.
+  // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+  // and BitmapUInt64Reader.
+  template <size_t N, size_t M, typename Visitor,
+            typename Word = typename std::decay<
+                internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+  static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+                                 std::array<Bitmap, M>* out_bitmaps_arg,
+                                 Visitor&& visitor) {
+    int64_t bit_length = BitLength(bitmaps_arg);
+    assert(bit_length == BitLength(*out_bitmaps_arg));
+
+    // if both input and output bitmaps have no byte offset, then use special template
+    if (std::all_of(bitmaps_arg.begin(), bitmaps_arg.end(),
+                    [](const Bitmap& b) { return b.offset_ % 8 == 0; }) &&
+        std::all_of(out_bitmaps_arg->begin(), out_bitmaps_arg->end(),
+                    [](const Bitmap& b) { return b.offset_ % 8 == 0; })) {
+      std::array<BitmapWordReader<Word, /*may_have_byte_offset=*/false>, N> readers;
+      for (size_t i = 0; i < N; ++i) {
+        const Bitmap& in_bitmap = bitmaps_arg[i];
+        readers[i] = BitmapWordReader<Word, /*may_have_byte_offset=*/false>(
+            in_bitmap.buffer_->data(), in_bitmap.offset_, in_bitmap.length_);
+      }
+
+      std::array<BitmapWordWriter<Word, /*may_have_byte_offset=*/false>, M> writers;
+      for (size_t i = 0; i < M; ++i) {
+        const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+        writers[i] = BitmapWordWriter<Word, /*may_have_byte_offset=*/false>(
+            out_bitmap.buffer_->mutable_data(), out_bitmap.offset_, out_bitmap.length_);
+      }
+
+      RunVisitWordsAndWriteLoop(bit_length, readers, writers, std::move(visitor));
+    } else {
+      std::array<BitmapWordReader<Word>, N> readers;
+      for (size_t i = 0; i < N; ++i) {
+        const Bitmap& in_bitmap = bitmaps_arg[i];
+        readers[i] = BitmapWordReader<Word>(in_bitmap.buffer_->data(), in_bitmap.offset_,
+                                            in_bitmap.length_);
+      }
+
+      std::array<BitmapWordWriter<Word>, M> writers;
+      for (size_t i = 0; i < M; ++i) {
+        const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+        writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+                                            out_bitmap.offset_, out_bitmap.length_);
+      }
+
+      RunVisitWordsAndWriteLoop(bit_length, readers, writers, std::move(visitor));

Review comment:
       ```suggestion
         RunVisitWordsAndWriteLoop(bit_length, readers, writers, visitor);
   ```

##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,37 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
 ARROW_EXPORT
 void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
 
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// \brief Clears all bits in the bitmap (set to false)
+ARROW_EXPORT
+void ClearBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// Returns a mask with lower i bits set to 1. If i >= sizeof(Word)*8, all-ones will be
+/// returned
+/// ex:
+/// PrecedingWordBitmask<uint_8>(0)= 0x00
+/// PrecedingWordBitmask<uint_8>(4)= 0x0f
+/// PrecedingWordBitmask<uint_8>(8)= 0xff
+/// PrecedingWordBitmask<uint_32>(8)= 0x00ff
+/// ref: https://stackoverflow.com/a/59523400
+template <typename Word>
+constexpr Word PrecedingWordBitmask(unsigned int const i) {
+  return (static_cast<Word>(i < sizeof(Word) * 8) << (i & (sizeof(Word) * 8 - 1))) - 1;
+}

Review comment:
       Since this is constexpr, the examples can be written as static assertions
   ```suggestion
   /// ref: https://stackoverflow.com/a/59523400
   template <typename Word>
   constexpr Word PrecedingWordBitmask(unsigned int const i) {
     return (static_cast<Word>(i < sizeof(Word) * 8) << (i & (sizeof(Word) * 8 - 1))) - 1;
   }
   static_assert(PrecedingWordBitmask<uint8_t>(0) == 0x00, "");
   static_assert(PrecedingWordBitmask<uint8_t>(4) == 0x0f, "");
   static_assert(PrecedingWordBitmask<uint8_t>(8) == 0xff, "");
   static_assert(PrecedingWordBitmask<uint16_t>(8) == 0x00ff, "");
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r652054735



##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,29 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
 ARROW_EXPORT
 void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
 
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// \brief Clears all bits in the bitmap (set to false)
+ARROW_EXPORT
+void ClearBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+template <typename Word>
+constexpr Word WordBitMask(int i) {
+  return (static_cast<Word>(1) << i) - 1;
+}

Review comment:
       That will fix up this UBSAN failure, too: https://github.com/apache/arrow/pull/10487/checks?check_run_id=2822702761#step:8:3714




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022871


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...a877957e35804b7197350884401e0e8c/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...b65a39a10c8a4c1ea3ae7751947bf36b/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...0e500f8140fb46e9bc9371d490d0fc61/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-857361898


   https://issues.apache.org/jira/browse/ARROW-13010


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz closed pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

bkietz closed pull request #10487:
URL: https://github.com/apache/arrow/pull/10487


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982633


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 788bd495a8ca8c180355e9066387824fc972d734. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Only ['lang', 'name'] filters are supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...f1f573636c5940ad9704c7af4a54c280/)
   [Skipped :warning: Only ['lang', 'name'] filters are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...379b66a7f5e0453c8a2c99fd6fe70503/)
   [Finished :arrow_down:0.25% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...38bfe9c78594477a955b82ca1a40a3fd/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982633






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870113163


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89f7711b6599405da0c61bdebf2bffe0/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...755e6720036342f3ad213715e1df19b2/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...ba6f391352984a39a9a0e82aed658680/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022871


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...a877957e35804b7197350884401e0e8c/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...b65a39a10c8a4c1ea3ae7751947bf36b/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...0e500f8140fb46e9bc9371d490d0fc61/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870927973


   Did some further analysis on this. It turns out that gcc-10 works much better than gcc-9. 
   https://gist.github.com/nirandaperera/0bcd40c223fd32105d027a86a571334f
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982633






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r655697111



##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
   Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
   Bitmap left_valid = GetBitmap(left_d, 0);
   Bitmap right_valid = GetBitmap(right_d, 0);
-  // sometimes Bitmaps will be ignored, in which case we replace access to them with
-  // duplicated (probably elided) access to cond_data
-  const Bitmap& _ = cond_data;
-
-  // lambda function that will be used inside the visitor
-  uint64_t* out_validity = nullptr;
-  int64_t i = 0;
-  auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
-                   uint64_t r_valid) {
-    out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
-    i++;
-  };
 
   // cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
   // In the following cases, we dont need to allocate out_valid bitmap

Review comment:
       @bkietz I have added the changes required. Could you please review the changes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299978


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89927876215449bcb32cfb06a3abdc28/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...5d8b14de410341159529bec47a1e8e90/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...62ad9efc3cba4d1b90cb7a5386ba261a/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r652003314



##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -67,5 +69,58 @@ void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_ar
   bits[bytes_end - 1] |= static_cast<uint8_t>(fill_byte & ~last_byte_mask);
 }
 
+template <bool value>
+void SetBitmapImpl(uint8_t* data, int64_t offset, int64_t length) {
+  //                 offset  length
+  // data              |<------------->|
+  //   |--------|...|--------|...|--------|
+  //                   |<--->|   |<--->|
+  //                     pro       epi
+  if (ARROW_PREDICT_FALSE(length == 0)) {
+    return;
+  }
+
+  constexpr uint8_t set_byte = value ? UINT8_MAX : 0;
+
+  int prologue = static_cast<int>(((offset + 7) / 8) * 8 - offset);
+  DCHECK_LT(prologue, 8);
+
+  if (length < prologue) {  // special case where a mask is required
+    //             offset length
+    // data             |<->|
+    //   |--------|...|--------|...
+    //         mask --> |111|
+    //                  |<---->|
+    //                     pro
+    uint8_t mask = BitUtil::kPrecedingBitmask[8 - prologue] ^
+                   BitUtil::kPrecedingBitmask[8 - prologue + length];
+    data[offset / 8] |= mask;
+    return;
+  }
+
+  // align to a byte boundary
+  data[offset / 8] = BitUtil::SpliceWord(offset, data[offset / 8], set_byte);

Review comment:
       ```suggestion
     data[offset / 8] = BitUtil::SpliceWord(prologue, data[offset / 8], set_byte);
   ```

##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -67,5 +69,58 @@ void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_ar
   bits[bytes_end - 1] |= static_cast<uint8_t>(fill_byte & ~last_byte_mask);
 }
 
+template <bool value>
+void SetBitmapImpl(uint8_t* data, int64_t offset, int64_t length) {
+  //                 offset  length
+  // data              |<------------->|
+  //   |--------|...|--------|...|--------|
+  //                   |<--->|   |<--->|
+  //                     pro       epi
+  if (ARROW_PREDICT_FALSE(length == 0)) {
+    return;
+  }
+
+  constexpr uint8_t set_byte = value ? UINT8_MAX : 0;
+
+  int prologue = static_cast<int>(((offset + 7) / 8) * 8 - offset);

Review comment:
       ```suggestion
     auto prologue = BitUtil::RoundUp(offset, 8) - offset;
   ```

##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -73,6 +76,11 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
     return Bitmap(buffer_, offset_ + offset, length);
   }
 
+  void Stride(int64_t stride) {
+    this->offset_ += stride;
+    this->length_ -= stride;
+  }
+

Review comment:
       ```suggestion
   ```
   
   This seems to be unused

##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -2156,5 +2187,169 @@ TEST(Bitmap, VisitWordsAnd) {
   }
 }
 
+void random_bool_vector(std::vector<bool>& vec, int64_t size, double p = 0.5) {
+  vec.reserve(size);
+  std::random_device rd;
+  std::mt19937 gen(rd());
+  std::bernoulli_distribution d(p);
+
+  for (int n = 0; n < size; ++n) {
+    vec.push_back(d(gen));
+  }
+}
+
+std::string VectorToString(const std::vector<bool>& v) {
+  std::string out(v.size() + +((v.size() - 1) / 8), ' ');
+  for (size_t i = 0; i < v.size(); ++i) {
+    out[i + (i / 8)] = v[i] ? '1' : '0';
+  }
+  return out;
+}
+
+void VerifyBoolVectorAndBitmap(const Bitmap& bitmap, const std::vector<bool>& expected) {
+  arrow::BooleanBuilder boolean_builder;
+  ASSERT_OK(boolean_builder.AppendValues(expected));
+  ASSERT_OK_AND_ASSIGN(auto arr, boolean_builder.Finish());
+
+  ASSERT_TRUE(BitmapEquals(bitmap.buffer()->data(), bitmap.offset(),
+                           arr->data()->buffers[1]->data(), 0, expected.size()))
+      << "exp: " << VectorToString(expected) << "\ngot: " << bitmap.ToString();
+}
+
+class TestBitmapVisitAndWriteOutputNoOffset : public ::testing::TestWithParam<int32_t> {};
+
+TEST_P(TestBitmapVisitAndWriteOutputNoOffset, Test1) {
+  auto part = GetParam();
+  int64_t bits = 4 * part;
+  std::vector<bool> data;
+  random_bool_vector(data, bits);
+
+  arrow::BooleanBuilder boolean_builder;
+  ASSERT_OK(boolean_builder.AppendValues(data));
+  ASSERT_OK_AND_ASSIGN(auto arrow_data, boolean_builder.Finish());
+
+  std::shared_ptr<Buffer>& arrow_buffer = arrow_data->data()->buffers[1];
+
+  Bitmap bm0(arrow_buffer, 0, part);
+  Bitmap bm1 = bm0.Slice(part * 1, part);  // this goes beyond bm0's len
+  Bitmap bm2 = bm0.Slice(part * 2, part);  // this goes beyond bm0's len
+
+  std::array<Bitmap, 2> out_bms;
+  ASSERT_OK_AND_ASSIGN(auto out0, AllocateBitmap(part));
+  ASSERT_OK_AND_ASSIGN(auto out1, AllocateBitmap(part));
+  out_bms[0] = Bitmap(out0, 0, part);
+  out_bms[1] = Bitmap(out1, 0, part);
+
+  std::vector<bool> v0(data.begin(), data.begin() + part);
+  std::vector<bool> v1(data.begin() + part * 1, data.begin() + part * 2);
+  std::vector<bool> v2(data.begin() + part * 2, data.begin() + part * 3);
+
+  // out0 = bm0 & bm1, out1= bm0 | bm2
+  std::array<Bitmap, 3> in_bms{bm0, bm1, bm2};
+  Bitmap::VisitWordsAndWrite(
+      in_bms, &out_bms,
+      [](const std::array<uint64_t, 3>& in, std::array<uint64_t, 2>* out) {
+        out->at(0) = in[0] & in[1];
+        out->at(1) = in[0] | in[2];
+      });
+
+  std::vector<bool> out_v0(part);
+  std::vector<bool> out_v1(part);
+  // v3 = v0 & v1
+  std::transform(v0.begin(), v0.end(), v1.begin(), out_v0.begin(),
+                 std::logical_and<bool>());
+  // v3 |= v2
+  std::transform(v0.begin(), v0.end(), v2.begin(), out_v1.begin(),
+                 std::logical_or<bool>());
+
+  //  std::cout << "v0: " << VectorToString(v0) << "\n"
+  //            << "b0: " << bm0.ToString() << "\n"
+  //            << "v1: " << VectorToString(v1) << "\n"
+  //            << "b1: " << bm1.ToString() << "\n"
+  //            << "v2: " << VectorToString(v2) << "\n"
+  //            << "b2: " << bm2.ToString() << "\n";
+

Review comment:
       ```suggestion
   ```
   Alternatively, you could attach this output to an assertion so it only shows if a test fails

##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -2156,5 +2187,169 @@ TEST(Bitmap, VisitWordsAnd) {
   }
 }
 
+void random_bool_vector(std::vector<bool>& vec, int64_t size, double p = 0.5) {
+  vec.reserve(size);
+  std::random_device rd;
+  std::mt19937 gen(rd());
+  std::bernoulli_distribution d(p);
+
+  for (int n = 0; n < size; ++n) {
+    vec.push_back(d(gen));
+  }
+}
+
+std::string VectorToString(const std::vector<bool>& v) {
+  std::string out(v.size() + +((v.size() - 1) / 8), ' ');
+  for (size_t i = 0; i < v.size(); ++i) {
+    out[i + (i / 8)] = v[i] ? '1' : '0';
+  }
+  return out;
+}
+
+void VerifyBoolVectorAndBitmap(const Bitmap& bitmap, const std::vector<bool>& expected) {
+  arrow::BooleanBuilder boolean_builder;
+  ASSERT_OK(boolean_builder.AppendValues(expected));
+  ASSERT_OK_AND_ASSIGN(auto arr, boolean_builder.Finish());
+
+  ASSERT_TRUE(BitmapEquals(bitmap.buffer()->data(), bitmap.offset(),
+                           arr->data()->buffers[1]->data(), 0, expected.size()))
+      << "exp: " << VectorToString(expected) << "\ngot: " << bitmap.ToString();
+}
+
+class TestBitmapVisitAndWriteOutputNoOffset : public ::testing::TestWithParam<int32_t> {};
+
+TEST_P(TestBitmapVisitAndWriteOutputNoOffset, Test1) {
+  auto part = GetParam();
+  int64_t bits = 4 * part;
+  std::vector<bool> data;
+  random_bool_vector(data, bits);

Review comment:
       It's worth noting that we already have some utilities for generating random data. For example, we could reuse `RandomArrayGenerator`
   
   ```c++
     random::RandomArrayGenerator rng(42);
     std::shared_ptr<Buffer> buffer = rng.NullBitmap(bits, 0.5);
   ```

##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -67,5 +69,58 @@ void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_ar
   bits[bytes_end - 1] |= static_cast<uint8_t>(fill_byte & ~last_byte_mask);
 }
 
+template <bool value>
+void SetBitmapImpl(uint8_t* data, int64_t offset, int64_t length) {
+  //                 offset  length
+  // data              |<------------->|
+  //   |--------|...|--------|...|--------|
+  //                   |<--->|   |<--->|
+  //                     pro       epi
+  if (ARROW_PREDICT_FALSE(length == 0)) {
+    return;
+  }
+
+  constexpr uint8_t set_byte = value ? UINT8_MAX : 0;
+
+  int prologue = static_cast<int>(((offset + 7) / 8) * 8 - offset);
+  DCHECK_LT(prologue, 8);
+
+  if (length < prologue) {  // special case where a mask is required
+    //             offset length
+    // data             |<->|
+    //   |--------|...|--------|...
+    //         mask --> |111|
+    //                  |<---->|
+    //                     pro
+    uint8_t mask = BitUtil::kPrecedingBitmask[8 - prologue] ^
+                   BitUtil::kPrecedingBitmask[8 - prologue + length];
+    data[offset / 8] |= mask;

Review comment:
       This seems to only be correct for `set_byte == 0xFF`, since we always set the masked bits. Additionally I think it could be written more clearly:
   ```suggestion
       uint8_t old_bits = BitUtil::LeadingBits(data[offset / 8], offset) |
                          BitUtil::TrailingBits(data[offset / 8], length + offset);
       uint8_t new_bits = BitUtil::TrailingBits(set_byte, offset) &
                          BitUtil::LeadingBits(set_byte, length + offset);
       data[offset / 8] = old_bits | new_bits;
   ```

##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,29 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
 ARROW_EXPORT
 void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
 
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// \brief Clears all bits in the bitmap (set to false)
+ARROW_EXPORT
+void ClearBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+template <typename Word>
+constexpr Word WordBitMask(int i) {
+  return (static_cast<Word>(1) << i) - 1;
+}

Review comment:
       This incurs undefined behavior for `i >= sizeof(Word) * 8` because we'll shift that 1 right out of the integer. Also please add a comment explaining the output, maybe rename too
   ```suggestion
   template <typename Word, Word all = static_cast<Word>(~static_cast<Word>(0))>
   constexpr Word TrailingWordBitmask(int i) {
     return ARROW_PREDICT_FALSE(i >= sizeof(Word) * 8) ? 0 : all << i;
   }
   ```

##########
File path: cpp/src/arrow/compute/kernels/scalar_boolean.cc
##########
@@ -551,14 +552,11 @@ void RegisterScalarBoolean(FunctionRegistry* registry) {
 
   // The Kleene logic kernels cannot write into sliced output bitmaps

Review comment:
       ```suggestion
   ```

##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -67,5 +69,58 @@ void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_ar
   bits[bytes_end - 1] |= static_cast<uint8_t>(fill_byte & ~last_byte_mask);
 }
 
+template <bool value>
+void SetBitmapImpl(uint8_t* data, int64_t offset, int64_t length) {
+  //                 offset  length
+  // data              |<------------->|
+  //   |--------|...|--------|...|--------|
+  //                   |<--->|   |<--->|
+  //                     pro       epi
+  if (ARROW_PREDICT_FALSE(length == 0)) {
+    return;
+  }
+
+  constexpr uint8_t set_byte = value ? UINT8_MAX : 0;
+
+  int prologue = static_cast<int>(((offset + 7) / 8) * 8 - offset);
+  DCHECK_LT(prologue, 8);
+
+  if (length < prologue) {  // special case where a mask is required
+    //             offset length
+    // data             |<->|
+    //   |--------|...|--------|...
+    //         mask --> |111|
+    //                  |<---->|
+    //                     pro
+    uint8_t mask = BitUtil::kPrecedingBitmask[8 - prologue] ^
+                   BitUtil::kPrecedingBitmask[8 - prologue + length];
+    data[offset / 8] |= mask;
+    return;
+  }
+
+  // align to a byte boundary
+  data[offset / 8] = BitUtil::SpliceWord(offset, data[offset / 8], set_byte);
+  offset += prologue;
+  length -= prologue;
+
+  // set values per byte
+  DCHECK_EQ(offset % 8, 0);
+  std::memset(data + offset / 8, set_byte, length / 8);
+  offset += ((length / 8) * 8);
+  length -= ((length / 8) * 8);

Review comment:
       ```suggestion
     offset += BitUtil::RoundDown(length, 8);
     length -= BitUtil::RoundDown(length, 8);
   ```

##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,29 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
 ARROW_EXPORT
 void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
 
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);

Review comment:
       We also have SetBitsTo, which has similar functionality. In a follow up let's benchmark and deduplicate them by rewriting the less efficient one as a wrapper of the other (probably they will have identical performance, in which case we keep whichever is easier to read)

##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
   Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
   Bitmap left_valid = GetBitmap(left_d, 0);
   Bitmap right_valid = GetBitmap(right_d, 0);
-  // sometimes Bitmaps will be ignored, in which case we replace access to them with
-  // duplicated (probably elided) access to cond_data
-  const Bitmap& _ = cond_data;
-
-  // lambda function that will be used inside the visitor
-  uint64_t* out_validity = nullptr;
-  int64_t i = 0;
-  auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
-                   uint64_t r_valid) {
-    out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
-    i++;
-  };
 
   // cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
   // In the following cases, we dont need to allocate out_valid bitmap

Review comment:
       These cases need to be reexamined since we're writing into slices- it's an error to allocate or steal a buffer since we might be discarding values already written to other slices of `output`

##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +248,99 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
     return min_offset;
   }
 
+  /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+  /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+  ///
+  /// All bitmaps must have identical length. The first bit in a visited bitmap
+  /// may be offset within the first visited word, but words will otherwise contain
+  /// densely packed bits loaded from the bitmap. That offset within the first word is
+  /// returned.
+  /// Visitor is expected to have the following signature
+  ///     [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+  ///
+  // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+  // It also has a large prolog / epilog overhead and should be used
+  // carefully in other cases.
+  // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+  // and BitmapUInt64Reader.
+  template <size_t N, size_t M, typename Visitor,
+            typename Word = typename std::decay<
+                internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+  static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+                                 std::array<Bitmap, M>* out_bitmaps_arg,
+                                 Visitor&& visitor) {
+    constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+    int64_t bit_length = BitLength(bitmaps_arg);
+    assert(bit_length == BitLength(*out_bitmaps_arg));
+
+    std::array<BitmapWordReader<Word>, N> readers;
+    for (size_t i = 0; i < N; ++i) {
+      readers[i] = BitmapWordReader<Word>(bitmaps_arg[i].buffer_->data(),
+                                          bitmaps_arg[i].offset_, bitmaps_arg[i].length_);
+    }
+
+    std::array<BitmapWordWriter<Word>, M> writers;
+    for (size_t i = 0; i < M; ++i) {
+      const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+      writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+                                          out_bitmap.offset_, out_bitmap.length_);
+    }
+
+    std::array<Word, N> visited_words;
+    visited_words.fill(0);
+    std::array<Word, M> output_words;
+    output_words.fill(0);
+
+    // every reader will have same number of words, since they are same length'ed
+    // todo this will be inefficient in some cases. When there are offsets beyond Word
+    //  boundary, every Word would have to be created from 2 adjoining Words

Review comment:
       Please open a follow up JIRA for this. One way to handle this would be adding `BitmapWordReader::NextLeadingByte()` so that we can consume bits until at least one bitmap has offset == 0
   ```suggestion
       // TODO($JIRA) this will be inefficient in some cases. When there are offsets beyond Word
       // boundary, every Word would have to be created from 2 adjoining Words
   ```

##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -2156,5 +2187,169 @@ TEST(Bitmap, VisitWordsAnd) {
   }
 }
 
+void random_bool_vector(std::vector<bool>& vec, int64_t size, double p = 0.5) {
+  vec.reserve(size);
+  std::random_device rd;
+  std::mt19937 gen(rd());
+  std::bernoulli_distribution d(p);
+
+  for (int n = 0; n < size; ++n) {
+    vec.push_back(d(gen));
+  }
+}
+
+std::string VectorToString(const std::vector<bool>& v) {
+  std::string out(v.size() + +((v.size() - 1) / 8), ' ');
+  for (size_t i = 0; i < v.size(); ++i) {
+    out[i + (i / 8)] = v[i] ? '1' : '0';
+  }
+  return out;
+}
+
+void VerifyBoolVectorAndBitmap(const Bitmap& bitmap, const std::vector<bool>& expected) {
+  arrow::BooleanBuilder boolean_builder;
+  ASSERT_OK(boolean_builder.AppendValues(expected));
+  ASSERT_OK_AND_ASSIGN(auto arr, boolean_builder.Finish());
+
+  ASSERT_TRUE(BitmapEquals(bitmap.buffer()->data(), bitmap.offset(),
+                           arr->data()->buffers[1]->data(), 0, expected.size()))
+      << "exp: " << VectorToString(expected) << "\ngot: " << bitmap.ToString();
+}
+
+class TestBitmapVisitAndWriteOutputNoOffset : public ::testing::TestWithParam<int32_t> {};
+
+TEST_P(TestBitmapVisitAndWriteOutputNoOffset, Test1) {
+  auto part = GetParam();
+  int64_t bits = 4 * part;
+  std::vector<bool> data;
+  random_bool_vector(data, bits);

Review comment:
       Also, if you use `Bitmap` consistently instead of `std::vector<bool>` you can reuse `Bitmap::{Equals, ToString, Diff}`

##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +248,99 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
     return min_offset;
   }
 
+  /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+  /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+  ///
+  /// All bitmaps must have identical length. The first bit in a visited bitmap
+  /// may be offset within the first visited word, but words will otherwise contain
+  /// densely packed bits loaded from the bitmap. That offset within the first word is
+  /// returned.
+  /// Visitor is expected to have the following signature
+  ///     [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+  ///
+  // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+  // It also has a large prolog / epilog overhead and should be used
+  // carefully in other cases.
+  // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+  // and BitmapUInt64Reader.
+  template <size_t N, size_t M, typename Visitor,
+            typename Word = typename std::decay<
+                internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+  static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+                                 std::array<Bitmap, M>* out_bitmaps_arg,
+                                 Visitor&& visitor) {
+    constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+    int64_t bit_length = BitLength(bitmaps_arg);
+    assert(bit_length == BitLength(*out_bitmaps_arg));
+
+    std::array<BitmapWordReader<Word>, N> readers;
+    for (size_t i = 0; i < N; ++i) {
+      readers[i] = BitmapWordReader<Word>(bitmaps_arg[i].buffer_->data(),
+                                          bitmaps_arg[i].offset_, bitmaps_arg[i].length_);
+    }
+
+    std::array<BitmapWordWriter<Word>, M> writers;
+    for (size_t i = 0; i < M; ++i) {
+      const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+      writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+                                          out_bitmap.offset_, out_bitmap.length_);
+    }
+
+    std::array<Word, N> visited_words;
+    visited_words.fill(0);
+    std::array<Word, M> output_words;
+    output_words.fill(0);
+
+    // every reader will have same number of words, since they are same length'ed
+    // todo this will be inefficient in some cases. When there are offsets beyond Word
+    //  boundary, every Word would have to be created from 2 adjoining Words

Review comment:
       Additionally, IMO this usage of `/BitmapWord(Reader|Writer)/` is more readable than the implementation of VisitWords. Maybe not in this PR, but: let's rewrite/remove the complicated one

##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -1975,6 +1975,37 @@ TEST(BitUtil, BitsetStack) {
   ASSERT_EQ(stack.TopSize(), 0);
 }
 
+template <typename Word>
+void CheckSplice(int n, Word low, Word high) {
+  std::bitset<sizeof(Word) * 8> ret;
+  for (size_t i = 0; i < ret.size(); i++) {
+    ret[i] = i < static_cast<size_t>(n)
+                 ? BitUtil::GetBit(reinterpret_cast<uint8_t*>(&low), i)
+                 : BitUtil::GetBit(reinterpret_cast<uint8_t*>(&high), i);
+  }
+
+  ASSERT_EQ(static_cast<Word>(ret.to_ulong()), BitUtil::SpliceWord(n, low, high));

Review comment:
       For GTest assertions, the expected value is on the right
   ```suggestion
     ASSERT_EQ(BitUtil::SpliceWord(n, low, high), static_cast<Word>(ret.to_ulong());
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r652054092



##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +248,99 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
     return min_offset;
   }
 
+  /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+  /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+  ///
+  /// All bitmaps must have identical length. The first bit in a visited bitmap
+  /// may be offset within the first visited word, but words will otherwise contain
+  /// densely packed bits loaded from the bitmap. That offset within the first word is
+  /// returned.
+  /// Visitor is expected to have the following signature
+  ///     [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+  ///
+  // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+  // It also has a large prolog / epilog overhead and should be used
+  // carefully in other cases.
+  // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+  // and BitmapUInt64Reader.
+  template <size_t N, size_t M, typename Visitor,
+            typename Word = typename std::decay<
+                internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+  static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+                                 std::array<Bitmap, M>* out_bitmaps_arg,
+                                 Visitor&& visitor) {
+    constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+    int64_t bit_length = BitLength(bitmaps_arg);
+    assert(bit_length == BitLength(*out_bitmaps_arg));
+
+    std::array<BitmapWordReader<Word>, N> readers;
+    for (size_t i = 0; i < N; ++i) {
+      readers[i] = BitmapWordReader<Word>(bitmaps_arg[i].buffer_->data(),
+                                          bitmaps_arg[i].offset_, bitmaps_arg[i].length_);
+    }
+
+    std::array<BitmapWordWriter<Word>, M> writers;
+    for (size_t i = 0; i < M; ++i) {
+      const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+      writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+                                          out_bitmap.offset_, out_bitmap.length_);
+    }
+
+    std::array<Word, N> visited_words;
+    visited_words.fill(0);
+    std::array<Word, M> output_words;
+    output_words.fill(0);
+
+    // every reader will have same number of words, since they are same length'ed
+    // todo this will be inefficient in some cases. When there are offsets beyond Word
+    //  boundary, every Word would have to be created from 2 adjoining Words
+    auto n_words = readers[0].words();
+    bit_length -= n_words * kBitWidth;
+    while (n_words--) {
+      // first collect all words to visited_words array
+      for (size_t i = 0; i < N; i++) {
+        visited_words[i] = readers[i].NextWord();
+      }
+      visitor(visited_words, &output_words);
+      for (size_t i = 0; i < M; i++) {
+        writers[i].PutNextWord(output_words[i]);
+      }
+    }
+
+    // every reader will have same number of trailing bytes, because of the above reason
+    // tailing portion could be more than one word! (ref: BitmapWordReader constructor)
+    // remaining full/ partial words to write
+
+    if (bit_length) {
+      // convert the word visitor lambda to a byte_visitor
+      auto byte_visitor = [&](const std::array<uint8_t, N>& in,
+                              std::array<uint8_t, M>* out) {
+        std::array<Word, N> in_words;
+        std::array<Word, M> out_words;
+        std::copy(in.begin(), in.end(), in_words.begin());
+        visitor(in_words, &out_words);
+        std::move(out_words.begin(), out_words.end(), out->begin());

Review comment:
       Looks like this is prompts a conversion warning on MSVC
   https://github.com/apache/arrow/pull/10487/checks?check_run_id=2822703161#step:7:774
   You'll have to write the loop out with explicit casts




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866302771


   @ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-scalar-boolean-benchmark 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870113163


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89f7711b6599405da0c61bdebf2bffe0/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...755e6720036342f3ad213715e1df19b2/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...ba6f391352984a39a9a0e82aed658680/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870775047


   In any case, I don't think the regressions are really terrible in themselves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022871


   Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...a877957e35804b7197350884401e0e8c/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...b65a39a10c8a4c1ea3ae7751947bf36b/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...0e500f8140fb46e9bc9371d490d0fc61/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels

Posted by GitBox <gi...@apache.org>.

nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870927973


   Did some further analysis on this. It turns out that gcc-10 works much better. 
   https://gist.github.com/nirandaperera/0bcd40c223fd32105d027a86a571334f
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org