You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/09 04:24:38 UTC
[GitHub] [arrow] nirandaperera opened a new pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
nirandaperera opened a new pull request #10487:
URL: https://github.com/apache/arrow/pull/10487
This change adds a `Bitmap::VisitWordsAndWrite` method, that outputs the values of the visitor lambda function to a provided bitmap.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
bkietz commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861737568
@ursabot please benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870112738
@ursabot please benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r654451587
##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -2156,5 +2187,169 @@ TEST(Bitmap, VisitWordsAnd) {
}
}
+void random_bool_vector(std::vector<bool>& vec, int64_t size, double p = 0.5) {
+ vec.reserve(size);
+ std::random_device rd;
+ std::mt19937 gen(rd());
+ std::bernoulli_distribution d(p);
+
+ for (int n = 0; n < size; ++n) {
+ vec.push_back(d(gen));
+ }
+}
+
+std::string VectorToString(const std::vector<bool>& v) {
+ std::string out(v.size() + +((v.size() - 1) / 8), ' ');
+ for (size_t i = 0; i < v.size(); ++i) {
+ out[i + (i / 8)] = v[i] ? '1' : '0';
+ }
+ return out;
+}
+
+void VerifyBoolVectorAndBitmap(const Bitmap& bitmap, const std::vector<bool>& expected) {
+ arrow::BooleanBuilder boolean_builder;
+ ASSERT_OK(boolean_builder.AppendValues(expected));
+ ASSERT_OK_AND_ASSIGN(auto arr, boolean_builder.Finish());
+
+ ASSERT_TRUE(BitmapEquals(bitmap.buffer()->data(), bitmap.offset(),
+ arr->data()->buffers[1]->data(), 0, expected.size()))
+ << "exp: " << VectorToString(expected) << "\ngot: " << bitmap.ToString();
+}
+
+class TestBitmapVisitAndWriteOutputNoOffset : public ::testing::TestWithParam<int32_t> {};
+
+TEST_P(TestBitmapVisitAndWriteOutputNoOffset, Test1) {
+ auto part = GetParam();
+ int64_t bits = 4 * part;
+ std::vector<bool> data;
+ random_bool_vector(data, bits);
Review comment:
I dont know why I didn't do that previously! :joy:
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
Bitmap left_valid = GetBitmap(left_d, 0);
Bitmap right_valid = GetBitmap(right_d, 0);
- // sometimes Bitmaps will be ignored, in which case we replace access to them with
- // duplicated (probably elided) access to cond_data
- const Bitmap& _ = cond_data;
-
- // lambda function that will be used inside the visitor
- uint64_t* out_validity = nullptr;
- int64_t i = 0;
- auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
- uint64_t r_valid) {
- out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
- i++;
- };
// cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
// In the following cases, we dont need to allocate out_valid bitmap
Review comment:
One thing to note here is, `if-else` kernels have the following,
```c++
kernel.null_handling = NullHandling::COMPUTED_NO_PREALLOCATE;
kernel.mem_allocation = MemAllocation::NO_PREALLOCATE;
```
So, I allocating buffers are entirely handled by the kernel. So, I believe writing into slices should work without a problem IMO.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982118
@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-scalar-boolean-benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870774320
> ArrayArrayKernel<KleeneAnd>/524288/0 15.853 GiB/sec 14.343 GiB/sec -9.527 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}
Regression for `ArrayArrayKernel<KleeneAnd>/32768/0` and `ArrayArrayKernel<KleeneAnd>/524288/0` is expected because we are populating a validity buffer always now, because the `exec` infrastructure always allocates memory for the validity buffer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r654838986
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
Bitmap left_valid = GetBitmap(left_d, 0);
Bitmap right_valid = GetBitmap(right_d, 0);
- // sometimes Bitmaps will be ignored, in which case we replace access to them with
- // duplicated (probably elided) access to cond_data
- const Bitmap& _ = cond_data;
-
- // lambda function that will be used inside the visitor
- uint64_t* out_validity = nullptr;
- int64_t i = 0;
- auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
- uint64_t r_valid) {
- out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
- i++;
- };
// cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
// In the following cases, we dont need to allocate out_valid bitmap
Review comment:
Got it! AFAIU only the fixed sized typed `if-else` kernels qualify for `can_write_into_slices` category then, isn't it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861738074
Benchmark runs are scheduled for baseline = 39dcb43dd26df51391f0da14b6e6285d612f3829 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/7c809fd9d9af45b6ac360f31063f84d2...063924f33a8e4656b1ce37a3e9229ee8/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/ea65cb2d35d741f0abaa1cc1e5481373...ba93afcabf9643169ffcd668f08c85ba/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/086d84702ca54d9a8d3978d18b4ec2f6...b76a1d0626d44a77a7dac62cc0a234c6/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861738074
Benchmark runs are scheduled for baseline = 39dcb43dd26df51391f0da14b6e6285d612f3829 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/7c809fd9d9af45b6ac360f31063f84d2...063924f33a8e4656b1ce37a3e9229ee8/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/ea65cb2d35d741f0abaa1cc1e5481373...ba93afcabf9643169ffcd668f08c85ba/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/086d84702ca54d9a8d3978d18b4ec2f6...b76a1d0626d44a77a7dac62cc0a234c6/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982633
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 788bd495a8ca8c180355e9066387824fc972d734. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Only ['lang', 'name'] filters are supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...f1f573636c5940ad9704c7af4a54c280/)
[Skipped :warning: Only ['lang', 'name'] filters are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...379b66a7f5e0453c8a2c99fd6fe70503/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...38bfe9c78594477a955b82ca1a40a3fd/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022798
@ursabot please benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299350
@ursabot please benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299978
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89927876215449bcb32cfb06a3abdc28/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...5d8b14de410341159529bec47a1e8e90/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...62ad9efc3cba4d1b90cb7a5386ba261a/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-859187060
@github-actions autotune
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870113163
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89f7711b6599405da0c61bdebf2bffe0/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...755e6720036342f3ad213715e1df19b2/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...ba6f391352984a39a9a0e82aed658680/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022871
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...a877957e35804b7197350884401e0e8c/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...b65a39a10c8a4c1ea3ae7751947bf36b/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...0e500f8140fb46e9bc9371d490d0fc61/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869980146
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869980146
@bkietz I added the changes we discussed. Following are the latest bench results in my machine.
```
Results for master commit: c7c959a26a6512b0ad078a06df474617f1b306aa
----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------------------
ArrayArrayKernel<And>/32768/10000 9.21 us 9.21 us 69999 bytes_per_second=3.31485G/s items_per_second=28.4743G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<And>/32768/100 9.11 us 9.11 us 74446 bytes_per_second=3.35138G/s items_per_second=28.7882G/s null_percent=1 size=32.768k
ArrayArrayKernel<And>/32768/10 8.06 us 8.06 us 103589 bytes_per_second=3.78621G/s items_per_second=32.5233G/s null_percent=10 size=32.768k
ArrayArrayKernel<And>/32768/2 7.27 us 7.27 us 96371 bytes_per_second=4.19765G/s items_per_second=36.0575G/s null_percent=50 size=32.768k
ArrayArrayKernel<And>/32768/1 8.96 us 8.95 us 91746 bytes_per_second=3.40815G/s items_per_second=29.2758G/s null_percent=100 size=32.768k
ArrayArrayKernel<And>/32768/0 7.87 us 7.87 us 78959 bytes_per_second=3.87712G/s items_per_second=33.3042G/s null_percent=0 size=32.768k
ArrayArrayKernel<And>/1048576/10000 335 us 335 us 2080 bytes_per_second=2.91231G/s items_per_second=25.0165G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<And>/1048576/100 334 us 334 us 2098 bytes_per_second=2.92463G/s items_per_second=25.1224G/s null_percent=1 size=1048.58k
ArrayArrayKernel<And>/1048576/10 336 us 336 us 2089 bytes_per_second=2.90452G/s items_per_second=24.9496G/s null_percent=10 size=1048.58k
ArrayArrayKernel<And>/1048576/2 336 us 336 us 2077 bytes_per_second=2.90715G/s items_per_second=24.9722G/s null_percent=50 size=1048.58k
ArrayArrayKernel<And>/1048576/1 238 us 238 us 2944 bytes_per_second=4.10794G/s items_per_second=35.2869G/s null_percent=100 size=1048.58k
ArrayArrayKernel<And>/1048576/0 239 us 239 us 2932 bytes_per_second=4.0919G/s items_per_second=35.1491G/s null_percent=0 size=1048.58k
ArrayArrayKernel<KleeneAnd>/32768/10000 15.7 us 15.7 us 44167 bytes_per_second=1.9399G/s items_per_second=16.6637G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/100 15.7 us 15.7 us 44447 bytes_per_second=1.94457G/s items_per_second=16.7038G/s null_percent=1 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/10 15.7 us 15.7 us 44661 bytes_per_second=1.94512G/s items_per_second=16.7085G/s null_percent=10 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/2 15.7 us 15.7 us 44260 bytes_per_second=1.94383G/s items_per_second=16.6974G/s null_percent=50 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/1 15.6 us 15.6 us 43756 bytes_per_second=1.95105G/s items_per_second=16.7594G/s null_percent=100 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/0 5.28 us 5.28 us 136108 bytes_per_second=5.78137G/s items_per_second=49.6616G/s null_percent=0 size=32.768k
ArrayArrayKernel<KleeneAnd>/1048576/10000 483 us 483 us 1447 bytes_per_second=2.02148G/s items_per_second=17.3643G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/100 484 us 484 us 1447 bytes_per_second=2.0196G/s items_per_second=17.3482G/s null_percent=1 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/10 483 us 483 us 1445 bytes_per_second=2.02036G/s items_per_second=17.3548G/s null_percent=10 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/2 484 us 484 us 1448 bytes_per_second=2.0172G/s items_per_second=17.3276G/s null_percent=50 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/1 484 us 484 us 1448 bytes_per_second=2.01958G/s items_per_second=17.3481G/s null_percent=100 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/0 198 us 198 us 3541 bytes_per_second=4.93652G/s items_per_second=42.4044G/s null_percent=0 size=1048.58k
Results for the PR:
----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------------------
ArrayArrayKernel<And>/32768/10000 8.64 us 8.64 us 82671 bytes_per_second=3.53361G/s items_per_second=30.3535G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<And>/32768/100 10.4 us 10.2 us 68076 bytes_per_second=2.99687G/s items_per_second=25.7429G/s null_percent=1 size=32.768k
ArrayArrayKernel<And>/32768/10 10.4 us 10.4 us 64431 bytes_per_second=2.94554G/s items_per_second=25.302G/s null_percent=10 size=32.768k
ArrayArrayKernel<And>/32768/2 7.92 us 7.92 us 102414 bytes_per_second=3.85474G/s items_per_second=33.1119G/s null_percent=50 size=32.768k
ArrayArrayKernel<And>/32768/1 7.37 us 7.37 us 97390 bytes_per_second=4.14169G/s items_per_second=35.5769G/s null_percent=100 size=32.768k
ArrayArrayKernel<And>/32768/0 6.48 us 6.48 us 106870 bytes_per_second=4.71213G/s items_per_second=40.4769G/s null_percent=0 size=32.768k
ArrayArrayKernel<And>/1048576/10000 336 us 336 us 2086 bytes_per_second=2.90622G/s items_per_second=24.9642G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<And>/1048576/100 333 us 333 us 2091 bytes_per_second=2.93627G/s items_per_second=25.2224G/s null_percent=1 size=1048.58k
ArrayArrayKernel<And>/1048576/10 333 us 333 us 2101 bytes_per_second=2.93204G/s items_per_second=25.186G/s null_percent=10 size=1048.58k
ArrayArrayKernel<And>/1048576/2 333 us 333 us 2104 bytes_per_second=2.93041G/s items_per_second=25.172G/s null_percent=50 size=1048.58k
ArrayArrayKernel<And>/1048576/1 238 us 238 us 2947 bytes_per_second=4.10724G/s items_per_second=35.2809G/s null_percent=100 size=1048.58k
ArrayArrayKernel<And>/1048576/0 239 us 239 us 2939 bytes_per_second=4.09211G/s items_per_second=35.1509G/s null_percent=0 size=1048.58k
ArrayArrayKernel<KleeneAnd>/32768/10000 13.4 us 13.4 us 52481 bytes_per_second=2.28086G/s items_per_second=19.5925G/s null_percent=0.01 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/100 13.3 us 13.3 us 52642 bytes_per_second=2.28768G/s items_per_second=19.651G/s null_percent=1 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/10 13.3 us 13.3 us 52251 bytes_per_second=2.29669G/s items_per_second=19.7284G/s null_percent=10 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/2 13.3 us 13.3 us 52684 bytes_per_second=2.29603G/s items_per_second=19.7227G/s null_percent=50 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/1 13.4 us 13.4 us 52555 bytes_per_second=2.28596G/s items_per_second=19.6362G/s null_percent=100 size=32.768k
ArrayArrayKernel<KleeneAnd>/32768/0 6.10 us 6.10 us 118983 bytes_per_second=5.00416G/s items_per_second=42.9854G/s null_percent=0 size=32.768k
ArrayArrayKernel<KleeneAnd>/1048576/10000 388 us 388 us 1807 bytes_per_second=2.51753G/s items_per_second=21.6254G/s null_percent=0.01 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/100 389 us 389 us 1805 bytes_per_second=2.50864G/s items_per_second=21.5491G/s null_percent=1 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/10 390 us 390 us 1804 bytes_per_second=2.50681G/s items_per_second=21.5333G/s null_percent=10 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/2 391 us 391 us 1803 bytes_per_second=2.49759G/s items_per_second=21.4541G/s null_percent=50 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/1 388 us 388 us 1807 bytes_per_second=2.51458G/s items_per_second=21.6001G/s null_percent=100 size=1048.58k
ArrayArrayKernel<KleeneAnd>/1048576/0 238 us 238 us 2939 bytes_per_second=4.11143G/s items_per_second=35.3169G/s null_percent=0 size=1048.58k
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870774752
@nirandaperera I see, thanks for the insight.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870773753
There are more regressions with gcc 9, though:
```
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (2)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ArrayArrayKernel<KleeneAnd>/524288/1 2.149 GiB/sec 2.318 GiB/sec 7.887 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3713, 'null_percent': 100.0}
ArrayArrayKernel<KleeneAnd>/524288/2 2.249 GiB/sec 2.262 GiB/sec 0.569 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4816, 'null_percent': 50.0}
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (10)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ArrayArrayKernel<KleeneAnd>/32768/0 7.645 GiB/sec 6.972 GiB/sec -8.804 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 178864, 'null_percent': 0.0}
ArrayArrayKernel<KleeneAnd>/524288/0 16.468 GiB/sec 13.337 GiB/sec -19.014 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 22855, 'null_percent': 0.0}
ArrayArrayKernel<KleeneAnd>/524288/10 2.801 GiB/sec 2.040 GiB/sec -27.172 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4221, 'null_percent': 10.0}
ArrayArrayKernel<KleeneAnd>/524288/100 2.984 GiB/sec 2.147 GiB/sec -28.031 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5191, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/524288/10000 3.224 GiB/sec 2.250 GiB/sec -30.221 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5014, 'null_percent': 0.01}
ArrayArrayKernel<KleeneAnd>/32768/10 2.991 GiB/sec 1.970 GiB/sec -34.143 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68647, 'null_percent': 10.0}
ArrayArrayKernel<KleeneAnd>/32768/10000 2.991 GiB/sec 1.964 GiB/sec -34.335 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68084, 'null_percent': 0.01}
ArrayArrayKernel<KleeneAnd>/32768/2 2.999 GiB/sec 1.969 GiB/sec -34.335 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68549, 'null_percent': 50.0}
ArrayArrayKernel<KleeneAnd>/32768/1 2.997 GiB/sec 1.966 GiB/sec -34.402 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68530, 'null_percent': 100.0}
ArrayArrayKernel<KleeneAnd>/32768/100 3.000 GiB/sec 1.967 GiB/sec -34.414 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 68686, 'null_percent': 1.0}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861738074
Benchmark runs are scheduled for baseline = 39dcb43dd26df51391f0da14b6e6285d612f3829 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/7c809fd9d9af45b6ac360f31063f84d2...063924f33a8e4656b1ce37a3e9229ee8/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/ea65cb2d35d741f0abaa1cc1e5481373...ba93afcabf9643169ffcd668f08c85ba/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/086d84702ca54d9a8d3978d18b4ec2f6...b76a1d0626d44a77a7dac62cc0a234c6/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-861738074
Benchmark runs are scheduled for baseline = 39dcb43dd26df51391f0da14b6e6285d612f3829 and contender = d0cead51aa9eb0f5e4a9b094236632ffdb8c4436. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/7c809fd9d9af45b6ac360f31063f84d2...063924f33a8e4656b1ce37a3e9229ee8/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/ea65cb2d35d741f0abaa1cc1e5481373...ba93afcabf9643169ffcd668f08c85ba/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/086d84702ca54d9a8d3978d18b4ec2f6...b76a1d0626d44a77a7dac62cc0a234c6/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r655697111
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
Bitmap left_valid = GetBitmap(left_d, 0);
Bitmap right_valid = GetBitmap(right_d, 0);
- // sometimes Bitmaps will be ignored, in which case we replace access to them with
- // duplicated (probably elided) access to cond_data
- const Bitmap& _ = cond_data;
-
- // lambda function that will be used inside the visitor
- uint64_t* out_validity = nullptr;
- int64_t i = 0;
- auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
- uint64_t r_valid) {
- out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
- i++;
- };
// cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
// In the following cases, we dont need to allocate out_valid bitmap
Review comment:
@bkietz I have added the changes required. Could you please review the changes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r654669300
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
Bitmap left_valid = GetBitmap(left_d, 0);
Bitmap right_valid = GetBitmap(right_d, 0);
- // sometimes Bitmaps will be ignored, in which case we replace access to them with
- // duplicated (probably elided) access to cond_data
- const Bitmap& _ = cond_data;
-
- // lambda function that will be used inside the visitor
- uint64_t* out_validity = nullptr;
- int64_t i = 0;
- auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
- uint64_t r_valid) {
- out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
- i++;
- };
// cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
// In the following cases, we dont need to allocate out_valid bitmap
Review comment:
Writing into slices of a preallocated output cannot proceed if the kernel does its own allocation of output: https://github.com/bkietz/arrow/blob/282c5aa713011209d6aa76e2ef5996a17161575d/cpp/src/arrow/compute/exec.cc#L703-L711
Effectively, `can_write_into_slices` will be ignored until you have
```c++
kernel.null_handling = NullHandling::COMPUTED_PREALLOCATE;
kernel.mem_allocation = MemAllocation::PREALLOCATE;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870768509
Locally, I seem to get varying results from run to run (and also depending on the compiler), but `archery benchmark diff` doesn't show very worrying regressions with clang 10:
```
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (11)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ArrayArrayKernel<KleeneAnd>/524288/1 4.309 GiB/sec 5.868 GiB/sec 36.176 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 100.0}
ArrayArrayKernel<KleeneAnd>/524288/10 4.337 GiB/sec 5.888 GiB/sec 35.749 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 10.0}
ArrayArrayKernel<KleeneAnd>/524288/100 4.335 GiB/sec 5.850 GiB/sec 34.939 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6232, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/524288/10000 4.318 GiB/sec 5.755 GiB/sec 33.290 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6221, 'null_percent': 0.01}
ArrayArrayKernel<KleeneAnd>/524288/2 4.365 GiB/sec 5.762 GiB/sec 31.994 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6200, 'null_percent': 50.0}
ArrayArrayKernel<KleeneAnd>/32768/1 3.204 GiB/sec 4.220 GiB/sec 31.726 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 75304, 'null_percent': 100.0}
ArrayArrayKernel<KleeneAnd>/32768/2 3.370 GiB/sec 4.212 GiB/sec 24.961 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77611, 'null_percent': 50.0}
ArrayArrayKernel<KleeneAnd>/32768/10000 3.360 GiB/sec 4.197 GiB/sec 24.906 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76897, 'null_percent': 0.01}
ArrayArrayKernel<KleeneAnd>/32768/10 3.373 GiB/sec 4.208 GiB/sec 24.745 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76903, 'null_percent': 10.0}
ArrayArrayKernel<KleeneAnd>/32768/100 3.383 GiB/sec 4.202 GiB/sec 24.212 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77559, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/32768/0 7.838 GiB/sec 7.956 GiB/sec 1.499 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 187182, 'null_percent': 0.0}
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (1)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ArrayArrayKernel<KleeneAnd>/524288/0 15.853 GiB/sec 14.343 GiB/sec -9.527 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}
```
(on Ubuntu 20.04, AMD Ryzen 3900)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870800492
I got archery running on my machine and I can confirm that gcc-9 is the problem there. If I use clang-10, it shows a better performance. But gcc-9 shows a lot of regressions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r660081336
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else_test.cc
##########
@@ -56,8 +57,12 @@ TYPED_TEST(TestIfElsePrimitive, IfElseFixedSizeRand) {
random::RandomArrayGenerator rand(/*seed=*/0);
int64_t len = 1000;
- auto cond = std::static_pointer_cast<BooleanArray>(
- rand.ArrayOf(boolean(), len, /*null_probability=*/0.01));
+ ASSERT_OK_AND_ASSIGN(auto temp1, MakeArrayFromScalar(BooleanScalar(true), 64));
Review comment:
Please add a comment describing what's useful about this construction
##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +243,132 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
return min_offset;
}
+ template <size_t N, size_t M, typename ReaderT, typename WriterT, typename Visitor,
+ typename Word = typename std::decay<
+ internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+ static void RunVisitWordsAndWriteLoop(int64_t bit_length,
+ std::array<ReaderT, N>& readers,
+ std::array<WriterT, M>& writers,
+ Visitor&& visitor) {
+ constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+ std::array<Word, N> visited_words;
+ std::array<Word, M> output_words;
+
+ // every reader will have same number of words, since they are same length'ed
+ // TODO($JIRA) this will be inefficient in some cases. When there are offsets beyond
+ // Word boundary, every Word would have to be created from 2 adjoining Words
+ auto n_words = readers[0].words();
+ bit_length -= n_words * kBitWidth;
+ while (n_words--) {
+ // first collect all words to visited_words array
+ for (size_t i = 0; i < N; i++) {
+ visited_words[i] = readers[i].NextWord();
+ }
+ visitor(visited_words, &output_words);
+ for (size_t i = 0; i < M; i++) {
+ writers[i].PutNextWord(output_words[i]);
+ }
+ }
+
+ // every reader will have same number of trailing bytes, because of the above reason
+ // tailing portion could be more than one word! (ref: BitmapWordReader constructor)
+ // remaining full/ partial words to write
+
+ if (bit_length) {
+ // convert the word visitor lambda to a byte_visitor
+ auto byte_visitor = [&](const std::array<uint8_t, N>& in,
+ std::array<uint8_t, M>* out) {
+ std::array<Word, N> in_words;
+ std::array<Word, M> out_words;
+ std::copy(in.begin(), in.end(), in_words.begin());
+ visitor(in_words, &out_words);
+ for (size_t i = 0; i < M; i++) {
+ out->at(i) = static_cast<uint8_t>(out_words[i]);
+ }
+ };
+
+ std::array<uint8_t, N> visited_bytes;
+ std::array<uint8_t, M> output_bytes;
+ int n_bytes = readers[0].trailing_bytes();
+ while (n_bytes--) {
+ visited_bytes.fill(0);
+ output_bytes.fill(0);
+ int valid_bits;
+ for (size_t i = 0; i < N; i++) {
+ visited_bytes[i] = readers[i].NextTrailingByte(valid_bits);
+ }
+ byte_visitor(visited_bytes, &output_bytes);
+ for (size_t i = 0; i < M; i++) {
+ writers[i].PutNextTrailingByte(output_bytes[i], valid_bits);
+ }
+ }
+ }
+ }
+
+ /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+ /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+ ///
+ /// All bitmaps must have identical length. The first bit in a visited bitmap
+ /// may be offset within the first visited word, but words will otherwise contain
+ /// densely packed bits loaded from the bitmap. That offset within the first word is
+ /// returned.
+ /// Visitor is expected to have the following signature
+ /// [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+ ///
+ // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+ // It also has a large prolog / epilog overhead and should be used
+ // carefully in other cases.
+ // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+ // and BitmapUInt64Reader.
+ template <size_t N, size_t M, typename Visitor,
+ typename Word = typename std::decay<
+ internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+ static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+ std::array<Bitmap, M>* out_bitmaps_arg,
+ Visitor&& visitor) {
+ int64_t bit_length = BitLength(bitmaps_arg);
+ assert(bit_length == BitLength(*out_bitmaps_arg));
+
+ // if both input and output bitmaps have no byte offset, then use special template
+ if (std::all_of(bitmaps_arg.begin(), bitmaps_arg.end(),
+ [](const Bitmap& b) { return b.offset_ % 8 == 0; }) &&
+ std::all_of(out_bitmaps_arg->begin(), out_bitmaps_arg->end(),
+ [](const Bitmap& b) { return b.offset_ % 8 == 0; })) {
+ std::array<BitmapWordReader<Word, /*may_have_byte_offset=*/false>, N> readers;
+ for (size_t i = 0; i < N; ++i) {
+ const Bitmap& in_bitmap = bitmaps_arg[i];
+ readers[i] = BitmapWordReader<Word, /*may_have_byte_offset=*/false>(
+ in_bitmap.buffer_->data(), in_bitmap.offset_, in_bitmap.length_);
+ }
+
+ std::array<BitmapWordWriter<Word, /*may_have_byte_offset=*/false>, M> writers;
+ for (size_t i = 0; i < M; ++i) {
+ const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+ writers[i] = BitmapWordWriter<Word, /*may_have_byte_offset=*/false>(
+ out_bitmap.buffer_->mutable_data(), out_bitmap.offset_, out_bitmap.length_);
+ }
+
+ RunVisitWordsAndWriteLoop(bit_length, readers, writers, std::move(visitor));
+ } else {
+ std::array<BitmapWordReader<Word>, N> readers;
+ for (size_t i = 0; i < N; ++i) {
+ const Bitmap& in_bitmap = bitmaps_arg[i];
+ readers[i] = BitmapWordReader<Word>(in_bitmap.buffer_->data(), in_bitmap.offset_,
+ in_bitmap.length_);
+ }
+
+ std::array<BitmapWordWriter<Word>, M> writers;
+ for (size_t i = 0; i < M; ++i) {
+ const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+ writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+ out_bitmap.offset_, out_bitmap.length_);
+ }
+
+ RunVisitWordsAndWriteLoop(bit_length, readers, writers, std::move(visitor));
Review comment:
```suggestion
RunVisitWordsAndWriteLoop(bit_length, readers, writers, visitor);
```
##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,37 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
ARROW_EXPORT
void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// \brief Clears all bits in the bitmap (set to false)
+ARROW_EXPORT
+void ClearBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// Returns a mask with lower i bits set to 1. If i >= sizeof(Word)*8, all-ones will be
+/// returned
+/// ex:
+/// PrecedingWordBitmask<uint_8>(0)= 0x00
+/// PrecedingWordBitmask<uint_8>(4)= 0x0f
+/// PrecedingWordBitmask<uint_8>(8)= 0xff
+/// PrecedingWordBitmask<uint_32>(8)= 0x00ff
+/// ref: https://stackoverflow.com/a/59523400
+template <typename Word>
+constexpr Word PrecedingWordBitmask(unsigned int const i) {
+ return (static_cast<Word>(i < sizeof(Word) * 8) << (i & (sizeof(Word) * 8 - 1))) - 1;
+}
Review comment:
Since this is constexpr, the examples can be written as static assertions
```suggestion
/// ref: https://stackoverflow.com/a/59523400
template <typename Word>
constexpr Word PrecedingWordBitmask(unsigned int const i) {
return (static_cast<Word>(i < sizeof(Word) * 8) << (i & (sizeof(Word) * 8 - 1))) - 1;
}
static_assert(PrecedingWordBitmask<uint8_t>(0) == 0x00, "");
static_assert(PrecedingWordBitmask<uint8_t>(4) == 0x0f, "");
static_assert(PrecedingWordBitmask<uint8_t>(8) == 0xff, "");
static_assert(PrecedingWordBitmask<uint16_t>(8) == 0x00ff, "");
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870113163
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89f7711b6599405da0c61bdebf2bffe0/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...755e6720036342f3ad213715e1df19b2/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...ba6f391352984a39a9a0e82aed658680/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870768509
Locally, I seem to get varying results from run to run (and also depending on the compiler), but `archery benchmark diff` doesn't show very worrying regressions:
```
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (11)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ArrayArrayKernel<KleeneAnd>/524288/1 4.309 GiB/sec 5.868 GiB/sec 36.176 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 100.0}
ArrayArrayKernel<KleeneAnd>/524288/10 4.337 GiB/sec 5.888 GiB/sec 35.749 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6226, 'null_percent': 10.0}
ArrayArrayKernel<KleeneAnd>/524288/100 4.335 GiB/sec 5.850 GiB/sec 34.939 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6232, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/524288/10000 4.318 GiB/sec 5.755 GiB/sec 33.290 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6221, 'null_percent': 0.01}
ArrayArrayKernel<KleeneAnd>/524288/2 4.365 GiB/sec 5.762 GiB/sec 31.994 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 6200, 'null_percent': 50.0}
ArrayArrayKernel<KleeneAnd>/32768/1 3.204 GiB/sec 4.220 GiB/sec 31.726 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 75304, 'null_percent': 100.0}
ArrayArrayKernel<KleeneAnd>/32768/2 3.370 GiB/sec 4.212 GiB/sec 24.961 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77611, 'null_percent': 50.0}
ArrayArrayKernel<KleeneAnd>/32768/10000 3.360 GiB/sec 4.197 GiB/sec 24.906 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76897, 'null_percent': 0.01}
ArrayArrayKernel<KleeneAnd>/32768/10 3.373 GiB/sec 4.208 GiB/sec 24.745 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 76903, 'null_percent': 10.0}
ArrayArrayKernel<KleeneAnd>/32768/100 3.383 GiB/sec 4.202 GiB/sec 24.212 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 77559, 'null_percent': 1.0}
ArrayArrayKernel<KleeneAnd>/32768/0 7.838 GiB/sec 7.956 GiB/sec 1.499 {'run_name': 'ArrayArrayKernel<KleeneAnd>/32768/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 187182, 'null_percent': 0.0}
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (1)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
benchmark baseline contender change % counters
ArrayArrayKernel<KleeneAnd>/524288/0 15.853 GiB/sec 14.343 GiB/sec -9.527 {'run_name': 'ArrayArrayKernel<KleeneAnd>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23504, 'null_percent': 0.0}
```
(on Ubuntu 20.04, AMD Ryzen 3900)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870774464
What's weird as well is that, sometimes, L2-sized benchmarks are faster than L1-sized, but sometimes they are slower.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299978
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89927876215449bcb32cfb06a3abdc28/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...5d8b14de410341159529bec47a1e8e90/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...62ad9efc3cba4d1b90cb7a5386ba261a/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299978
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89927876215449bcb32cfb06a3abdc28/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...5d8b14de410341159529bec47a1e8e90/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...62ad9efc3cba4d1b90cb7a5386ba261a/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r660081336
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else_test.cc
##########
@@ -56,8 +57,12 @@ TYPED_TEST(TestIfElsePrimitive, IfElseFixedSizeRand) {
random::RandomArrayGenerator rand(/*seed=*/0);
int64_t len = 1000;
- auto cond = std::static_pointer_cast<BooleanArray>(
- rand.ArrayOf(boolean(), len, /*null_probability=*/0.01));
+ ASSERT_OK_AND_ASSIGN(auto temp1, MakeArrayFromScalar(BooleanScalar(true), 64));
Review comment:
Please add a comment describing what's useful about this construction
##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +243,132 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
return min_offset;
}
+ template <size_t N, size_t M, typename ReaderT, typename WriterT, typename Visitor,
+ typename Word = typename std::decay<
+ internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+ static void RunVisitWordsAndWriteLoop(int64_t bit_length,
+ std::array<ReaderT, N>& readers,
+ std::array<WriterT, M>& writers,
+ Visitor&& visitor) {
+ constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+ std::array<Word, N> visited_words;
+ std::array<Word, M> output_words;
+
+ // every reader will have same number of words, since they are same length'ed
+ // TODO($JIRA) this will be inefficient in some cases. When there are offsets beyond
+ // Word boundary, every Word would have to be created from 2 adjoining Words
+ auto n_words = readers[0].words();
+ bit_length -= n_words * kBitWidth;
+ while (n_words--) {
+ // first collect all words to visited_words array
+ for (size_t i = 0; i < N; i++) {
+ visited_words[i] = readers[i].NextWord();
+ }
+ visitor(visited_words, &output_words);
+ for (size_t i = 0; i < M; i++) {
+ writers[i].PutNextWord(output_words[i]);
+ }
+ }
+
+ // every reader will have same number of trailing bytes, because of the above reason
+ // tailing portion could be more than one word! (ref: BitmapWordReader constructor)
+ // remaining full/ partial words to write
+
+ if (bit_length) {
+ // convert the word visitor lambda to a byte_visitor
+ auto byte_visitor = [&](const std::array<uint8_t, N>& in,
+ std::array<uint8_t, M>* out) {
+ std::array<Word, N> in_words;
+ std::array<Word, M> out_words;
+ std::copy(in.begin(), in.end(), in_words.begin());
+ visitor(in_words, &out_words);
+ for (size_t i = 0; i < M; i++) {
+ out->at(i) = static_cast<uint8_t>(out_words[i]);
+ }
+ };
+
+ std::array<uint8_t, N> visited_bytes;
+ std::array<uint8_t, M> output_bytes;
+ int n_bytes = readers[0].trailing_bytes();
+ while (n_bytes--) {
+ visited_bytes.fill(0);
+ output_bytes.fill(0);
+ int valid_bits;
+ for (size_t i = 0; i < N; i++) {
+ visited_bytes[i] = readers[i].NextTrailingByte(valid_bits);
+ }
+ byte_visitor(visited_bytes, &output_bytes);
+ for (size_t i = 0; i < M; i++) {
+ writers[i].PutNextTrailingByte(output_bytes[i], valid_bits);
+ }
+ }
+ }
+ }
+
+ /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+ /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+ ///
+ /// All bitmaps must have identical length. The first bit in a visited bitmap
+ /// may be offset within the first visited word, but words will otherwise contain
+ /// densely packed bits loaded from the bitmap. That offset within the first word is
+ /// returned.
+ /// Visitor is expected to have the following signature
+ /// [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+ ///
+ // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+ // It also has a large prolog / epilog overhead and should be used
+ // carefully in other cases.
+ // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+ // and BitmapUInt64Reader.
+ template <size_t N, size_t M, typename Visitor,
+ typename Word = typename std::decay<
+ internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+ static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+ std::array<Bitmap, M>* out_bitmaps_arg,
+ Visitor&& visitor) {
+ int64_t bit_length = BitLength(bitmaps_arg);
+ assert(bit_length == BitLength(*out_bitmaps_arg));
+
+ // if both input and output bitmaps have no byte offset, then use special template
+ if (std::all_of(bitmaps_arg.begin(), bitmaps_arg.end(),
+ [](const Bitmap& b) { return b.offset_ % 8 == 0; }) &&
+ std::all_of(out_bitmaps_arg->begin(), out_bitmaps_arg->end(),
+ [](const Bitmap& b) { return b.offset_ % 8 == 0; })) {
+ std::array<BitmapWordReader<Word, /*may_have_byte_offset=*/false>, N> readers;
+ for (size_t i = 0; i < N; ++i) {
+ const Bitmap& in_bitmap = bitmaps_arg[i];
+ readers[i] = BitmapWordReader<Word, /*may_have_byte_offset=*/false>(
+ in_bitmap.buffer_->data(), in_bitmap.offset_, in_bitmap.length_);
+ }
+
+ std::array<BitmapWordWriter<Word, /*may_have_byte_offset=*/false>, M> writers;
+ for (size_t i = 0; i < M; ++i) {
+ const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+ writers[i] = BitmapWordWriter<Word, /*may_have_byte_offset=*/false>(
+ out_bitmap.buffer_->mutable_data(), out_bitmap.offset_, out_bitmap.length_);
+ }
+
+ RunVisitWordsAndWriteLoop(bit_length, readers, writers, std::move(visitor));
+ } else {
+ std::array<BitmapWordReader<Word>, N> readers;
+ for (size_t i = 0; i < N; ++i) {
+ const Bitmap& in_bitmap = bitmaps_arg[i];
+ readers[i] = BitmapWordReader<Word>(in_bitmap.buffer_->data(), in_bitmap.offset_,
+ in_bitmap.length_);
+ }
+
+ std::array<BitmapWordWriter<Word>, M> writers;
+ for (size_t i = 0; i < M; ++i) {
+ const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+ writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+ out_bitmap.offset_, out_bitmap.length_);
+ }
+
+ RunVisitWordsAndWriteLoop(bit_length, readers, writers, std::move(visitor));
Review comment:
```suggestion
RunVisitWordsAndWriteLoop(bit_length, readers, writers, visitor);
```
##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,37 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
ARROW_EXPORT
void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// \brief Clears all bits in the bitmap (set to false)
+ARROW_EXPORT
+void ClearBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// Returns a mask with lower i bits set to 1. If i >= sizeof(Word)*8, all-ones will be
+/// returned
+/// ex:
+/// PrecedingWordBitmask<uint_8>(0)= 0x00
+/// PrecedingWordBitmask<uint_8>(4)= 0x0f
+/// PrecedingWordBitmask<uint_8>(8)= 0xff
+/// PrecedingWordBitmask<uint_32>(8)= 0x00ff
+/// ref: https://stackoverflow.com/a/59523400
+template <typename Word>
+constexpr Word PrecedingWordBitmask(unsigned int const i) {
+ return (static_cast<Word>(i < sizeof(Word) * 8) << (i & (sizeof(Word) * 8 - 1))) - 1;
+}
Review comment:
Since this is constexpr, the examples can be written as static assertions
```suggestion
/// ref: https://stackoverflow.com/a/59523400
template <typename Word>
constexpr Word PrecedingWordBitmask(unsigned int const i) {
return (static_cast<Word>(i < sizeof(Word) * 8) << (i & (sizeof(Word) * 8 - 1))) - 1;
}
static_assert(PrecedingWordBitmask<uint8_t>(0) == 0x00, "");
static_assert(PrecedingWordBitmask<uint8_t>(4) == 0x0f, "");
static_assert(PrecedingWordBitmask<uint8_t>(8) == 0xff, "");
static_assert(PrecedingWordBitmask<uint16_t>(8) == 0x00ff, "");
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r652054735
##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,29 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
ARROW_EXPORT
void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// \brief Clears all bits in the bitmap (set to false)
+ARROW_EXPORT
+void ClearBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+template <typename Word>
+constexpr Word WordBitMask(int i) {
+ return (static_cast<Word>(1) << i) - 1;
+}
Review comment:
That will fix up this UBSAN failure, too: https://github.com/apache/arrow/pull/10487/checks?check_run_id=2822702761#step:8:3714
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022871
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...a877957e35804b7197350884401e0e8c/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...b65a39a10c8a4c1ea3ae7751947bf36b/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...0e500f8140fb46e9bc9371d490d0fc61/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-857361898
https://issues.apache.org/jira/browse/ARROW-13010
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz closed pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
bkietz closed pull request #10487:
URL: https://github.com/apache/arrow/pull/10487
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982633
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 788bd495a8ca8c180355e9066387824fc972d734. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Only ['lang', 'name'] filters are supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...f1f573636c5940ad9704c7af4a54c280/)
[Skipped :warning: Only ['lang', 'name'] filters are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...379b66a7f5e0453c8a2c99fd6fe70503/)
[Finished :arrow_down:0.25% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...38bfe9c78594477a955b82ca1a40a3fd/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982633
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870113163
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89f7711b6599405da0c61bdebf2bffe0/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...755e6720036342f3ad213715e1df19b2/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...ba6f391352984a39a9a0e82aed658680/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022871
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...a877957e35804b7197350884401e0e8c/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...b65a39a10c8a4c1ea3ae7751947bf36b/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...0e500f8140fb46e9bc9371d490d0fc61/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870927973
Did some further analysis on this. It turns out that gcc-10 works much better than gcc-9.
https://gist.github.com/nirandaperera/0bcd40c223fd32105d027a86a571334f
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-869982633
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r655697111
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
Bitmap left_valid = GetBitmap(left_d, 0);
Bitmap right_valid = GetBitmap(right_d, 0);
- // sometimes Bitmaps will be ignored, in which case we replace access to them with
- // duplicated (probably elided) access to cond_data
- const Bitmap& _ = cond_data;
-
- // lambda function that will be used inside the visitor
- uint64_t* out_validity = nullptr;
- int64_t i = 0;
- auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
- uint64_t r_valid) {
- out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
- i++;
- };
// cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
// In the following cases, we dont need to allocate out_valid bitmap
Review comment:
@bkietz I have added the changes required. Could you please review the changes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866299978
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = bcce18e5d4d83f0831de71b363ad91470376084c. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89927876215449bcb32cfb06a3abdc28/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...5d8b14de410341159529bec47a1e8e90/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...62ad9efc3cba4d1b90cb7a5386ba261a/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r652003314
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -67,5 +69,58 @@ void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_ar
bits[bytes_end - 1] |= static_cast<uint8_t>(fill_byte & ~last_byte_mask);
}
+template <bool value>
+void SetBitmapImpl(uint8_t* data, int64_t offset, int64_t length) {
+ // offset length
+ // data |<------------->|
+ // |--------|...|--------|...|--------|
+ // |<--->| |<--->|
+ // pro epi
+ if (ARROW_PREDICT_FALSE(length == 0)) {
+ return;
+ }
+
+ constexpr uint8_t set_byte = value ? UINT8_MAX : 0;
+
+ int prologue = static_cast<int>(((offset + 7) / 8) * 8 - offset);
+ DCHECK_LT(prologue, 8);
+
+ if (length < prologue) { // special case where a mask is required
+ // offset length
+ // data |<->|
+ // |--------|...|--------|...
+ // mask --> |111|
+ // |<---->|
+ // pro
+ uint8_t mask = BitUtil::kPrecedingBitmask[8 - prologue] ^
+ BitUtil::kPrecedingBitmask[8 - prologue + length];
+ data[offset / 8] |= mask;
+ return;
+ }
+
+ // align to a byte boundary
+ data[offset / 8] = BitUtil::SpliceWord(offset, data[offset / 8], set_byte);
Review comment:
```suggestion
data[offset / 8] = BitUtil::SpliceWord(prologue, data[offset / 8], set_byte);
```
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -67,5 +69,58 @@ void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_ar
bits[bytes_end - 1] |= static_cast<uint8_t>(fill_byte & ~last_byte_mask);
}
+template <bool value>
+void SetBitmapImpl(uint8_t* data, int64_t offset, int64_t length) {
+ // offset length
+ // data |<------------->|
+ // |--------|...|--------|...|--------|
+ // |<--->| |<--->|
+ // pro epi
+ if (ARROW_PREDICT_FALSE(length == 0)) {
+ return;
+ }
+
+ constexpr uint8_t set_byte = value ? UINT8_MAX : 0;
+
+ int prologue = static_cast<int>(((offset + 7) / 8) * 8 - offset);
Review comment:
```suggestion
auto prologue = BitUtil::RoundUp(offset, 8) - offset;
```
##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -73,6 +76,11 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
return Bitmap(buffer_, offset_ + offset, length);
}
+ void Stride(int64_t stride) {
+ this->offset_ += stride;
+ this->length_ -= stride;
+ }
+
Review comment:
```suggestion
```
This seems to be unused
##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -2156,5 +2187,169 @@ TEST(Bitmap, VisitWordsAnd) {
}
}
+void random_bool_vector(std::vector<bool>& vec, int64_t size, double p = 0.5) {
+ vec.reserve(size);
+ std::random_device rd;
+ std::mt19937 gen(rd());
+ std::bernoulli_distribution d(p);
+
+ for (int n = 0; n < size; ++n) {
+ vec.push_back(d(gen));
+ }
+}
+
+std::string VectorToString(const std::vector<bool>& v) {
+ std::string out(v.size() + +((v.size() - 1) / 8), ' ');
+ for (size_t i = 0; i < v.size(); ++i) {
+ out[i + (i / 8)] = v[i] ? '1' : '0';
+ }
+ return out;
+}
+
+void VerifyBoolVectorAndBitmap(const Bitmap& bitmap, const std::vector<bool>& expected) {
+ arrow::BooleanBuilder boolean_builder;
+ ASSERT_OK(boolean_builder.AppendValues(expected));
+ ASSERT_OK_AND_ASSIGN(auto arr, boolean_builder.Finish());
+
+ ASSERT_TRUE(BitmapEquals(bitmap.buffer()->data(), bitmap.offset(),
+ arr->data()->buffers[1]->data(), 0, expected.size()))
+ << "exp: " << VectorToString(expected) << "\ngot: " << bitmap.ToString();
+}
+
+class TestBitmapVisitAndWriteOutputNoOffset : public ::testing::TestWithParam<int32_t> {};
+
+TEST_P(TestBitmapVisitAndWriteOutputNoOffset, Test1) {
+ auto part = GetParam();
+ int64_t bits = 4 * part;
+ std::vector<bool> data;
+ random_bool_vector(data, bits);
+
+ arrow::BooleanBuilder boolean_builder;
+ ASSERT_OK(boolean_builder.AppendValues(data));
+ ASSERT_OK_AND_ASSIGN(auto arrow_data, boolean_builder.Finish());
+
+ std::shared_ptr<Buffer>& arrow_buffer = arrow_data->data()->buffers[1];
+
+ Bitmap bm0(arrow_buffer, 0, part);
+ Bitmap bm1 = bm0.Slice(part * 1, part); // this goes beyond bm0's len
+ Bitmap bm2 = bm0.Slice(part * 2, part); // this goes beyond bm0's len
+
+ std::array<Bitmap, 2> out_bms;
+ ASSERT_OK_AND_ASSIGN(auto out0, AllocateBitmap(part));
+ ASSERT_OK_AND_ASSIGN(auto out1, AllocateBitmap(part));
+ out_bms[0] = Bitmap(out0, 0, part);
+ out_bms[1] = Bitmap(out1, 0, part);
+
+ std::vector<bool> v0(data.begin(), data.begin() + part);
+ std::vector<bool> v1(data.begin() + part * 1, data.begin() + part * 2);
+ std::vector<bool> v2(data.begin() + part * 2, data.begin() + part * 3);
+
+ // out0 = bm0 & bm1, out1= bm0 | bm2
+ std::array<Bitmap, 3> in_bms{bm0, bm1, bm2};
+ Bitmap::VisitWordsAndWrite(
+ in_bms, &out_bms,
+ [](const std::array<uint64_t, 3>& in, std::array<uint64_t, 2>* out) {
+ out->at(0) = in[0] & in[1];
+ out->at(1) = in[0] | in[2];
+ });
+
+ std::vector<bool> out_v0(part);
+ std::vector<bool> out_v1(part);
+ // v3 = v0 & v1
+ std::transform(v0.begin(), v0.end(), v1.begin(), out_v0.begin(),
+ std::logical_and<bool>());
+ // v3 |= v2
+ std::transform(v0.begin(), v0.end(), v2.begin(), out_v1.begin(),
+ std::logical_or<bool>());
+
+ // std::cout << "v0: " << VectorToString(v0) << "\n"
+ // << "b0: " << bm0.ToString() << "\n"
+ // << "v1: " << VectorToString(v1) << "\n"
+ // << "b1: " << bm1.ToString() << "\n"
+ // << "v2: " << VectorToString(v2) << "\n"
+ // << "b2: " << bm2.ToString() << "\n";
+
Review comment:
```suggestion
```
Alternatively, you could attach this output to an assertion so it only shows if a test fails
##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -2156,5 +2187,169 @@ TEST(Bitmap, VisitWordsAnd) {
}
}
+void random_bool_vector(std::vector<bool>& vec, int64_t size, double p = 0.5) {
+ vec.reserve(size);
+ std::random_device rd;
+ std::mt19937 gen(rd());
+ std::bernoulli_distribution d(p);
+
+ for (int n = 0; n < size; ++n) {
+ vec.push_back(d(gen));
+ }
+}
+
+std::string VectorToString(const std::vector<bool>& v) {
+ std::string out(v.size() + +((v.size() - 1) / 8), ' ');
+ for (size_t i = 0; i < v.size(); ++i) {
+ out[i + (i / 8)] = v[i] ? '1' : '0';
+ }
+ return out;
+}
+
+void VerifyBoolVectorAndBitmap(const Bitmap& bitmap, const std::vector<bool>& expected) {
+ arrow::BooleanBuilder boolean_builder;
+ ASSERT_OK(boolean_builder.AppendValues(expected));
+ ASSERT_OK_AND_ASSIGN(auto arr, boolean_builder.Finish());
+
+ ASSERT_TRUE(BitmapEquals(bitmap.buffer()->data(), bitmap.offset(),
+ arr->data()->buffers[1]->data(), 0, expected.size()))
+ << "exp: " << VectorToString(expected) << "\ngot: " << bitmap.ToString();
+}
+
+class TestBitmapVisitAndWriteOutputNoOffset : public ::testing::TestWithParam<int32_t> {};
+
+TEST_P(TestBitmapVisitAndWriteOutputNoOffset, Test1) {
+ auto part = GetParam();
+ int64_t bits = 4 * part;
+ std::vector<bool> data;
+ random_bool_vector(data, bits);
Review comment:
It's worth noting that we already have some utilities for generating random data. For example, we could reuse `RandomArrayGenerator`
```c++
random::RandomArrayGenerator rng(42);
std::shared_ptr<Buffer> buffer = rng.NullBitmap(bits, 0.5);
```
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -67,5 +69,58 @@ void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_ar
bits[bytes_end - 1] |= static_cast<uint8_t>(fill_byte & ~last_byte_mask);
}
+template <bool value>
+void SetBitmapImpl(uint8_t* data, int64_t offset, int64_t length) {
+ // offset length
+ // data |<------------->|
+ // |--------|...|--------|...|--------|
+ // |<--->| |<--->|
+ // pro epi
+ if (ARROW_PREDICT_FALSE(length == 0)) {
+ return;
+ }
+
+ constexpr uint8_t set_byte = value ? UINT8_MAX : 0;
+
+ int prologue = static_cast<int>(((offset + 7) / 8) * 8 - offset);
+ DCHECK_LT(prologue, 8);
+
+ if (length < prologue) { // special case where a mask is required
+ // offset length
+ // data |<->|
+ // |--------|...|--------|...
+ // mask --> |111|
+ // |<---->|
+ // pro
+ uint8_t mask = BitUtil::kPrecedingBitmask[8 - prologue] ^
+ BitUtil::kPrecedingBitmask[8 - prologue + length];
+ data[offset / 8] |= mask;
Review comment:
This seems to only be correct for `set_byte == 0xFF`, since we always set the masked bits. Additionally I think it could be written more clearly:
```suggestion
uint8_t old_bits = BitUtil::LeadingBits(data[offset / 8], offset) |
BitUtil::TrailingBits(data[offset / 8], length + offset);
uint8_t new_bits = BitUtil::TrailingBits(set_byte, offset) &
BitUtil::LeadingBits(set_byte, length + offset);
data[offset / 8] = old_bits | new_bits;
```
##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,29 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
ARROW_EXPORT
void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+/// \brief Clears all bits in the bitmap (set to false)
+ARROW_EXPORT
+void ClearBitmap(uint8_t* data, int64_t offset, int64_t length);
+
+template <typename Word>
+constexpr Word WordBitMask(int i) {
+ return (static_cast<Word>(1) << i) - 1;
+}
Review comment:
This incurs undefined behavior for `i >= sizeof(Word) * 8` because we'll shift that 1 right out of the integer. Also please add a comment explaining the output, maybe rename too
```suggestion
template <typename Word, Word all = static_cast<Word>(~static_cast<Word>(0))>
constexpr Word TrailingWordBitmask(int i) {
return ARROW_PREDICT_FALSE(i >= sizeof(Word) * 8) ? 0 : all << i;
}
```
##########
File path: cpp/src/arrow/compute/kernels/scalar_boolean.cc
##########
@@ -551,14 +552,11 @@ void RegisterScalarBoolean(FunctionRegistry* registry) {
// The Kleene logic kernels cannot write into sliced output bitmaps
Review comment:
```suggestion
```
##########
File path: cpp/src/arrow/util/bit_util.cc
##########
@@ -67,5 +69,58 @@ void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_ar
bits[bytes_end - 1] |= static_cast<uint8_t>(fill_byte & ~last_byte_mask);
}
+template <bool value>
+void SetBitmapImpl(uint8_t* data, int64_t offset, int64_t length) {
+ // offset length
+ // data |<------------->|
+ // |--------|...|--------|...|--------|
+ // |<--->| |<--->|
+ // pro epi
+ if (ARROW_PREDICT_FALSE(length == 0)) {
+ return;
+ }
+
+ constexpr uint8_t set_byte = value ? UINT8_MAX : 0;
+
+ int prologue = static_cast<int>(((offset + 7) / 8) * 8 - offset);
+ DCHECK_LT(prologue, 8);
+
+ if (length < prologue) { // special case where a mask is required
+ // offset length
+ // data |<->|
+ // |--------|...|--------|...
+ // mask --> |111|
+ // |<---->|
+ // pro
+ uint8_t mask = BitUtil::kPrecedingBitmask[8 - prologue] ^
+ BitUtil::kPrecedingBitmask[8 - prologue + length];
+ data[offset / 8] |= mask;
+ return;
+ }
+
+ // align to a byte boundary
+ data[offset / 8] = BitUtil::SpliceWord(offset, data[offset / 8], set_byte);
+ offset += prologue;
+ length -= prologue;
+
+ // set values per byte
+ DCHECK_EQ(offset % 8, 0);
+ std::memset(data + offset / 8, set_byte, length / 8);
+ offset += ((length / 8) * 8);
+ length -= ((length / 8) * 8);
Review comment:
```suggestion
offset += BitUtil::RoundDown(length, 8);
length -= BitUtil::RoundDown(length, 8);
```
##########
File path: cpp/src/arrow/util/bit_util.h
##########
@@ -316,5 +316,29 @@ static inline void SetBitTo(uint8_t* bits, int64_t i, bool bit_is_set) {
ARROW_EXPORT
void SetBitsTo(uint8_t* bits, int64_t start_offset, int64_t length, bool bits_are_set);
+/// \brief Sets all bits in the bitmap to true
+ARROW_EXPORT
+void SetBitmap(uint8_t* data, int64_t offset, int64_t length);
Review comment:
We also have SetBitsTo, which has similar functionality. In a follow up let's benchmark and deduplicate them by rewriting the less efficient one as a wrapper of the other (probably they will have identical performance, in which case we keep whichever is easier to read)
##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else.cc
##########
@@ -72,18 +72,6 @@ Status PromoteNullsVisitor(KernelContext* ctx, const Datum& cond_d, const Datum&
Bitmap cond_valid{cond.buffers[0], cond.offset, cond.length};
Bitmap left_valid = GetBitmap(left_d, 0);
Bitmap right_valid = GetBitmap(right_d, 0);
- // sometimes Bitmaps will be ignored, in which case we replace access to them with
- // duplicated (probably elided) access to cond_data
- const Bitmap& _ = cond_data;
-
- // lambda function that will be used inside the visitor
- uint64_t* out_validity = nullptr;
- int64_t i = 0;
- auto apply = [&](uint64_t c_valid, uint64_t c_data, uint64_t l_valid,
- uint64_t r_valid) {
- out_validity[i] = c_valid & ((c_data & l_valid) | (~c_data & r_valid));
- i++;
- };
// cond.valid & (cond.data & left.valid | ~cond.data & right.valid)
// In the following cases, we dont need to allocate out_valid bitmap
Review comment:
These cases need to be reexamined since we're writing into slices- it's an error to allocate or steal a buffer since we might be discarding values already written to other slices of `output`
##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +248,99 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
return min_offset;
}
+ /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+ /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+ ///
+ /// All bitmaps must have identical length. The first bit in a visited bitmap
+ /// may be offset within the first visited word, but words will otherwise contain
+ /// densely packed bits loaded from the bitmap. That offset within the first word is
+ /// returned.
+ /// Visitor is expected to have the following signature
+ /// [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+ ///
+ // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+ // It also has a large prolog / epilog overhead and should be used
+ // carefully in other cases.
+ // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+ // and BitmapUInt64Reader.
+ template <size_t N, size_t M, typename Visitor,
+ typename Word = typename std::decay<
+ internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+ static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+ std::array<Bitmap, M>* out_bitmaps_arg,
+ Visitor&& visitor) {
+ constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+ int64_t bit_length = BitLength(bitmaps_arg);
+ assert(bit_length == BitLength(*out_bitmaps_arg));
+
+ std::array<BitmapWordReader<Word>, N> readers;
+ for (size_t i = 0; i < N; ++i) {
+ readers[i] = BitmapWordReader<Word>(bitmaps_arg[i].buffer_->data(),
+ bitmaps_arg[i].offset_, bitmaps_arg[i].length_);
+ }
+
+ std::array<BitmapWordWriter<Word>, M> writers;
+ for (size_t i = 0; i < M; ++i) {
+ const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+ writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+ out_bitmap.offset_, out_bitmap.length_);
+ }
+
+ std::array<Word, N> visited_words;
+ visited_words.fill(0);
+ std::array<Word, M> output_words;
+ output_words.fill(0);
+
+ // every reader will have same number of words, since they are same length'ed
+ // todo this will be inefficient in some cases. When there are offsets beyond Word
+ // boundary, every Word would have to be created from 2 adjoining Words
Review comment:
Please open a follow up JIRA for this. One way to handle this would be adding `BitmapWordReader::NextLeadingByte()` so that we can consume bits until at least one bitmap has offset == 0
```suggestion
// TODO($JIRA) this will be inefficient in some cases. When there are offsets beyond Word
// boundary, every Word would have to be created from 2 adjoining Words
```
##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -2156,5 +2187,169 @@ TEST(Bitmap, VisitWordsAnd) {
}
}
+void random_bool_vector(std::vector<bool>& vec, int64_t size, double p = 0.5) {
+ vec.reserve(size);
+ std::random_device rd;
+ std::mt19937 gen(rd());
+ std::bernoulli_distribution d(p);
+
+ for (int n = 0; n < size; ++n) {
+ vec.push_back(d(gen));
+ }
+}
+
+std::string VectorToString(const std::vector<bool>& v) {
+ std::string out(v.size() + +((v.size() - 1) / 8), ' ');
+ for (size_t i = 0; i < v.size(); ++i) {
+ out[i + (i / 8)] = v[i] ? '1' : '0';
+ }
+ return out;
+}
+
+void VerifyBoolVectorAndBitmap(const Bitmap& bitmap, const std::vector<bool>& expected) {
+ arrow::BooleanBuilder boolean_builder;
+ ASSERT_OK(boolean_builder.AppendValues(expected));
+ ASSERT_OK_AND_ASSIGN(auto arr, boolean_builder.Finish());
+
+ ASSERT_TRUE(BitmapEquals(bitmap.buffer()->data(), bitmap.offset(),
+ arr->data()->buffers[1]->data(), 0, expected.size()))
+ << "exp: " << VectorToString(expected) << "\ngot: " << bitmap.ToString();
+}
+
+class TestBitmapVisitAndWriteOutputNoOffset : public ::testing::TestWithParam<int32_t> {};
+
+TEST_P(TestBitmapVisitAndWriteOutputNoOffset, Test1) {
+ auto part = GetParam();
+ int64_t bits = 4 * part;
+ std::vector<bool> data;
+ random_bool_vector(data, bits);
Review comment:
Also, if you use `Bitmap` consistently instead of `std::vector<bool>` you can reuse `Bitmap::{Equals, ToString, Diff}`
##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +248,99 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
return min_offset;
}
+ /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+ /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+ ///
+ /// All bitmaps must have identical length. The first bit in a visited bitmap
+ /// may be offset within the first visited word, but words will otherwise contain
+ /// densely packed bits loaded from the bitmap. That offset within the first word is
+ /// returned.
+ /// Visitor is expected to have the following signature
+ /// [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+ ///
+ // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+ // It also has a large prolog / epilog overhead and should be used
+ // carefully in other cases.
+ // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+ // and BitmapUInt64Reader.
+ template <size_t N, size_t M, typename Visitor,
+ typename Word = typename std::decay<
+ internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+ static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+ std::array<Bitmap, M>* out_bitmaps_arg,
+ Visitor&& visitor) {
+ constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+ int64_t bit_length = BitLength(bitmaps_arg);
+ assert(bit_length == BitLength(*out_bitmaps_arg));
+
+ std::array<BitmapWordReader<Word>, N> readers;
+ for (size_t i = 0; i < N; ++i) {
+ readers[i] = BitmapWordReader<Word>(bitmaps_arg[i].buffer_->data(),
+ bitmaps_arg[i].offset_, bitmaps_arg[i].length_);
+ }
+
+ std::array<BitmapWordWriter<Word>, M> writers;
+ for (size_t i = 0; i < M; ++i) {
+ const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+ writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+ out_bitmap.offset_, out_bitmap.length_);
+ }
+
+ std::array<Word, N> visited_words;
+ visited_words.fill(0);
+ std::array<Word, M> output_words;
+ output_words.fill(0);
+
+ // every reader will have same number of words, since they are same length'ed
+ // todo this will be inefficient in some cases. When there are offsets beyond Word
+ // boundary, every Word would have to be created from 2 adjoining Words
Review comment:
Additionally, IMO this usage of `/BitmapWord(Reader|Writer)/` is more readable than the implementation of VisitWords. Maybe not in this PR, but: let's rewrite/remove the complicated one
##########
File path: cpp/src/arrow/util/bit_util_test.cc
##########
@@ -1975,6 +1975,37 @@ TEST(BitUtil, BitsetStack) {
ASSERT_EQ(stack.TopSize(), 0);
}
+template <typename Word>
+void CheckSplice(int n, Word low, Word high) {
+ std::bitset<sizeof(Word) * 8> ret;
+ for (size_t i = 0; i < ret.size(); i++) {
+ ret[i] = i < static_cast<size_t>(n)
+ ? BitUtil::GetBit(reinterpret_cast<uint8_t*>(&low), i)
+ : BitUtil::GetBit(reinterpret_cast<uint8_t*>(&high), i);
+ }
+
+ ASSERT_EQ(static_cast<Word>(ret.to_ulong()), BitUtil::SpliceWord(n, low, high));
Review comment:
For GTest assertions, the expected value is on the right
```suggestion
ASSERT_EQ(BitUtil::SpliceWord(n, low, high), static_cast<Word>(ret.to_ulong());
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#discussion_r652054092
##########
File path: cpp/src/arrow/util/bitmap.h
##########
@@ -225,6 +248,99 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>,
return min_offset;
}
+ /// \brief Visit words of bits from each input bitmap as array<Word, N> and collects
+ /// outputs to an array<Word, M>, to be written into the output bitmaps accordingly.
+ ///
+ /// All bitmaps must have identical length. The first bit in a visited bitmap
+ /// may be offset within the first visited word, but words will otherwise contain
+ /// densely packed bits loaded from the bitmap. That offset within the first word is
+ /// returned.
+ /// Visitor is expected to have the following signature
+ /// [](const std::array<Word, N>& in_words, std::array<Word, M>* out_words){...}
+ ///
+ // NOTE: this function is efficient on 3+ sufficiently large bitmaps.
+ // It also has a large prolog / epilog overhead and should be used
+ // carefully in other cases.
+ // For 2 bitmaps or less, and/or smaller bitmaps, see also VisitTwoBitBlocksVoid
+ // and BitmapUInt64Reader.
+ template <size_t N, size_t M, typename Visitor,
+ typename Word = typename std::decay<
+ internal::call_traits::argument_type<0, Visitor&&>>::type::value_type>
+ static void VisitWordsAndWrite(const std::array<Bitmap, N>& bitmaps_arg,
+ std::array<Bitmap, M>* out_bitmaps_arg,
+ Visitor&& visitor) {
+ constexpr int64_t kBitWidth = sizeof(Word) * 8;
+
+ int64_t bit_length = BitLength(bitmaps_arg);
+ assert(bit_length == BitLength(*out_bitmaps_arg));
+
+ std::array<BitmapWordReader<Word>, N> readers;
+ for (size_t i = 0; i < N; ++i) {
+ readers[i] = BitmapWordReader<Word>(bitmaps_arg[i].buffer_->data(),
+ bitmaps_arg[i].offset_, bitmaps_arg[i].length_);
+ }
+
+ std::array<BitmapWordWriter<Word>, M> writers;
+ for (size_t i = 0; i < M; ++i) {
+ const Bitmap& out_bitmap = out_bitmaps_arg->at(i);
+ writers[i] = BitmapWordWriter<Word>(out_bitmap.buffer_->mutable_data(),
+ out_bitmap.offset_, out_bitmap.length_);
+ }
+
+ std::array<Word, N> visited_words;
+ visited_words.fill(0);
+ std::array<Word, M> output_words;
+ output_words.fill(0);
+
+ // every reader will have same number of words, since they are same length'ed
+ // todo this will be inefficient in some cases. When there are offsets beyond Word
+ // boundary, every Word would have to be created from 2 adjoining Words
+ auto n_words = readers[0].words();
+ bit_length -= n_words * kBitWidth;
+ while (n_words--) {
+ // first collect all words to visited_words array
+ for (size_t i = 0; i < N; i++) {
+ visited_words[i] = readers[i].NextWord();
+ }
+ visitor(visited_words, &output_words);
+ for (size_t i = 0; i < M; i++) {
+ writers[i].PutNextWord(output_words[i]);
+ }
+ }
+
+ // every reader will have same number of trailing bytes, because of the above reason
+ // tailing portion could be more than one word! (ref: BitmapWordReader constructor)
+ // remaining full/ partial words to write
+
+ if (bit_length) {
+ // convert the word visitor lambda to a byte_visitor
+ auto byte_visitor = [&](const std::array<uint8_t, N>& in,
+ std::array<uint8_t, M>* out) {
+ std::array<Word, N> in_words;
+ std::array<Word, M> out_words;
+ std::copy(in.begin(), in.end(), in_words.begin());
+ visitor(in_words, &out_words);
+ std::move(out_words.begin(), out_words.end(), out->begin());
Review comment:
Looks like this is prompts a conversion warning on MSVC
https://github.com/apache/arrow/pull/10487/checks?check_run_id=2822703161#step:7:774
You'll have to write the loop out with explicit casts
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-866302771
@ursabot please benchmark command=cpp-micro --suite-filter=arrow-compute-scalar-boolean-benchmark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870113163
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 2663d92be3f95598b00391e254eefa11cfb11279. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...89f7711b6599405da0c61bdebf2bffe0/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...755e6720036342f3ad213715e1df19b2/)
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...ba6f391352984a39a9a0e82aed658680/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870775047
In any case, I don't think the regressions are really terrible in themselves.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870022871
Benchmark runs are scheduled for baseline = c913aa3ad7b36b2eaccd4dc2cf0bc35ab893bb4a and contender = 0631e7bbb5042adb5299440572c5b49633dc58fb. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/06ca8b60ba3d41bd9179bc6e878891d5...a877957e35804b7197350884401e0e8c/)
[Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/11aa31eaecc6491cb62c18f18d5fabcb...b65a39a10c8a4c1ea3ae7751947bf36b/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/1650cbbcf1ab41b99efb4383f415bdaa...0e500f8140fb46e9bc9371d490d0fc61/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10487: ARROW-13010: [C++][Compute] Support outputting to slices from kleene kernels
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10487:
URL: https://github.com/apache/arrow/pull/10487#issuecomment-870927973
Did some further analysis on this. It turns out that gcc-10 works much better.
https://gist.github.com/nirandaperera/0bcd40c223fd32105d027a86a571334f
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org