You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/23 19:35:52 UTC
[GitHub] [arrow] nirandaperera opened a new pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
nirandaperera opened a new pull request #10585:
URL: https://github.com/apache/arrow/pull/10585
Adding `array_sort_indices` and `partition_nth_indices` for `BooleanType` using existing sort and Nth-partition utils.
This may be rather inefficient, since the values are traversed bit-by-bit rather than working on a byte/word.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882913560
I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056) (appveyor not mingw).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884086059
Benchmark runs are scheduled for baseline = 737492e074b816952bc011d18a996fec83b0b55f and contender = 992f8dcf38caf30405f100636760a77e5e98d056. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/38a02c3e53154a28ba259c7263231f54...5a2f863b93ee4ea5afd13886a5739380/)
[Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/786641447462489697548b5f8f9c7ba1...b2791f7ee14a4a2e9e23fb9d6caf2e80/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/aa7fc9b2490d4039ab3c6b716c6cfe4f...1aa05c1d9b384f96a12c7306acc81186/)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] edponce commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882912621
@nirandaperera @kszucs The MSVC error seems unrelated to this PR and is cause by a timeout in a Flight test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-869990551
@pitrou I think I'll open a new JIRA for the optimized bool sort implementation. I feel like I can reuse some of the stuff from the #10487 PR here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883529233
@ursabot please benchmark lang=C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r663999228
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
Review comment:
This sounds rather weird. Are you sure about this?
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
+ if (values.length() < (1LL << 32)) {
+ return SortInternal<uint32_t>(indices_begin, indices_end, values, offset, options);
+ } else {
+ return SortInternal<uint64_t>(indices_begin, indices_end, values, offset, options);
+ }
+ }
+
+ private:
+ template <typename CounterType>
+ void CountBits(const BooleanArray& values, int64_t offset, CounterType* set_count,
+ CounterType* null_count) {
Review comment:
Really, you needn't write this yourself. Just call `null_count()` and `true_count()`.
##########
File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc
##########
@@ -980,6 +1056,68 @@ TEST_F(TestTableSortIndices, NaNAndNull) {
AssertSortIndices(table, options, "[7, 1, 2, 6, 5, 4, 0, 3]");
}
+TEST_F(TestTableSortIndices, Boolean) {
Review comment:
Same here.
##########
File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc
##########
@@ -842,6 +877,47 @@ TEST_F(TestRecordBatchSortIndices, NaNAndNull) {
AssertSortIndices(batch, options, "[7, 1, 2, 6, 5, 4, 0, 3]");
}
+TEST_F(TestRecordBatchSortIndices, Boolean) {
Review comment:
I don't think it's worth testing the non-null case separately. This will make less test code to maintain.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882663062
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r660073781
##########
File path: cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc
##########
@@ -81,6 +81,16 @@ static void ArraySortIndicesInt64Wide(benchmark::State& state) {
ArraySortIndicesInt64Benchmark(state, min, max);
}
+static void ArraySortIndicesBool(benchmark::State& state) {
+ RegressionArgs args(state);
+
+ const int64_t array_size = args.size / 8;
Review comment:
Yes. I was mistaken
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r659815010
##########
File path: cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc
##########
@@ -81,6 +81,16 @@ static void ArraySortIndicesInt64Wide(benchmark::State& state) {
ArraySortIndicesInt64Benchmark(state, min, max);
}
+static void ArraySortIndicesBool(benchmark::State& state) {
+ RegressionArgs args(state);
+
+ const int64_t array_size = args.size / 8;
Review comment:
This should probably be `* 8`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867409291
Also, if you want to work on performance, note that a dedicated counting sort for boolean should be really simple.
You can first call `null_count`, `true_count` and `false_count`, then you just have to walk individual bits and emit indices.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r671329597
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
Review comment:
Unless it's benchmarked here as well, maybe let's remove the comment to avoid being confusing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882663062
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884104018
@ursabot please benchmark lang=C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883529599
Benchmark runs are scheduled for baseline = d7a8b468ab64d4318ae62ab90251830acbb9b88d and contender = 85445cf37da25a953bb15478938853613b64cc18. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c26ba510b58049109ea86988b134a418...394257747dd049bb8d82a1e51c13548d/)
[Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/29a357c807b343edb0af286bcd89cd4b...7781e2b9649a4f6181e5da44d8fc8248/)
[Failed] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab2c14bdb99b4644a656c34cb9964855...83138687943a480d9850eacecd3fa097/)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867133305
@ianmcook it looks like something else is missing.
https://github.com/apache/arrow/pull/10585/checks?check_run_id=2898806771
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867123996
@ianmcook Done! :-)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r671355074
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
Review comment:
This comment is there in the primitive type impl as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r663999228
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
Review comment:
This sounds rather weird. Are you sure about this?
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
+ if (values.length() < (1LL << 32)) {
+ return SortInternal<uint32_t>(indices_begin, indices_end, values, offset, options);
+ } else {
+ return SortInternal<uint64_t>(indices_begin, indices_end, values, offset, options);
+ }
+ }
+
+ private:
+ template <typename CounterType>
+ void CountBits(const BooleanArray& values, int64_t offset, CounterType* set_count,
+ CounterType* null_count) {
Review comment:
Really, you needn't write this yourself. Just call `null_count()` and `true_count()`.
##########
File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc
##########
@@ -980,6 +1056,68 @@ TEST_F(TestTableSortIndices, NaNAndNull) {
AssertSortIndices(table, options, "[7, 1, 2, 6, 5, 4, 0, 3]");
}
+TEST_F(TestTableSortIndices, Boolean) {
Review comment:
Same here.
##########
File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc
##########
@@ -842,6 +877,47 @@ TEST_F(TestRecordBatchSortIndices, NaNAndNull) {
AssertSortIndices(batch, options, "[7, 1, 2, 6, 5, 4, 0, 3]");
}
+TEST_F(TestRecordBatchSortIndices, Boolean) {
Review comment:
I don't think it's worth testing the non-null case separately. This will make less test code to maintain.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r671356593
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
Review comment:
That one was benchmarked!
More seriously…maybe at least edit them to reflect that it was done due to a benchmark. Though I worry about the comment effectively bitrotting.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883102409
> I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056) (appveyor not mingw).
I believe this is due to a method being declared `static`. Let's see! thanks @kszucs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882709345
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883412386
> It would be a before vs after benchmark not side by side
I tested this with the `ArraySortIndicesBool` benchmark and didn't see any significant difference in performance. So I think its safe to leave it with int64_t.
```
int64_t
-------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------
ArraySortIndicesBool/32768/10000 856525 ns 790696 ns 1067 bytes_per_second=39.5221M/s items_per_second=331.536M/s null_percent=0.01 size=32.768k
ArraySortIndicesBool/32768/100 1068125 ns 1067932 ns 644 bytes_per_second=29.2622M/s items_per_second=245.469M/s null_percent=1 size=32.768k
ArraySortIndicesBool/32768/10 1322777 ns 1320088 ns 523 bytes_per_second=23.6727M/s items_per_second=198.581M/s null_percent=10 size=32.768k
ArraySortIndicesBool/32768/2 1898999 ns 1871787 ns 372 bytes_per_second=16.6953M/s items_per_second=140.05M/s null_percent=50 size=32.768k
ArraySortIndicesBool/32768/1 228503 ns 218945 ns 3266 bytes_per_second=142.73M/s items_per_second=1.19731G/s null_percent=100 size=32.768k
ArraySortIndicesBool/32768/0 755168 ns 735851 ns 960 bytes_per_second=42.4678M/s items_per_second=356.246M/s null_percent=0 size=32.768k
ArraySortIndicesBool/1048576/100 35812425 ns 35626885 ns 19 bytes_per_second=28.0687M/s items_per_second=235.457M/s null_percent=1 size=1048.58k
ArraySortIndicesBool/8388608/100 320588616 ns 320491041 ns 2 bytes_per_second=24.9617M/s items_per_second=209.394M/s null_percent=1 size=8.38861M
int32_t
-------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------
ArraySortIndicesBool/32768/10000 768462 ns 763410 ns 1005 bytes_per_second=40.9347M/s items_per_second=343.385M/s null_percent=0.01 size=32.768k
ArraySortIndicesBool/32768/100 1060476 ns 1026142 ns 765 bytes_per_second=30.4539M/s items_per_second=255.466M/s null_percent=1 size=32.768k
ArraySortIndicesBool/32768/10 1447835 ns 1375450 ns 499 bytes_per_second=22.7198M/s items_per_second=190.588M/s null_percent=10 size=32.768k
ArraySortIndicesBool/32768/2 2002536 ns 2001982 ns 343 bytes_per_second=15.6095M/s items_per_second=130.942M/s null_percent=50 size=32.768k
ArraySortIndicesBool/32768/1 274177 ns 268186 ns 2750 bytes_per_second=116.523M/s items_per_second=977.469M/s null_percent=100 size=32.768k
ArraySortIndicesBool/32768/0 735371 ns 734928 ns 1025 bytes_per_second=42.5211M/s items_per_second=356.693M/s null_percent=0 size=32.768k
ArraySortIndicesBool/1048576/100 42704206 ns 42695362 ns 12 bytes_per_second=23.4217M/s items_per_second=196.476M/s null_percent=1 size=1048.58k
ArraySortIndicesBool/8388608/100 310066885 ns 310040314 ns 2 bytes_per_second=25.8031M/s items_per_second=216.452M/s null_percent=1 size=8.38861M
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867105461
@pitrou Can you take a look at this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884085956
@ursabot please benchmark lang=C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r659815010
##########
File path: cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc
##########
@@ -81,6 +81,16 @@ static void ArraySortIndicesInt64Wide(benchmark::State& state) {
ArraySortIndicesInt64Benchmark(state, min, max);
}
+static void ArraySortIndicesBool(benchmark::State& state) {
+ RegressionArgs args(state);
+
+ const int64_t array_size = args.size / 8;
Review comment:
This should probably be `* 8`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882663062
I made the suggested changes and I think this is ready now
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867858356
> @nirandaperera You may want to add a benchmark in `vector_sort_benchmark.cc`.
@pitrou I added a simple benchmark now. I'll add the improved version and run it against that bench. Didn't have to do much for RecordBatches and Tables because they are already using `::GetView` methods to access values. So, it was working OOB for bools.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883551498
@nirandaperera this apparently needs rebasing against master before we can run Conbench on it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882905528
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r664587075
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
+ if (values.length() < (1LL << 32)) {
+ return SortInternal<uint32_t>(indices_begin, indices_end, values, offset, options);
+ } else {
+ return SortInternal<uint64_t>(indices_begin, indices_end, values, offset, options);
+ }
+ }
+
+ private:
+ template <typename CounterType>
+ void CountBits(const BooleanArray& values, int64_t offset, CounterType* set_count,
+ CounterType* null_count) {
Review comment:
Oh okay. I did this way because we can count both nulls and trues in a single pass. But sure, I'll use that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867408117
@nirandaperera You may want to add a benchmark in `vector_sort_benchmark.cc`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867105268
https://issues.apache.org/jira/browse/ARROW-12016
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r671355074
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
Review comment:
This comment is there in the primitive type impl as well. Should that be removed as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882913560
I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882913560
I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056) (appveyor not mingw).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] edponce commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882912621
@nirandaperera @kszucs The MSVC error seems unrelated to this PR and is cause by a timeout in a Flight test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r664588538
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
Review comment:
well, I didn't test this. Just following the `ArrayCountSorter` impl.
https://github.com/apache/arrow/blob/36d7a562f08d77dbf5ed699d486badcc9403f619/cpp/src/arrow/compute/kernels/vector_sort.cc#L438
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867115679
@nirandaperera as part of this PR, could you please make two small changes to the R package tests to exercise this new capability?
1. Remove the comment at the end of this line:
https://github.com/apache/arrow/blob/515b05c3bbad66a60b7c2577c50f7a258219add8/r/tests/testthat/helper-data.R#L155
2. Remove this line:
https://github.com/apache/arrow/blob/450e0eb7a640881788f839d0a475d796fa23c81c/r/tests/testthat/test-dplyr-arrange.R#L162
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] edponce commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882912621
@nirandaperera @kszucs The MSVC error seems unrelated to this PR and is cause by a timeout in a Flight test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883529599
Benchmark runs are scheduled for baseline = d7a8b468ab64d4318ae62ab90251830acbb9b88d and contender = 85445cf37da25a953bb15478938853613b64cc18. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c26ba510b58049109ea86988b134a418...394257747dd049bb8d82a1e51c13548d/)
[Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/29a357c807b343edb0af286bcd89cd4b...7781e2b9649a4f6181e5da44d8fc8248/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab2c14bdb99b4644a656c34cb9964855...83138687943a480d9850eacecd3fa097/)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884104141
Benchmark runs are scheduled for baseline = 737492e074b816952bc011d18a996fec83b0b55f and contender = 257527c9de339ab41b9f1af125e18a042b9a3ae5. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/38a02c3e53154a28ba259c7263231f54...fd20eac590f240e8afe0f6f5e671419b/)
[Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/786641447462489697548b5f8f9c7ba1...39bcecbb49774797801e9dff23f72c13/)
[Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/aa7fc9b2490d4039ab3c6b716c6cfe4f...39483dac631845f0a50353b7c93d8dbe/)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882709345
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r660073781
##########
File path: cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc
##########
@@ -81,6 +81,16 @@ static void ArraySortIndicesInt64Wide(benchmark::State& state) {
ArraySortIndicesInt64Benchmark(state, min, max);
}
+static void ArraySortIndicesBool(benchmark::State& state) {
+ RegressionArgs args(state);
+
+ const int64_t array_size = args.size / 8;
Review comment:
Yes. I was mistaken
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884104141
Benchmark runs are scheduled for baseline = 737492e074b816952bc011d18a996fec83b0b55f and contender = 257527c9de339ab41b9f1af125e18a042b9a3ae5. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/38a02c3e53154a28ba259c7263231f54...fd20eac590f240e8afe0f6f5e671419b/)
[Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/786641447462489697548b5f8f9c7ba1...39bcecbb49774797801e9dff23f72c13/)
[Finished :arrow_down:0.19% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/aa7fc9b2490d4039ab3c6b716c6cfe4f...39483dac631845f0a50353b7c93d8dbe/)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883529505
Thanks for checking. Let's also check with Conbench and if that's alright, then let's merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884086059
Benchmark runs are scheduled for baseline = 737492e074b816952bc011d18a996fec83b0b55f and contender = 992f8dcf38caf30405f100636760a77e5e98d056. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/38a02c3e53154a28ba259c7263231f54...5a2f863b93ee4ea5afd13886a5739380/)
[Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/786641447462489697548b5f8f9c7ba1...b2791f7ee14a4a2e9e23fb9d6caf2e80/)
[Finished :arrow_down:0.43% :arrow_up:0.05%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/aa7fc9b2490d4039ab3c6b716c6cfe4f...1aa05c1d9b384f96a12c7306acc81186/)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883412386
> It would be a before vs after benchmark not side by side
I tested this with the `ArraySortIndicesBool` benchmark and didn't see any significant difference in performance. So I think its okay to leave it with int64_t.
```
int64_t
-------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------
ArraySortIndicesBool/32768/10000 856525 ns 790696 ns 1067 bytes_per_second=39.5221M/s items_per_second=331.536M/s null_percent=0.01 size=32.768k
ArraySortIndicesBool/32768/100 1068125 ns 1067932 ns 644 bytes_per_second=29.2622M/s items_per_second=245.469M/s null_percent=1 size=32.768k
ArraySortIndicesBool/32768/10 1322777 ns 1320088 ns 523 bytes_per_second=23.6727M/s items_per_second=198.581M/s null_percent=10 size=32.768k
ArraySortIndicesBool/32768/2 1898999 ns 1871787 ns 372 bytes_per_second=16.6953M/s items_per_second=140.05M/s null_percent=50 size=32.768k
ArraySortIndicesBool/32768/1 228503 ns 218945 ns 3266 bytes_per_second=142.73M/s items_per_second=1.19731G/s null_percent=100 size=32.768k
ArraySortIndicesBool/32768/0 755168 ns 735851 ns 960 bytes_per_second=42.4678M/s items_per_second=356.246M/s null_percent=0 size=32.768k
ArraySortIndicesBool/1048576/100 35812425 ns 35626885 ns 19 bytes_per_second=28.0687M/s items_per_second=235.457M/s null_percent=1 size=1048.58k
ArraySortIndicesBool/8388608/100 320588616 ns 320491041 ns 2 bytes_per_second=24.9617M/s items_per_second=209.394M/s null_percent=1 size=8.38861M
int32_t
-------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------
ArraySortIndicesBool/32768/10000 768462 ns 763410 ns 1005 bytes_per_second=40.9347M/s items_per_second=343.385M/s null_percent=0.01 size=32.768k
ArraySortIndicesBool/32768/100 1060476 ns 1026142 ns 765 bytes_per_second=30.4539M/s items_per_second=255.466M/s null_percent=1 size=32.768k
ArraySortIndicesBool/32768/10 1447835 ns 1375450 ns 499 bytes_per_second=22.7198M/s items_per_second=190.588M/s null_percent=10 size=32.768k
ArraySortIndicesBool/32768/2 2002536 ns 2001982 ns 343 bytes_per_second=15.6095M/s items_per_second=130.942M/s null_percent=50 size=32.768k
ArraySortIndicesBool/32768/1 274177 ns 268186 ns 2750 bytes_per_second=116.523M/s items_per_second=977.469M/s null_percent=100 size=32.768k
ArraySortIndicesBool/32768/0 735371 ns 734928 ns 1025 bytes_per_second=42.5211M/s items_per_second=356.693M/s null_percent=0 size=32.768k
ArraySortIndicesBool/1048576/100 42704206 ns 42695362 ns 12 bytes_per_second=23.4217M/s items_per_second=196.476M/s null_percent=1 size=1048.58k
ArraySortIndicesBool/8388608/100 310066885 ns 310040314 ns 2 bytes_per_second=25.8031M/s items_per_second=216.452M/s null_percent=1 size=8.38861M
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882905528
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r670910858
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
}
};
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+ ArrayCountSorter() = default;
+
+ // Returns where null starts.
+ uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+ const BooleanArray& values, int64_t offset,
+ const ArraySortOptions& options) {
+ // 32bit counter performs much better than 64bit one
Review comment:
This comment was added by me very early when implementing counting sort approach.
I think it's possibly due to 32bit counter array is smaller and have more chance to stay in L1 cache.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882905528
@nirandaperera MSVC doesn't look happy.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867123996
> 2\. test-dplyr-arrange.R
Done! :-)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-873299590
@pitrou I added a separate `ArrayCountSorter` impl for `bool` types.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882774778
> If we're removing the 32 vs 64 bit counter branch, can we benchmark it to make sure there's no impact?
I'm still thinking how to do this benchmark? :smile: Because we can't call separate `ArrayCountSorter<BooleanType>` impls from the bench suite, isn't it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kszucs edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882913560
I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056) (appveyor not mingw).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882783904
It would be a before vs after benchmark not side by side
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10585:
URL: https://github.com/apache/arrow/pull/10585
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882709345
If we're removing the 32 vs 64 bit counter branch, can we benchmark it to make sure there's no impact?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type
Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-869990551
@pitrou I think I'll open a new JIRA for the optimized bool sort implementation. I feel like I can reuse some of the stuff from the #10487 PR here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org