You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/23 19:35:52 UTC

[GitHub] [arrow] nirandaperera opened a new pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

nirandaperera opened a new pull request #10585:
URL: https://github.com/apache/arrow/pull/10585


   Adding `array_sort_indices` and `partition_nth_indices` for `BooleanType` using existing sort and Nth-partition utils. 
   This may be rather inefficient, since the values are traversed bit-by-bit rather than working on a byte/word. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882913560


   I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056) (appveyor not mingw).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884086059


   Benchmark runs are scheduled for baseline = 737492e074b816952bc011d18a996fec83b0b55f and contender = 992f8dcf38caf30405f100636760a77e5e98d056. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/38a02c3e53154a28ba259c7263231f54...5a2f863b93ee4ea5afd13886a5739380/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/786641447462489697548b5f8f9c7ba1...b2791f7ee14a4a2e9e23fb9d6caf2e80/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/aa7fc9b2490d4039ab3c6b716c6cfe4f...1aa05c1d9b384f96a12c7306acc81186/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882912621


   @nirandaperera @kszucs The MSVC error seems unrelated to this PR and is cause by a timeout in a Flight test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-869990551


   @pitrou I think I'll open a new JIRA for the optimized bool sort implementation. I feel like I can reuse some of the stuff from the #10487 PR here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883529233


   @ursabot please benchmark lang=C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r663999228



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one

Review comment:
       This sounds rather weird. Are you sure about this?

##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one
+    if (values.length() < (1LL << 32)) {
+      return SortInternal<uint32_t>(indices_begin, indices_end, values, offset, options);
+    } else {
+      return SortInternal<uint64_t>(indices_begin, indices_end, values, offset, options);
+    }
+  }
+
+ private:
+  template <typename CounterType>
+  void CountBits(const BooleanArray& values, int64_t offset, CounterType* set_count,
+                 CounterType* null_count) {

Review comment:
       Really, you needn't write this yourself. Just call `null_count()` and `true_count()`.

##########
File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc
##########
@@ -980,6 +1056,68 @@ TEST_F(TestTableSortIndices, NaNAndNull) {
   AssertSortIndices(table, options, "[7, 1, 2, 6, 5, 4, 0, 3]");
 }
 
+TEST_F(TestTableSortIndices, Boolean) {

Review comment:
       Same here.

##########
File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc
##########
@@ -842,6 +877,47 @@ TEST_F(TestRecordBatchSortIndices, NaNAndNull) {
   AssertSortIndices(batch, options, "[7, 1, 2, 6, 5, 4, 0, 3]");
 }
 
+TEST_F(TestRecordBatchSortIndices, Boolean) {

Review comment:
       I don't think it's worth testing the non-null case separately. This will make less test code to maintain.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882663062






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r660073781



##########
File path: cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc
##########
@@ -81,6 +81,16 @@ static void ArraySortIndicesInt64Wide(benchmark::State& state) {
   ArraySortIndicesInt64Benchmark(state, min, max);
 }
 
+static void ArraySortIndicesBool(benchmark::State& state) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / 8;

Review comment:
       Yes. I was mistaken




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r659815010



##########
File path: cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc
##########
@@ -81,6 +81,16 @@ static void ArraySortIndicesInt64Wide(benchmark::State& state) {
   ArraySortIndicesInt64Benchmark(state, min, max);
 }
 
+static void ArraySortIndicesBool(benchmark::State& state) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / 8;

Review comment:
       This should probably be `* 8`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867409291


   Also, if you want to work on performance, note that a dedicated counting sort for boolean should be really simple.
   You can first call `null_count`, `true_count` and `false_count`, then you just have to walk individual bits and emit indices.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r671329597



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one

Review comment:
       Unless it's benchmarked here as well, maybe let's remove the comment to avoid being confusing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882663062






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884104018


   @ursabot please benchmark lang=C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883529599


   Benchmark runs are scheduled for baseline = d7a8b468ab64d4318ae62ab90251830acbb9b88d and contender = 85445cf37da25a953bb15478938853613b64cc18. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c26ba510b58049109ea86988b134a418...394257747dd049bb8d82a1e51c13548d/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/29a357c807b343edb0af286bcd89cd4b...7781e2b9649a4f6181e5da44d8fc8248/)
   [Failed] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab2c14bdb99b4644a656c34cb9964855...83138687943a480d9850eacecd3fa097/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867133305


   @ianmcook it looks like something else is missing. 
   https://github.com/apache/arrow/pull/10585/checks?check_run_id=2898806771 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867123996


   @ianmcook  Done! :-) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r671355074



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one

Review comment:
       This comment is there in the primitive type impl as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r663999228



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one

Review comment:
       This sounds rather weird. Are you sure about this?

##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one
+    if (values.length() < (1LL << 32)) {
+      return SortInternal<uint32_t>(indices_begin, indices_end, values, offset, options);
+    } else {
+      return SortInternal<uint64_t>(indices_begin, indices_end, values, offset, options);
+    }
+  }
+
+ private:
+  template <typename CounterType>
+  void CountBits(const BooleanArray& values, int64_t offset, CounterType* set_count,
+                 CounterType* null_count) {

Review comment:
       Really, you needn't write this yourself. Just call `null_count()` and `true_count()`.

##########
File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc
##########
@@ -980,6 +1056,68 @@ TEST_F(TestTableSortIndices, NaNAndNull) {
   AssertSortIndices(table, options, "[7, 1, 2, 6, 5, 4, 0, 3]");
 }
 
+TEST_F(TestTableSortIndices, Boolean) {

Review comment:
       Same here.

##########
File path: cpp/src/arrow/compute/kernels/vector_sort_test.cc
##########
@@ -842,6 +877,47 @@ TEST_F(TestRecordBatchSortIndices, NaNAndNull) {
   AssertSortIndices(batch, options, "[7, 1, 2, 6, 5, 4, 0, 3]");
 }
 
+TEST_F(TestRecordBatchSortIndices, Boolean) {

Review comment:
       I don't think it's worth testing the non-null case separately. This will make less test code to maintain.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r671356593



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one

Review comment:
       That one was benchmarked!
   
   More seriously…maybe at least edit them to reflect that it was done due to a benchmark. Though I worry about the comment effectively bitrotting.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883102409


   > I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056) (appveyor not mingw).
   
   I believe this is due to a method being declared `static`. Let's see! thanks @kszucs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882709345






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883412386


   > It would be a before vs after benchmark not side by side
   
   I tested this with the `ArraySortIndicesBool` benchmark and didn't see any significant difference in performance. So I think its safe to leave it with int64_t.
   
   ```
   int64_t
   
   -------------------------------------------------------------------------------------------
   Benchmark                                 Time             CPU   Iterations UserCounters...
   -------------------------------------------------------------------------------------------
   ArraySortIndicesBool/32768/10000     856525 ns       790696 ns         1067 bytes_per_second=39.5221M/s items_per_second=331.536M/s null_percent=0.01 size=32.768k
   ArraySortIndicesBool/32768/100      1068125 ns      1067932 ns          644 bytes_per_second=29.2622M/s items_per_second=245.469M/s null_percent=1 size=32.768k
   ArraySortIndicesBool/32768/10       1322777 ns      1320088 ns          523 bytes_per_second=23.6727M/s items_per_second=198.581M/s null_percent=10 size=32.768k
   ArraySortIndicesBool/32768/2        1898999 ns      1871787 ns          372 bytes_per_second=16.6953M/s items_per_second=140.05M/s null_percent=50 size=32.768k
   ArraySortIndicesBool/32768/1         228503 ns       218945 ns         3266 bytes_per_second=142.73M/s items_per_second=1.19731G/s null_percent=100 size=32.768k
   ArraySortIndicesBool/32768/0         755168 ns       735851 ns          960 bytes_per_second=42.4678M/s items_per_second=356.246M/s null_percent=0 size=32.768k
   ArraySortIndicesBool/1048576/100   35812425 ns     35626885 ns           19 bytes_per_second=28.0687M/s items_per_second=235.457M/s null_percent=1 size=1048.58k
   ArraySortIndicesBool/8388608/100  320588616 ns    320491041 ns            2 bytes_per_second=24.9617M/s items_per_second=209.394M/s null_percent=1 size=8.38861M
   
   
   int32_t
   
   -------------------------------------------------------------------------------------------
   Benchmark                                 Time             CPU   Iterations UserCounters...
   -------------------------------------------------------------------------------------------
   ArraySortIndicesBool/32768/10000     768462 ns       763410 ns         1005 bytes_per_second=40.9347M/s items_per_second=343.385M/s null_percent=0.01 size=32.768k
   ArraySortIndicesBool/32768/100      1060476 ns      1026142 ns          765 bytes_per_second=30.4539M/s items_per_second=255.466M/s null_percent=1 size=32.768k
   ArraySortIndicesBool/32768/10       1447835 ns      1375450 ns          499 bytes_per_second=22.7198M/s items_per_second=190.588M/s null_percent=10 size=32.768k
   ArraySortIndicesBool/32768/2        2002536 ns      2001982 ns          343 bytes_per_second=15.6095M/s items_per_second=130.942M/s null_percent=50 size=32.768k
   ArraySortIndicesBool/32768/1         274177 ns       268186 ns         2750 bytes_per_second=116.523M/s items_per_second=977.469M/s null_percent=100 size=32.768k
   ArraySortIndicesBool/32768/0         735371 ns       734928 ns         1025 bytes_per_second=42.5211M/s items_per_second=356.693M/s null_percent=0 size=32.768k
   ArraySortIndicesBool/1048576/100   42704206 ns     42695362 ns           12 bytes_per_second=23.4217M/s items_per_second=196.476M/s null_percent=1 size=1048.58k
   ArraySortIndicesBool/8388608/100  310066885 ns    310040314 ns            2 bytes_per_second=25.8031M/s items_per_second=216.452M/s null_percent=1 size=8.38861M
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867105461


   @pitrou Can you take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884085956


   @ursabot please benchmark lang=C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r659815010



##########
File path: cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc
##########
@@ -81,6 +81,16 @@ static void ArraySortIndicesInt64Wide(benchmark::State& state) {
   ArraySortIndicesInt64Benchmark(state, min, max);
 }
 
+static void ArraySortIndicesBool(benchmark::State& state) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / 8;

Review comment:
       This should probably be `* 8`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882663062


   I made the suggested changes and I think this is ready now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867858356


   > @nirandaperera You may want to add a benchmark in `vector_sort_benchmark.cc`.
   
   @pitrou I added a simple benchmark now. I'll add the improved version and run it against that bench. Didn't have to do much for RecordBatches and Tables because they are already using `::GetView` methods to access values. So, it was working OOB for bools. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883551498


   @nirandaperera this apparently needs rebasing against master before we can run Conbench on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882905528






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r664587075



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one
+    if (values.length() < (1LL << 32)) {
+      return SortInternal<uint32_t>(indices_begin, indices_end, values, offset, options);
+    } else {
+      return SortInternal<uint64_t>(indices_begin, indices_end, values, offset, options);
+    }
+  }
+
+ private:
+  template <typename CounterType>
+  void CountBits(const BooleanArray& values, int64_t offset, CounterType* set_count,
+                 CounterType* null_count) {

Review comment:
       Oh okay. I did this way because we can count both nulls and trues in a single pass. But sure, I'll use that. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867408117


   @nirandaperera You may want to add a benchmark in `vector_sort_benchmark.cc`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867105268


   https://issues.apache.org/jira/browse/ARROW-12016


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r671355074



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one

Review comment:
       This comment is there in the primitive type impl as well. Should that be removed as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882913560


   I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882913560


   I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056) (appveyor not mingw).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882912621


   @nirandaperera @kszucs The MSVC error seems unrelated to this PR and is cause by a timeout in a Flight test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r664588538



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one

Review comment:
       well, I didn't test this. Just following the `ArrayCountSorter` impl. 
   https://github.com/apache/arrow/blob/36d7a562f08d77dbf5ed699d486badcc9403f619/cpp/src/arrow/compute/kernels/vector_sort.cc#L438 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867115679


   @nirandaperera as part of this PR, could you please make two small changes to the R package tests to exercise this new capability?
   
   1. Remove the comment at the end of this line:
   https://github.com/apache/arrow/blob/515b05c3bbad66a60b7c2577c50f7a258219add8/r/tests/testthat/helper-data.R#L155
   
   2. Remove this line:
   https://github.com/apache/arrow/blob/450e0eb7a640881788f839d0a475d796fa23c81c/r/tests/testthat/test-dplyr-arrange.R#L162
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882912621


   @nirandaperera @kszucs The MSVC error seems unrelated to this PR and is cause by a timeout in a Flight test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883529599


   Benchmark runs are scheduled for baseline = d7a8b468ab64d4318ae62ab90251830acbb9b88d and contender = 85445cf37da25a953bb15478938853613b64cc18. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c26ba510b58049109ea86988b134a418...394257747dd049bb8d82a1e51c13548d/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/29a357c807b343edb0af286bcd89cd4b...7781e2b9649a4f6181e5da44d8fc8248/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab2c14bdb99b4644a656c34cb9964855...83138687943a480d9850eacecd3fa097/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884104141


   Benchmark runs are scheduled for baseline = 737492e074b816952bc011d18a996fec83b0b55f and contender = 257527c9de339ab41b9f1af125e18a042b9a3ae5. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/38a02c3e53154a28ba259c7263231f54...fd20eac590f240e8afe0f6f5e671419b/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/786641447462489697548b5f8f9c7ba1...39bcecbb49774797801e9dff23f72c13/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/aa7fc9b2490d4039ab3c6b716c6cfe4f...39483dac631845f0a50353b7c93d8dbe/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882709345






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r660073781



##########
File path: cpp/src/arrow/compute/kernels/vector_sort_benchmark.cc
##########
@@ -81,6 +81,16 @@ static void ArraySortIndicesInt64Wide(benchmark::State& state) {
   ArraySortIndicesInt64Benchmark(state, min, max);
 }
 
+static void ArraySortIndicesBool(benchmark::State& state) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / 8;

Review comment:
       Yes. I was mistaken




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884104141


   Benchmark runs are scheduled for baseline = 737492e074b816952bc011d18a996fec83b0b55f and contender = 257527c9de339ab41b9f1af125e18a042b9a3ae5. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/38a02c3e53154a28ba259c7263231f54...fd20eac590f240e8afe0f6f5e671419b/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/786641447462489697548b5f8f9c7ba1...39bcecbb49774797801e9dff23f72c13/)
   [Finished :arrow_down:0.19% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/aa7fc9b2490d4039ab3c6b716c6cfe4f...39483dac631845f0a50353b7c93d8dbe/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883529505


   Thanks for checking. Let's also check with Conbench and if that's alright, then let's merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-884086059


   Benchmark runs are scheduled for baseline = 737492e074b816952bc011d18a996fec83b0b55f and contender = 992f8dcf38caf30405f100636760a77e5e98d056. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/38a02c3e53154a28ba259c7263231f54...5a2f863b93ee4ea5afd13886a5739380/)
   [Skipped :warning: Only ['Python', 'R'] langs are supported on ursa-i9-9960x] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/786641447462489697548b5f8f9c7ba1...b2791f7ee14a4a2e9e23fb9d6caf2e80/)
   [Finished :arrow_down:0.43% :arrow_up:0.05%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/aa7fc9b2490d4039ab3c6b716c6cfe4f...1aa05c1d9b384f96a12c7306acc81186/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-883412386


   > It would be a before vs after benchmark not side by side
   
   I tested this with the `ArraySortIndicesBool` benchmark and didn't see any significant difference in performance. So I think its okay to leave it with int64_t.
   
   ```
   int64_t
   
   -------------------------------------------------------------------------------------------
   Benchmark                                 Time             CPU   Iterations UserCounters...
   -------------------------------------------------------------------------------------------
   ArraySortIndicesBool/32768/10000     856525 ns       790696 ns         1067 bytes_per_second=39.5221M/s items_per_second=331.536M/s null_percent=0.01 size=32.768k
   ArraySortIndicesBool/32768/100      1068125 ns      1067932 ns          644 bytes_per_second=29.2622M/s items_per_second=245.469M/s null_percent=1 size=32.768k
   ArraySortIndicesBool/32768/10       1322777 ns      1320088 ns          523 bytes_per_second=23.6727M/s items_per_second=198.581M/s null_percent=10 size=32.768k
   ArraySortIndicesBool/32768/2        1898999 ns      1871787 ns          372 bytes_per_second=16.6953M/s items_per_second=140.05M/s null_percent=50 size=32.768k
   ArraySortIndicesBool/32768/1         228503 ns       218945 ns         3266 bytes_per_second=142.73M/s items_per_second=1.19731G/s null_percent=100 size=32.768k
   ArraySortIndicesBool/32768/0         755168 ns       735851 ns          960 bytes_per_second=42.4678M/s items_per_second=356.246M/s null_percent=0 size=32.768k
   ArraySortIndicesBool/1048576/100   35812425 ns     35626885 ns           19 bytes_per_second=28.0687M/s items_per_second=235.457M/s null_percent=1 size=1048.58k
   ArraySortIndicesBool/8388608/100  320588616 ns    320491041 ns            2 bytes_per_second=24.9617M/s items_per_second=209.394M/s null_percent=1 size=8.38861M
   
   
   int32_t
   
   -------------------------------------------------------------------------------------------
   Benchmark                                 Time             CPU   Iterations UserCounters...
   -------------------------------------------------------------------------------------------
   ArraySortIndicesBool/32768/10000     768462 ns       763410 ns         1005 bytes_per_second=40.9347M/s items_per_second=343.385M/s null_percent=0.01 size=32.768k
   ArraySortIndicesBool/32768/100      1060476 ns      1026142 ns          765 bytes_per_second=30.4539M/s items_per_second=255.466M/s null_percent=1 size=32.768k
   ArraySortIndicesBool/32768/10       1447835 ns      1375450 ns          499 bytes_per_second=22.7198M/s items_per_second=190.588M/s null_percent=10 size=32.768k
   ArraySortIndicesBool/32768/2        2002536 ns      2001982 ns          343 bytes_per_second=15.6095M/s items_per_second=130.942M/s null_percent=50 size=32.768k
   ArraySortIndicesBool/32768/1         274177 ns       268186 ns         2750 bytes_per_second=116.523M/s items_per_second=977.469M/s null_percent=100 size=32.768k
   ArraySortIndicesBool/32768/0         735371 ns       734928 ns         1025 bytes_per_second=42.5211M/s items_per_second=356.693M/s null_percent=0 size=32.768k
   ArraySortIndicesBool/1048576/100   42704206 ns     42695362 ns           12 bytes_per_second=23.4217M/s items_per_second=196.476M/s null_percent=1 size=1048.58k
   ArraySortIndicesBool/8388608/100  310066885 ns    310040314 ns            2 bytes_per_second=25.8031M/s items_per_second=216.452M/s null_percent=1 size=8.38861M
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882905528






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#discussion_r670910858



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -477,6 +491,87 @@ class ArrayCountSorter {
   }
 };
 
+using ::arrow::internal::Bitmap;
+
+template <>
+class ArrayCountSorter<BooleanType> {
+ public:
+  ArrayCountSorter() = default;
+
+  // Returns where null starts.
+  uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end,
+                 const BooleanArray& values, int64_t offset,
+                 const ArraySortOptions& options) {
+    // 32bit counter performs much better than 64bit one

Review comment:
       This comment was added by me very early when implementing counting sort approach.
   I think it's possibly due to 32bit counter array is smaller and have more chance to stay in L1 cache.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882905528


   @nirandaperera MSVC doesn't look happy. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-867123996


   >     2\. test-dplyr-arrange.R
   
   Done! :-) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-873299590


   @pitrou I added a separate `ArrayCountSorter` impl for `bool` types. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882774778


   > If we're removing the 32 vs 64 bit counter branch, can we benchmark it to make sure there's no impact?
   
   I'm still thinking how to do this benchmark? :smile: Because we can't call separate `ArrayCountSorter<BooleanType>` impls from the bench suite, isn't it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs edited a comment on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882913560


   I'm referring to [these errors](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/40048244/job/p40i6473avjbh91g#L1056) (appveyor not mingw).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882783904


   It would be a before vs after benchmark not side by side


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10585:
URL: https://github.com/apache/arrow/pull/10585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-882709345


   If we're removing the 32 vs 64 bit counter branch, can we benchmark it to make sure there's no impact?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nirandaperera commented on pull request #10585: ARROW-12016 [C++] Implement array_sort_indices and sort_indices for BOOL type

Posted by GitBox <gi...@apache.org>.
nirandaperera commented on pull request #10585:
URL: https://github.com/apache/arrow/pull/10585#issuecomment-869990551


   @pitrou I think I'll open a new JIRA for the optimized bool sort implementation. I feel like I can reuse some of the stuff from the #10487 PR here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org