You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/31 07:12:08 UTC

[GitHub] [arrow] jianxind opened a new pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

jianxind opened a new pull request #7871:
URL: https://github.com/apache/arrow/pull/7871


   1. Use BitBlockCounter to speedup the performance for typical 0.01% null-able data.
   2. Enable AVX compiler auto vectorize version for no-nulls on int types. Float/Double use fmin/fmax to handle NaN which can't be vectorize by compiler.
   3. Also add test case to cover different null probability.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r480175509



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::max();
+  T max = std::numeric_limits<T>::min();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::min(min, values[i]);
+        max = std::max(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::min(min, values[i]);
+      max = std::max(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::infinity();
+  T max = -std::numeric_limits<T>::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::fmin(min, values[i]);
+        max = std::fmax(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::fmin(min, values[i]);
+      max = std::fmax(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits<ArrowType>;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as<StructScalar>();
+
+  auto expected = NaiveMinMax<ArrowType>(array);
+  const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);

Review comment:
       In this case, I would expect the test to look like this:
   ```c++
   ASSERT_FALSE(out_min.is_valid);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685615876


   > ARROW_USER_SIMD_LEVEL=none
   
   Below is the cmd I used, and compiler vectorise happens only on Int types.
   ```
   ARROW_USER_SIMD_LEVEL=avx2 ./release/arrow-compute-aggregate-benchmark --benchmark_filter=MinMaxKernelInt64
   ARROW_USER_SIMD_LEVEL=none ./release/arrow-compute-aggregate-benchmark --benchmark_filter=MinMaxKernelInt64
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685656632


   Rebased.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-672689445


   Ping. @wesm @pitrou 
   
   Could you help to review this? Similar approach to sum kernel, use compiler to vectorise the NoNulls part, use BitBlockCounter on the 0.01% data. https://github.com/apache/arrow/pull/7870 add the benchmark item for MinMax kernel.
   
   Thanks. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou edited a comment on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685637355


   Ah, I also had `-DARROW_SIMD_LEVEL=AVX2` in CMake. Without it I do see a difference. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7871:
URL: https://github.com/apache/arrow/pull/7871


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r481101916



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::max();
+  T max = std::numeric_limits<T>::min();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::min(min, values[i]);
+        max = std::max(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::min(min, values[i]);
+      max = std::max(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::infinity();
+  T max = -std::numeric_limits<T>::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::fmin(min, values[i]);
+        max = std::fmax(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::fmin(min, values[i]);
+      max = std::fmax(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits<ArrowType>;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as<StructScalar>();
+
+  auto expected = NaiveMinMax<ArrowType>(array);
+  const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);

Review comment:
       I don't think that's sufficient. You should probably add a `bool is_valid` member to `MinMaxResult`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673192812


   > @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark-filter=MinMax
   
   Below is the results for null_percent 0.01% and 0% on https://ci.ursalabs.org/#/builders/73/builds/101
   ```
                              benchmark         baseline         contender  change %                                           counters
   3     MinMaxKernelInt8/1048576/10000  812.254 MiB/sec     7.952 GiB/sec   902.442  {'run_name': 'MinMaxKernelInt8/1048576/10000',...
   31   MinMaxKernelInt16/1048576/10000    1.583 GiB/sec    12.895 GiB/sec   714.512  {'run_name': 'MinMaxKernelInt16/1048576/10000'...
   16   MinMaxKernelInt32/1048576/10000    3.152 GiB/sec    16.605 GiB/sec   426.876  {'run_name': 'MinMaxKernelInt32/1048576/10000'...
   2        MinMaxKernelInt64/1048576/0    5.289 GiB/sec    11.092 GiB/sec   109.708  {'run_name': 'MinMaxKernelInt64/1048576/0', 'r...
   14   MinMaxKernelInt64/1048576/10000    6.222 GiB/sec    10.055 GiB/sec    61.610  {'run_name': 'MinMaxKernelInt64/1048576/10000'...
   1        MinMaxKernelInt32/1048576/0   18.103 GiB/sec    26.301 GiB/sec    45.282  {'run_name': 'MinMaxKernelInt32/1048576/0', 'r...
   15       MinMaxKernelInt16/1048576/0   18.086 GiB/sec    26.274 GiB/sec    45.269  {'run_name': 'MinMaxKernelInt16/1048576/0', 'r...
   7         MinMaxKernelInt8/1048576/0   18.112 GiB/sec    26.210 GiB/sec    44.708  {'run_name': 'MinMaxKernelInt8/1048576/0', 'ru...
   26  MinMaxKernelDouble/1048576/10000    1.063 GiB/sec     1.315 GiB/sec    23.759  {'run_name': 'MinMaxKernelDouble/1048576/10000...
   23   MinMaxKernelFloat/1048576/10000  551.756 MiB/sec   674.455 MiB/sec    22.238  {'run_name': 'MinMaxKernelFloat/1048576/10000'...
   0       MinMaxKernelDouble/1048576/0    1.205 GiB/sec     1.332 GiB/sec    10.600  {'run_name': 'MinMaxKernelDouble/1048576/0', '...
   12       MinMaxKernelFloat/1048576/0  621.824 MiB/sec   607.146 MiB/sec    -2.361  {'run_name': 'MinMaxKernelFloat/1048576/0', 'r...
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685587641


   Passing `ARROW_USER_SIMD_LEVEL=none` doesn't seem to impact the results. Is something amiss?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind removed a comment on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind removed a comment on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-669047527


   Need a rebase after https://github.com/apache/arrow/pull/7903


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-666974009


   https://issues.apache.org/jira/browse/ARROW-9605


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-666978438


   I can trigger a benchmark action once https://github.com/apache/arrow/pull/7870 get merged.
   
   Below is the BM data for int types on my setup:
   ```
   Before:
   MinMaxKernelInt8/1048576/10000          847 us          845 us          828 bytes_per_second=1.15586G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt8/1048576/0             43.9 us         43.8 us        15738 bytes_per_second=22.294G/s null_percent=0 size=1048.58k
   MinMaxKernelInt16/1048576/10000         429 us          428 us         1637 bytes_per_second=2.28348G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt16/1048576/0            42.4 us         42.4 us        15878 bytes_per_second=23.0572G/s null_percent=0 size=1048.58k
   MinMaxKernelInt32/1048576/10000         295 us          294 us         2383 bytes_per_second=3.31751G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt32/1048576/0            42.1 us         42.0 us        16620 bytes_per_second=23.2245G/s null_percent=0 size=1048.58k
   MinMaxKernelInt64/1048576/10000         112 us          112 us         6309 bytes_per_second=8.70966G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt64/1048576/0            82.2 us         82.1 us         8537 bytes_per_second=11.8992G/s null_percent=0 size=1048.58k
   
   After(AVX2):
   MinMaxKernelInt8/1048576/10000         92.9 us         92.6 us         7568 bytes_per_second=10.5421G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt8/1048576/0             31.3 us         31.2 us        21832 bytes_per_second=31.2619G/s null_percent=0 size=1048.58k
   MinMaxKernelInt16/1048576/10000        60.7 us         60.5 us        11501 bytes_per_second=16.1388G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt16/1048576/0            31.5 us         31.4 us        22316 bytes_per_second=31.1085G/s null_percent=0 size=1048.58k
   MinMaxKernelInt32/1048576/10000        51.0 us         50.9 us        13841 bytes_per_second=19.1853G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt32/1048576/0            31.8 us         31.7 us        22111 bytes_per_second=30.8189G/s null_percent=0 size=1048.58k
   MinMaxKernelInt64/1048576/10000        61.1 us         61.0 us        11610 bytes_per_second=16.016G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt64/1048576/0            54.2 us         54.1 us        12935 bytes_per_second=18.0651G/s null_percent=0 size=1048.58k
   
   AVX512:
   MinMaxKernelInt32/1048576/10000       40.9 us         40.8 us        17151 bytes_per_second=23.9207G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt32/1048576/0           25.6 us         25.6 us        26669 bytes_per_second=38.2196G/s null_percent=0 size=1048.58k
   MinMaxKernelInt64/1048576/10000       34.5 us         34.4 us        20137 bytes_per_second=28.396G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt64/1048576/0           23.7 us         23.7 us        25949 bytes_per_second=41.2537G/s null_percent=0 size=1048.58k
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477425988



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::max();
+  T max = std::numeric_limits<T>::min();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::min(min, values[i]);
+        max = std::max(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::min(min, values[i]);
+      max = std::max(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::infinity();
+  T max = -std::numeric_limits<T>::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::fmin(min, values[i]);
+        max = std::fmax(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::fmin(min, values[i]);
+      max = std::fmax(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits<ArrowType>;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as<StructScalar>();
+
+  auto expected = NaiveMinMax<ArrowType>(array);
+  const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);
+
+  const auto& out_max = checked_cast<const ScalarType&>(*value.value[1]);
+  ASSERT_EQ(expected.second, out_max.value);
+}
+
+template <typename ArrowType>
+class TestRandomNumericMinMaxKernel : public ::testing::Test {};
+
+TYPED_TEST_SUITE(TestRandomNumericMinMaxKernel, NumericArrowTypes);
+TYPED_TEST(TestRandomNumericMinMaxKernel, RandomArrayMinMax) {
+  auto rand = random::RandomArrayGenerator(0x8afc055);
+  // Test size up to 1<<13 (8192).

Review comment:
       Sounds a bit large. Why not stop at e.g. 1024?

##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::max();
+  T max = std::numeric_limits<T>::min();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::min(min, values[i]);
+        max = std::max(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::min(min, values[i]);
+      max = std::max(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::infinity();
+  T max = -std::numeric_limits<T>::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::fmin(min, values[i]);
+        max = std::fmax(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::fmin(min, values[i]);
+      max = std::fmax(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits<ArrowType>;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as<StructScalar>();
+
+  auto expected = NaiveMinMax<ArrowType>(array);
+  const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);

Review comment:
       How does this work when the array is all nulls (`null_probability` = 1.0 below)?

##########
File path: cpp/src/arrow/compute/api_aggregate.h
##########
@@ -130,23 +130,6 @@ Result<Datum> MinMax(const Datum& value,
                      const MinMaxOptions& options = MinMaxOptions::Defaults(),
                      ExecContext* ctx = NULLPTR);
 
-/// \brief Calculate the min / max of a numeric array.
-///
-/// This function returns both the min and max as a collection. The resulting
-/// datum thus consists of two scalar datums: {Datum(min), Datum(max)}
-///
-/// \param[in] array input array
-/// \param[in] options see MinMaxOptions for more information
-/// \param[in] ctx the function execution context, optional
-/// \return resulting datum containing a {min, max} collection
-///
-/// \since 1.0.0
-/// \note API not yet finalized
-ARROW_EXPORT
-Result<Datum> MinMax(const Array& array,

Review comment:
       Hmm... is this removal deliberate?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind removed a comment on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind removed a comment on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673185847


   @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark_filter="MinMax"


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-669047527


   Need a rebase after https://github.com/apache/arrow/pull/7903


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477781881



##########
File path: cpp/src/arrow/compute/api_aggregate.h
##########
@@ -130,23 +130,6 @@ Result<Datum> MinMax(const Datum& value,
                      const MinMaxOptions& options = MinMaxOptions::Defaults(),
                      ExecContext* ctx = NULLPTR);
 
-/// \brief Calculate the min / max of a numeric array.
-///
-/// This function returns both the min and max as a collection. The resulting
-/// datum thus consists of two scalar datums: {Datum(min), Datum(max)}
-///
-/// \param[in] array input array
-/// \param[in] options see MinMaxOptions for more information
-/// \param[in] ctx the function execution context, optional
-/// \return resulting datum containing a {min, max} collection
-///
-/// \since 1.0.0
-/// \note API not yet finalized
-ARROW_EXPORT
-Result<Datum> MinMax(const Array& array,

Review comment:
       Yes. There's no implementation to Array input of MinMax, it will hit a link error if call MinMax with Array data.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r481880440



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::max();
+  T max = std::numeric_limits<T>::min();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::min(min, values[i]);
+        max = std::max(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::min(min, values[i]);
+      max = std::max(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::infinity();
+  T max = -std::numeric_limits<T>::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::fmin(min, values[i]);
+        max = std::fmax(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::fmin(min, values[i]);
+      max = std::fmax(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits<ArrowType>;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as<StructScalar>();
+
+  auto expected = NaiveMinMax<ArrowType>(array);
+  const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);

Review comment:
       Thanks, added. Pls help to check again.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477794871



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::max();
+  T max = std::numeric_limits<T>::min();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::min(min, values[i]);
+        max = std::max(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::min(min, values[i]);
+      max = std::max(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::infinity();
+  T max = -std::numeric_limits<T>::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::fmin(min, values[i]);
+        max = std::fmax(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::fmin(min, values[i]);
+      max = std::fmax(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits<ArrowType>;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as<StructScalar>();
+
+  auto expected = NaiveMinMax<ArrowType>(array);
+  const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);

Review comment:
       Return null for both the expected test and MinMax.
   
    if (array.length() <= array.null_count()) {  // All null values
       return {static_cast<T>(0), static_cast<T>(0)};
     }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r477809857



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::max();
+  T max = std::numeric_limits<T>::min();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::min(min, values[i]);
+        max = std::max(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::min(min, values[i]);
+      max = std::max(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::infinity();
+  T max = -std::numeric_limits<T>::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::fmin(min, values[i]);
+        max = std::fmax(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::fmin(min, values[i]);
+      max = std::fmax(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits<ArrowType>;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as<StructScalar>();
+
+  auto expected = NaiveMinMax<ArrowType>(array);
+  const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);
+
+  const auto& out_max = checked_cast<const ScalarType&>(*value.value[1]);
+  ASSERT_EQ(expected.second, out_max.value);
+}
+
+template <typename ArrowType>
+class TestRandomNumericMinMaxKernel : public ::testing::Test {};
+
+TYPED_TEST_SUITE(TestRandomNumericMinMaxKernel, NumericArrowTypes);
+TYPED_TEST(TestRandomNumericMinMaxKernel, RandomArrayMinMax) {
+  auto rand = random::RandomArrayGenerator(0x8afc055);
+  // Test size up to 1<<13 (8192).

Review comment:
       The new implementation is based on BitBlockCounter, I am afraid 1024 size with 0.001(only one valid value) null_probability doesn't cover enough.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-680420724


   > @jianxind Sorry for the delay. Could you please rebase this PR? It looks like there are some conflicts now.
   
   No problem at all. Rebased now. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-679993093


   @jianxind Sorry for the delay. Could you please rebase this PR? It looks like there are some conflicts now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on a change in pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#discussion_r480661786



##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) {
   AssertDatumsEqual(explicit_defaults, no_options_provided);
 }
 
+template <typename ArrowType>
+using MinMaxResult = std::pair<typename ArrowType::c_type, typename ArrowType::c_type>;
+
+template <typename ArrowType>
+static enable_if_integer<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::max();
+  T max = std::numeric_limits<T>::min();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::min(min, values[i]);
+        max = std::max(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::min(min, values[i]);
+      max = std::max(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+static enable_if_floating_point<ArrowType, MinMaxResult<ArrowType>> NaiveMinMax(
+    const Array& array) {
+  using T = typename ArrowType::c_type;
+  using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+
+  const auto& array_numeric = reinterpret_cast<const ArrayType&>(array);
+  const auto values = array_numeric.raw_values();
+
+  if (array.length() <= array.null_count()) {  // All null values
+    return {static_cast<T>(0), static_cast<T>(0)};
+  }
+
+  T min = std::numeric_limits<T>::infinity();
+  T max = -std::numeric_limits<T>::infinity();
+  if (array.null_count() != 0) {  // Some values are null
+    internal::BitmapReader reader(array.null_bitmap_data(), array.offset(),
+                                  array.length());
+    for (int64_t i = 0; i < array.length(); i++) {
+      if (reader.IsSet()) {
+        min = std::fmin(min, values[i]);
+        max = std::fmax(max, values[i]);
+      }
+      reader.Next();
+    }
+  } else {  // All true values
+    for (int64_t i = 0; i < array.length(); i++) {
+      min = std::fmin(min, values[i]);
+      max = std::fmax(max, values[i]);
+    }
+  }
+
+  return {min, max};
+}
+
+template <typename ArrowType>
+void ValidateMinMax(const Array& array) {
+  using Traits = TypeTraits<ArrowType>;
+  using ScalarType = typename Traits::ScalarType;
+
+  ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array));
+  const StructScalar& value = out.scalar_as<StructScalar>();
+
+  auto expected = NaiveMinMax<ArrowType>(array);
+  const auto& out_min = checked_cast<const ScalarType&>(*value.value[0]);
+  ASSERT_EQ(expected.first, out_min.value);

Review comment:
       Added a RandomNullArrayMinMax to cover 1.0 null_probability, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685741244


   Test failures are unrelated, will merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673186171


   @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark-filter=MinMax


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-685637355


   Ah, I also had {{-DARROW_SIMD_LEVEL=AVX2}} in CMake. Without it I do see a difference. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
jianxind commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673185847


   @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark --benchmark_filter="MinMax"


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7871:
URL: https://github.com/apache/arrow/pull/7871#issuecomment-673185853


   ```
   no such option: --benchmark_filter
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org