You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/24 22:00:39 UTC

[GitHub] [arrow] lidavidm opened a new pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

lidavidm opened a new pull request #10390:
URL: https://github.com/apache/arrow/pull/10390


   This is a bit messy, but implements a variadic scalar maximum/minimum kernel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook commented on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook commented on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-854117875


   @lidavidm could you briefly describe what happens here if the arrays passed to these kernels have different but comparable, data types such as one array of doubles and one array of integers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-848282481


   (N.B. I still need to fix CommonTimestamp, got caught up in fixing the other feedback)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r638953961



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +601,203 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Arg0Type, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Datum* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->scalar()->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->scalar()->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out->scalar().get());
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const bool all_scalar =
+        std::all_of(batch.values.begin(), batch.values.end(),
+                    [](const Datum& d) { return d.descr().shape == ValueDescr::SCALAR; });
+    if (all_scalar) {
+      ExecScalar(batch, options, out);
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // Exactly one array (output = input)
+    if (batch.values.size() == 1) {
+      *output = *batch[0].array();
+      return Status::OK();
+    }
+
+    // At least one array, two or more arguments
+    int64_t length = 0;

Review comment:
       This should already be present in ExecBatch::length

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -453,6 +518,26 @@ struct ArithmeticFunction : ScalarFunction {
   }
 };
 
+struct ArithmeticVarArgsFunction : ScalarFunction {
+  using ScalarFunction::ScalarFunction;
+
+  Result<const Kernel*> DispatchBest(std::vector<ValueDescr>* values) const override {
+    RETURN_NOT_OK(CheckArity(*values));
+
+    using arrow::compute::detail::DispatchExactImpl;
+    if (auto kernel = DispatchExactImpl(this, *values)) return kernel;
+
+    EnsureDictionaryDecoded(values);
+
+    if (auto type = CommonNumeric(*values)) {
+      ReplaceTypes(type, values);
+    }

Review comment:
       (Perhaps as a follow-up:)
   ```suggestion
       } else if (auto type = CommonTimestamp(*values)) {
         ReplaceTypes(type, values);
       } else if (auto type = CommonBinary(*values)) {
         ReplaceTypes(type, values);
       }
   ```

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +601,203 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Arg0Type, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Datum* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->scalar()->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->scalar()->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out->scalar().get());
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const bool all_scalar =
+        std::all_of(batch.values.begin(), batch.values.end(),
+                    [](const Datum& d) { return d.descr().shape == ValueDescr::SCALAR; });
+    if (all_scalar) {
+      ExecScalar(batch, options, out);
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // Exactly one array (output = input)
+    if (batch.values.size() == 1) {
+      *output = *batch[0].array();
+      return Status::OK();
+    }
+
+    // At least one array, two or more arguments
+    int64_t length = 0;
+    for (const auto& arg : batch.values) {
+      if (arg.is_array()) {
+        length = arg.array()->length;
+        break;
+      }
+    }
+
+    if (!options.skip_nulls) {
+      // We can precompute the validity buffer in this case
+      // If output will be all null, just return
+      for (const auto& arg : batch.values) {
+        if (arg.is_scalar() && !arg.scalar()->is_valid) {
+          ARROW_ASSIGN_OR_RAISE(
+              auto array, MakeArrayFromScalar(*arg.scalar(), length, ctx->memory_pool()));
+          *output = *array->data();
+          return Status::OK();
+        } else if (arg.is_array() && arg.array()->null_count == length) {
+          *output = *arg.array();
+          return Status::OK();
+        }
+      }
+      // AND together the validity buffers of all arrays
+      ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(length));
+      bool first = true;
+      for (const auto& arg : batch.values) {
+        if (!arg.is_array()) continue;
+        auto arr = arg.array();
+        if (!arr->buffers[0]) continue;
+        if (first) {
+          ::arrow::internal::CopyBitmap(arr->buffers[0]->data(), arr->offset, length,
+                                        output->buffers[0]->mutable_data(),
+                                        /*dest_offset=*/0);
+          first = false;
+        } else {
+          ::arrow::internal::BitmapAnd(
+              output->buffers[0]->data(), /*left_offset=*/0, arr->buffers[0]->data(),
+              arr->offset, length, /*out_offset=*/0, output->buffers[0]->mutable_data());
+        }
+      }
+    }
+
+    if (batch.values[0].is_scalar()) {
+      // Promote to output array
+      ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*batch.values[0].scalar(),
+                                                            length, ctx->memory_pool()));
+      *output = *array->data();
+      if (!batch.values[0].scalar()->is_valid) {
+        // MakeArrayFromScalar reuses the same buffer for null/data in
+        // this case, allocate a real one since we'll be writing to it
+        ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(length));
+        ::arrow::internal::BitmapXor(output->buffers[0]->data(), /*left_offset=*/0,
+                                     output->buffers[0]->data(), /*right_offset=*/0,
+                                     length, /*out_offset=*/0,
+                                     output->buffers[0]->mutable_data());
+      }
+    } else {
+      // Copy to output array
+      const ArrayData& input = *batch.values[0].array();
+      ARROW_ASSIGN_OR_RAISE(output->buffers[1], ctx->Allocate(length * sizeof(OutValue)));
+      if (options.skip_nulls && input.buffers[0]) {
+        ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(length));
+        ::arrow::internal::CopyBitmap(input.buffers[0]->data(), input.offset, length,
+                                      output->buffers[0]->mutable_data(),
+                                      /*dest_offset=*/0);
+      }
+      // This won't work for nested or variable-sized types
+      std::memcpy(output->buffers[1]->mutable_data(),
+                  input.buffers[1]->data() + (input.offset * sizeof(OutValue)),
+                  length * sizeof(OutValue));
+    }
+
+    for (size_t i = 1; i < batch.values.size(); i++) {
+      OutputArrayWriter<OutType> writer(out->mutable_array());
+      if (batch.values[i].is_scalar()) {

Review comment:
       I think instead it'd be useful to sort the inputs and handle all the scalars first. This would allow you to consider only arrays in this loop




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r638312730



##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -42,6 +42,11 @@ struct ArithmeticOptions : public FunctionOptions {
   bool check_overflow;
 };
 
+struct ARROW_EXPORT MinMaxOptions : public FunctionOptions {

Review comment:
       Do we want to name this something else?

##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -250,6 +255,24 @@ Result<Datum> Power(const Datum& left, const Datum& right,
                     ArithmeticOptions options = ArithmeticOptions(),
                     ExecContext* ctx = NULLPTR);
 
+/// \brief

Review comment:
       I forgot to fill this in, will fix.

##########
File path: cpp/src/arrow/ipc/json_simple.cc
##########
@@ -911,6 +912,26 @@ Status DictArrayFromJSON(const std::shared_ptr<DataType>& type,
       .Value(out);
 }
 
+Status ScalarFromJSON(const std::shared_ptr<DataType>& type,

Review comment:
       I'm going to port this into ARROW-12859.

##########
File path: cpp/src/arrow/testing/gtest_util.cc
##########
@@ -360,6 +360,30 @@ void AssertDatumsEqual(const Datum& expected, const Datum& actual, bool verbose)
       break;
   }
 }
+ARROW_TESTING_EXPORT void AssertDatumsApproxEqual(const Datum& expected,

Review comment:
       I'm going to port this to ARROW-12859 (and fix the unimplemented case).

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -428,6 +454,45 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId get_id) {
   }
 }
 
+template <template <typename... Args> class KernelGenerator, typename Op>

Review comment:
       This is rather a lot of code being generated; it might be nice if I could figure out the right template setup to consolidate the temporal types with their respective equivalent integral type.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641613706



##########
File path: docs/source/cpp/compute.rst
##########
@@ -310,6 +310,20 @@ output element is null.
 | less, less_equal         |            |                                             |                     |
 +--------------------------+------------+---------------------------------------------+---------------------+
 
+These functions take any number of inputs of numeric type (in which case they
+will be cast to the :ref:`common numeric type <common-numeric-type>` before
+comparison) or of temporal types. If any input is dictionary encoded it will be
+expanded for the purposes of comparison.
+
++--------------------------+------------+---------------------------------------------+---------------------+---------------------------------------+-------+
+| Function names           | Arity      | Input types                                 | Output type         | Options class                         | Notes |
++==========================+============+=============================================+=====================+=======================================+=======+
+| maximum, minimum         | Varargs    | Numeric and Temporal                        | Numeric or Temporal | :struct:`ElementWiseAggregateOptions` | \(1)  |

Review comment:
       I agree that using `least` and `greatest` would not fully resolve the ambiguity (except to users familiar with SQL) but the use of these synonyms would at least signal that these are something different from `min` and `max`. I think `element_wise_min` and ` element_wise_max` are fine and do a better job of disambiguating as long as we're OK with their length.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641602611



##########
File path: docs/source/cpp/compute.rst
##########
@@ -310,6 +310,20 @@ output element is null.
 | less, less_equal         |            |                                             |                     |
 +--------------------------+------------+---------------------------------------------+---------------------+
 
+These functions take any number of inputs of numeric type (in which case they
+will be cast to the :ref:`common numeric type <common-numeric-type>` before
+comparison) or of temporal types. If any input is dictionary encoded it will be
+expanded for the purposes of comparison.
+
++--------------------------+------------+---------------------------------------------+---------------------+---------------------------------------+-------+
+| Function names           | Arity      | Input types                                 | Output type         | Options class                         | Notes |
++==========================+============+=============================================+=====================+=======================================+=======+
+| maximum, minimum         | Varargs    | Numeric and Temporal                        | Numeric or Temporal | :struct:`ElementWiseAggregateOptions` | \(1)  |

Review comment:
       How about naming them `least` and `greatest` like they do in SQL?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641584889



##########
File path: docs/source/cpp/compute.rst
##########
@@ -310,6 +310,20 @@ output element is null.
 | less, less_equal         |            |                                             |                     |
 +--------------------------+------------+---------------------------------------------+---------------------+
 
+These functions take any number of inputs of numeric type (in which case they
+will be cast to the :ref:`common numeric type <common-numeric-type>` before
+comparison) or of temporal types. If any input is dictionary encoded it will be
+expanded for the purposes of comparison.
+
++--------------------------+------------+---------------------------------------------+---------------------+---------------------------------------+-------+
+| Function names           | Arity      | Input types                                 | Output type         | Options class                         | Notes |
++==========================+============+=============================================+=====================+=======================================+=======+
+| maximum, minimum         | Varargs    | Numeric and Temporal                        | Numeric or Temporal | :struct:`ElementWiseAggregateOptions` | \(1)  |

Review comment:
       That's wordy indeed. Or at least `element_wise_min`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641609490



##########
File path: docs/source/cpp/compute.rst
##########
@@ -310,6 +310,20 @@ output element is null.
 | less, less_equal         |            |                                             |                     |
 +--------------------------+------------+---------------------------------------------+---------------------+
 
+These functions take any number of inputs of numeric type (in which case they
+will be cast to the :ref:`common numeric type <common-numeric-type>` before
+comparison) or of temporal types. If any input is dictionary encoded it will be
+expanded for the purposes of comparison.
+
++--------------------------+------------+---------------------------------------------+---------------------+---------------------------------------+-------+
+| Function names           | Arity      | Input types                                 | Output type         | Options class                         | Notes |
++==========================+============+=============================================+=====================+=======================================+=======+
+| maximum, minimum         | Varargs    | Numeric and Temporal                        | Numeric or Temporal | :struct:`ElementWiseAggregateOptions` | \(1)  |

Review comment:
       I just changed it to element_wise_min/max. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook edited a comment on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook edited a comment on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-854117875


   @lidavidm could you briefly describe what happens here if the arrays passed to these kernels have different but comparable data types such as one array of doubles and one array of integers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r645061099



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       The antiextremes of float are Inf, -Inf

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       Huh, I was expecting `fmin(NaN, x) = NaN` but in fact it's `fmin(NaN, x) = x`. Nevermind




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] cyb70289 commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

cyb70289 commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r638487418



##########
File path: cpp/src/arrow/compute/kernel.h
##########
@@ -52,7 +52,7 @@ struct ARROW_EXPORT KernelState {
 /// \brief Context/state for the execution of a particular kernel.
 class ARROW_EXPORT KernelContext {
  public:
-  explicit KernelContext(ExecContext* exec_ctx) : exec_ctx_(exec_ctx) {}
+  explicit KernelContext(ExecContext* exec_ctx) : exec_ctx_(exec_ctx), state_() {}

Review comment:
       This change is already in current code. Needs rebase?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r639070715



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -428,6 +454,45 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId get_id) {
   }
 }
 
+template <template <typename... Args> class KernelGenerator, typename Op>

Review comment:
       I added this. Note that if we want to have `CommonTimestamp()` I don't think this'll fly since we'd need to scale the timestamp values to the common unit before comparison. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r639128759



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -428,6 +454,45 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId get_id) {
   }
 }
 
+template <template <typename... Args> class KernelGenerator, typename Op>

Review comment:
       No, CommonTimestamp indicates that an implicit cast is necessary (which in the case of timestamps includes the scaling). It's not the responsibility of the kernel to execute that implicit cast




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r645062286



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       Inf would trip up if we had something like `element_wise_min([NaN, 0], [NaN, 1])` in which case we'd want to emit `NaN` and not `Inf` as the first element. Using `Inf` as the antiextreme means we'd emit `Inf` since the kernel takes any non-NaN value over any NaN.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-854119096


   > @lidavidm could you briefly describe what happens here if the arrays passed to these kernels have different but comparable data types such as one array of doubles and one array of integers?
   
   For numeric/temporal types, all arrays will be cast to a common compatible type, much the same as with arithmetic kernels (e.g. for doubles + integers, all values will be cast to doubles).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook edited a comment on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook edited a comment on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-848823194


   > Can you explain what the point is of a variadic function when we don't have e.g. variadic addition?
   
   My understanding is that we are aiming for Arrow's compute API to reach parity with the collections of built-in functions available in popular SQL engines. These SQL engines include several variadic functions that combine/merge/aggregate values row-wise. Two such SQL functions are `greatest()` and `least()`. The two kernels implemented in this PR are equivalent to those two SQL functions. Another such SQL function is `concat()` and its variant `concat_ws()`, which we intend to implement in ARROW-12709.
   
   There are two ways we could implement this functionality without implementing variadic functions:
   1. Chain multiple calls to a binary function (as is required for addition of more than two values, as you mention)
   2. Combine the arrays row-wise into a ListArray, then operate on the ListArray with a unary function (ARROW-12739)
   
   If one of those alternative approaches is superior, then perhaps we do not need a variadic function. But my current understanding is that both of these alternative approaches are inferior for reasons of usability and efficiency respectively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r639198243



##########
File path: cpp/src/arrow/compute/kernels/codegen_internal.h
##########
@@ -1093,6 +1097,41 @@ ArrayKernelExec GeneratePhysicalInteger(detail::GetTypeId get_id) {
   }
 }
 
+template <template <typename... Args> class Generator, typename... Args>
+ArrayKernelExec GeneratePhysicalNumeric(detail::GetTypeId get_id) {
+  switch (get_id.id) {
+    case Type::INT8:
+      return Generator<Int8Type, Args...>::Exec;
+    case Type::INT16:
+      return Generator<Int16Type, Args...>::Exec;
+    case Type::INT32:
+    case Type::DATE32:
+    case Type::TIME32:
+      return Generator<Int32Type, Args...>::Exec;
+    case Type::INT64:
+    case Type::DATE64:
+    case Type::TIMESTAMP:
+    case Type::TIME64:
+    case Type::DURATION:
+      return Generator<Int64Type, Args...>::Exec;
+    case Type::UINT8:
+      return Generator<UInt8Type, Args...>::Exec;
+    case Type::UINT16:
+      return Generator<UInt16Type, Args...>::Exec;
+    case Type::UINT32:
+      return Generator<UInt32Type, Args...>::Exec;
+    case Type::UINT64:
+      return Generator<UInt64Type, Args...>::Exec;
+    case Type::FLOAT:
+      return Generator<FloatType, Args...>::Exec;
+    case Type::DOUBLE:
+      return Generator<DoubleType, Args...>::Exec;

Review comment:
       Unfortunately the two generators are slightly different since it's `GeneratePhysicalInteger<Generator, Type0, Args>` which will stamp out `Generator<Type0, Int8Type, Args>...` vs here where it's `GeneratePhysicalNumeric<Generator, Args>`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641568041



##########
File path: docs/source/cpp/compute.rst
##########
@@ -310,6 +310,20 @@ output element is null.
 | less, less_equal         |            |                                             |                     |
 +--------------------------+------------+---------------------------------------------+---------------------+
 
+These functions take any number of inputs of numeric type (in which case they
+will be cast to the :ref:`common numeric type <common-numeric-type>` before
+comparison) or of temporal types. If any input is dictionary encoded it will be
+expanded for the purposes of comparison.
+
++--------------------------+------------+---------------------------------------------+---------------------+---------------------------------------+-------+
+| Function names           | Arity      | Input types                                 | Output type         | Options class                         | Notes |
++==========================+============+=============================================+=====================+=======================================+=======+
+| maximum, minimum         | Varargs    | Numeric and Temporal                        | Numeric or Temporal | :struct:`ElementWiseAggregateOptions` | \(1)  |

Review comment:
       That's a good point. I'll rename the compute function. Perhaps `elementwise_min` in accordance with the options struct (though that is a bit wordy)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz closed pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz closed pull request #10390:
URL: https://github.com/apache/arrow/pull/10390


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r639129786



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -428,6 +454,45 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId get_id) {
   }
 }
 
+template <template <typename... Args> class KernelGenerator, typename Op>

Review comment:
       Ah! Ok, my mistake. I'll add some test cases for that.
   
   Binary types aren't implemented here, though (and would need some work/restructuring to handle).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r639136875



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +562,202 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    bool all_scalar = true;
+    bool any_scalar = false;
+    size_t first_array_index = batch.values.size();
+    for (size_t i = 0; i < batch.values.size(); i++) {
+      const auto& datum = batch.values[i];
+      all_scalar &= datum.descr().shape == ValueDescr::SCALAR;
+      any_scalar |= datum.descr().shape == ValueDescr::SCALAR;
+      if (first_array_index >= batch.values.size() &&
+          datum.descr().shape == ValueDescr::ARRAY) {
+        first_array_index = i;
+      }
+    }
+    if (all_scalar) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // Exactly one array (output = input)
+    if (batch.values.size() == 1) {
+      *output = *batch[0].array();
+      return Status::OK();
+    }
+
+    // At least one array, two or more arguments
+    DCHECK_GE(first_array_index, 0);
+    DCHECK_LT(first_array_index, batch.values.size());
+    DCHECK(batch.values[first_array_index].is_array());
+    if (any_scalar) {

Review comment:
       Handling scalars should happen before the output == input bail. If the folded scalar is null then we can either ignore it (skip_nulls=true) or emit a null array without examining any array inputs (skip_nulls=false). After that we're guaranteed to have a valid scalar (for which MakeArrayFromScalar will not materialize a validity bitmap, so you can drop the BitmapXor etc)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r643232604



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       This works quite well except in the case that we have a singular NaN array argument (`element_wise_min([NaN])`) - in which case we want to emit `NaN` not the antiextreme.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook edited a comment on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook edited a comment on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-854117875


   @lidavidm could you briefly describe what happens here if the arrays passed to these kernels have different but comparable data types such as one array of doubles and one array of integers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r645062286



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       Inf would trip up if we had something like `element_wise_min([NaN, 0], [NaN, 1])` in which case we'd want to emit `NaN` and not `Inf` as the first element. Using `Inf` as the antiextreme means we'd emit `Inf` since the kernel takes any non-NaN value over any NaN.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-848829298


   I see. Well, if efficiency is really important and it's common to have more than 2 inputs, then why not (though the implementation in this PR is probably far from optimal).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641560736



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +565,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // At least one array, two or more arguments
+    ArrayDataVector arrays;
+    for (const auto& arg : batch.values) {
+      if (!arg.is_array()) continue;
+      arrays.push_back(arg.array());
+    }
+
+    if (scalar_count > 0) {
+      ARROW_ASSIGN_OR_RAISE(std::shared_ptr<Scalar> temp_scalar,
+                            MakeScalar(out->type(), 0));
+      ExecScalar(batch, options, temp_scalar.get());
+      if (temp_scalar->is_valid) {
+        // Promote to output array
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        arrays.push_back(array->data());
+      } else if (!options.skip_nulls) {
+        // Abort early
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        *output = *array->data();
+        return Status::OK();
+      }
+    }
+
+    // Exactly one array to consider (output = input)
+    if (arrays.size() == 1) {
+      *output = *arrays[0];
+      return Status::OK();
+    }
+
+    // Two or more arrays to consider
+    if (scalar_count > 0) {
+      // We allocated the last array from a scalar: recycle it as the output

Review comment:
       Good catch, thanks.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc
##########
@@ -1161,5 +1255,242 @@ TYPED_TEST(TestUnaryArithmeticFloating, AbsoluteValue) {
   }
 }
 
+TYPED_TEST(TestVarArgsArithmeticNumeric, Minimum) {
+  this->AssertNullScalar(Minimum, {});
+  this->AssertNullScalar(Minimum, {this->scalar("null"), this->scalar("null")});
+
+  this->Assert(Minimum, this->scalar("0"), {this->scalar("0")});
+  this->Assert(Minimum, this->scalar("0"),
+               {this->scalar("2"), this->scalar("0"), this->scalar("1")});
+  this->Assert(
+      Minimum, this->scalar("0"),
+      {this->scalar("2"), this->scalar("0"), this->scalar("1"), this->scalar("null")});
+  this->Assert(Minimum, this->scalar("1"),
+               {this->scalar("null"), this->scalar("null"), this->scalar("1"),
+                this->scalar("null")});
+
+  this->Assert(Minimum, (this->array("[]")), {this->array("[]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, null]"), {this->array("[1, 2, 3, null]")});
+
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2"), this->scalar("4")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("null"), this->scalar("2")});
+
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, 2, 2, 2]")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, null, 2, 2]")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->array("[2, 2, 2, 2]")});
+
+  this->Assert(Minimum, this->array("[1, 2, null, 6]"),
+               {this->array("[1, 2, null, null]"), this->array("[4, null, null, 6]")});
+  this->Assert(Minimum, this->array("[1, 2, null, 6]"),
+               {this->array("[4, null, null, 6]"), this->array("[1, 2, null, null]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, 4]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, 4]"),
+               {this->array("[null, null, null, null]"), this->array("[1, 2, 3, 4]")});
+
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("1"), this->array("[1, 2, 3, 4]")});
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("1"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("null"), this->array("[1, 1, 1, 1]")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("null"), this->array("[null, null, null, null]")});
+
+  // Test null handling
+  this->element_wise_aggregate_options_.skip_nulls = false;
+  this->AssertNullScalar(Minimum, {this->scalar("null"), this->scalar("null")});
+  this->AssertNullScalar(Minimum, {this->scalar("0"), this->scalar("null")});
+
+  this->Assert(Minimum, this->array("[1, null, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2"), this->scalar("4")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("null"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, null, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, null, 2, 2]")});
+
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("1"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("null"), this->array("[1, 1, 1, 1]")});
+}
+
+TYPED_TEST(TestVarArgsArithmeticFloating, Minimum) {
+  this->SetNansEqual();
+  this->Assert(Maximum, this->scalar("0"), {this->scalar("0"), this->scalar("NaN")});
+  this->Assert(Maximum, this->scalar("0"), {this->scalar("NaN"), this->scalar("0")});
+  this->Assert(Maximum, this->scalar("Inf"), {this->scalar("Inf"), this->scalar("NaN")});
+  this->Assert(Maximum, this->scalar("Inf"), {this->scalar("NaN"), this->scalar("Inf")});
+  this->Assert(Maximum, this->scalar("-Inf"),
+               {this->scalar("-Inf"), this->scalar("NaN")});
+  this->Assert(Maximum, this->scalar("-Inf"),
+               {this->scalar("NaN"), this->scalar("-Inf")});
+  this->Assert(Maximum, this->scalar("NaN"), {this->scalar("NaN"), this->scalar("null")});
+  this->Assert(Minimum, this->scalar("0"), {this->scalar("0"), this->scalar("Inf")});
+  this->Assert(Minimum, this->scalar("-Inf"), {this->scalar("0"), this->scalar("-Inf")});

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r638933803



##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -42,6 +42,11 @@ struct ArithmeticOptions : public FunctionOptions {
   bool check_overflow;
 };
 
+struct ARROW_EXPORT MinMaxOptions : public FunctionOptions {

Review comment:
       If using "Aggregate" is too confusing, maybe "Combine" or "Merge" would work too. e.g. `ElementWiseCombineOptions`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r645068404



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       Huh, I was expecting `fmin(NaN, x) = NaN` but in fact it's `fmin(NaN, x) = x`. Nevermind




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook commented on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook commented on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-854117875


   @lidavidm could you briefly describe what happens here if the arrays passed to these kernels have different but comparable, data types such as one array of doubles and one array of integers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r643233180



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       Ah, maybe let's just have the 'antiextreme' of a float be NaN. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-847369971


   https://issues.apache.org/jira/browse/ARROW-12751


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r638925147



##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -42,6 +42,11 @@ struct ArithmeticOptions : public FunctionOptions {
   bool check_overflow;
 };
 
+struct ARROW_EXPORT MinMaxOptions : public FunctionOptions {

Review comment:
       I think it is worth giving this a more general name, because there are other potential uses. Maybe `ElementWiseAggregateOptions`? (since this is aggregation, just across instead of down)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641666075



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -721,7 +720,7 @@ struct ScalarMinMax {
         }
       }
     }
-    output->null_count = -1;
+    output->null_count = output->buffers[0] ? -1 : 0;

Review comment:
       ```suggestion
       output->null_count = output->buffers[0] ? kUnknownNullCount : 0;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r643154430



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       Instead of trying to recycle buffers, we *could* change the kernel's flag and allow the executor to preallocate the output data buffer. Then we populate this with the anti extreme (for maximum/uint64_t we zero the buffer). Then we wouldn't need to use a bitmap to express "no values folded yet"; `get_extreme({anti_extreme, v...}) == get_extreme({v...})`. We'd also be able to compute the null bitmap separately in either case: `!skip_nulls` AND the bitmaps, `skip_nulls` OR the bitmaps.
   
   In the case where `scalar_count != 0`, we can ExecScalar and use the result instead of an anti_extreme




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook edited a comment on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook edited a comment on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-848823194


   > Can you explain what the point is of a variadic function when we don't have e.g. variadic addition?
   
   My understanding is that we are aiming for Arrow's compute API to reach parity with the collections of built-in functions available in popular SQL engines. These SQL engines include several variadic functions that combine/merge/aggregate values row-wise. Two such SQL functions are `greatest()` and `least()`. The two kernels implemented in this PR are equivalent to those two SQL functions. Another such SQL function is `concat()` and its variant `concat_ws()`, which we intend to implement in ARROW-12709.
   
   There are two ways we could implement this functionality without implementing variadic functions:
   1. Chain multiple calls to a binary function (as is required for addition of more than two values, as you mention)
   2. Combine the arrays row-wise into a ListArray (ARROW-12739) then operate on the ListArray with a unary function
   
   If one of those alternative approaches is superior, then perhaps we do not need a variadic function. But my current understanding is that both of these alternative approaches are inferior for reasons of usability and efficiency respectively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz closed pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz closed pull request #10390:
URL: https://github.com/apache/arrow/pull/10390


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r639703591



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +565,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // At least one array, two or more arguments
+    ArrayDataVector arrays;
+    for (const auto& arg : batch.values) {
+      if (!arg.is_array()) continue;
+      arrays.push_back(arg.array());
+    }
+
+    if (scalar_count > 0) {
+      ARROW_ASSIGN_OR_RAISE(std::shared_ptr<Scalar> temp_scalar,
+                            MakeScalar(out->type(), 0));
+      ExecScalar(batch, options, temp_scalar.get());
+      if (temp_scalar->is_valid) {
+        // Promote to output array
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        arrays.push_back(array->data());
+      } else if (!options.skip_nulls) {
+        // Abort early
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        *output = *array->data();
+        return Status::OK();
+      }
+    }
+
+    // Exactly one array to consider (output = input)
+    if (arrays.size() == 1) {
+      *output = *arrays[0];
+      return Status::OK();
+    }
+
+    // Two or more arrays to consider
+    if (scalar_count > 0) {
+      // We allocated the last array from a scalar: recycle it as the output

Review comment:
       Even if the scalar was null and `options.skip_nulls` is true?

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +565,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // At least one array, two or more arguments
+    ArrayDataVector arrays;
+    for (const auto& arg : batch.values) {
+      if (!arg.is_array()) continue;
+      arrays.push_back(arg.array());
+    }
+
+    if (scalar_count > 0) {
+      ARROW_ASSIGN_OR_RAISE(std::shared_ptr<Scalar> temp_scalar,
+                            MakeScalar(out->type(), 0));
+      ExecScalar(batch, options, temp_scalar.get());
+      if (temp_scalar->is_valid) {
+        // Promote to output array
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        arrays.push_back(array->data());
+      } else if (!options.skip_nulls) {
+        // Abort early
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        *output = *array->data();
+        return Status::OK();
+      }
+    }
+
+    // Exactly one array to consider (output = input)
+    if (arrays.size() == 1) {
+      *output = *arrays[0];
+      return Status::OK();
+    }
+
+    // Two or more arrays to consider
+    if (scalar_count > 0) {
+      // We allocated the last array from a scalar: recycle it as the output
+      *output = *arrays.back();
+    } else {
+      // Copy last array argument to output array
+      const ArrayData& input = *arrays.back();
+      if (options.skip_nulls && input.buffers[0]) {
+        // Don't copy the bitmap if !options.skip_nulls since we'll precompute it later
+        ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+        ::arrow::internal::CopyBitmap(input.buffers[0]->data(), input.offset,
+                                      batch.length, output->buffers[0]->mutable_data(),
+                                      /*dest_offset=*/0);
+      }
+      // This won't work for nested or variable-sized types
+      ARROW_ASSIGN_OR_RAISE(output->buffers[1],
+                            ctx->Allocate(batch.length * sizeof(OutValue)));
+      std::memcpy(output->buffers[1]->mutable_data(),
+                  input.buffers[1]->data() + (input.offset * sizeof(OutValue)),
+                  batch.length * sizeof(OutValue));
+    }
+
+    if (!options.skip_nulls) {
+      // We can precompute the validity buffer in this case
+      // AND together the validity buffers of all arrays
+      ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));

Review comment:
       This will allocate a null bitmap even if all input arrays have 0 nulls.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc
##########
@@ -312,6 +313,89 @@ class TestBinaryArithmeticUnsigned : public TestBinaryArithmeticIntegral<T> {};
 template <typename T>
 class TestBinaryArithmeticFloating : public TestBinaryArithmetic<T> {};
 
+template <typename T>
+class TestVarArgsArithmetic : public TestBase {
+ protected:
+  static std::shared_ptr<DataType> type_singleton() {
+    return TypeTraits<T>::type_singleton();
+  }
+
+  using VarArgsFunction = std::function<Result<Datum>(
+      const std::vector<Datum>&, ElementWiseAggregateOptions, ExecContext*)>;
+
+  Datum scalar(const std::string& value) {
+    return ScalarFromJSON(type_singleton(), value);
+  }
+
+  Datum array(const std::string& value) { return ArrayFromJSON(type_singleton(), value); }
+
+  Datum Eval(VarArgsFunction func, const std::vector<Datum>& args) {
+    EXPECT_OK_AND_ASSIGN(auto actual,
+                         func(args, element_wise_aggregate_options_, nullptr));
+    if (actual.is_array()) {
+      auto arr = actual.make_array();
+      ARROW_EXPECT_OK(arr->ValidateFull());
+    }
+    return actual;
+  }
+
+  void AssertNullScalar(VarArgsFunction func, const std::vector<Datum>& args) {
+    auto datum = this->Eval(func, args);
+    ASSERT_TRUE(datum.is_scalar());
+    ASSERT_FALSE(datum.scalar()->is_valid);
+  }
+
+  void Assert(VarArgsFunction func, Datum expected, const std::vector<Datum>& args) {
+    std::stringstream ss;
+    ss << "Inputs:";
+    for (const auto& arg : args) {
+      ss << ' ';
+      if (arg.is_scalar())
+        ss << arg.scalar()->ToString();
+      else if (arg.is_array())
+        ss << arg.make_array()->ToString();

Review comment:
       Uh. I don't think generating string representations for each and every test is a good idea.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -310,6 +310,20 @@ output element is null.
 | less, less_equal         |            |                                             |                     |
 +--------------------------+------------+---------------------------------------------+---------------------+
 
+These functions take any number of inputs of numeric type (in which case they
+will be cast to the :ref:`common numeric type <common-numeric-type>` before
+comparison) or of temporal types. If any input is dictionary encoded it will be
+expanded for the purposes of comparison.
+
++--------------------------+------------+---------------------------------------------+---------------------+-------+
+| Function names           | Arity      | Input types                                 | Output type         | Notes |

Review comment:
       Should add a column for the options class.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +565,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // At least one array, two or more arguments
+    ArrayDataVector arrays;
+    for (const auto& arg : batch.values) {
+      if (!arg.is_array()) continue;
+      arrays.push_back(arg.array());
+    }
+
+    if (scalar_count > 0) {
+      ARROW_ASSIGN_OR_RAISE(std::shared_ptr<Scalar> temp_scalar,
+                            MakeScalar(out->type(), 0));
+      ExecScalar(batch, options, temp_scalar.get());
+      if (temp_scalar->is_valid) {
+        // Promote to output array
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        arrays.push_back(array->data());
+      } else if (!options.skip_nulls) {
+        // Abort early
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        *output = *array->data();
+        return Status::OK();
+      }
+    }
+
+    // Exactly one array to consider (output = input)
+    if (arrays.size() == 1) {
+      *output = *arrays[0];
+      return Status::OK();
+    }
+
+    // Two or more arrays to consider
+    if (scalar_count > 0) {
+      // We allocated the last array from a scalar: recycle it as the output
+      *output = *arrays.back();
+    } else {
+      // Copy last array argument to output array
+      const ArrayData& input = *arrays.back();
+      if (options.skip_nulls && input.buffers[0]) {
+        // Don't copy the bitmap if !options.skip_nulls since we'll precompute it later
+        ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+        ::arrow::internal::CopyBitmap(input.buffers[0]->data(), input.offset,
+                                      batch.length, output->buffers[0]->mutable_data(),
+                                      /*dest_offset=*/0);
+      }
+      // This won't work for nested or variable-sized types
+      ARROW_ASSIGN_OR_RAISE(output->buffers[1],
+                            ctx->Allocate(batch.length * sizeof(OutValue)));
+      std::memcpy(output->buffers[1]->mutable_data(),
+                  input.buffers[1]->data() + (input.offset * sizeof(OutValue)),
+                  batch.length * sizeof(OutValue));
+    }
+
+    if (!options.skip_nulls) {
+      // We can precompute the validity buffer in this case
+      // AND together the validity buffers of all arrays
+      ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+      bool first = true;
+      for (const auto& arr : arrays) {
+        if (!arr->buffers[0]) continue;
+        if (first) {

Review comment:
       Instead of `if (first)`, you could move the bitmap allocation below and use `if (!output->buffers[0])`.

##########
File path: cpp/src/arrow/ipc/json_simple.cc
##########
@@ -911,6 +912,26 @@ Status DictArrayFromJSON(const std::shared_ptr<DataType>& type,
       .Value(out);
 }
 
+Status ScalarFromJSON(const std::shared_ptr<DataType>& type,
+                      util::string_view json_string, std::shared_ptr<Scalar>* out) {
+  std::shared_ptr<Converter> converter;
+  RETURN_NOT_OK(GetConverter(type, &converter));
+
+  rj::Document json_doc;
+  json_doc.Parse<kParseFlags>(json_string.data(), json_string.length());
+  if (json_doc.HasParseError()) {
+    return Status::Invalid("JSON parse error at offset ", json_doc.GetErrorOffset(), ": ",
+                           GetParseError_En(json_doc.GetParseError()));
+  }
+
+  std::shared_ptr<Array> array;
+  RETURN_NOT_OK(converter->AppendValue(json_doc));
+  RETURN_NOT_OK(converter->Finish(&array));
+  DCHECK_GE(array->length(), 0);

Review comment:
       Shouldn't we ensure it's equal to 1?

##########
File path: cpp/src/arrow/compute/kernels/codegen_internal.h
##########
@@ -1093,6 +1097,41 @@ ArrayKernelExec GeneratePhysicalInteger(detail::GetTypeId get_id) {
   }
 }
 
+template <template <typename... Args> class Generator, typename... Args>
+ArrayKernelExec GeneratePhysicalNumeric(detail::GetTypeId get_id) {
+  switch (get_id.id) {
+    case Type::INT8:
+      return Generator<Int8Type, Args...>::Exec;
+    case Type::INT16:
+      return Generator<Int16Type, Args...>::Exec;
+    case Type::INT32:
+    case Type::DATE32:
+    case Type::TIME32:
+      return Generator<Int32Type, Args...>::Exec;
+    case Type::INT64:
+    case Type::DATE64:
+    case Type::TIMESTAMP:
+    case Type::TIME64:
+    case Type::DURATION:
+      return Generator<Int64Type, Args...>::Exec;
+    case Type::UINT8:
+      return Generator<UInt8Type, Args...>::Exec;
+    case Type::UINT16:
+      return Generator<UInt16Type, Args...>::Exec;
+    case Type::UINT32:
+      return Generator<UInt32Type, Args...>::Exec;
+    case Type::UINT64:
+      return Generator<UInt64Type, Args...>::Exec;
+    case Type::FLOAT:
+      return Generator<FloatType, Args...>::Exec;
+    case Type::DOUBLE:
+      return Generator<DoubleType, Args...>::Exec;

Review comment:
       Perhaps, but the discrepancy seems a bit gratuitous.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +565,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // At least one array, two or more arguments
+    ArrayDataVector arrays;
+    for (const auto& arg : batch.values) {
+      if (!arg.is_array()) continue;
+      arrays.push_back(arg.array());
+    }
+
+    if (scalar_count > 0) {
+      ARROW_ASSIGN_OR_RAISE(std::shared_ptr<Scalar> temp_scalar,
+                            MakeScalar(out->type(), 0));
+      ExecScalar(batch, options, temp_scalar.get());
+      if (temp_scalar->is_valid) {
+        // Promote to output array
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        arrays.push_back(array->data());
+      } else if (!options.skip_nulls) {
+        // Abort early
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        *output = *array->data();
+        return Status::OK();
+      }
+    }
+
+    // Exactly one array to consider (output = input)
+    if (arrays.size() == 1) {
+      *output = *arrays[0];
+      return Status::OK();
+    }
+
+    // Two or more arrays to consider
+    if (scalar_count > 0) {
+      // We allocated the last array from a scalar: recycle it as the output
+      *output = *arrays.back();
+    } else {
+      // Copy last array argument to output array
+      const ArrayData& input = *arrays.back();
+      if (options.skip_nulls && input.buffers[0]) {
+        // Don't copy the bitmap if !options.skip_nulls since we'll precompute it later
+        ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+        ::arrow::internal::CopyBitmap(input.buffers[0]->data(), input.offset,
+                                      batch.length, output->buffers[0]->mutable_data(),
+                                      /*dest_offset=*/0);
+      }
+      // This won't work for nested or variable-sized types
+      ARROW_ASSIGN_OR_RAISE(output->buffers[1],
+                            ctx->Allocate(batch.length * sizeof(OutValue)));
+      std::memcpy(output->buffers[1]->mutable_data(),
+                  input.buffers[1]->data() + (input.offset * sizeof(OutValue)),
+                  batch.length * sizeof(OutValue));
+    }
+
+    if (!options.skip_nulls) {
+      // We can precompute the validity buffer in this case
+      // AND together the validity buffers of all arrays
+      ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+      bool first = true;
+      for (const auto& arr : arrays) {
+        if (!arr->buffers[0]) continue;

Review comment:
       Use `arr->null_count() == 0`?

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc
##########
@@ -1161,5 +1255,242 @@ TYPED_TEST(TestUnaryArithmeticFloating, AbsoluteValue) {
   }
 }
 
+TYPED_TEST(TestVarArgsArithmeticNumeric, Minimum) {
+  this->AssertNullScalar(Minimum, {});
+  this->AssertNullScalar(Minimum, {this->scalar("null"), this->scalar("null")});
+
+  this->Assert(Minimum, this->scalar("0"), {this->scalar("0")});
+  this->Assert(Minimum, this->scalar("0"),
+               {this->scalar("2"), this->scalar("0"), this->scalar("1")});
+  this->Assert(
+      Minimum, this->scalar("0"),
+      {this->scalar("2"), this->scalar("0"), this->scalar("1"), this->scalar("null")});
+  this->Assert(Minimum, this->scalar("1"),
+               {this->scalar("null"), this->scalar("null"), this->scalar("1"),
+                this->scalar("null")});
+
+  this->Assert(Minimum, (this->array("[]")), {this->array("[]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, null]"), {this->array("[1, 2, 3, null]")});
+
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2"), this->scalar("4")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("null"), this->scalar("2")});
+
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, 2, 2, 2]")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, null, 2, 2]")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->array("[2, 2, 2, 2]")});
+
+  this->Assert(Minimum, this->array("[1, 2, null, 6]"),
+               {this->array("[1, 2, null, null]"), this->array("[4, null, null, 6]")});
+  this->Assert(Minimum, this->array("[1, 2, null, 6]"),
+               {this->array("[4, null, null, 6]"), this->array("[1, 2, null, null]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, 4]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, 4]"),
+               {this->array("[null, null, null, null]"), this->array("[1, 2, 3, 4]")});
+
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("1"), this->array("[1, 2, 3, 4]")});
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("1"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("null"), this->array("[1, 1, 1, 1]")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("null"), this->array("[null, null, null, null]")});
+
+  // Test null handling
+  this->element_wise_aggregate_options_.skip_nulls = false;
+  this->AssertNullScalar(Minimum, {this->scalar("null"), this->scalar("null")});
+  this->AssertNullScalar(Minimum, {this->scalar("0"), this->scalar("null")});
+
+  this->Assert(Minimum, this->array("[1, null, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2"), this->scalar("4")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("null"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, null, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, null, 2, 2]")});
+
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("1"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("null"), this->array("[1, 1, 1, 1]")});
+}
+
+TYPED_TEST(TestVarArgsArithmeticFloating, Minimum) {
+  this->SetNansEqual();

Review comment:
       Why isn't this done by default in the test setup?

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +565,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // At least one array, two or more arguments
+    ArrayDataVector arrays;
+    for (const auto& arg : batch.values) {
+      if (!arg.is_array()) continue;
+      arrays.push_back(arg.array());
+    }
+
+    if (scalar_count > 0) {
+      ARROW_ASSIGN_OR_RAISE(std::shared_ptr<Scalar> temp_scalar,
+                            MakeScalar(out->type(), 0));
+      ExecScalar(batch, options, temp_scalar.get());
+      if (temp_scalar->is_valid) {
+        // Promote to output array
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        arrays.push_back(array->data());
+      } else if (!options.skip_nulls) {
+        // Abort early
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        *output = *array->data();
+        return Status::OK();
+      }
+    }
+
+    // Exactly one array to consider (output = input)
+    if (arrays.size() == 1) {
+      *output = *arrays[0];
+      return Status::OK();
+    }
+
+    // Two or more arrays to consider
+    if (scalar_count > 0) {
+      // We allocated the last array from a scalar: recycle it as the output
+      *output = *arrays.back();
+    } else {
+      // Copy last array argument to output array
+      const ArrayData& input = *arrays.back();
+      if (options.skip_nulls && input.buffers[0]) {
+        // Don't copy the bitmap if !options.skip_nulls since we'll precompute it later
+        ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+        ::arrow::internal::CopyBitmap(input.buffers[0]->data(), input.offset,
+                                      batch.length, output->buffers[0]->mutable_data(),
+                                      /*dest_offset=*/0);
+      }
+      // This won't work for nested or variable-sized types
+      ARROW_ASSIGN_OR_RAISE(output->buffers[1],
+                            ctx->Allocate(batch.length * sizeof(OutValue)));
+      std::memcpy(output->buffers[1]->mutable_data(),
+                  input.buffers[1]->data() + (input.offset * sizeof(OutValue)),
+                  batch.length * sizeof(OutValue));
+    }
+
+    if (!options.skip_nulls) {
+      // We can precompute the validity buffer in this case
+      // AND together the validity buffers of all arrays
+      ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+      bool first = true;
+      for (const auto& arr : arrays) {
+        if (!arr->buffers[0]) continue;
+        if (first) {
+          ::arrow::internal::CopyBitmap(arr->buffers[0]->data(), arr->offset,
+                                        batch.length, output->buffers[0]->mutable_data(),
+                                        /*dest_offset=*/0);
+          first = false;
+        } else {
+          ::arrow::internal::BitmapAnd(output->buffers[0]->data(), /*left_offset=*/0,
+                                       arr->buffers[0]->data(), arr->offset, batch.length,
+                                       /*out_offset=*/0,
+                                       output->buffers[0]->mutable_data());
+        }
+      }
+    }
+    arrays.pop_back();
+
+    for (const auto& array : arrays) {
+      OutputArrayWriter<OutType> writer(out->mutable_array());
+      ArrayIterator<OutType> out_it(*output);
+      int64_t index = 0;
+      VisitArrayValuesInline<OutType>(
+          *array,
+          [&](OutValue value) {
+            auto u = out_it();
+            if (!output->buffers[0] ||
+                BitUtil::GetBit(output->buffers[0]->data(), index)) {
+              writer.Write(Op::Call(u, value));
+            } else {
+              writer.Write(value);
+            }
+            index++;
+          },
+          [&]() {
+            // RHS is null, preserve the LHS
+            writer.values++;
+            index++;
+            out_it();
+          });
+      // When skipping nulls, we incrementally compute the validity buffer
+      if (options.skip_nulls && output->buffers[0]) {
+        if (array->buffers[0]) {
+          ::arrow::internal::BitmapOr(
+              output->buffers[0]->data(), /*left_offset=*/0, array->buffers[0]->data(),
+              /*right_offset=*/array->offset, batch.length, /*out_offset=*/0,
+              output->buffers[0]->mutable_data());
+        } else {
+          output->buffers[0] = nullptr;
+        }
+      }
+    }
+    output->null_count = -1;

Review comment:
       Even if there is no validity bitmap?

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc
##########
@@ -1161,5 +1255,242 @@ TYPED_TEST(TestUnaryArithmeticFloating, AbsoluteValue) {
   }
 }
 
+TYPED_TEST(TestVarArgsArithmeticNumeric, Minimum) {
+  this->AssertNullScalar(Minimum, {});
+  this->AssertNullScalar(Minimum, {this->scalar("null"), this->scalar("null")});
+
+  this->Assert(Minimum, this->scalar("0"), {this->scalar("0")});
+  this->Assert(Minimum, this->scalar("0"),
+               {this->scalar("2"), this->scalar("0"), this->scalar("1")});
+  this->Assert(
+      Minimum, this->scalar("0"),
+      {this->scalar("2"), this->scalar("0"), this->scalar("1"), this->scalar("null")});
+  this->Assert(Minimum, this->scalar("1"),
+               {this->scalar("null"), this->scalar("null"), this->scalar("1"),
+                this->scalar("null")});
+
+  this->Assert(Minimum, (this->array("[]")), {this->array("[]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, null]"), {this->array("[1, 2, 3, null]")});
+
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2"), this->scalar("4")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("null"), this->scalar("2")});
+
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, 2, 2, 2]")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, null, 2, 2]")});
+  this->Assert(Minimum, this->array("[1, 2, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->array("[2, 2, 2, 2]")});
+
+  this->Assert(Minimum, this->array("[1, 2, null, 6]"),
+               {this->array("[1, 2, null, null]"), this->array("[4, null, null, 6]")});
+  this->Assert(Minimum, this->array("[1, 2, null, 6]"),
+               {this->array("[4, null, null, 6]"), this->array("[1, 2, null, null]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, 4]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[1, 2, 3, 4]"),
+               {this->array("[null, null, null, null]"), this->array("[1, 2, 3, 4]")});
+
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("1"), this->array("[1, 2, 3, 4]")});
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("1"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[1, 1, 1, 1]"),
+               {this->scalar("null"), this->array("[1, 1, 1, 1]")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("null"), this->array("[null, null, null, null]")});
+
+  // Test null handling
+  this->element_wise_aggregate_options_.skip_nulls = false;
+  this->AssertNullScalar(Minimum, {this->scalar("null"), this->scalar("null")});
+  this->AssertNullScalar(Minimum, {this->scalar("0"), this->scalar("null")});
+
+  this->Assert(Minimum, this->array("[1, null, 2, 2]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("2"), this->scalar("4")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->array("[1, null, 3, 4]"), this->scalar("null"), this->scalar("2")});
+  this->Assert(Minimum, this->array("[1, null, 2, 2]"),
+               {this->array("[1, 2, 3, 4]"), this->array("[2, null, 2, 2]")});
+
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("1"), this->array("[null, null, null, null]")});
+  this->Assert(Minimum, this->array("[null, null, null, null]"),
+               {this->scalar("null"), this->array("[1, 1, 1, 1]")});
+}
+
+TYPED_TEST(TestVarArgsArithmeticFloating, Minimum) {
+  this->SetNansEqual();
+  this->Assert(Maximum, this->scalar("0"), {this->scalar("0"), this->scalar("NaN")});
+  this->Assert(Maximum, this->scalar("0"), {this->scalar("NaN"), this->scalar("0")});
+  this->Assert(Maximum, this->scalar("Inf"), {this->scalar("Inf"), this->scalar("NaN")});
+  this->Assert(Maximum, this->scalar("Inf"), {this->scalar("NaN"), this->scalar("Inf")});
+  this->Assert(Maximum, this->scalar("-Inf"),
+               {this->scalar("-Inf"), this->scalar("NaN")});
+  this->Assert(Maximum, this->scalar("-Inf"),
+               {this->scalar("NaN"), this->scalar("-Inf")});
+  this->Assert(Maximum, this->scalar("NaN"), {this->scalar("NaN"), this->scalar("null")});
+  this->Assert(Minimum, this->scalar("0"), {this->scalar("0"), this->scalar("Inf")});
+  this->Assert(Minimum, this->scalar("-Inf"), {this->scalar("0"), this->scalar("-Inf")});

Review comment:
       Can you add some test with `-0.0` and `0.0`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r638935508



##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -42,6 +42,11 @@ struct ArithmeticOptions : public FunctionOptions {
   bool check_overflow;
 };
 
+struct ARROW_EXPORT MinMaxOptions : public FunctionOptions {

Review comment:
       Since we have `ScalarAggregateOptions` I think `ElementWiseAggregateOptions` should be ok. I've updated the PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r638942900



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -428,6 +454,45 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId get_id) {
   }
 }
 
+template <template <typename... Args> class KernelGenerator, typename Op>

Review comment:
       There's GeneratePhysicalInteger, though that doesn't include floating point. Perhaps you could add GeneratePhysicalNumeric?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r639130532



##########
File path: cpp/src/arrow/compute/kernels/codegen_internal.h
##########
@@ -1093,6 +1097,41 @@ ArrayKernelExec GeneratePhysicalInteger(detail::GetTypeId get_id) {
   }
 }
 
+template <template <typename... Args> class Generator, typename... Args>
+ArrayKernelExec GeneratePhysicalNumeric(detail::GetTypeId get_id) {
+  switch (get_id.id) {
+    case Type::INT8:
+      return Generator<Int8Type, Args...>::Exec;
+    case Type::INT16:
+      return Generator<Int16Type, Args...>::Exec;
+    case Type::INT32:
+    case Type::DATE32:
+    case Type::TIME32:
+      return Generator<Int32Type, Args...>::Exec;
+    case Type::INT64:
+    case Type::DATE64:
+    case Type::TIMESTAMP:
+    case Type::TIME64:
+    case Type::DURATION:
+      return Generator<Int64Type, Args...>::Exec;
+    case Type::UINT8:
+      return Generator<UInt8Type, Args...>::Exec;
+    case Type::UINT16:
+      return Generator<UInt16Type, Args...>::Exec;
+    case Type::UINT32:
+      return Generator<UInt32Type, Args...>::Exec;
+    case Type::UINT64:
+      return Generator<UInt64Type, Args...>::Exec;
+    case Type::FLOAT:
+      return Generator<FloatType, Args...>::Exec;
+    case Type::DOUBLE:
+      return Generator<DoubleType, Args...>::Exec;

Review comment:
       ```suggestion
       case Type::FLOAT:
         return Generator<FloatType, Args...>::Exec;
       case Type::DOUBLE:
         return Generator<DoubleType, Args...>::Exec;
       default:
         return GeneratePhysicalInteger<Generator, Args...>(get_id);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641605729



##########
File path: docs/source/cpp/compute.rst
##########
@@ -310,6 +310,20 @@ output element is null.
 | less, less_equal         |            |                                             |                     |
 +--------------------------+------------+---------------------------------------------+---------------------+
 
+These functions take any number of inputs of numeric type (in which case they
+will be cast to the :ref:`common numeric type <common-numeric-type>` before
+comparison) or of temporal types. If any input is dictionary encoded it will be
+expanded for the purposes of comparison.
+
++--------------------------+------------+---------------------------------------------+---------------------+---------------------------------------+-------+
+| Function names           | Arity      | Input types                                 | Output type         | Options class                         | Notes |
++==========================+============+=============================================+=====================+=======================================+=======+
+| maximum, minimum         | Varargs    | Numeric and Temporal                        | Numeric or Temporal | :struct:`ElementWiseAggregateOptions` | \(1)  |

Review comment:
       How is that less ambiguous?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r638316027



##########
File path: cpp/src/arrow/ipc/json_simple.h
##########
@@ -30,6 +30,7 @@ namespace arrow {
 
 class Array;
 class DataType;
+class Scalar;

Review comment:
       This is the wrong declaration (should be `struct Scalar`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r645061099



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +564,190 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    const size_t scalar_count =
+        static_cast<size_t>(std::count_if(batch.values.begin(), batch.values.end(),
+                                          [](const Datum& d) { return d.is_scalar(); }));
+    if (scalar_count == batch.values.size()) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();

Review comment:
       The antiextremes of float are Inf, -Inf




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

pitrou commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r641563788



##########
File path: docs/source/cpp/compute.rst
##########
@@ -310,6 +310,20 @@ output element is null.
 | less, less_equal         |            |                                             |                     |
 +--------------------------+------------+---------------------------------------------+---------------------+
 
+These functions take any number of inputs of numeric type (in which case they
+will be cast to the :ref:`common numeric type <common-numeric-type>` before
+comparison) or of temporal types. If any input is dictionary encoded it will be
+expanded for the purposes of comparison.
+
++--------------------------+------------+---------------------------------------------+---------------------+---------------------------------------+-------+
+| Function names           | Arity      | Input types                                 | Output type         | Options class                         | Notes |
++==========================+============+=============================================+=====================+=======================================+=======+
+| maximum, minimum         | Varargs    | Numeric and Temporal                        | Numeric or Temporal | :struct:`ElementWiseAggregateOptions` | \(1)  |

Review comment:
       It just came to my mind that it's gonna be very confusing to have "minimum" and "maximum" functions in addition to "min_max". Perhaps we can be less ambiguous, e.g. "scalar_min", "scalar_max" (or "horizontal_min" or "row_min"...).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ianmcook commented on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

ianmcook commented on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-848823194


   > Can you explain what the point is of a variadic function when we don't have e.g. variadic addition?
   
   My understanding is that we are aiming for Arrow's compute API to reach parity with the collections of built-in functions available in popular SQL engines. These SQL engines include several variadic functions that combine/merge/aggregate values row-wise. Two such SQL functions are `greatest()` and `least()`. The two kernels implemented in this PR are equivalent to those two SQL functions. Another such SQL function is `concat()` and its variant `concat_ws()`, which we intend to implement in ARROW-12709.
   
   There are two ways we could implement this functionality without implementing variadic functions:
   1. Chain multiple calls to a binary function (as is required for addition of more than two values, as you mention)
   2. Combine the arrays row-wise into a ListArray, then operate on the ListArray with a unary function
   
   If one of those alternative approaches is superior, then perhaps we do not need a variadic function. But my current understanding is that both of these alternative approaches are inferior for reasons of usability and efficiency respectively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] bkietz commented on a change in pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

bkietz commented on a change in pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#discussion_r639140420



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -516,6 +562,202 @@ std::shared_ptr<ScalarFunction> MakeUnarySignedArithmeticFunctionNotNull(
   return func;
 }
 
+using MinMaxState = OptionsWrapper<ElementWiseAggregateOptions>;
+
+// Implement a variadic scalar min/max kernel.
+template <typename OutType, typename Op>
+struct ScalarMinMax {
+  using OutValue = typename GetOutputType<OutType>::T;
+
+  static void ExecScalar(const ExecBatch& batch,
+                         const ElementWiseAggregateOptions& options, Scalar* out) {
+    // All arguments are scalar
+    OutValue value{};
+    bool valid = false;
+    for (const auto& arg : batch.values) {
+      // Ignore non-scalar arguments so we can use it in the mixed-scalar-and-array case
+      if (!arg.is_scalar()) continue;
+      const auto& scalar = *arg.scalar();
+      if (!scalar.is_valid) {
+        if (options.skip_nulls) continue;
+        out->is_valid = false;
+        return;
+      }
+      if (!valid) {
+        value = UnboxScalar<OutType>::Unbox(scalar);
+        valid = true;
+      } else {
+        value = Op::Call(value, UnboxScalar<OutType>::Unbox(scalar));
+      }
+    }
+    out->is_valid = valid;
+    if (valid) {
+      BoxScalar<OutType>::Box(value, out);
+    }
+  }
+
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const ElementWiseAggregateOptions& options = MinMaxState::Get(ctx);
+    const auto descrs = batch.GetDescriptors();
+    bool all_scalar = true;
+    bool any_scalar = false;
+    size_t first_array_index = batch.values.size();
+    for (size_t i = 0; i < batch.values.size(); i++) {
+      const auto& datum = batch.values[i];
+      all_scalar &= datum.descr().shape == ValueDescr::SCALAR;
+      any_scalar |= datum.descr().shape == ValueDescr::SCALAR;
+      if (first_array_index >= batch.values.size() &&
+          datum.descr().shape == ValueDescr::ARRAY) {
+        first_array_index = i;
+      }
+    }
+    if (all_scalar) {
+      ExecScalar(batch, options, out->scalar().get());
+      return Status::OK();
+    }
+
+    ArrayData* output = out->mutable_array();
+
+    // Exactly one array (output = input)
+    if (batch.values.size() == 1) {
+      *output = *batch[0].array();
+      return Status::OK();
+    }
+
+    // At least one array, two or more arguments
+    DCHECK_GE(first_array_index, 0);
+    DCHECK_LT(first_array_index, batch.values.size());
+    DCHECK(batch.values[first_array_index].is_array());
+    if (any_scalar) {
+      ARROW_ASSIGN_OR_RAISE(std::shared_ptr<Scalar> temp_scalar,
+                            MakeScalar(out->type(), 0));
+      ExecScalar(batch, options, temp_scalar.get());
+      if (options.skip_nulls || temp_scalar->is_valid) {
+        // Promote to output array
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        *output = *array->data();
+        if (!temp_scalar->is_valid) {
+          // MakeArrayFromScalar reuses the same buffer for null/data in
+          // this case, allocate a real one since we'll be writing to it
+          ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+          ::arrow::internal::BitmapXor(output->buffers[0]->data(), /*left_offset=*/0,
+                                       output->buffers[0]->data(), /*right_offset=*/0,
+                                       batch.length, /*out_offset=*/0,
+                                       output->buffers[0]->mutable_data());
+        }
+      } else {
+        // Abort early
+        ARROW_ASSIGN_OR_RAISE(auto array, MakeArrayFromScalar(*temp_scalar, batch.length,
+                                                              ctx->memory_pool()));
+        *output = *array->data();
+        return Status::OK();
+      }
+    } else {
+      // Copy first array argument to output array
+      const ArrayData& input = *batch.values[first_array_index].array();
+      ARROW_ASSIGN_OR_RAISE(output->buffers[1],
+                            ctx->Allocate(batch.length * sizeof(OutValue)));
+      if (options.skip_nulls && input.buffers[0]) {
+        // Don't copy the bitmap if !options.skip_nulls since we'll precompute it later
+        ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+        ::arrow::internal::CopyBitmap(input.buffers[0]->data(), input.offset,
+                                      batch.length, output->buffers[0]->mutable_data(),
+                                      /*dest_offset=*/0);
+      }
+      // This won't work for nested or variable-sized types
+      std::memcpy(output->buffers[1]->mutable_data(),
+                  input.buffers[1]->data() + (input.offset * sizeof(OutValue)),
+                  batch.length * sizeof(OutValue));
+    }
+
+    if (!options.skip_nulls) {
+      // We can precompute the validity buffer in this case
+      // AND together the validity buffers of all arrays
+      ARROW_ASSIGN_OR_RAISE(output->buffers[0], ctx->AllocateBitmap(batch.length));
+      bool first = true;
+      for (const auto& arg : batch.values) {
+        if (!arg.is_array()) continue;

Review comment:
       instead of dealing with datums, you could build an ArrayDataVector and push the non scalar inputs (and the promoted folded scalar) into that. It'd make it explicit that you're only dealing with arrays and be a bit more readable, I think




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] lidavidm commented on pull request #10390: ARROW-12751: [C++] Implement minimum/maximum kernels

Posted by GitBox <gi...@apache.org>.

lidavidm commented on pull request #10390:
URL: https://github.com/apache/arrow/pull/10390#issuecomment-854119096


   > @lidavidm could you briefly describe what happens here if the arrays passed to these kernels have different but comparable data types such as one array of doubles and one array of integers?
   
   For numeric/temporal types, all arrays will be cast to a common compatible type, much the same as with arithmetic kernels (e.g. for doubles + integers, all values will be cast to doubles).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org