You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/15 18:11:14 UTC

[GitHub] [arrow] bkietz commented on a change in pull request #10725: ARROW-9056: [C++] Support aggregations over scalars

bkietz commented on a change in pull request #10725:
URL: https://github.com/apache/arrow/pull/10725#discussion_r670689127



##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -59,10 +59,16 @@ struct CountImpl : public ScalarAggregator {
   explicit CountImpl(ScalarAggregateOptions options) : options(std::move(options)) {}
 
   Status Consume(KernelContext*, const ExecBatch& batch) override {
-    const ArrayData& input = *batch[0].array();
-    const int64_t nulls = input.GetNullCount();
-    this->nulls += nulls;
-    this->non_nulls += input.length - nulls;
+    if (batch[0].is_array()) {
+      const ArrayData& input = *batch[0].array();
+      const int64_t nulls = input.GetNullCount();
+      this->nulls += nulls;
+      this->non_nulls += input.length - nulls;
+    } else {
+      const Scalar& input = *batch[0].scalar();
+      this->nulls += !input.is_valid;

Review comment:
       In dataset scans and elsewhere, a scalar can be used to encode a constant column. We should take the batch's length into account here
   ```suggestion
         this->nulls += !input.is_valid * batch.length;
   ```

##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -395,6 +433,10 @@ void AddMinMaxKernels(KernelInit init,
     auto out_ty = struct_({field("min", ty), field("max", ty)});
     auto sig = KernelSignature::Make({InputType::Array(ty)}, ValueDescr::Scalar(out_ty));
     AddAggKernel(std::move(sig), init, func, simd_level);
+
+    // scalar[InT] -> scalar[struct<min: T, max: T>]
+    sig = KernelSignature::Make({InputType::Scalar(ty)}, ValueDescr::Scalar(out_ty));
+    AddAggKernel(std::move(sig), init, func, SimdLevel::NONE);

Review comment:
       Instead since the same init is being used for each kernel, I'd prefer we just make the signature inclusive of scalars as well as arrays:
   ```suggestion
       auto sig = KernelSignature::Make({InputType(ty)}, ValueDescr::Scalar(out_ty));
       AddAggKernel(std::move(sig), init, func, simd_level);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org