You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "rtpsw (via GitHub)" <gi...@apache.org> on 2023/04/05 08:00:11 UTC

[GitHub] [arrow] rtpsw opened a new pull request, #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

rtpsw opened a new pull request, #34904:
URL: https://github.com/apache/arrow/pull/34904

   See #34786


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1500015094

   > > Are there any test we can add for this?
   > 
   > I should have an answer after experimenting with this code a bit.
   
   Added test-cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1501811308

   @westonpace This looks pretty good to me. Can you please take a look too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1497617896

   > Are there any test we can add for this?
   
   I should have an answer after experimenting with this code a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158617762


##########
cpp/src/arrow/engine/substrait/relation_internal.h:
##########
@@ -50,11 +50,6 @@ ARROW_ENGINE_EXPORT Result<std::unique_ptr<substrait::Rel>> ToProto(
 
 namespace internal {
 
-struct ParsedMeasure {

Review Comment:
   Nice cleanup.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1163804001


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -665,14 +680,47 @@ class GroupByNode : public ExecNode, public TracedNode {
     }
     base += segment_keys.size();
     for (size_t i = 0; i < aggs.size(); ++i) {
-      output_fields[base + i] =
-          agg_result_fields[i]->WithName(aggregate_options.aggregates[i].name);
+      output_fields[base + i] = agg_result_fields[i]->WithName(aggs[i].name);
+    }
+
+    return AggregateNodeArgs<HashAggregateKernel>{schema(std::move(output_fields)),
+                                                  std::move(key_field_ids),
+                                                  std::move(segment_key_field_ids),
+                                                  std::move(segmenter),
+                                                  std::move(agg_src_fieldsets),
+                                                  std::move(aggs),
+                                                  std::move(agg_kernels),
+                                                  std::move(agg_src_types),
+                                                  /*states=*/{}};

Review Comment:
   Technically, this is because `GroupByNode` does not take states in [its constructor](https://github.com/apache/arrow/blob/d9e672f40e831dd8f6a73a21b3a9bebfc35c5a45/cpp/src/arrow/acero/aggregate_node.cc#L543-L549) whereas `ScalarAggregateNode` [does](https://github.com/apache/arrow/blob/d9e672f40e831dd8f6a73a21b3a9bebfc35c5a45/cpp/src/arrow/acero/aggregate_node.cc#L257-L265). I don't think there is a real design justification for this difference; I just didn't change the original design here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1161706619


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -665,14 +682,47 @@ class GroupByNode : public ExecNode, public TracedNode {
     }
     base += segment_keys.size();
     for (size_t i = 0; i < aggs.size(); ++i) {
-      output_fields[base + i] =
-          agg_result_fields[i]->WithName(aggregate_options.aggregates[i].name);
+      output_fields[base + i] = agg_result_fields[i]->WithName(aggs[i].name);
     }
 
+    return AggregateNodeArgs<HashAggregateKernel>{schema(std::move(output_fields)),
+                                                  std::move(key_field_ids),
+                                                  std::move(segment_key_field_ids),
+                                                  std::move(segmenter),
+                                                  std::move(agg_src_fieldsets),
+                                                  std::move(aggs),
+                                                  std::move(agg_kernels),
+                                                  std::move(agg_src_types),
+                                                  {}};
+  }
+
+  static Result<ExecNode*> Make(ExecPlan* plan, std::vector<ExecNode*> inputs,
+                                const ExecNodeOptions& options) {
+    RETURN_NOT_OK(ValidateExecNodeInputs(plan, inputs, 1, "GroupByNode"));
+
+    auto input = inputs[0];
+    const auto& aggregate_options = checked_cast<const AggregateNodeOptions&>(options);
+    const auto& keys = aggregate_options.keys;
+    const auto& segment_keys = aggregate_options.segment_keys;
+    // Copy (need to modify options pointer below)

Review Comment:
   Where did we modify the options?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158614051


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -602,7 +621,7 @@ class GroupByNode : public ExecNode, public TracedNode {
     for (const auto& segment_key_field_id : segment_key_field_ids) {
       if (key_field_id_set.find(segment_key_field_id) != key_field_id_set.end()) {
         return Status::Invalid("Group-by aggregation with field '",
-                               input_schema->field(segment_key_field_id)->name(),

Review Comment:
   Any reason that we need to change `input_schema` from a pointer reference to a schema object reference?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158631737


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -602,7 +621,7 @@ class GroupByNode : public ExecNode, public TracedNode {
     for (const auto& segment_key_field_id : segment_key_field_ids) {
       if (key_field_id_set.find(segment_key_field_id) != key_field_id_set.end()) {
         return Status::Invalid("Group-by aggregation with field '",
-                               input_schema->field(segment_key_field_id)->name(),

Review Comment:
   I think this can be a `shared_ptr`, if you prefer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158614051


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -602,7 +621,7 @@ class GroupByNode : public ExecNode, public TracedNode {
     for (const auto& segment_key_field_id : segment_key_field_ids) {
       if (key_field_id_set.find(segment_key_field_id) != key_field_id_set.end()) {
         return Status::Invalid("Group-by aggregation with field '",
-                               input_schema->field(segment_key_field_id)->name(),

Review Comment:
   Any reason that we need to change `input_schema` from a pointer reference to a schema reference?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158611677


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -77,6 +78,19 @@ namespace acero {
 
 namespace {
 
+template <typename KernelType>
+struct ARROW_ACERO_EXPORT AggregateNodeArgs {

Review Comment:
   I am actually not sure what does `ARROW_ACERO_EXPORT` does but it seems this is a private struct of the cc file and doesn't need to be exported?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1161705566


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -373,14 +371,50 @@ class ScalarAggregateNode : public ExecNode, public TracedNode {
       ARROW_ASSIGN_OR_RAISE(auto out_type, kernels[i]->signature->out_type().Resolve(
                                                &kernel_ctx, kernel_intypes[i]));
 
-      fields[base + i] =
-          field(aggregate_options.aggregates[i].name, out_type.GetSharedPtr());
+      fields[base + i] = field(aggregates[i].name, out_type.GetSharedPtr());
+    }
+
+    return AggregateNodeArgs<ScalarAggregateKernel>{schema(std::move(fields)),
+                                                    {},

Review Comment:
   Can you add a comment to the empty list args here? e.g.
   /*param*/{}



##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -373,14 +371,50 @@ class ScalarAggregateNode : public ExecNode, public TracedNode {
       ARROW_ASSIGN_OR_RAISE(auto out_type, kernels[i]->signature->out_type().Resolve(
                                                &kernel_ctx, kernel_intypes[i]));
 
-      fields[base + i] =
-          field(aggregate_options.aggregates[i].name, out_type.GetSharedPtr());
+      fields[base + i] = field(aggregates[i].name, out_type.GetSharedPtr());
+    }
+
+    return AggregateNodeArgs<ScalarAggregateKernel>{schema(std::move(fields)),
+                                                    {},

Review Comment:
   Can you add a comment to the empty list args here? e.g.
   ```
   /*param*/{}
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1497098310

   @westonpace, this is a simpler PR, as compared to #34885, following a discussion with @icexelloss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1160494069


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -77,6 +78,19 @@ namespace acero {
 
 namespace {
 
+template <typename KernelType>
+struct ARROW_ACERO_EXPORT AggregateNodeArgs {

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace merged pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace merged PR #34904:
URL: https://github.com/apache/arrow/pull/34904


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1161705808


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -665,14 +682,47 @@ class GroupByNode : public ExecNode, public TracedNode {
     }
     base += segment_keys.size();
     for (size_t i = 0; i < aggs.size(); ++i) {
-      output_fields[base + i] =
-          agg_result_fields[i]->WithName(aggregate_options.aggregates[i].name);
+      output_fields[base + i] = agg_result_fields[i]->WithName(aggs[i].name);
     }
 
+    return AggregateNodeArgs<HashAggregateKernel>{schema(std::move(output_fields)),
+                                                  std::move(key_field_ids),
+                                                  std::move(segment_key_field_ids),
+                                                  std::move(segmenter),
+                                                  std::move(agg_src_fieldsets),
+                                                  std::move(aggs),
+                                                  std::move(agg_kernels),
+                                                  std::move(agg_src_types),
+                                                  {}};

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1508907964

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e720fc94582540ff9f08dc3ad5a6a7d1...a48f99af134c429fb9fbb696ab5f35c3/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158627090


##########
cpp/src/arrow/acero/aggregate_node.h:
##########
@@ -0,0 +1,53 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// This API is EXPERIMENTAL.
+
+#pragma once
+
+#include <memory>
+#include <vector>
+
+#include "arrow/acero/visibility.h"
+#include "arrow/compute/api_aggregate.h"
+#include "arrow/compute/row/grouper.h"
+#include "arrow/compute/type_fwd.h"
+#include "arrow/type_fwd.h"
+
+namespace arrow {
+namespace acero {
+namespace aggregate {
+
+using compute::Aggregate;
+using compute::default_exec_context;
+using compute::ExecContext;
+
+/// \brief Make the output schema of an aggregate node
+///
+/// \param[in] input_schema the schema of the input to the node
+/// \param[in] keys the grouping keys for the aggregation
+/// \param[in] segment_keys the segmenting keys for the aggregation
+/// \param[in] aggregates the aggregates for the aggregation
+/// \param[in] exec_ctx the execution context for the aggregation
+ARROW_ACERO_EXPORT Result<std::shared_ptr<Schema>> MakeOutputSchema(
+    const Schema& input_schema, const std::vector<FieldRef>& keys,
+    const std::vector<FieldRef>& segment_keys, const std::vector<Aggregate>& aggregates,
+    ExecContext* exec_ctx = default_exec_context());

Review Comment:
   Right, [as discussed here](https://github.com/apache/arrow/issues/34786#issuecomment-1493726176).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158643343


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -602,7 +621,7 @@ class GroupByNode : public ExecNode, public TracedNode {
     for (const auto& segment_key_field_id : segment_key_field_ids) {
       if (key_field_id_set.find(segment_key_field_id) != key_field_id_set.end()) {
         return Status::Invalid("Group-by aggregation with field '",
-                               input_schema->field(segment_key_field_id)->name(),

Review Comment:
   Yeah I think `const shared_ptr<Schema>&` is a little better because then some of the lines using input_schema can be unchanged. Schema is also usually passed around as `shared_ptr` or `share_ptr&`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1497608395

   @rtpsw Thanks - at high level I think this looks a lot nicer - 
   
   Are there any test we can add for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1163152472


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -665,14 +680,47 @@ class GroupByNode : public ExecNode, public TracedNode {
     }
     base += segment_keys.size();
     for (size_t i = 0; i < aggs.size(); ++i) {
-      output_fields[base + i] =
-          agg_result_fields[i]->WithName(aggregate_options.aggregates[i].name);
+      output_fields[base + i] = agg_result_fields[i]->WithName(aggs[i].name);
+    }
+
+    return AggregateNodeArgs<HashAggregateKernel>{schema(std::move(output_fields)),
+                                                  std::move(key_field_ids),
+                                                  std::move(segment_key_field_ids),
+                                                  std::move(segmenter),
+                                                  std::move(agg_src_fieldsets),
+                                                  std::move(aggs),
+                                                  std::move(agg_kernels),
+                                                  std::move(agg_src_types),
+                                                  /*states=*/{}};

Review Comment:
   Why states is passed in with ScalarAggregate but not HashAggregate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1160494703


##########
cpp/src/arrow/acero/aggregate_node.h:
##########
@@ -0,0 +1,53 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// This API is EXPERIMENTAL.
+
+#pragma once
+
+#include <memory>
+#include <vector>
+
+#include "arrow/acero/visibility.h"
+#include "arrow/compute/api_aggregate.h"
+#include "arrow/compute/row/grouper.h"
+#include "arrow/compute/type_fwd.h"
+#include "arrow/type_fwd.h"
+
+namespace arrow {
+namespace acero {
+namespace aggregate {
+
+using compute::Aggregate;
+using compute::default_exec_context;
+using compute::ExecContext;
+
+/// \brief Make the output schema of an aggregate node
+///
+/// \param[in] input_schema the schema of the input to the node
+/// \param[in] keys the grouping keys for the aggregation
+/// \param[in] segment_keys the segmenting keys for the aggregation
+/// \param[in] aggregates the aggregates for the aggregation
+/// \param[in] exec_ctx the execution context for the aggregation
+ARROW_ACERO_EXPORT Result<std::shared_ptr<Schema>> MakeOutputSchema(
+    const Schema& input_schema, const std::vector<FieldRef>& keys,
+    const std::vector<FieldRef>& segment_keys, const std::vector<Aggregate>& aggregates,
+    ExecContext* exec_ctx = default_exec_context());

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158642345


##########
cpp/src/arrow/acero/aggregate_node.h:
##########
@@ -0,0 +1,53 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// This API is EXPERIMENTAL.
+
+#pragma once
+
+#include <memory>
+#include <vector>
+
+#include "arrow/acero/visibility.h"
+#include "arrow/compute/api_aggregate.h"
+#include "arrow/compute/row/grouper.h"
+#include "arrow/compute/type_fwd.h"
+#include "arrow/type_fwd.h"
+
+namespace arrow {
+namespace acero {
+namespace aggregate {
+
+using compute::Aggregate;
+using compute::default_exec_context;
+using compute::ExecContext;
+
+/// \brief Make the output schema of an aggregate node
+///
+/// \param[in] input_schema the schema of the input to the node
+/// \param[in] keys the grouping keys for the aggregation
+/// \param[in] segment_keys the segmenting keys for the aggregation
+/// \param[in] aggregates the aggregates for the aggregation
+/// \param[in] exec_ctx the execution context for the aggregation
+ARROW_ACERO_EXPORT Result<std::shared_ptr<Schema>> MakeOutputSchema(
+    const Schema& input_schema, const std::vector<FieldRef>& keys,
+    const std::vector<FieldRef>& segment_keys, const std::vector<Aggregate>& aggregates,
+    ExecContext* exec_ctx = default_exec_context());

Review Comment:
   Got it - not a big deal. But maybe add in the docstring why this is needed. (basically to get the output schema, we need to create kernels, and to create kernels we need exec context?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158615906


##########
cpp/src/arrow/acero/aggregate_node.h:
##########
@@ -0,0 +1,53 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// This API is EXPERIMENTAL.
+
+#pragma once
+
+#include <memory>
+#include <vector>
+
+#include "arrow/acero/visibility.h"
+#include "arrow/compute/api_aggregate.h"
+#include "arrow/compute/row/grouper.h"
+#include "arrow/compute/type_fwd.h"
+#include "arrow/type_fwd.h"
+
+namespace arrow {
+namespace acero {
+namespace aggregate {
+
+using compute::Aggregate;
+using compute::default_exec_context;
+using compute::ExecContext;
+
+/// \brief Make the output schema of an aggregate node
+///
+/// \param[in] input_schema the schema of the input to the node
+/// \param[in] keys the grouping keys for the aggregation
+/// \param[in] segment_keys the segmenting keys for the aggregation
+/// \param[in] aggregates the aggregates for the aggregation
+/// \param[in] exec_ctx the execution context for the aggregation
+ARROW_ACERO_EXPORT Result<std::shared_ptr<Schema>> MakeOutputSchema(
+    const Schema& input_schema, const std::vector<FieldRef>& keys,
+    const std::vector<FieldRef>& segment_keys, const std::vector<Aggregate>& aggregates,
+    ExecContext* exec_ctx = default_exec_context());

Review Comment:
   We don't have access to ExecContext in the substrait consumer right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158643343


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -602,7 +621,7 @@ class GroupByNode : public ExecNode, public TracedNode {
     for (const auto& segment_key_field_id : segment_key_field_ids) {
       if (key_field_id_set.find(segment_key_field_id) != key_field_id_set.end()) {
         return Status::Invalid("Group-by aggregation with field '",
-                               input_schema->field(segment_key_field_id)->name(),

Review Comment:
   Yeah I think `const shared_ptr<Schema>&` is better because then some of the lines using input_schema can be unchanged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1161707254


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -665,14 +682,47 @@ class GroupByNode : public ExecNode, public TracedNode {
     }
     base += segment_keys.size();
     for (size_t i = 0; i < aggs.size(); ++i) {
-      output_fields[base + i] =
-          agg_result_fields[i]->WithName(aggregate_options.aggregates[i].name);
+      output_fields[base + i] = agg_result_fields[i]->WithName(aggs[i].name);
     }
 
+    return AggregateNodeArgs<HashAggregateKernel>{schema(std::move(output_fields)),
+                                                  std::move(key_field_ids),
+                                                  std::move(segment_key_field_ids),
+                                                  std::move(segmenter),
+                                                  std::move(agg_src_fieldsets),
+                                                  std::move(aggs),
+                                                  std::move(agg_kernels),
+                                                  std::move(agg_src_types),
+                                                  {}};
+  }
+
+  static Result<ExecNode*> Make(ExecPlan* plan, std::vector<ExecNode*> inputs,
+                                const ExecNodeOptions& options) {
+    RETURN_NOT_OK(ValidateExecNodeInputs(plan, inputs, 1, "GroupByNode"));
+
+    auto input = inputs[0];
+    const auto& aggregate_options = checked_cast<const AggregateNodeOptions&>(options);
+    const auto& keys = aggregate_options.keys;
+    const auto& segment_keys = aggregate_options.segment_keys;
+    // Copy (need to modify options pointer below)
+    auto aggs = aggregate_options.aggregates;
+
+    if (plan->query_context()->exec_context()->executor()->GetCapacity() > 1 &&
+        segment_keys.size() > 0) {
+      return Status::NotImplemented("Segmented aggregation in a multi-threaded plan");

Review Comment:
   nit: "multi-threaded plan" -> "multi-threaded execution context"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1501802069

   @rtpsw Looks pretty good - left a few question/comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1508907256

   Benchmark runs are scheduled for baseline = 2e78adbe15cd4c58dd30ccc4bf2d6d5e0eb2bf4d and contender = 342b74e4192a69e849b67e36b054078c208fac91. 342b74e4192a69e849b67e36b054078c208fac91 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/cb9e09b8b38d4e8c8a653ef14af68264...23979d528fd8434b941211e8940a53d0/)
   [Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/f069c5d0d22b4d09b1329b2ddcbce409...bad2da87103142a9b5aca591a245390f/)
   [Finished :arrow_down:2.3% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e720fc94582540ff9f08dc3ad5a6a7d1...a48f99af134c429fb9fbb696ab5f35c3/)
   [Finished :arrow_down:0.22% :arrow_up:0.03%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/636f23f704a940d7a6051d9416b359bd...231925ed81694a6ead6ecce12a3cd40b/)
   Buildkite builds:
   [Finished] [`342b74e4` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2696)
   [Failed] [`342b74e4` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2730)
   [Finished] [`342b74e4` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2694)
   [Finished] [`342b74e4` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2721)
   [Finished] [`2e78adbe` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2695)
   [Failed] [`2e78adbe` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2729)
   [Finished] [`2e78adbe` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2693)
   [Finished] [`2e78adbe` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2720)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1160494188


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -602,7 +621,7 @@ class GroupByNode : public ExecNode, public TracedNode {
     for (const auto& segment_key_field_id : segment_key_field_ids) {
       if (key_field_id_set.find(segment_key_field_id) != key_field_id_set.end()) {
         return Status::Invalid("Group-by aggregation with field '",
-                               input_schema->field(segment_key_field_id)->name(),

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1161748582


##########
cpp/src/arrow/acero/aggregate_node.h:
##########
@@ -0,0 +1,57 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// This API is EXPERIMENTAL.
+
+#pragma once
+
+#include <memory>
+#include <vector>
+
+#include "arrow/acero/visibility.h"
+#include "arrow/compute/api_aggregate.h"
+#include "arrow/compute/row/grouper.h"

Review Comment:
   ```suggestion
   ```



##########
cpp/src/arrow/acero/aggregate_node.h:
##########
@@ -0,0 +1,57 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// This API is EXPERIMENTAL.
+
+#pragma once
+
+#include <memory>
+#include <vector>
+
+#include "arrow/acero/visibility.h"
+#include "arrow/compute/api_aggregate.h"
+#include "arrow/compute/row/grouper.h"
+#include "arrow/compute/type_fwd.h"
+#include "arrow/type_fwd.h"

Review Comment:
   ```suggestion
   #include "arrow/result.h"
   #include "arrow/type_fwd.h"
   ```



##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -665,14 +682,47 @@ class GroupByNode : public ExecNode, public TracedNode {
     }
     base += segment_keys.size();
     for (size_t i = 0; i < aggs.size(); ++i) {
-      output_fields[base + i] =
-          agg_result_fields[i]->WithName(aggregate_options.aggregates[i].name);
+      output_fields[base + i] = agg_result_fields[i]->WithName(aggs[i].name);
     }
 
+    return AggregateNodeArgs<HashAggregateKernel>{schema(std::move(output_fields)),
+                                                  std::move(key_field_ids),
+                                                  std::move(segment_key_field_ids),
+                                                  std::move(segmenter),
+                                                  std::move(agg_src_fieldsets),
+                                                  std::move(aggs),
+                                                  std::move(agg_kernels),
+                                                  std::move(agg_src_types),
+                                                  {}};
+  }
+
+  static Result<ExecNode*> Make(ExecPlan* plan, std::vector<ExecNode*> inputs,
+                                const ExecNodeOptions& options) {
+    RETURN_NOT_OK(ValidateExecNodeInputs(plan, inputs, 1, "GroupByNode"));
+
+    auto input = inputs[0];
+    const auto& aggregate_options = checked_cast<const AggregateNodeOptions&>(options);
+    const auto& keys = aggregate_options.keys;
+    const auto& segment_keys = aggregate_options.segment_keys;
+    // Copy (need to modify options pointer below)
+    auto aggs = aggregate_options.aggregates;
+
+    if (plan->query_context()->exec_context()->executor()->GetCapacity() > 1 &&
+        segment_keys.size() > 0) {
+      return Status::NotImplemented("Segmented aggregation in a multi-threaded plan");
+    }
+
+    auto input_schema = input->output_schema();

Review Comment:
   ```suggestion
       const auto& input_schema = input->output_schema();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34904:
URL: https://github.com/apache/arrow/pull/34904#issuecomment-1497079541

   * Closes: #34786


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1158624872


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -77,6 +78,19 @@ namespace acero {
 
 namespace {
 
+template <typename KernelType>
+struct ARROW_ACERO_EXPORT AggregateNodeArgs {

Review Comment:
   Yes, this was moved here from the header but has no effect here - should be removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rtpsw commented on a diff in pull request #34904: GH-34786: [C++] Fix output schema calculated by Substrait consumer for AggregateRel

Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on code in PR #34904:
URL: https://github.com/apache/arrow/pull/34904#discussion_r1161988818


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -665,14 +682,47 @@ class GroupByNode : public ExecNode, public TracedNode {
     }
     base += segment_keys.size();
     for (size_t i = 0; i < aggs.size(); ++i) {
-      output_fields[base + i] =
-          agg_result_fields[i]->WithName(aggregate_options.aggregates[i].name);
+      output_fields[base + i] = agg_result_fields[i]->WithName(aggs[i].name);
     }
 
+    return AggregateNodeArgs<HashAggregateKernel>{schema(std::move(output_fields)),
+                                                  std::move(key_field_ids),
+                                                  std::move(segment_key_field_ids),
+                                                  std::move(segmenter),
+                                                  std::move(agg_src_fieldsets),
+                                                  std::move(aggs),
+                                                  std::move(agg_kernels),
+                                                  std::move(agg_src_types),
+                                                  {}};
+  }
+
+  static Result<ExecNode*> Make(ExecPlan* plan, std::vector<ExecNode*> inputs,
+                                const ExecNodeOptions& options) {
+    RETURN_NOT_OK(ValidateExecNodeInputs(plan, inputs, 1, "GroupByNode"));
+
+    auto input = inputs[0];
+    const auto& aggregate_options = checked_cast<const AggregateNodeOptions&>(options);
+    const auto& keys = aggregate_options.keys;
+    const auto& segment_keys = aggregate_options.segment_keys;
+    // Copy (need to modify options pointer below)

Review Comment:
   In the refactoring, I forgot to move this comment to line 294, since `aggregates` there is modified (in line 358). I'll do this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org