You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/05/12 16:54:00 UTC
[jira] [Created] (ARROW-16549) [C++] Simplify AggregateNodeOptions aggregates/targets
Weston Pace created ARROW-16549:
-----------------------------------
Summary: [C++] Simplify AggregateNodeOptions aggregates/targets
Key: ARROW-16549
URL: https://issues.apache.org/jira/browse/ARROW-16549
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
Currently AggregateNodeOptions is:
{noformat}
class ARROW_EXPORT AggregateNodeOptions : public ExecNodeOptions {
public:
// aggregations which will be applied to the targetted fields
std::vector<internal::Aggregate> aggregates;
// fields to which aggregations will be applied
std::vector<FieldRef> targets;
// output field names for aggregations
std::vector<std::string> names;
// keys by which aggregations will be grouped
std::vector<FieldRef> keys;
};
{noformat}
It is not very obvious how {{aggregates}} and {{targets}} are related. My initial read of the comments led me to think that each aggregate would be applied to each target and you would end up with {{len(aggregates) * len(targets)}} output fields. In reality the {{aggregate}} at index {{i}} only applies to the {{target}} at index {{i}}. It would be simpler to add a {{FieldRef target}} to {{internal::Aggregate}} (and {{Aggregate}} should not be {{internal}}).
Alternatively, the entire {{internal::Aggregate}} could be replaced by a "call" {{arrow::compute::Expression}}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)