You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/02/12 16:00:00 UTC
[jira] [Updated] (ARROW-11606) [Rust] [DataFusion] Need guidance on
HashAggregateExec reconstruction
[ https://issues.apache.org/jira/browse/ARROW-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-11606:
-----------------------------------
Labels: pull-request-available (was: )
> [Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction
> ---------------------------------------------------------------------
>
> Key: ARROW-11606
> URL: https://issues.apache.org/jira/browse/ARROW-11606
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust - DataFusion
> Reporter: Andy Grove
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We have run into an issue in the Ballista project where we are reconstructing the Final and Partial HashAggregateExec operators [1] for distributed execution and we need some guidance.
> The Partial HashAggregateExec gets created OK and executes correctly.
> However, when we create the Final HashAggregateExec, it is not finding the expected schema in the input operator. The partial exec outputs field names ending with "[sum]" and "[count]" and so on but the final aggregate doesn't seem to be looking for those names.
> It is also worth noting that the Final and Partial executors are not connected directly in this usage.
> The Partial exec is executed and output streamed to disk.
> The Final exec then runs against the output from the Partial exec.
> We may need to make changes in DataFusion to allow other crates to support this kind of use case?
> [1] https://github.com/ballista-compute/ballista/pull/491
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)