You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2021/02/12 01:39:00 UTC

[jira] [Commented] (ARROW-11606) [Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction

    [ https://issues.apache.org/jira/browse/ARROW-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283458#comment-17283458 ] 

Andy Grove commented on ARROW-11606:
------------------------------------

[~jorgecarleitao] We could use your guidance here if you have time

> [Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction
> ---------------------------------------------------------------------
>
>                 Key: ARROW-11606
>                 URL: https://issues.apache.org/jira/browse/ARROW-11606
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust - DataFusion
>            Reporter: Andy Grove
>            Priority: Major
>
> We have run into an issue in the Ballista project where we are reconstructing the Final and Partial HashAggregateExec operators [1] for distributed execution and we need some guidance.
> The Partial HashAggregateExec gets created OK and executes correctly.
> However, when we create the Final HashAggregateExec, it is not finding the expected schema in the input operator. The partial exec outputs field names ending with "[sum]" and "[count]" and so on but the final aggregate doesn't seem to be looking for those names.
> It is also worth noting that the Final and Partial executors are not connected directly in this usage.
> The Partial exec is executed and output streamed to disk.
> The Final exec then runs against the output from the Partial exec.
> We may need to make changes in DataFusion to allow other crates to support this kind of use case?
>  [1] https://github.com/ballista-compute/ballista/pull/491
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)