You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/05/25 15:11:00 UTC

[jira] [Comment Edited] (ARROW-5002) [C++] Implement Hash Aggregation query execution node

    [ https://issues.apache.org/jira/browse/ARROW-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116106#comment-17116106 ] 

Wes McKinney edited comment on ARROW-5002 at 5/25/20, 3:10 PM:
---------------------------------------------------------------

I renamed the issue. I need to be able to execute hash aggregations in the next few months so I be working to implement the appropriate machinery for this under arrow/compute (since hash aggregations need to compose with array/kernel expressions)


was (Author: wesmckinn):
I renamed the issue. I need to be able to execute hash aggregations in the next few months so I will implement the appropriate machinery for this under arrow/compute (since hash aggregations need to compose with array/kernel expressions)

> [C++] Implement Hash Aggregation query execution node
> -----------------------------------------------------
>
>                 Key: ARROW-5002
>                 URL: https://issues.apache.org/jira/browse/ARROW-5002
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Philipp Moritz
>            Priority: Major
>              Labels: query-engine
>
> Dear all,
> I wonder what the best way forward is for implementing GroupBy kernels. Initially this was part of
> https://issues.apache.org/jira/browse/ARROW-4124
> but is not contained in the current implementation as far as I can tell.
> It seems that the part of group by that just returns indices could be conveniently implemented with the HashKernel. That seems useful in any case. Is that indeed the best way forward/should this be done?
> GroupBy + Aggregate could then either be implemented with that + the Take kernel + aggregation involving more memory copies than necessary though or as part of the aggregate kernel. Probably the latter is preferred, any thoughts on that?
> Am I missing any other JIRAs related to this?
> Best, Philipp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)