You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/12/04 19:34:00 UTC

[jira] [Resolved] (IMPALA-2737) Investigate partition-oriented agg and join processing

     [ https://issues.apache.org/jira/browse/IMPALA-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-2737.
-----------------------------------
    Resolution: Later

Closing until we have more concrete plans.

> Investigate partition-oriented agg and join processing
> ------------------------------------------------------
>
>                 Key: IMPALA-2737
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2737
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.3.0
>            Reporter: Tim Armstrong
>            Priority: Minor
>              Labels: performance
>         Attachments: partition-oriented-pagg-preview.diff
>
>
> Currently the partitioned aggregations and joins add rows to the partitions as they process the input. This leads to poor memory access patterns since the 16 different partitions are randomly accessed. An alternative approach is to do an initial pass to hash and divide the rows between partitions, then do a second pass per partition to insert all the rows for that partition. This avoids the random access to partitions.
> This can enable some additional optimisations, e.g. prefetching hash table buckets for the next row.
> An initial prototype was posted here: http://gerrit.cloudera.org/#/c/628 . The diff is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)