You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2019/04/17 04:12:00 UTC

[jira] [Assigned] (IMPALA-3825) Distribute runtime filter aggregation across cluster

     [ https://issues.apache.org/jira/browse/IMPALA-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong reassigned IMPALA-3825:
-------------------------------------

    Assignee: Abhishek Rawat  (was: Rahul Shivu Mahadev)

> Distribute runtime filter aggregation across cluster
> ----------------------------------------------------
>
>                 Key: IMPALA-3825
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3825
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>    Affects Versions: Impala 2.6.0
>            Reporter: Henry Robinson
>            Assignee: Abhishek Rawat
>            Priority: Major
>              Labels: runtime-filters
>
> Runtime filters can be tens of MB or more, and incasting all filters from all shuffle joins to the coordinator can put a lot of memory pressure on that node. To alleviate this we should consider spreading out the aggregation operation across the cluster, so that a different node aggregates each runtime filter.
> This still restricts aggregation to #runtime-filters nodes, which will usually be less than the cluster size. If we want to smooth that out further we could use tree-based aggregation, but let's measure the benefits of simply distributing the aggregation work first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org