You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/10/30 19:05:00 UTC

[jira] [Updated] (IMPALA-3994) Anticipate effect of cardinality estimation error when deciding between broadcast and repartitioning join

     [ https://issues.apache.org/jira/browse/IMPALA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong updated IMPALA-3994:
----------------------------------
    Issue Type: Improvement  (was: Bug)

> Anticipate effect of cardinality estimation error when deciding between broadcast and repartitioning join
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-3994
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3994
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.2
>            Reporter: Marcel Kornacker
>            Priority: Major
>
> DistributedPlanner.createHashJoinFragment() computes the cost of doing both join distribution variants based on the estimated cardinalities of the left and right input and then picks the cheapest distribution mode (unless query hints, etc., prevent it from doing so).
> This is problematic because:
> - estimates typically don't match actual row counts
> - both distribution modes have very different cost functions, relative to input size (broadcast cost goes up very quickly in large clusters)
> - we pick broadcast joins if the cost *at the estimated cardinality* is only marginally smaller than the cost of a repartitioning join, which ends up being the wrong choice if the actual cardinality is (not even much) higher
> To take into account the possibility of estimation errors, createHashJoinFragment() should compute the expected costs for a small number of input cardinalities (derived from the expected cardinality).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org