You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Amogh Margoor (Jira)" <ji...@apache.org> on 2021/09/29 17:23:00 UTC

[jira] [Updated] (IMPALA-10938) Adaptive Broadcast Joins in the Impala

     [ https://issues.apache.org/jira/browse/IMPALA-10938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amogh Margoor updated IMPALA-10938:
-----------------------------------
    Description: 
Broadcast joins when build side is small can be much faster than the partitioned joins. Whenever cost model figures it out, it would try to enable broadcast join. However due to lack of statistics or due to incorrect estimates it is not always possible to detect small build side. Therefore, we need an adaptive way to figure it out during the runtime and enable the broadcast. There are other systems like Apache Spark and modern datawarehouses which do rely on this technique, so it would be good addition to Impala too.

As a part of this, we may end up defining Adaptive framework which can be extended to other problems like Skew joins etc. 

  was:
Broadcast joins when build side is small can be much faster than the partitioned joins. Whenever cost model figures it out, it would try to enable broadcast join. However due to lack of statistics or due to incorrect estimates it is not always possible to detect small build side. Therefore, we need an adaptive way to figure it out during the runtime and enable the broadcast. There are other systems like Apache Spark and modern datawarehouses which do rely on this technique, so it would be good addition to Impala too.

As a part of this, we mayfield  end up defining Adaptive framework which can be extended to other problems like Skew joins etc. 


> Adaptive Broadcast Joins in the Impala
> --------------------------------------
>
>                 Key: IMPALA-10938
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10938
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>            Reporter: Amogh Margoor
>            Assignee: Amogh Margoor
>            Priority: Major
>
> Broadcast joins when build side is small can be much faster than the partitioned joins. Whenever cost model figures it out, it would try to enable broadcast join. However due to lack of statistics or due to incorrect estimates it is not always possible to detect small build side. Therefore, we need an adaptive way to figure it out during the runtime and enable the broadcast. There are other systems like Apache Spark and modern datawarehouses which do rely on this technique, so it would be good addition to Impala too.
> As a part of this, we may end up defining Adaptive framework which can be extended to other problems like Skew joins etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org