You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yash Datta (JIRA)" <ji...@apache.org> on 2015/04/27 12:49:41 UTC

[jira] [Updated] (SPARK-7097) Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold

     [ https://issues.apache.org/jira/browse/SPARK-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yash Datta updated SPARK-7097:
------------------------------
    Description: 
Currently when deciding about whether to create HashJoin or ShuffleHashJoin, the size estimation of partitioned tables involved considers the size of entire table. This results in many query plans using shuffle hash joins , where infact only a small number of partitions may be being referred by the actual query (due to additional filters), and hence these could be run using BroadCastHash join.

The query plan should consider the size of only the referred partitions in such cases

  was:
Currently when deciding about whether to create HashJoin or ShuffleHashJoin, the size estimation of partitioned tables involved considers the size of entire table. This results in many query plans using shuffle hash joins , where infact only a small number of partitions may be being referred by the actual query (due to additional filters), and hence these could be run using Map side hash join.

The query plan should consider the size of only the referred partitions in such cases


> Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-7097
>                 URL: https://issues.apache.org/jira/browse/SPARK-7097
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.1.1, 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1
>            Reporter: Yash Datta
>             Fix For: 1.4.0
>
>
> Currently when deciding about whether to create HashJoin or ShuffleHashJoin, the size estimation of partitioned tables involved considers the size of entire table. This results in many query plans using shuffle hash joins , where infact only a small number of partitions may be being referred by the actual query (due to additional filters), and hence these could be run using BroadCastHash join.
> The query plan should consider the size of only the referred partitions in such cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org