You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2014/11/11 21:50:33 UTC

[jira] [Commented] (SPARK-2205) Unnecessary exchange operators in a join on multiple tables with the same join key.

    [ https://issues.apache.org/jira/browse/SPARK-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207031#comment-14207031 ] 

Yin Huai commented on SPARK-2205:
---------------------------------

Just a note to myself. It will be good to also look at if outputPartitioning in other physical operators are properly set. For example, the outputPartitioning in LeftSemiJoinHash is using the default UnknownPartitioning.

> Unnecessary exchange operators in a join on multiple tables with the same join key.
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-2205
>                 URL: https://issues.apache.org/jira/browse/SPARK-2205
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>            Priority: Minor
>
> {code}
> hql("select * from src x join src y on (x.key=y.key) join src z on (y.key=z.key)")
> SchemaRDD[1] at RDD at SchemaRDD.scala:100
> == Query Plan ==
> Project [key#4:0,value#5:1,key#6:2,value#7:3,key#8:4,value#9:5]
>  HashJoin [key#6], [key#8], BuildRight
>   Exchange (HashPartitioning [key#6], 200)
>    HashJoin [key#4], [key#6], BuildRight
>     Exchange (HashPartitioning [key#4], 200)
>      HiveTableScan [key#4,value#5], (MetastoreRelation default, src, Some(x)), None
>     Exchange (HashPartitioning [key#6], 200)
>      HiveTableScan [key#6,value#7], (MetastoreRelation default, src, Some(y)), None
>   Exchange (HashPartitioning [key#8], 200)
>    HiveTableScan [key#8,value#9], (MetastoreRelation default, src, Some(z)), None
> {code}
> However, this is fine...
> {code}
> hql("select * from src x join src y on (x.key=y.key) join src z on (x.key=z.key)")
> res5: org.apache.spark.sql.SchemaRDD = 
> SchemaRDD[5] at RDD at SchemaRDD.scala:100
> == Query Plan ==
> Project [key#26:0,value#27:1,key#28:2,value#29:3,key#30:4,value#31:5]
>  HashJoin [key#26], [key#30], BuildRight
>   HashJoin [key#26], [key#28], BuildRight
>    Exchange (HashPartitioning [key#26], 200)
>     HiveTableScan [key#26,value#27], (MetastoreRelation default, src, Some(x)), None
>    Exchange (HashPartitioning [key#28], 200)
>     HiveTableScan [key#28,value#29], (MetastoreRelation default, src, Some(y)), None
>   Exchange (HashPartitioning [key#30], 200)
>    HiveTableScan [key#30,value#31], (MetastoreRelation default, src, Some(z)), None
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org