You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2014/06/19 23:13:24 UTC

[jira] [Created] (SPARK-2205) Unnecessary exchange operators in a join on multiple tables with the same join key.

Yin Huai created SPARK-2205:
-------------------------------

             Summary: Unnecessary exchange operators in a join on multiple tables with the same join key.
                 Key: SPARK-2205
                 URL: https://issues.apache.org/jira/browse/SPARK-2205
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Yin Huai


{code}
hql("select * from src x join src y on (x.key=y.key) join src z on (y.key=z.key)")

SchemaRDD[1] at RDD at SchemaRDD.scala:100
== Query Plan ==
Project [key#4:0,value#5:1,key#6:2,value#7:3,key#8:4,value#9:5]
 HashJoin [key#6], [key#8], BuildRight
  Exchange (HashPartitioning [key#6], 200)
   HashJoin [key#4], [key#6], BuildRight
    Exchange (HashPartitioning [key#4], 200)
     HiveTableScan [key#4,value#5], (MetastoreRelation default, src, Some(x)), None
    Exchange (HashPartitioning [key#6], 200)
     HiveTableScan [key#6,value#7], (MetastoreRelation default, src, Some(y)), None
  Exchange (HashPartitioning [key#8], 200)
   HiveTableScan [key#8,value#9], (MetastoreRelation default, src, Some(z)), None
{code}

However, this is fine...
{code}
hql("select * from src x join src y on (x.key=y.key) join src z on (x.key=z.key)")

res5: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[5] at RDD at SchemaRDD.scala:100
== Query Plan ==
Project [key#26:0,value#27:1,key#28:2,value#29:3,key#30:4,value#31:5]
 HashJoin [key#26], [key#30], BuildRight
  HashJoin [key#26], [key#28], BuildRight
   Exchange (HashPartitioning [key#26], 200)
    HiveTableScan [key#26,value#27], (MetastoreRelation default, src, Some(x)), None
   Exchange (HashPartitioning [key#28], 200)
    HiveTableScan [key#28,value#29], (MetastoreRelation default, src, Some(y)), None
  Exchange (HashPartitioning [key#30], 200)
   HiveTableScan [key#30,value#31], (MetastoreRelation default, src, Some(z)), None
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)