You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by da...@apache.org on 2015/12/28 08:24:02 UTC

spark git commit: [SPARK-12520] [PYSPARK] [1.5] Ensure the join type is `inner` for equi-Join.

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 86161a4f7 -> 42286feb6


[SPARK-12520] [PYSPARK] [1.5] Ensure the join type is `inner` for equi-Join.

This PR is to add `assert` to ensure the join type is `inner` for equi-Join.

JIRA: https://issues.apache.org/jira/browse/SPARK-12520

In the JIRA, users specify the join type `outer` when using the equi-join. However, the result we returned is the `inner` join, which is the only type Spark 1.5 supports. (Note, starting from Spark 1.6, we can support the other types for equi-join).

For example,
```scala
joined_table = left_table.join(right_table, "joining_column", "outer")
```

Should we also back port it to 1.4? davies JoshRosen Thanks!

Author: gatorsmile <ga...@gmail.com>

Closes #10484 from gatorsmile/pythonEquiOuterJoin.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/42286feb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/42286feb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/42286feb

Branch: refs/heads/branch-1.5
Commit: 42286feb676f52b366c7be3f9ace4bfde55d72a9
Parents: 86161a4
Author: gatorsmile <ga...@gmail.com>
Authored: Sun Dec 27 23:23:57 2015 -0800
Committer: Davies Liu <da...@gmail.com>
Committed: Sun Dec 27 23:23:57 2015 -0800

----------------------------------------------------------------------
 python/pyspark/sql/dataframe.py | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/42286feb/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 2b23815..eb2c6e5 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -570,6 +570,7 @@ class DataFrame(object):
         if on is None or len(on) == 0:
             jdf = self._jdf.join(other._jdf)
         elif isinstance(on[0], basestring):
+            assert how is None or how == 'inner', "Equi-join does not support: %s" % how
             jdf = self._jdf.join(other._jdf, self._jseq(on))
         else:
             assert isinstance(on[0], Column), "on should be Column or list of Column"


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org