You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by da...@apache.org on 2015/12/28 08:24:02 UTC
spark git commit: [SPARK-12520] [PYSPARK] [1.5] Ensure the join type
is `inner` for equi-Join.
Repository: spark
Updated Branches:
refs/heads/branch-1.5 86161a4f7 -> 42286feb6
[SPARK-12520] [PYSPARK] [1.5] Ensure the join type is `inner` for equi-Join.
This PR is to add `assert` to ensure the join type is `inner` for equi-Join.
JIRA: https://issues.apache.org/jira/browse/SPARK-12520
In the JIRA, users specify the join type `outer` when using the equi-join. However, the result we returned is the `inner` join, which is the only type Spark 1.5 supports. (Note, starting from Spark 1.6, we can support the other types for equi-join).
For example,
```scala
joined_table = left_table.join(right_table, "joining_column", "outer")
```
Should we also back port it to 1.4? davies JoshRosen Thanks!
Author: gatorsmile <ga...@gmail.com>
Closes #10484 from gatorsmile/pythonEquiOuterJoin.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/42286feb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/42286feb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/42286feb
Branch: refs/heads/branch-1.5
Commit: 42286feb676f52b366c7be3f9ace4bfde55d72a9
Parents: 86161a4
Author: gatorsmile <ga...@gmail.com>
Authored: Sun Dec 27 23:23:57 2015 -0800
Committer: Davies Liu <da...@gmail.com>
Committed: Sun Dec 27 23:23:57 2015 -0800
----------------------------------------------------------------------
python/pyspark/sql/dataframe.py | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/42286feb/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 2b23815..eb2c6e5 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -570,6 +570,7 @@ class DataFrame(object):
if on is None or len(on) == 0:
jdf = self._jdf.join(other._jdf)
elif isinstance(on[0], basestring):
+ assert how is None or how == 'inner', "Equi-join does not support: %s" % how
jdf = self._jdf.join(other._jdf, self._jseq(on))
else:
assert isinstance(on[0], Column), "on should be Column or list of Column"
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org