You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/04/26 09:14:12 UTC
[jira] [Assigned] (SPARK-14761) PySpark DataFrame.join should
reject invalid join methods even when join columns are not specified
[ https://issues.apache.org/jira/browse/SPARK-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-14761:
------------------------------------
Assignee: (was: Apache Spark)
> PySpark DataFrame.join should reject invalid join methods even when join columns are not specified
> --------------------------------------------------------------------------------------------------
>
> Key: SPARK-14761
> URL: https://issues.apache.org/jira/browse/SPARK-14761
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Reporter: Josh Rosen
> Priority: Minor
> Labels: starter
>
> In PySpark, the following invalid DataFrame join will not result an error:
> {code}
> df1.join(df2, how='not-a-valid-join-type')
> {code}
> The signature for `join` is
> {code}
> def join(self, other, on=None, how=None):
> {code}
> and its code ends up completely skipping handling of the `how` parameter when `on` is `None`:
> {code}
> if on is not None and not isinstance(on, list):
> on = [on]
> if on is None or len(on) == 0:
> jdf = self._jdf.join(other._jdf)
> elif isinstance(on[0], basestring):
> if how is None:
> jdf = self._jdf.join(other._jdf, self._jseq(on), "inner")
> else:
> assert isinstance(how, basestring), "how should be basestring"
> jdf = self._jdf.join(other._jdf, self._jseq(on), how)
> else:
> {code}
> Given that this behavior can mask user errors (as in the above example), I think that we should refactor this to first process all arguments and then call the three-argument {{_.jdf.join}}. This would handle the above invalid example by passing all arguments to the JVM DataFrame for analysis.
> I'm not planning to work on this myself, so this bugfix (+ regression test!) is up for grabs in case someone else wants to do it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org