You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/07/03 00:27:04 UTC

[jira] [Updated] (SPARK-8573) For PySpark's DataFrame API, we need to throw exceptions when users try to use and/or/not

     [ https://issues.apache.org/jira/browse/SPARK-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin updated SPARK-8573:
-------------------------------
    Fix Version/s: 1.5.0
                   1.4.1

> For PySpark's DataFrame API, we need to throw exceptions when users try to use and/or/not
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-8573
>                 URL: https://issues.apache.org/jira/browse/SPARK-8573
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark, SQL
>    Affects Versions: 1.3.0
>            Reporter: Yin Huai
>            Assignee: Davies Liu
>            Priority: Critical
>             Fix For: 1.4.1, 1.5.0
>
>
> In PySpark's DataFrame API, we have
> {code}
> # `and`, `or`, `not` cannot be overloaded in Python,
> # so use bitwise operators as boolean operators
> __and__ = _bin_op('and')
> __or__ = _bin_op('or')
> __invert__ = _func_op('not')
> __rand__ = _bin_op("and")
> __ror__ = _bin_op("or")
> {code}
> Right now, users can still use operators like {{and}}, which can cause very confusing behaviors. We need to throw an error when users try to use them and let them know what is the right way to do.
> For example, 
> {code}
> df = sqlContext.range(1, 10)
> df.id > 5 or df.id < 10
> Out[30]: Column<(id > 5)>
> df.id > 5 and df.id < 10
> Out[31]: Column<(id < 10)>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org