You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2018/10/23 17:27:08 UTC

Documentation of boolean column operators missing?

I can’t seem to find any documentation of the &, |, and ~ operators for
PySpark DataFrame columns. I assume that should be in our docs somewhere.

Was it always missing? Am I just missing something obvious?

Nick

Re: Documentation of boolean column operators missing?

Posted by Nicholas Chammas <ni...@gmail.com>.
Nope, that’s different. I’m talking about the operators on DataFrame
columns in PySpark, not SQL functions.

For example:

(df
    .where(~col('is_exiled') & (col('age') > 60))
    .show()
)


On Tue, Oct 23, 2018 at 1:48 PM Xiao Li <li...@databricks.com> wrote:

> They are documented at the link below
>
> https://spark.apache.org/docs/2.3.0/api/sql/index.html
>
>
>
> On Tue, Oct 23, 2018 at 10:27 AM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> I can’t seem to find any documentation of the &, |, and ~ operators for
>> PySpark DataFrame columns. I assume that should be in our docs somewhere.
>>
>> Was it always missing? Am I just missing something obvious?
>>
>> Nick
>>
>
>
> --
> [image: Spark+AI Summit North America 2019]
> <http://t.sidekickopen24.com/s1t/c/5/f18dQhb0S7lM8dDMPbW2n0x6l2B9nMJN7t5X-FfhMynN2z8MDjQsyTKW56dzQQ1-_gV6102?t=https%3A%2F%2Fdatabricks.com%2Fsparkaisummit%2Fnorth-america&si=undefined&pi=406b8c9a-b648-4923-9ed1-9a51ffe213fa>
>

Re: Documentation of boolean column operators missing?

Posted by Xiao Li <li...@databricks.com>.
They are documented at the link below

https://spark.apache.org/docs/2.3.0/api/sql/index.html



On Tue, Oct 23, 2018 at 10:27 AM Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> I can’t seem to find any documentation of the &, |, and ~ operators for
> PySpark DataFrame columns. I assume that should be in our docs somewhere.
>
> Was it always missing? Am I just missing something obvious?
>
> Nick
>


-- 
[image: Spark+AI Summit North America 2019]
<http://t.sidekickopen24.com/s1t/c/5/f18dQhb0S7lM8dDMPbW2n0x6l2B9nMJN7t5X-FfhMynN2z8MDjQsyTKW56dzQQ1-_gV6102?t=https%3A%2F%2Fdatabricks.com%2Fsparkaisummit%2Fnorth-america&si=undefined&pi=406b8c9a-b648-4923-9ed1-9a51ffe213fa>

Re: Documentation of boolean column operators missing?

Posted by Nicholas Chammas <ni...@gmail.com>.
On Tue, 23 Oct 2018 at 21:32, Sean Owen <sr...@gmail.com> wrote:
>
>> The comments say that it is not possible to overload 'and' and 'or',
>> which would have been more natural.
>>
> Yes, unfortunately, Python does not allow you to override and, or, or not.
They are not implemented as “dunder” method (e.g. __add__()) and they
implement special short-circuiting logic that’s not possible to reproduce
with a function call. I think we made the most practical choice in
overriding the bitwise operators.

In any case, I’ll file a JIRA ticket about this, and maybe also submit a PR
to close it, adding documentation about PySpark column boolean operators to
the programming guide.

Nick

Re: Documentation of boolean column operators missing?

Posted by Maciej Szymkiewicz <ms...@gmail.com>.
Even if these were documented Sphinx doesn't include dunder methods by
default (with exception to __init__). There is :special-members: option
which could be passed to, for example, autoclass.

On Tue, 23 Oct 2018 at 21:32, Sean Owen <sr...@gmail.com> wrote:

> (& and | are both logical and bitwise operators in Java and Scala, FWIW)
>
> I don't see them in the python docs; they are defined in column.py but
> they don't turn up in the docs. Then again, they're not documented:
>
> ...
> __and__ = _bin_op('and')
> __or__ = _bin_op('or')
> __invert__ = _func_op('not')
> __rand__ = _bin_op("and")
> __ror__ = _bin_op("or")
> ...
>
> I don't know if there's a good reason for it, but go ahead and doc
> them if they can be.
> While I suspect their meaning is obvious once it's clear they aren't
> the bitwise operators, that part isn't obvious/ While it matches
> Java/Scala/Scala-Spark syntax, and that's probably most important, it
> isn't typical for python.
>
> The comments say that it is not possible to overload 'and' and 'or',
> which would have been more natural.
>
> On Tue, Oct 23, 2018 at 2:20 PM Nicholas Chammas
> <ni...@gmail.com> wrote:
> >
> > Also, to clarify something for folks who don't work with PySpark: The
> boolean column operators in PySpark are completely different from those in
> Scala, and non-obvious to boot (since they overload Python's _bitwise_
> operators). So their apparent absence from the docs is surprising.
> >
> > On Tue, Oct 23, 2018 at 3:02 PM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
> >>
> >> So it appears then that the equivalent operators for PySpark are
> completely missing from the docs, right? That’s surprising. And if there
> are column function equivalents for |, &, and ~, then I can’t find those
> either for PySpark. Indeed, I don’t think such a thing is possible in
> PySpark. (e.g. (col('age') > 0).and(...))
> >>
> >> I can file a ticket about this, but I’m just making sure I’m not
> missing something obvious.
> >>
> >>
> >> On Tue, Oct 23, 2018 at 2:50 PM Sean Owen <sr...@gmail.com> wrote:
> >>>
> >>> Those should all be Column functions, really, and I see them at
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
> >>>
> >>> On Tue, Oct 23, 2018, 12:27 PM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
> >>>>
> >>>> I can’t seem to find any documentation of the &, |, and ~ operators
> for PySpark DataFrame columns. I assume that should be in our docs
> somewhere.
> >>>>
> >>>> Was it always missing? Am I just missing something obvious?
> >>>>
> >>>> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Documentation of boolean column operators missing?

Posted by Sean Owen <sr...@gmail.com>.
(& and | are both logical and bitwise operators in Java and Scala, FWIW)

I don't see them in the python docs; they are defined in column.py but
they don't turn up in the docs. Then again, they're not documented:

...
__and__ = _bin_op('and')
__or__ = _bin_op('or')
__invert__ = _func_op('not')
__rand__ = _bin_op("and")
__ror__ = _bin_op("or")
...

I don't know if there's a good reason for it, but go ahead and doc
them if they can be.
While I suspect their meaning is obvious once it's clear they aren't
the bitwise operators, that part isn't obvious/ While it matches
Java/Scala/Scala-Spark syntax, and that's probably most important, it
isn't typical for python.

The comments say that it is not possible to overload 'and' and 'or',
which would have been more natural.

On Tue, Oct 23, 2018 at 2:20 PM Nicholas Chammas
<ni...@gmail.com> wrote:
>
> Also, to clarify something for folks who don't work with PySpark: The boolean column operators in PySpark are completely different from those in Scala, and non-obvious to boot (since they overload Python's _bitwise_ operators). So their apparent absence from the docs is surprising.
>
> On Tue, Oct 23, 2018 at 3:02 PM Nicholas Chammas <ni...@gmail.com> wrote:
>>
>> So it appears then that the equivalent operators for PySpark are completely missing from the docs, right? That’s surprising. And if there are column function equivalents for |, &, and ~, then I can’t find those either for PySpark. Indeed, I don’t think such a thing is possible in PySpark. (e.g. (col('age') > 0).and(...))
>>
>> I can file a ticket about this, but I’m just making sure I’m not missing something obvious.
>>
>>
>> On Tue, Oct 23, 2018 at 2:50 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>> Those should all be Column functions, really, and I see them at http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
>>>
>>> On Tue, Oct 23, 2018, 12:27 PM Nicholas Chammas <ni...@gmail.com> wrote:
>>>>
>>>> I can’t seem to find any documentation of the &, |, and ~ operators for PySpark DataFrame columns. I assume that should be in our docs somewhere.
>>>>
>>>> Was it always missing? Am I just missing something obvious?
>>>>
>>>> Nick

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Documentation of boolean column operators missing?

Posted by Nicholas Chammas <ni...@gmail.com>.
Also, to clarify something for folks who don't work with PySpark: The
boolean column operators in PySpark are completely different from those in
Scala, and non-obvious to boot (since they overload Python's _bitwise_
operators). So their apparent absence from the docs is surprising.

On Tue, Oct 23, 2018 at 3:02 PM Nicholas Chammas <ni...@gmail.com>
wrote:

> So it appears then that the equivalent operators for PySpark are
> completely missing from the docs, right? That’s surprising. And if there
> are column function equivalents for |, &, and ~, then I can’t find those
> either for PySpark. Indeed, I don’t think such a thing is possible in
> PySpark. (e.g. (col('age') > 0).and(...))
>
> I can file a ticket about this, but I’m just making sure I’m not missing
> something obvious.
>
> On Tue, Oct 23, 2018 at 2:50 PM Sean Owen <sr...@gmail.com> wrote:
>
>> Those should all be Column functions, really, and I see them at
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
>>
>> On Tue, Oct 23, 2018, 12:27 PM Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> I can’t seem to find any documentation of the &, |, and ~ operators for
>>> PySpark DataFrame columns. I assume that should be in our docs somewhere.
>>>
>>> Was it always missing? Am I just missing something obvious?
>>>
>>> Nick
>>>
>>

Re: Documentation of boolean column operators missing?

Posted by Nicholas Chammas <ni...@gmail.com>.
So it appears then that the equivalent operators for PySpark are completely
missing from the docs, right? That’s surprising. And if there are column
function equivalents for |, &, and ~, then I can’t find those either for
PySpark. Indeed, I don’t think such a thing is possible in PySpark.
(e.g. (col('age')
> 0).and(...))

I can file a ticket about this, but I’m just making sure I’m not missing
something obvious.

On Tue, Oct 23, 2018 at 2:50 PM Sean Owen <sr...@gmail.com> wrote:

> Those should all be Column functions, really, and I see them at
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
>
> On Tue, Oct 23, 2018, 12:27 PM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> I can’t seem to find any documentation of the &, |, and ~ operators for
>> PySpark DataFrame columns. I assume that should be in our docs somewhere.
>>
>> Was it always missing? Am I just missing something obvious?
>>
>> Nick
>>
>

Re: Documentation of boolean column operators missing?

Posted by Sean Owen <sr...@gmail.com>.
Those should all be Column functions, really, and I see them at
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column

On Tue, Oct 23, 2018, 12:27 PM Nicholas Chammas <ni...@gmail.com>
wrote:

> I can’t seem to find any documentation of the &, |, and ~ operators for
> PySpark DataFrame columns. I assume that should be in our docs somewhere.
>
> Was it always missing? Am I just missing something obvious?
>
> Nick
>