You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/05/17 10:18:00 UTC

[jira] [Commented] (ARROW-12695) [Python] bool value of scalars depends on data type

    [ https://issues.apache.org/jira/browse/ARROW-12695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346046#comment-17346046 ] 

Joris Van den Bossche commented on ARROW-12695:
-----------------------------------------------

Currently pyarrow doesn't implement any {{\_\_bool\_\_}}. In general, Python will then always return True by default, but it seems that if your object is "sequence-like" (having a {\_\_len\_\_}}), it will check the length. This is described at https://docs.python.org/3/library/stdtypes.html#truth-value-testing

So here the underlying reason is that this fails:

{code}
>>> len(pa.scalar([1, 2], type=pa.list_(pa.int32())))
2

>>> len(pa.scalar(None, type=pa.list_(pa.int32())))
...
TypeError: object of type 'NoneType' has no len()
{code}

But the question is also, what should this return instead? Returning 0 in this case also doesn't feel correct, as you can also have an empty list scalar with a length of zero.

In general, I think it will be hard to give a nice and consistent interface for pyarrow scalars involving null scalars (we could provide better error messages though?)

[~mosalx] what's your use case for wanting to do {{bool(null_scalar)}}, and what do you think it should return? (also True as the other scalars?)

> [Python] bool value of scalars depends on data type
> ---------------------------------------------------
>
>                 Key: ARROW-12695
>                 URL: https://issues.apache.org/jira/browse/ARROW-12695
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 4.0.0
>         Environment: Windows 10
> python 3.9.4
>            Reporter: Sergey Mozharov
>            Priority: Major
>
> `pyarrow.Scalar` and its subclasses do not implement `__bool__` method. The default implementation does not seem to do the right thing. For example:
> {code:java}
> >>> import pyarrow as pa
> >>> na_value = pa.scalar(None, type=pa.int32())
> >>> bool(na_value)
> True
> >>> na_value = pa.scalar(None, type=pa.struct([('a', pa.int32())]))
> >>> bool(na_value)
> False
> >>> bool(pa.scalar(None, type=pa.list_(pa.int32())))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyarrow\scalar.pxi", line 572, in pyarrow.lib.ListScalar.__len__
> TypeError: object of type 'NoneType' has no len()
> >>>
> {code}
> Please consider implementing `___bool____` method. It seems reasonable to delegate to the `____bool___` method of the wrapped object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)