You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/01/28 08:23:00 UTC

[jira] [Commented] (ARROW-11412) [Python] (C++?) Expression evaluation problem for logical boolean expressions (and, or, not)

    [ https://issues.apache.org/jira/browse/ARROW-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273407#comment-17273407 ] 

Joris Van den Bossche commented on ARROW-11412:
-----------------------------------------------

[~romankarlstetter] Thanks for the report!

So using {{&}} instead of {{and}} does work (and the same for {{|}} instead of {{or}}, and {{~}} instead of {{not}}):

{code}
In [10]: ds.scalar(False) & ds.scalar(True)
Out[10]: <pyarrow.dataset.Expression (false and true)>
{code}

(note it gives "False and True", because the expression is only captured and not directly simplified)

Now, the reason for the unexpected results is that we don't control the behaviour of {{and}} and {{or}} (Python let's you override & and | with bitwise {{__and__}} and {{__or__}} operators). So it is using "plain" Python logic for the {{and}} and {{or}} operators. In which case it looks at the "truthiness" of the object ({{bool(..)}}, which _can_ be overriden with {{__bool__}}). And because we currently don't override this, each expression (also the "False" expression) simply is seen as "true". 

All the example return values you show above follow from that. For example in {{ds.scalar(False) or ds.scalar(True)}}, Python will first check if the left value is "true", if that's the case return it ({{or}} cuts short here without evaluating the right side), and otherwise check whether the right side value is "true". In our case, because {{ds.scalar(False)}}  is "true", that is simply returned. You can observe something similar by doing {{2 or 3}}, which will return 2 because it is a "truthy" value.  
Something similar can be explained for the other examples (also the ones with the "expected" result are actually not fully correct, e.g. {{not ds.scalar(True)}} no longer returns an expression, which is not what we would ideally want). 

Now, we are limited here to what Python let's us customize. So I don't think we are able to fully get {{and}}, {{or}} and {{not}} working as we would like. The better option might be to raise an error in {{__bool__}}, with an informative error message to avoid that people run into this trap (similarly as eg numpy arrays also raise in {{__bool__}}, try eg {{not np.array([1, 2])}})

> [Python] (C++?) Expression evaluation problem for logical boolean expressions (and, or, not)
> --------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11412
>                 URL: https://issues.apache.org/jira/browse/ARROW-11412
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 2.0.0, 3.0.0
>            Reporter: Roman Karlstetter
>            Priority: Major
>
> There's a problem with boolean "and", "or" and "not" expressions when creating them in python (or I'm doing something completely stupid).
>  
> {code:java}
> >>> import pyarrow
> >>> pyarrow.__version__
> '3.0.0'
> >>> import pyarrow.dataset as ds
> >>> ds. scalar(False) and ds.scalar(True) # <--- I expect false
> <pyarrow.dataset.Expression true>
> >>> ds.scalar(True) and ds.scalar(False) # this works
> <pyarrow.dataset.Expression false>
> >>> ds.scalar(True) or ds.scalar(False) # this works
> <pyarrow.dataset.Expression true>
> >>> ds.scalar(False) or ds.scalar(True) # <--- I expect true
> <pyarrow.dataset.Expression false>
> >>> not ds.scalar(True)   # this works                                                                                                                                                                                                  
> False                                                                                                                                                                                                                       
> >>> not ds.scalar(False)      <--- I expect true                                                                                                                                                                                              
> False  
> {code}
> I tried to figure out what goes wrong here, but there are no obvious problems in the python code, same for C++ (but I didn't quite understand everything of it yet).
>  
> This happens with pyarrow3 and pyarrow2
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)