You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "randolf-scholz (via GitHub)" <gi...@apache.org> on 2023/04/08 18:09:38 UTC

[GitHub] [arrow] randolf-scholz opened a new issue, #34987: `bool(pa.scalar(False))` evaluates to `True`

randolf-scholz opened a new issue, #34987:
URL: https://github.com/apache/arrow/issues/34987

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   This can easily cause bugs: `assert pa.scalar(False)` will not raise `AssertionError`!
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] randolf-scholz commented on issue #34987: `bool(pa.scalar(False))` evaluates to `True`

Posted by "randolf-scholz (via GitHub)" <gi...@apache.org>.
randolf-scholz commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501000369

   > I guess the question would be: what wouldn't be acceptable to pyarrow as a scalar? Most of the types return scalar objects - including list, tuple, strings, etc.
   
   It's not about what is acceptable as a scalar. It's about which scalars should be considered Truthy and Falsey. In python the convention is:
   
   - Boolean/Integer/Float/Complex: 0 → False, other →True
   - Container-Types (string/list/tuple): same as `bool(len(obj))`
   
   For other types (dates, timestamps, etc.) I'm not sure a boolean conversion makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] randolf-scholz commented on issue #34987: `bool(pa.scalar(False))` evaluates to `True`

Posted by "randolf-scholz (via GitHub)" <gi...@apache.org>.
randolf-scholz commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1500999070

   @zfoobar `pyarrow.BooleanScalar` needs to implement the `__bool__` magic method and return the appropriate python boolean.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] randolf-scholz commented on issue #34987: `bool(pa.scalar(False))` evaluates to `True`

Posted by "randolf-scholz (via GitHub)" <gi...@apache.org>.
randolf-scholz commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501083185

   I think it is worth discussing what to do in case when the boolean is null. It might make more sense to raise an Exception in this  case rather than silently casting to `False`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar commented on issue #34987: [Python] `bool(pa.scalar(False))` evaluates to True

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501820553

   ExSent from my iPhoneOn Apr 8, 2023, at 4:57 PM, Randolf Scholz ***@***.***> wrote:
   
   I guess the question would be: what wouldn't be acceptable to pyarrow as a scalar? Most of the types return scalar objects - including list, tuple, strings, etc.
   
   It's not about what is acceptable as a scalar. It's about which scalars should be considered Truthy and Falsey. In python the convention is:
   
   Boolean/Integer/Float/Complex: 0 → False, other →True
   Container-Types (string/list/tuple): same as bool(len(obj))
   
   For other types (dates, timestamps, etc.) I'm not sure a boolean conversion makes sense.
   
   —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar commented on issue #34987: [Python] `bool(pa.scalar(False))` evaluates to True

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501930878

   Just coming at this from the lens of a newb user - not even considering the Null case:
   
   ```
   >>> a = pa.scalar(1)
   >>> b = pa.scalar(2)
   >>> assert(pc.equal(a,b))
   >>> 
   
   ```
   
   ^^ It wouldn't immediately occur to me to even know what as_py() is or to use it to ensure the expected behavior in this case. A solid RTFM could be in order but to my eye it wasn't immediately clear that this should be done. 
   
   Agree that __eq__ returning boolean object, which can then be relied on to provide __bool__ seems intuitive.
   
   To @jorisvandenbossche's point, I did feel odd only adding __bool__ implementation for a single data type. =/ 
   
   Happy to take another swing at this if we can align on an implementation, otherwise as_py() it is... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar commented on issue #34987: `bool(pa.scalar(False))` evaluates to `True`

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501200509

   @randolf-scholz PR updated with your wonderful insights.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar commented on issue #34987: `bool(pa.scalar(False))` evaluates to `True`

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1500987373

   False is a valid value for a Boolean Scalar in this context, so it won't return 0, it will return an object specialized as boolean. 
   
   I guess the question would be: what wouldn't be acceptable to pyarrow as a scalar? Most of the types return scalar objects - including list, tuple, strings, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] randolf-scholz commented on issue #34987: [Python] `bool(pa.scalar(False))` evaluates to True

Posted by "randolf-scholz (via GitHub)" <gi...@apache.org>.
randolf-scholz commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501722888

   @jorisvandenbossche 
   
   > We could also say: if you want python-like scalar behaviour, just call as_py() on the pyarrow scalar.
   (and we could also raise an error in __bool__ always, to make it clear this method isn't actually implemented)
   
   Certainly that could be better than the current situation. I opened this issue after the following code silently failed:
   
   ```python
   assert pa.compute.all(pa.compute.invert(array.is_null())
   ```
   
   >  Currently `__eq__` is very dumb, so doing `pa.scalar(None) == pa.scalar(None)` gives `True` (while that doesn't match element-wise equality behavior for the array with equivalent values) or it requires exact types (again not being consistent with the element-wise "equal" kernel)
   
   That just seems really bad behavior as well. `pa.compute.equal(pa.scalar(None, type=pa.bool_()), pa.scalar(None, type=pa.bool_()))` returns `<pyarrow.BooleanScalar: None>`, so `(pa.scalar(None, type=pa.bool_()), pa.scalar(None, type=pa.bool_())` should return this as well. This should be fine, since if the result of this equality will be used in context where a python boolean is expected the casting to bool will raise the TypeError, if we follow option 1 from my other comment.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on issue #34987: [Python] `bool(pa.scalar(False))` evaluates to True

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1513798964

   I'll add a +1 for implementing the intuitive usage of `__bool__` and `__eq__` across scalar types (maybe split into 2 PRs). I don't think requiring usage of `as_py()` is as user-friendly in the long term.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar commented on issue #34987: `bool(pa.scalar(False))` evaluates to `True`

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501006764

   Makes total sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on issue #34987: [Python] `bool(pa.scalar(False))` evaluates to True

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1516598605

   After further thought, its probably best to just update for the `bool` scalar now and save the rest of this discussion/implementation for a larger refactor (e.g. how should each scalar handle bool, eq, lt, gt, etc.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] randolf-scholz commented on issue #34987: `bool(pa.scalar(False))` evaluates to `True`

Posted by "randolf-scholz (via GitHub)" <gi...@apache.org>.
randolf-scholz commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501596450

   @zfoobar LGTM. Possibly `pylint` will complain about is the [unnecessary else](https://pylint.readthedocs.io/en/latest/user_guide/messages/refactor/no-else-return.html), but that's nitpicking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #34987: [Python] `bool(pa.scalar(False))` evaluates to True

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #34987:
URL: https://github.com/apache/arrow/issues/34987#issuecomment-1501623892

   While I certainly agree the current behaviour is not ideal, I am a bit hesitant to just add `__bool__` to the boolean scalars. 
   
   - Several broader questions for `__bool__` itself:
     - If we add it to BooleanScalar, we should probably add it to all other scalars as well?
     - If we add it to other scalar types, what should be its behaviour? For other numeric types it probably makes sense to follow 0 being falsey. But personally I am not fully convinced we should start making ListScalars truthy/falsey depending on the length of the list scalar value.
     - As @randolf-scholz already mentioned, we then also need to make a decision about the truthy value of null.
   - But aside from `bool(..)` there are other aspects that people would then expect from normal "scalar" values, such as equality. Currently `__eq__` is very dumb, so doing `pa.scalar(None) == pa.scalar(None)` gives True (while that doesn't match element-wise equality behavior for the array with equivalent values) or it requires exact types (again not being consistent with the element-wise "equal" kernel)
   
   We could also say: if you want python-like scalar behaviour, just call `as_py()` on the pyarrow scalar.  
   (and we could also raise an error in `__bool__` always, to make it clear this method isn't actually implemented)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org