You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "zfoobar (via GitHub)" <gi...@apache.org> on 2023/04/09 19:46:10 UTC

[GitHub] [arrow] zfoobar opened a new pull request, #34993: GH-34987: [PYTHON] Truthy/Falsy for boolean scalars.

zfoobar opened a new pull request, #34993:
URL: https://github.com/apache/arrow/pull/34993

   Added exception handling for null value case (thanks Randolf) Added unit test
   
   
   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   ### What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   ### Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   ### Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34993: GH-34987: [PYTHON] Truthy/Falsy for boolean scalars.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1501200190

   * Closes: #34987


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] randolf-scholz commented on pull request #34993: GH-34987: [Python] Truthy/Falsy for boolean scalars

Posted by "randolf-scholz (via GitHub)" <gi...@apache.org>.
randolf-scholz commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1685415286

   @pitrou It's not quite as easy...
   
   - `bool(float("nan"))` is `True`. Historically, π™½πšŠπ™½-floats were often used as missing value indicators.
   - `bool(None)` is `False`.
   - `bool(NotImplemented)` is `True`, but gives a `DeprecationWarning`.
   - `bool(pandas.NA)` raises `TypeError`.
   - `pyarrow.NA.as_py()` returns `None`.
   
   The last point would be a reason to return None. However, one should consider how and when `bool` evaluation might pop up.
   
   - assert statements.
   - branching logic.
   
   Coercing a missing boolean scalar to `False` instead of raising an error can potentially lead to some very nasty and hard to debug issues. I'd wager that in the vast majority of cases, branching logic based on a missing boolean is just nonsense and should be dismissed.
   
   There is also some inherent inconsistency with conversion to python if one takes this route.
   In a perfect world, one would expect the following diagram to commute:
   
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚pa Scalarβ”œβ”€β”€β”€op───►│pa Scalarβ”‚
       β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
            β”‚                   β”‚
            β”‚                   β”‚
          as_py               as_py
            β”‚                   β”‚
            β–Ό                   β–Ό
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚py Scalarβ”œβ”€β”€β”€op───►│py Scalarβ”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   
   However, consider this example:
   
   
   ```python
   import pyarrow as pa
   
   pa_x = pa.scalar(None, type=pa.int64())
   pa_y = pa.compute.greater(x, None)
   result = y.as_py()   # None
   
   py_x = pa_x.as_py()
   result = py_x > 0   # TypeError: '>' not supported between instances of 'NoneType' and 'int'
   ```
   
   So we see that if we translated to the python world immediately, there would have been a `TypeError`.
   
   In order to make the diagram "commute", the only reasonable solution is therefore to raise a `TypeError` when converting the null-bool to python. This way `result` is the same in both branches - a `TypeError`.
   
   By coercing the null-bool to `False`, one hides this `TypeError` which as said before can lead to all sorts of hard to debug bugs.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar commented on pull request #34993: GH-34987: [PYTHON] Truthy/Falsy for boolean scalars.

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1501424564

   @AlenkaF I noticed the MacOS12 AMD64 runner failed for this upstream - I can't reproduce. Test does not time out on my side. MacOS 13/M1, test runs in a few seconds. 
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34993: GH-34987: [PYTHON] Truthy/Falsy for boolean scalars.

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1502858239

   > @AlenkaF I noticed the MacOS12 AMD64 runner failed for this upstream - I can't reproduce. Test does not time out on my side. MacOS 13/M1, test runs in a few seconds.
   
   Hm, I see both MacOS and AppVeyor have the same issue and I can see the same time out on other PRs also - it is not connected to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar commented on pull request #34993: GH-34987: [Python] Truthy/Falsy for boolean scalars

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1513585682

   @AlenkaF it looks like the jury is still out on this PR. Do you want me to close it or expand the discussion to include _bool support for the other data types?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34993: GH-34987: [Python] Truthy/Falsy for boolean scalars

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1513622702

   > @AlenkaF it looks like the jury is still out on this PR. Do you want me to close it or expand the discussion to include _bool support for the other data types?
   
   Oh, so sorry for being unresponsive! Will look at the discussion on the issue and see what would be the best thing to do. If there is still no activity from us by the end of this week, please ping me again πŸ™


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on pull request #34993: GH-34987: [Python] Truthy/Falsy for boolean scalars

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1546119729

   This implementation LGTM! 
   
   I think supporting `__bool__` for Boolean Scalars makes sense, and I agree we raise exception on NULL since its ambiguous. I would forego any implementation of `__bool__` with other scalars for the time being. There are bigger philosophical questions for other scalars such as how to handle 3-value logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #34993: GH-34987: [Python] Truthy/Falsy for boolean scalars

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1549805782

   Having `__bool__` return something else than a boolean, or raise an exception, is generally rather nasty as it breaks a widespread expectation about boolean testing.
   
   Also the boolean value of `None` is well-defined:
   ```pycon
   >>> bool(None)
   False
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org