You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/10/02 13:52:50 UTC

[PR] GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe [arrow]

AlenkaF opened a new pull request, #37975:
URL: https://github.com/apache/arrow/pull/37975

   ### Rationale for this change
   
   Bit-packed booleans are currently not supported in the `from_dataframe` of the Dataframe Interchange Protocol.
   
   Note: We currently represent booleans in the pyarrow implementation as `uint8` which will also need to be changed in a follow-up PR (see https://github.com/data-apis/dataframe-api/issues/227). 
   
   ### What changes are included in this PR?
   
   This PR adds the support for bit-packed booleans when consuming a dataframe interchange object.
   
   ### Are these changes tested?
   
   Only locally, currently!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #37975:
URL: https://github.com/apache/arrow/pull/37975#issuecomment-1744930573

   Maybe we can also merge this, if you can test it locally with polars, and then open an issue to remind ourselves to add tests for it later (or when we export such data ourselves). 
   
   Merging this sooner rather than later would be good, as otherwise it breaks polars->pyarrow conversion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #37975:
URL: https://github.com/apache/arrow/pull/37975#issuecomment-1746844283

   Tested the code from the [original report](https://github.com/apache/arrow/issues/33982#issuecomment-1669278644):
   
   ![Screenshot 2023-10-04 at 15 06 36](https://github.com/apache/arrow/assets/16418547/51acf41a-c3ef-4897-8535-198408fd7808)
   
   
   We can marge.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #37975:
URL: https://github.com/apache/arrow/pull/37975#issuecomment-1744819662

   Implementation looks good, just needs some tests then (I assume the problem here is that we ourselves don't generate this (yet), so we can't actually test it easily?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #37975:
URL: https://github.com/apache/arrow/pull/37975#issuecomment-1744899926

   > Implementation looks good, just needs some tests then (I assume the problem here is that we ourselves don't generate this (yet), so we can't actually test it easily?)
   
   Yeah 😬 I think the only other library that **will** use bit-packed booleans is Polars.
   
   I guess this PR should stay draft until we generate this ourselves?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF merged PR #37975:
URL: https://github.com/apache/arrow/pull/37975


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe [arrow]

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #37975:
URL: https://github.com/apache/arrow/pull/37975#issuecomment-1744941395

   OK, I will test one more time and then ping you for merge (or I can do it, if that is OK).
   
   I plan to create an issue for us to export bit-packed boolean data so we can discuss what would be the correct approach. I will add the note about the tests there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe [arrow]

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #37975:
URL: https://github.com/apache/arrow/pull/37975#issuecomment-1750054321

   After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 161510e4131976712ea1588c7649b4ccdebdb5e0.
   
   There were 3 benchmark results indicating a performance regression:
   
   - Commit Run on `ursa-i9-9960x` at [2023-10-05 23:00:33Z](https://conbench.ursa.dev/compare/runs/8a68c0b0b8ce41db8631e8e236af9235...f4d6b6343f7a436ba3894f871a524b0c/)
     - [`file-read` (Python) with compression=uncompressed, dataset=nyctaxi_2010-01, file_type=feather, output_type=dataframe](https://conbench.ursa.dev/compare/benchmarks/0651f212387e791480005620bf6fafd3...0651f4cf05fd70248000e038b623cb64)
     - [`file-read` (Python) with compression=uncompressed, dataset=nyctaxi_2010-01, file_type=feather, output_type=table](https://conbench.ursa.dev/compare/benchmarks/0651f211b81178158000026790082f63...0651f4ce6a75799e80009fac6aae3b49)
   - and 1 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/17455255433) has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org