You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/19 18:47:15 UTC

[GitHub] [arrow] wjones1 commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

wjones1 commented on pull request #6979:
URL: https://github.com/apache/arrow/pull/6979#issuecomment-695343480


   I'm back on this for the weekend and will be back as needed the week after next.
   
   @jorisvandenbossche I can confirm that once I merge in the latest changes from apache master, I am getting the batches to be the expected size (and spanning across the parquet chunks). I have updated the `test_iter_batches_columns_reader` unit test accordingly.
   
   However, I have found that if the columns selected include a categorical column, then it reverts back to the behavior I was seeing before. You should be able to reproduce this by editing line 178 of the parquet tests to include `categorical=True`, and will find that the `test test_iter_batches_columns_reader` will then fail. I'm fine leaving this fix for a future story.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org