You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/10 12:16:13 UTC

[GitHub] [arrow-rs] garyanaplan edited a comment on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

garyanaplan edited a comment on issue #349:
URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-858568206


   Looks like that hard-coded value (256) in the bit-writer is the root cause. When writing, if we try to put > 2048 boolean values, then the writer just "ignores" the writes. This is caused by the fact that bool encoder silently ignores calls to put_value that return false.
   
   I have a fix for this which works by extending the size of the BitWriter (in 256 byte) increments and also checks the return of put_value in BoolType::encode() and raises an error if the call fails.
   
   Can anyone comment on this approach?
   
   (diff attached)
   
   [a.diff.txt](https://github.com/apache/arrow-rs/files/6631262/a.diff.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org