You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/06 11:20:08 UTC

[GitHub] [arrow-rs] wolfv opened a new issue, #3029: The `num_values` number computation in rle.rs is wrong

wolfv opened a new issue, #3029:
URL: https://github.com/apache/arrow-rs/issues/3029

   **Describe the bug**
   
   Reading a specific parquet file triggers: thread 'main' panicked at 'index out of bounds: the len is 1024 but the index is 1024', /Users/wolfvollprecht/Programs/arrow-rs/parquet/src/encodings/rle.rs:492:25
   
   The max-index size computation is wrong.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3029: RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #3029:
URL: https://github.com/apache/arrow-rs/issues/3029#issuecomment-1305076395

   I've found the underlying cause of this is an accounting bug in `RLEDecoder::get_batch_with_dict`
   
   In particular if the runs are longer than 1024, it may try to read more values from the underlying bit reader than there is capacity for. If the actual number of values is not a multiple of 8, this will return more values, as the length of bit packed runs is actually ambiguous. Such a scenario will result in a panic when it tries to copy these values across.
   
   Will post a PR to fix shortly
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #3029: RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #3029: RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024
URL: https://github.com/apache/arrow-rs/issues/3029


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3029: RLEDecoder::get_batch_with_dict may panic on bit-packed runs longer than 1024

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #3029:
URL: https://github.com/apache/arrow-rs/issues/3029#issuecomment-1312141424

   `label_issue.py` automatically added labels {'parquet'} from #3036


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org