You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/28 18:36:23 UTC

[GitHub] [arrow-rs] marioloko commented on pull request #2959: Pass decompressed size to parquet Codec::decompress (#2956)

marioloko commented on PR #2959:
URL: https://github.com/apache/arrow-rs/pull/2959#issuecomment-1295330273

   It seems that the estimation of lz4 uncompress size can cause overflow for small compress size. Any compress size smaller than 10 will overflow and as though it will panic.
   
   So I see too options now:
   1. To change predictions formula to return 255 for any compressed size smaller than 10.
   2. To only allow lz4_raw if `uncompressed_size` is provided, and return an error saying 'LZ4_RAW without known uncompressed_size is unsupported'.
   
   I would go with the second one, as even if the overflow error is only for small compression sizes, if the compressed size is 1G it will reserve ~250GB which is too much. So I would avoid prediction.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org