You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/21 17:30:13 UTC

[GitHub] [arrow-rs] kylebarron opened a new issue, #1604: Update parquet thrift to 2.9.0 to support `LZ4_RAW` compression

kylebarron opened a new issue, #1604:
URL: https://github.com/apache/arrow-rs/issues/1604

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   I'd like to support reading Parquet files with [`LZ4_RAW` compression](https://github.com/apache/parquet-format/blob/master/Compression.md#lz4_raw). It looks like this change happened in Parquet [version 2.9.0](https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-290).
   
   I see that arrow-rs depends on the `parquet-format` crate, so I'll also make an issue there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] marioloko commented on issue #1604: Update parquet thrift to 2.9.0 to support `LZ4_RAW` compression

Posted by GitBox <gi...@apache.org>.
marioloko commented on issue #1604:
URL: https://github.com/apache/arrow-rs/issues/1604#issuecomment-1295807844

   It seems that the estimation of lz4 `uncompress_size` can cause overflow for small compress size. Any compress size smaller than 10 will overflow and though it will panic.
   
   So now that we pass the `uncompressed_size` to the `decompress` method, I see too options now:
   
   1. To change predictions formula to return 255 for any compressed size smaller than 10.
   2. To only allow lz4_raw if `uncompressed_size` is provided, and to return an error saying 'LZ4_RAW without known `uncompressed_size` is unsupported'.
   
   I would go with the second one, as even if the overflow error is only for small compression sizes, if the compressed size is 1G it will reserve ~250GB which is too much. So I would avoid prediction.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #1604: Update parquet thrift to 2.9.0 to support `LZ4_RAW` compression

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #1604: Update parquet thrift to 2.9.0 to support `LZ4_RAW` compression
URL: https://github.com/apache/arrow-rs/issues/1604


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1604: Update parquet thrift to 2.9.0 to support `LZ4_RAW` compression

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1604:
URL: https://github.com/apache/arrow-rs/issues/1604#issuecomment-1256044689

   Support for the codec within the metadata has been added, however, support for the compression codec has not been added.
   
   This should be a case of adding a new `compression::Codec` and should be relatively straightforward


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1604: Update parquet thrift to 2.9.0 to support `LZ4_RAW` compression

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1604:
URL: https://github.com/apache/arrow-rs/issues/1604#issuecomment-1295922447

   2. seems sensible to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org