You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "simonvandel (via GitHub)" <gi...@apache.org> on 2023/04/20 09:34:10 UTC

[GitHub] [arrow-rs] simonvandel opened a new issue, #4102: Parquet: Support `Encoding::BYTE_STREAM_SPLIT`

simonvandel opened a new issue, #4102:
URL: https://github.com/apache/arrow-rs/issues/4102

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I would like to evaluate whether using the `BYTE_STREAM_SPLIT` encoding helps a Float64 column compress better. But it seems like it is not supported yet: https://github.com/apache/arrow-rs/blob/93484a10d145617434432d610e241640a06b382f/parquet/src/encodings/encoding/mod.rs#L90
   
   **Describe the solution you'd like**
   An implementation of the encoding. Even a naive, non-optimized version would resolve this issue. The implementation can be improved iteratively.
   
   **Describe alternatives you've considered**
   `PyArrow` seems to support it, but I would really like to stay within the Rust world.
   
   **Additional context**
   - Parquet format description here: https://github.com/apache/parquet-format/blob/master/Encodings.md#byte-stream-split-byte_stream_split--9
   - The scalar impl in the C++ library is here: https://github.com/apache/arrow/blob/0bf777a5952be012e41f5b1ad443d4fec38e6f5a/cpp/src/arrow/util/byte_stream_split.h#L579-L602 . They also have SIMD variations, which will be more involved to port.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] simonvandel commented on issue #4102: Parquet: Support `Encoding::BYTE_STREAM_SPLIT`

Posted by "simonvandel (via GitHub)" <gi...@apache.org>.
simonvandel commented on issue #4102:
URL: https://github.com/apache/arrow-rs/issues/4102#issuecomment-1529370379

   Hi @Weijun-H I have a working version that I can clean up a bit. I'll update you here 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Weijun-H commented on issue #4102: Parquet: Support `Encoding::BYTE_STREAM_SPLIT`

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on issue #4102:
URL: https://github.com/apache/arrow-rs/issues/4102#issuecomment-1529149102

   Hello @simonvandel, I was wondering if you're currently working on this task. If not, I would be happy to take on the project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] simonvandel commented on issue #4102: Parquet: Support `Encoding::BYTE_STREAM_SPLIT`

Posted by "simonvandel (via GitHub)" <gi...@apache.org>.
simonvandel commented on issue #4102:
URL: https://github.com/apache/arrow-rs/issues/4102#issuecomment-1516544359

   I'll give it a go myself, if possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Parquet: Support `Encoding::BYTE_STREAM_SPLIT` [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #4102: Parquet: Support `Encoding::BYTE_STREAM_SPLIT`
URL: https://github.com/apache/arrow-rs/issues/4102


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org