You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "simonvandel (via GitHub)" <gi...@apache.org> on 2023/05/09 09:00:29 UTC

[GitHub] [arrow-rs] simonvandel opened a new pull request, #4183: Byte stream split

simonvandel opened a new pull request, #4183:
URL: https://github.com/apache/arrow-rs/pull/4183

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes https://github.com/apache/arrow-rs/issues/4102.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   Implements decoding and encoding of BYTE_STREAM_SPLIT for f32 and f64.
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   Yes, now the BYTE_STREAM_SPLIT will not error when used as an encoding for a column.
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "ggreco (via GitHub)" <gi...@apache.org>.
ggreco commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1798559647

   I found this while googling for my problem:
   ```
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: NYI("Encoding BYTE_STREAM_SPLIT is not supported")', /Users/gabry/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-47.0.0/src/column/writer/mod.rs:258:58
   ```
   
   I thought it was my problem since the rust crate page says:
   <img width="462" alt="image" src="https://github.com/apache/arrow-rs/assets/1829939/64972f25-1193-4691-8519-36ae85ed6659">
   
   ... @tustvoid can you please update the README to describe which encoding are not yet implemented?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "mwlon (via GitHub)" <gi...@apache.org>.
mwlon commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1878671063

   @tustvold What's holding this PR up? I'm also encountering the issue that byte_stream_split is unsupported.
   
   I'm willing to make a PR of my own to do this if the problem is that @simonvandel is unresponsive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "mwlon (via GitHub)" <gi...@apache.org>.
mwlon commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1880323561

   I've created a parquet-testing PR to facilitate this: https://github.com/apache/parquet-testing/pull/45


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed pull request #4183: Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT
URL: https://github.com/apache/arrow-rs/pull/4183


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4183: Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1572205122

   Marking as draft as waiting for feedback, feel free to mark as ready for review when you would like me to take another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1900050636

   Closed by https://github.com/apache/arrow-rs/pull/5293


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4183: Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#discussion_r1188464399


##########
parquet/src/encodings/decoding.rs:
##########
@@ -1882,6 +1935,7 @@ mod tests {
             encoder.put(&v[..]).expect("ok to encode");
         }
         let bytes = encoder.flush_buffer().expect("ok to flush buffer");
+        println!("{:x?}", bytes.data());

Review Comment:
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "simonvandel (via GitHub)" <gi...@apache.org>.
simonvandel commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1879594019

   > @tustvold What's holding this PR up? I'm also encountering the issue that byte_stream_split is unsupported.
   > 
   > I'm willing to make a PR of my own to do this if the problem is that @simonvandel is unresponsive.
   
   @mwlon my need for this encoding disappeared, and so did my motivation to finish it. If you want to continue, feel free to do so 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1798605685

   Thank you, that's a pure oversight - filed https://github.com/apache/arrow-rs/issues/5051
   
   FWIW I believe this is the only one that isn't supported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1799143886

   PR https://github.com/apache/arrow-rs/pull/5053


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "ggreco (via GitHub)" <gi...@apache.org>.
ggreco commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1798634572

   > Thank you, that's a pure oversight - filed #5051
   > 
   > FWIW I believe this is the only one that isn't supported.
   
   I can confirm I tried all the one described that could be applied to integers and floating points in https://parquet.apache.org/docs/file-format/data-pages/encodings/ and the only one not working with the rust implementation is `BYTE_STREAM_SPLIT`. I've done this in the attempt to make my parquet files smaller.
   
   I've never tried the encodings for BYTE_ARRAY type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "mwlon (via GitHub)" <gi...@apache.org>.
mwlon commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1884100615

   parquet-testing PR is in; new PR for BYTE_STREAM_SPLIT implementation: https://github.com/apache/arrow-rs/pull/5293


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Parquet: Implement support for Encoding::BYTE_STREAM_SPLIT [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4183:
URL: https://github.com/apache/arrow-rs/pull/4183#issuecomment-1878679602

   > I'm willing to make a PR of my own to do this
   
   Always happy to review PRs, IIRC the major thing this PR was missing was adequate test coverage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org