You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "spebern (via GitHub)" <gi...@apache.org> on 2023/03/12 09:22:50 UTC

[GitHub] [arrow-rs] spebern opened a new pull request, #3847: Support compression levels

spebern opened a new pull request, #3847:
URL: https://github.com/apache/arrow-rs/pull/3847

   This code has been almost completely copied from `parquet2`.
   
   # Which issue does this PR close?
   
   Closes #3844.
   
   # Rationale for this change
   
   # What changes are included in this PR?
   
   # Are there any user-facing changes?
   
   The compression enum got variants.
   
   # Other ways I have thought of
   
   Another way to make this non breaking would be, to include another enum `CompressionOptions` that has all the codes with their supported arguments, which can be set with the `WriterPropertiesBuilder`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3847: Support compression levels

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3847:
URL: https://github.com/apache/arrow-rs/pull/3847#discussion_r1135333778


##########
parquet/src/compression.rs:
##########
@@ -284,7 +343,7 @@ mod brotli_codec {
             let mut encoder = brotli::CompressorWriter::new(
                 output_buf,
                 BROTLI_DEFAULT_BUFFER_SIZE,

Review Comment:
   I'm not familiar enough with brotli to know how important those parameters are, if someone feels strongly about exposing them I wouldn't object



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] spebern commented on a diff in pull request #3847: Support compression levels

Posted by "spebern (via GitHub)" <gi...@apache.org>.
spebern commented on code in PR #3847:
URL: https://github.com/apache/arrow-rs/pull/3847#discussion_r1133301171


##########
parquet/src/basic.rs:
##########
@@ -1783,11 +1784,20 @@ mod tests {
     fn test_display_compression() {
         assert_eq!(Compression::UNCOMPRESSED.to_string(), "UNCOMPRESSED");
         assert_eq!(Compression::SNAPPY.to_string(), "SNAPPY");
-        assert_eq!(Compression::GZIP.to_string(), "GZIP");
+        assert_eq!(

Review Comment:
   I am not so sure about whether the level should get displayed.



##########
parquet/src/basic.rs:
##########
@@ -286,11 +287,11 @@ pub enum Encoding {
 pub enum Compression {

Review Comment:
   This is a breaking change and is set in https://github.com/apache/arrow-rs/blob/9ce0ebb06550be943febc226f61bf083016d7652/parquet/src/file/properties.rs#L424
   
   One way to make this not breaking could be to add a new enum `CompressionOptions` that can also be set via the `WriterPropertiesBuilder`. 



##########
parquet/src/compression.rs:
##########
@@ -284,7 +343,7 @@ mod brotli_codec {
             let mut encoder = brotli::CompressorWriter::new(
                 output_buf,
                 BROTLI_DEFAULT_BUFFER_SIZE,

Review Comment:
   Maybe it makes sense to not only make the level configurable for brotli, but also the buffer and window size.



##########
parquet/src/compression.rs:
##########
@@ -121,6 +121,26 @@ impl CodecOptionsBuilder {
     }
 }
 
+/// Defines valid compression levels.

Review Comment:
   I just want to mention as I have before that this is copied from `parquet2`, so the credit should go to those authors of course!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #3847: Support compression levels

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #3847:
URL: https://github.com/apache/arrow-rs/pull/3847#issuecomment-1468008413

   Benchmark runs are scheduled for baseline = dfb8c769606efd4fd8731706b287993479b339ca and contender = 7b94b08c7d98e1f449955e2f31a94b871dc3e78e. 7b94b08c7d98e1f449955e2f31a94b871dc3e78e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/2de2e07910ad46f4ae494400799156b9...a71c5fdbd2824c3cabd5708a99059ff7/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/382ce1187a0b4a0b9e5b4d440f65562b...134a612d977d400692846201fec53419/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/72123d6352f14cb0b9b066fd73b60ee2...582ccd5bba204bd58035fa48e51db2b2/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/39e57aca2423436f8ed86ad6718f443d...d4a730144d9d48eaa1c6f55708efb45a/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3847: Support compression levels

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #3847:
URL: https://github.com/apache/arrow-rs/pull/3847


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org