You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "zeroshade (via GitHub)" <gi...@apache.org> on 2023/03/06 23:15:56 UTC

[GitHub] [arrow] zeroshade opened a new pull request, #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

zeroshade opened a new pull request, #34476:
URL: https://github.com/apache/arrow/pull/34476

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   Fixing a bug in compressing buffers by prepending -1 when a buffer is not compressed due to size.
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   ### Are these changes tested?
   Unit tests added, and other tests will be enabled via integration tests in #15194
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #34476:
URL: https://github.com/apache/arrow/pull/34476#issuecomment-1458542757

   Addressed the failing macos issue with #34488


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #34476:
URL: https://github.com/apache/arrow/pull/34476#issuecomment-1458423210

   @lidavidm Yea, looks like the issue is that macOS is missing the openssl lib for some reason, I 'll open a PR for it soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #34476:
URL: https://github.com/apache/arrow/pull/34476#discussion_r1128180015


##########
go/arrow/ipc/ipc.go:
##########
@@ -168,6 +169,26 @@ func WithDictionaryDeltas(v bool) Option {
 	}
 }
 
+// WithMinSpaceSavings specifies a percentage of space savings for
+// compression to be applied to buffers.
+//
+// Space savings is calculated as (1.0 - compressedSize / uncompressedSize).
+//
+// For example, if minSpaceSavings = 0.1, a 100-byte body buffer won't
+// undergo compression if its expected compressed size exceeds 90 bytes.
+// If this option is unset, compression will be used indiscriminately. If
+// no codec was supplied, this option is ignored.
+//
+// Values outside of the range [0,1] are handled as errors.
+//
+// Note that enabling this option may result in unreadable data for Arrow
+// C++ versions prior to 12.0.0.

Review Comment:
   gah, my fault for copying the comment, i'll fix it and update.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #34476:
URL: https://github.com/apache/arrow/pull/34476#discussion_r1127820248


##########
go/arrow/ipc/ipc.go:
##########
@@ -168,6 +169,26 @@ func WithDictionaryDeltas(v bool) Option {
 	}
 }
 
+// WithMinSpaceSavings specifies a percentage of space savings for
+// compression to be applied to buffers.
+//
+// Space savings is calculated as (1.0 - compressedSize / uncompressedSize).
+//
+// For example, if minSpaceSavings = 0.1, a 100-byte body buffer won't
+// undergo compression if its expected compressed size exceeds 90 bytes.
+// If this option is unset, compression will be used indiscriminately. If
+// no codec was supplied, this option is ignored.
+//
+// Values outside of the range [0,1] are handled as errors.
+//
+// Note that enabling this option may result in unreadable data for Arrow
+// C++ versions prior to 12.0.0.

Review Comment:
   Doesn't this also apply to go? (Or else, why did we have problems in the integration test if this only affected C++ readers?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34476:
URL: https://github.com/apache/arrow/pull/34476#issuecomment-1457196026

   * Closes: #34385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34476:
URL: https://github.com/apache/arrow/pull/34476#issuecomment-1457196086

   :warning: GitHub issue #34385 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #34476:
URL: https://github.com/apache/arrow/pull/34476#issuecomment-1458157930

   Hmm, looks like CGO is having issues across multiple PRs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade merged pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade merged PR #34476:
URL: https://github.com/apache/arrow/pull/34476


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34476: GH-34385: [Go] Read IPC files with compression enabled but uncompressed buffers

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34476:
URL: https://github.com/apache/arrow/pull/34476#issuecomment-1459145138

   Benchmark runs are scheduled for baseline = f7fbfcafdebc37cfcd0ef76b68434497ffa4275a and contender = 864ba8221e0e852a9013d0692310c0f91b9b3b80. 864ba8221e0e852a9013d0692310c0f91b9b3b80 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/983c32acd1da4275b1b43675512f30c2...f9aedcb2690145e2990c152928e347da/)
   [Failed :arrow_down:0.21% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ee6a8a7f24b24a8cb98f4bf162c7a1fd...aa162d71003d4ebe941cdb2ff906d163/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/78cd382b4fb1460c83a5cd392a5d39cd...77e6617ebc6d48788269ccaa6b8d4995/)
   [Finished :arrow_down:0.41% :arrow_up:0.13%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/db1440ee83f243909f51e01557f3c4b7...60208e4513c845958b2a58d1ec029054/)
   Buildkite builds:
   [Finished] [`864ba822` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2487)
   [Finished] [`864ba822` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2517)
   [Finished] [`864ba822` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2485)
   [Finished] [`864ba822` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2508)
   [Finished] [`f7fbfcaf` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2486)
   [Failed] [`f7fbfcaf` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2516)
   [Finished] [`f7fbfcaf` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2484)
   [Finished] [`f7fbfcaf` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2507)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org