You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/17 17:15:19 UTC
[GitHub] [beam] lukecwik opened a new pull request, #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders
lukecwik opened a new pull request, #21928:
URL: https://github.com/apache/beam/pull/21928
A typical BoundedSource may be split into many BoundedSource instances during initial splitting. Doing a simple test of the BigtableSource shows that encoding 10 instances after splitting took on average 102660 bytes while compressing each instance separately after encoding took 1639 bytes for a >60x improvement.
------------------------
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
- [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
- [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159268786
Run Java PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159203143
Run Java PostCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159203074
Run Java PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159092429
CC: @pabloem Would be great to get into the 2.40 release as a patch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159292274
Run Java PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159083375
It would be great if we could do this for unbounded source as well but that would break pipeline update compatibility.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159209437
I've made a change for this path in Python before (I used a memoizing coder for encoding the source, because the pickling coder was extremely slow, and it was pickling the same BoundedSource object over and over).
Has this been reported by others? Is this causing trouble for someone?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159225819
> I've made a change for this path in Python before (I used a memoizing coder for encoding the source, because the pickling coder was extremely slow, and it was pickling the same BoundedSource object over and over).
>
> Has this been reported by others? Is this causing trouble for someone?
Yes, came up as a customer issue when using BigtableIO.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159229968
LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem merged pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
pabloem merged PR #21928:
URL: https://github.com/apache/beam/pull/21928
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159245657
Run Java_Examples_Dataflow_Java11 PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159083961
R: @kileys
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders
Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159227447
Note that I updated and added the streaming fix as well. It turns out that changing bounded was going to break pipeline update compatibility for portable pipeline users but we should get this in before we make Java on Dataflow Prime GA.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org