You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/17 17:15:19 UTC

[GitHub] [beam] lukecwik opened a new pull request, #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders

lukecwik opened a new pull request, #21928:
URL: https://github.com/apache/beam/pull/21928

   A typical BoundedSource may be split into many BoundedSource instances during initial splitting. Doing a simple test of the BigtableSource shows that encoding 10 instances after splitting took on average 102660 bytes while compressing each instance separately after encoding took 1639 bytes for a >60x improvement.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159268786

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159203143

   Run Java PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159203074

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159092429

   CC: @pabloem Would be great to get into the 2.40 release as a patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159292274

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159083375

   It would be great if we could do this for unbounded source as well but that would break pipeline update compatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159209437

   I've made a change for this path in Python before (I used a memoizing coder for encoding the source, because the pickling coder was extremely slow, and it was pickling the same BoundedSource object over and over).
   
   Has this been reported by others? Is this causing trouble for someone?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159225819

   > I've made a change for this path in Python before (I used a memoizing coder for encoding the source, because the pickling coder was extremely slow, and it was pickling the same BoundedSource object over and over).
   > 
   > Has this been reported by others? Is this causing trouble for someone?
   
   Yes, came up as a customer issue when using BigtableIO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159229968

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem merged pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
pabloem merged PR #21928:
URL: https://github.com/apache/beam/pull/21928


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159245657

   Run Java_Examples_Dataflow_Java11 PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159083961

   R: @kileys 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #21928: [Fixes #21927] Compress (Un)BoundedSourceAsSdfWrapper element and restriction coders

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #21928:
URL: https://github.com/apache/beam/pull/21928#issuecomment-1159227447

   Note that I updated and added the streaming fix as well. It turns out that changing bounded was going to break pipeline update compatibility for portable pipeline users but we should get this in before we make Java on Dataflow Prime GA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org