You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/02 16:28:49 UTC

[GitHub] [beam] steveniemitz opened a new pull request, #22144: Tune ByteStringCoder allocations

steveniemitz opened a new pull request, #22144:
URL: https://github.com/apache/beam/pull/22144

   This fixes #22143
   
   R: @lukecwik (sorry to keep tagging you, you're the only one thats touched this in a long time)
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #22144: Tune ByteStringCoder allocations

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #22144:
URL: https://github.com/apache/beam/pull/22144#issuecomment-1172935753

   No worries Steve. These are very useful improvements for all pipelines.
   
   It seems as though ByteString.readFrom honors the size we pass in so it only creates one chunk, but then it copies the bytes again (for immutability purposes): https://github.com/protocolbuffers/protobuf/blob/7a07be5ad5e5dfe5af65f8101e99fec9604d15e5/java/core/src/main/java/com/google/protobuf/ByteString.java#L595


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik merged pull request #22144: Tune ByteStringCoder allocations

Posted by GitBox <gi...@apache.org>.
lukecwik merged PR #22144:
URL: https://github.com/apache/beam/pull/22144


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] steveniemitz commented on pull request #22144: Tune ByteStringCoder allocations

Posted by GitBox <gi...@apache.org>.
steveniemitz commented on PR #22144:
URL: https://github.com/apache/beam/pull/22144#issuecomment-1172938645

   > It seems as though ByteString.readFrom honors the size we pass in so it only creates one chunk, but then it copies the bytes again (for immutability purposes)
   
   Yeah, there's a couple problems with it:
   - we end up allocating a 10 element object[] in the `results` ArrayList to hold a single element.
   - since readChunk is called twice (one to read all the data, another to return null), we allocate an extra buffer there which gets thrown away once it realizes that there's nothing left to read (read returns -1).
   - as you mentioned, a copy from the temp buffer into the resulting ByteString.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #22144: Tune ByteStringCoder allocations

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #22144:
URL: https://github.com/apache/beam/pull/22144#issuecomment-1172955610

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lukecwik commented on pull request #22144: Tune ByteStringCoder allocations

Posted by GitBox <gi...@apache.org>.
lukecwik commented on PR #22144:
URL: https://github.com/apache/beam/pull/22144#issuecomment-1172935843

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] steveniemitz commented on pull request #22144: Tune ByteStringCoder allocations

Posted by GitBox <gi...@apache.org>.
steveniemitz commented on PR #22144:
URL: https://github.com/apache/beam/pull/22144#issuecomment-1175345771

   everything green, all good to merge this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org