You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "lostluck (via GitHub)" <gi...@apache.org> on 2023/02/17 02:23:12 UTC

[GitHub] [beam] lostluck opened a new issue, #25522: [Task][Go SDK][Perf]: Make and use a general []byte buffer allocation pool for the SDK.

lostluck opened a new issue, #25522:
URL: https://github.com/apache/beam/issues/25522

   ### What needs to happen?
   
   Replace all small []byte, or bytes.Buffer uses with a central `sync.Pool` allocated "buffer" package, like 
   
   https://cs.opensource.google/go/x/exp/+/f062dba9:slog/internal/buffer/buffer.go;bpv=1;bpt=1
   
   or 
   
   https://github.com/golang/go/blob/master/src/log/log.go#L165
   
   https://pkg.go.dev/sync#Pool are weak reference pools meaning that buffers in the pool may be freed at during the next GC if appropriate.
   
   While some unsafe strategies were used as well, through the [ioutilx package read](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/ioutilx/read.go#L54) and [write](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/ioutilx/write.go#L27) calls to avoid allocations, this remains slightly risky, so uses there are a reasonable priority.
   
   But generally, [uses of bytes.Buffer](https://github.com/apache/beam/search?l=Go&q=bytes.Buffer) or arbitrary lengths but known lifetime uses of [make([]byte)]( https://github.com/apache/beam/search?l=Go&q=make%28%5B%5Dbyte)
   
   Some bytes.Buffer replacements would require this new Buffer to support the io.Reader interface as well.
   
   In particular, targeting the uses in the exec and harness packages would likely be the most fruitful.
   
   It would be worth having a moderate scale benchmark to validate the before and after performance of this change, with appropriate heap and CPU profiles.
   
   ### Issue Priority
   
   Priority: 3 (nice-to-have improvement)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [ ] Component: Java SDK
   - [X] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #25522: [Task][Go SDK][Perf]: Make and use a general []byte buffer allocation pool for the SDK.

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #25522:
URL: https://github.com/apache/beam/issues/25522#issuecomment-1446959938

   Oh! And just a heads up, I wouldn't change anything about the harness/datamgr.go code in the near term (next 2-3 weeks). I'm working on fixing that layer so we can support Timers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yyy1000 commented on issue #25522: [Task][Go SDK][Perf]: Make and use a general []byte buffer allocation pool for the SDK.

Posted by "yyy1000 (via GitHub)" <gi...@apache.org>.
yyy1000 commented on issue #25522:
URL: https://github.com/apache/beam/issues/25522#issuecomment-1448313105

   I came across a solution to this, but I don't know whether my understanding is accurate. Could u please help me to review it and give some comments and suggestions? Thanks!
   1. Create a package named "buffer". This package will contain the implementation of the allocated buffer by sync.Pool.
   2. Define a struct named Buffer in the "buffer" package that wraps a []byte slice.
   3. Implement a "GetBuffer" function in the "buffer" package that returns a new instance of the Buffer struct from the sync.Pool, and a "PutBuffer" function in the "buffer" package that returns a used Buffer struct back to the sync.Pool. 
   4. Replace all small []byte or bytes.Buffer uses in go sdk with calls to "GetBuffer" and "PutBuffer".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yyy1000 commented on issue #25522: [Task][Go SDK][Perf]: Make and use a general []byte buffer allocation pool for the SDK.

Posted by "yyy1000 (via GitHub)" <gi...@apache.org>.
yyy1000 commented on issue #25522:
URL: https://github.com/apache/beam/issues/25522#issuecomment-1446468465

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #25522: [Task][Go SDK][Perf]: Make and use a general []byte buffer allocation pool for the SDK.

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #25522:
URL: https://github.com/apache/beam/issues/25522#issuecomment-1446458750

   Great! 
   @yyy1000 see beam.apache.org/contribute for  instructions on using the bot infra to get this assigned to you.
   
   FYI, this only needs to be done for *Writes*. There's no value in changing how reads are done, since bytes.Reader already exists.
   
   Benchmarks are required for anything being changed up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yyy1000 commented on issue #25522: [Task][Go SDK][Perf]: Make and use a general []byte buffer allocation pool for the SDK.

Posted by "yyy1000 (via GitHub)" <gi...@apache.org>.
yyy1000 commented on issue #25522:
URL: https://github.com/apache/beam/issues/25522#issuecomment-1446432542

   I want to have a try. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #25522: [Task][Go SDK][Perf]: Make and use a general []byte buffer allocation pool for the SDK.

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #25522:
URL: https://github.com/apache/beam/issues/25522#issuecomment-1448345949

   Please look at the linked code from the slog package from the start of the
   issue. There's no need to wrap in a struct, and the type can self return to
   the pool via a method call.
   
   On Tue, Feb 28, 2023, 6:47 AM Junhao Liu ***@***.***> wrote:
   
   > I came across a solution to this, but I don't know whether my
   > understanding is accurate. Could u please help me to review it and give
   > some comments and suggestions? Thanks!
   >
   >    1. Create a package named "buffer". This package will contain the
   >    implementation of the allocated buffer by sync.Pool.
   >    2. Define a struct named Buffer in the "buffer" package that wraps a
   >    []byte slice.
   >    3. Implement a "GetBuffer" function in the "buffer" package that
   >    returns a new instance of the Buffer struct from the sync.Pool, and a
   >    "PutBuffer" function in the "buffer" package that returns a used Buffer
   >    struct back to the sync.Pool.
   >    4. Replace all small []byte or bytes.Buffer uses in go sdk with calls
   >    to "GetBuffer" and "PutBuffer".
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/issues/25522#issuecomment-1448313105>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ADKDOFIJRU7CAHEGHVWAQVLWZYFYBANCNFSM6AAAAAAU64XZNE>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org