You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 01:52:57 UTC

[GitHub] [beam] kennknowles opened a new issue, #19483: Coder copy overhead

kennknowles opened a new issue, #19483:
URL: https://github.com/apache/beam/issues/19483

More context can be found in discussion here:

[http://mail-archives.apache.org/mod_mbox/beam-dev/201904.mbox/%3CCAOUjMkyKV8npYJfS_PF3Gzo=vwOmB2FRzUtE81ZsrxnM13Tisw@mail.gmail.com%3E](http://mail-archives.apache.org/mod_mbox/beam-dev/201904.mbox/%3CCAOUjMkyKV8npYJfS_PF3Gzo=vwOmB2FRzUtE81ZsrxnM13Tisw@mail.gmail.com%3E)

I am not sure how much is this runner dependent, but each operator's user function receives a copy of data element for isolation. Beam coders does copy by serializing to bytes and then deserialize back. This seems to impact performance and grows with job complexity.

On a simple test pipeline described in discussion thread above, I noticed almost 2x speedup when CoderUtils.copy() just returned the object.

Native Flink job does copy too, but via Kryo, which seems to be doing deep copy more effectively, on object level.

What can be done in Beam to reduce this overhead?

Imported from Jira [BEAM-7206](https://issues.apache.org/jira/browse/BEAM-7206). Original Jira may contain additional context.
Reported by: JozoVilcek.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org