You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/25 22:01:59 UTC

[GitHub] [beam] lostluck commented on issue #21817: [Bug]: Difficult to write Go SDK pipelines that stay within memory constraints

lostluck commented on issue #21817:
URL: https://github.com/apache/beam/issues/21817#issuecomment-1227803646

   While it's not spilling to disk, I do have an approach that will decode elements on demand from a GBK stream, which has demonstrated heap reductions in the tried cases so far.
   
   The main catch is that it only works for GBKs value iterators that are read once. This covers most GBK usage, and covers Reshuffle, and lifted combine usage. It *cannot* cover general CoGBK cases due to the current re-iterator requirement, and it cannot cover Post-GBK PCollections that are read by more than one DoFn.  It would also not cover GBK re-iteration, but the Go SDK currently doesn't support that mode for GBKs, so that's a non issue.
   
   That specific work will be tracked in #22900.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org