You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/08 23:08:18 UTC

[GitHub] [beam] lostluck commented on issue #22610: [Bug]: Dataflow worker retries pipeline if FinishBundle blocks for too long

lostluck commented on issue #22610:
URL: https://github.com/apache/beam/issues/22610#issuecomment-1208709018

   I'll note that that advice is for streaming pipelines, not batch pipelines. 
   
   The code should still institute a cap to the batch sizes (where it then makes the request, and then starts filling another batch), as a single "bundle" could be billions of eleemnts in batch contexts, while they tend to be about a dozen in streaming ones.
   
   Essentially, waiting until FinishBundle means that the actually batch sizes are dictated unknowingly by the runner. Better to set hard caps and self batch a little first (either by end RPC size if constructed in progress, or by simple count cap if per element size adds up).  The goal is always to get the task done, not to simply rote follow 
   
   ----
   
   That said, it is odd if the Java side is doing the same approach exactly with FinishBundle, and not failing. That's worth keeping the issue around even after implementing batching in the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org