You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/05 09:09:25 UTC

[GitHub] [beam] je-ik commented on issue #24655: [Bug]: Pipeline fusion should break at @RequiresStableInput boundary

je-ik commented on issue #24655:
URL: https://github.com/apache/beam/issues/24655#issuecomment-1371955852

   Sorry, I should've attached link to the [ML thread](https://lists.apache.org/thread/2s3jx62wh0rz09dcmz816sl5y3dnq432).
   
   TL;DR
   
   It is runner-dependent how to achieve stable input, but every runner is able to ensure stability at the boundary of executable stage only. If we have two DoFns, say NonDeterminsticDoFn -> StableDoFn and these two get fused into single executable stage, then the contract of `@RequiresStableInput` do the StableDoFn is broken, because the runner can ensure stability only for the fused (NonDeterminsticDoFn + StableDoFn) DoFn, which is executed in the harness. Retrying such stage leads to non-stable input to the StableDoFn (due to non-determinism of the leading DoFn). Therefore, fusion needs to be broken at the boundary of `@RequiresStableInput` to make sure, that the executable stage starts with the stable DoFn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org