You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/09/19 03:17:19 UTC

[GitHub] [beam] lukecwik edited a comment on pull request #12603: [WIP][BEAM-10670] Make SparkRunner opt-out for using an SDF powered Read transform.

lukecwik edited a comment on pull request #12603:
URL: https://github.com/apache/beam/pull/12603#issuecomment-695156605


   @iemejia I figured out that the issue is that watermark holds aren't implemented for spark so the first batch completes which computes new watermarks so the watermark hold that was set by the splittable dofn implementation is ignored. This leads to timers being dropped and hence only some of the results being produced.
   
   This is also the likely cause for why the PAssert is dropping the elements that were produced as well but I haven't validated this yet.
   
   Can you explain how the GlobalWatermarkHolder works, can I register anything as a `sourceId`?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org