You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/13 21:53:09 UTC

[GitHub] [beam] iht commented on issue #20041: Make MonotonicWatermarkEstimator configurable for whether to ignore late timestamp.

iht commented on issue #20041:
URL: https://github.com/apache/beam/issues/20041#issuecomment-1312831329

   [In Java, the monotonic watermark estimator](https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.java#L120-L127) just updates the watermark if the last observed timestamp is after the watermark. I think we should also do the same implementation in Python, for a consistent behavior across different SDKs.
   
   The estimator has not do do anything with the data, just update the value of the watermark. It should not change the timestamps of the messages either, the incoming messages should not be altered, and the timestamps of the output messages in the splittable DoFn should be decided by the user of that DoFn.
   
   The current implementation of MonotonicWatermarkEstimator throws an exception with late data, which makes it unusable unless you can guarantee the order of messages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org