You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 14:47:37 UTC

[GitHub] [beam] damccorm opened a new issue, #20041: Make MonotonicWatermarkEstimator configurable for whether to ignore late timestamp.

damccorm opened a new issue, #20041:
URL: https://github.com/apache/beam/issues/20041

   Current implementation of MonotonicWatermarkEstimator throws error and stop the pipeline  when there is a late timestamp. But there are more potential options like:
   (1) Suppress the error and emit the item as possibly late data.
   (2) Move the timestamp forward to respect the watermark.
   We should consider making MonotonicWatermarkEstimator configurable with these options, or  providing different types of MonotonicWatermarkEstimator  to handle different options.
   
   Imported from Jira [BEAM-9312](https://issues.apache.org/jira/browse/BEAM-9312). Original Jira may contain additional context.
   Reported by: boyuanz.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey closed issue #20041: Make MonotonicWatermarkEstimator configurable for whether to ignore late timestamp.

Posted by GitBox <gi...@apache.org>.
johnjcasey closed issue #20041: Make MonotonicWatermarkEstimator configurable for whether to ignore late timestamp.
URL: https://github.com/apache/beam/issues/20041


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] iht commented on issue #20041: Make MonotonicWatermarkEstimator configurable for whether to ignore late timestamp.

Posted by GitBox <gi...@apache.org>.
iht commented on issue #20041:
URL: https://github.com/apache/beam/issues/20041#issuecomment-1312831329

   [In Java, the monotonic watermark estimator](https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.java#L120-L127) just updates the watermark if the last observed timestamp is after the watermark. I think we should also do the same implementation in Python, for a consistent behavior across different SDKs.
   
   The estimator has not do do anything with the data, just update the value of the watermark. It should not change the timestamps of the messages either, the incoming messages should not be altered, and the timestamps of the output messages in the splittable DoFn should be decided by the user of that DoFn.
   
   The current implementation of MonotonicWatermarkEstimator throws an exception with late data, which makes it unusable unless you can guarantee the order of messages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org