You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/23 18:03:03 UTC

[GitHub] [beam] TheNeuralBit commented on issue #22723: [Bug]: Python BatchElements uses minimum or arbitrary element timestamps for output timestamps with GlobalWindow specialization

TheNeuralBit commented on issue #22723:
URL: https://github.com/apache/beam/issues/22723#issuecomment-1224493228

   Interesting. I'm not sure which of these is the oversight, perhaps @robertwb can comment as the original author. 
   
   Looking at the code: https://github.com/apache/beam/blob/c7f64264451af12ff6c7c0ef4bc95fd7ce0f5418/sdks/python/apache_beam/transforms/util.py#L539-L552
   
   In the process case we just yield the batch - likely this was just written ignoring the timestamp, but it turns out that Beam actually attaches the current element's timestamp in the OutputProcessor.
   In the finish_bundle case we wrap in a `GlobalWindows.windowed_value` - likely this was just to satisfy some type check (static or in output processing), but it turns out we end up using `min_timestamp()`.
   
   As @scwhittle noted, the per window version always uses the current window's max timestamp: https://github.com/apache/beam/blob/c7f64264451af12ff6c7c0ef4bc95fd7ce0f5418/sdks/python/apache_beam/transforms/util.py#L584-L585
   https://github.com/apache/beam/blob/c7f64264451af12ff6c7c0ef4bc95fd7ce0f5418/sdks/python/apache_beam/transforms/util.py#L602-L603
   
   I took a look at what the `GroupIntoBatches` implementations do here, but AFAICT they're not opinionated either, and rely on the SDK worker to decide what timestamp to use: https://github.com/apache/beam/blob/c7f64264451af12ff6c7c0ef4bc95fd7ce0f5418/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L545
   
   Presumably this is typically the most recent element's timestamp, when it completes a batch. I'm not sure about the case when a timer fires though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org