You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/28 09:08:20 UTC

[GitHub] [spark] echauchot commented on issue #23576: [SPARK-26655] [SS] Support multiple aggregates in append mode

echauchot commented on issue #23576: [SPARK-26655] [SS] Support multiple aggregates in append mode
URL: https://github.com/apache/spark/pull/23576#issuecomment-525654867
 
 
   > > Beam does not trigger output unless the watermark pass the end of window + allowed lateness. There is no triggering between end of window and allowed lateness. Close and output is at the same time.
   > 
   > Ah OK I see. That looks similar as Append mode. That's a bit different from what I read a book for Flink so assuming there're some differences between Beam and Flink... (BTW I also read "Streaming Systems", though it mostly explains theory and not having pretty much details on Beam.)
   > 
   > > Ah I thought we were talking about watermark. For choosing the event timestamp, Beam uses a TimestampCombiner which default policy is to set the resulting timestamp to the end of the window for new record.
   > 
   > That seems to only explain the case where window is applied. How it works for other cases? Does it keep the origin event timestamp as it is? In windowed stream-stream join it also makes sense, but there're also non-windowed stream-stream join as well, and then output should have only one event time whereas there're two inputs.
   
   Windows are mandatory in streaming mode in Beam (otherwise there is no trigger time and no output). But if you are in batch mode (the only case where you can have no window), then the timestamps of all elements are set to +INF.
   
   PS: I simplify a bit, in reality we can replace windows by configured triggers that can be based on the number of elements or processing time but as they don't exist in spark I did not mention them here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org