You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Edward Rojas (JIRA)" <ji...@apache.org> on 2019/01/15 12:48:00 UTC

[jira] [Created] (FLINK-11337) Incorrect watermark in StreaminFileSink BucketAssigner.Context when used in connected stream

Edward Rojas created FLINK-11337:
------------------------------------

             Summary: Incorrect watermark in StreaminFileSink BucketAssigner.Context when used in connected stream
                 Key: FLINK-11337
                 URL: https://issues.apache.org/jira/browse/FLINK-11337
             Project: Flink
          Issue Type: Bug
          Components: filesystem-connector
    Affects Versions: 1.7.0
            Reporter: Edward Rojas


When StreamingFileSink is used as sink of a connected stream the "invoke" method of the sink could be called before the "combinedWatermark" is updated with the timestamp of the element currently being processed, resulting on the context containing the incorrect watermark value (the Long.MIN_VALUE when using "AssignerWithPeriodicWatermarks" for the firsts events in the stream). 

I reproduce this when using a broadcast stream connected to a data stream. The broadcast stream is using a custom timestamp extractor that always return the Watermark.MAX_VALUE as it's done in a trining example here: [https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/solutions/datastream_java/broadcast/OngoingRidesSolution.java#L143.]

This is problematic as the watermark could not be used reliably to compute the bucket id based on event time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)