You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2017/05/09 10:02:04 UTC

[jira] [Commented] (FLINK-6472) BoundedOutOfOrdernessTimestampExtractor does not bound out of orderliness

    [ https://issues.apache.org/jira/browse/FLINK-6472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002416#comment-16002416 ] 

Aljoscha Krettek commented on FLINK-6472:
-----------------------------------------

I think we might have to un-deprecate {{TimestampExtractor}} https://github.com/apache/flink/blob/5aa93a270a87b5a965e5c37c5c8bd6e6208b09b7/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/TimestampExtractor.java#L43-L43. Originally, this was the only available interface for timestamp/watermark extraction and allowed both periodic and punctuated watermarks.

Alternatively, we could think about letting an extractor implement both {{AssignerWithPeriodicWatermarks}} and {{AssignerWithPunctuatedWatermarks}} and have a unified operator that can handle that case (This would replace {{TimestampsAndPunctuatedWatermarksOperator}} and {{TimestampsAndPeriodicWatermarksOperator}}).

What do you think?

> BoundedOutOfOrdernessTimestampExtractor does not bound out of orderliness
> -------------------------------------------------------------------------
>
>                 Key: FLINK-6472
>                 URL: https://issues.apache.org/jira/browse/FLINK-6472
>             Project: Flink
>          Issue Type: Bug
>          Components: DataStream API
>    Affects Versions: 1.3.0
>            Reporter: Elias Levy
>
> {{BoundedOutOfOrdernessTimestampExtractor}} attempts to emit watermarks that lag behind the largest observed timestamp by a configurable time delta.  It fails to so in some circumstances.
> The class extends {{AssignerWithPeriodicWatermarks}}, which generates watermarks in periodic intervals.  The timer for this intervals is a processing time timer.
> In circumstances where there is a rush of events (restarting Flink, unpausing an upstream producer, loading events from a file, etc), many events with timestamps much larger that what the configured bound would normally allow will be sent downstream without a watermark.  This can have negative effects downstream, as operators may be buffering the events waiting for a watermark to process them, thus leading the memory growth and possible out-of-memory conditions.
> It is probably best to have a bounded out of orderliness extractor that is based on the punctuated timestamp extractor, so we can ensure that watermarks are generated in a timely fashion in event time, with the addition of process time timer to generate a watermark if there has been a lull in events, thus also bounding the delay of generating a watermark in processing time. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)