You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Flink Jira Bot (Jira)" <ji...@apache.org> on 2022/04/10 22:39:00 UTC
[jira] [Updated] (FLINK-10886) Event time synchronization across sources

     [ https://issues.apache.org/jira/browse/FLINK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Flink Jira Bot updated FLINK-10886:
-----------------------------------
    Labels: auto-deprioritized-major auto-unassigned stale-minor  (was: auto-deprioritized-major auto-unassigned)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community manage its development. I see this issues has been marked as Minor but is unassigned and neither itself nor its Sub-Tasks have been updated for 180 days. I have gone ahead and marked it "stale-minor". If this ticket is still Minor, please either assign yourself or give an update. Afterwards, please remove the label or in 7 days the issue will be deprioritized.


> Event time synchronization across sources
> -----------------------------------------
>
>                 Key: FLINK-10886
>                 URL: https://issues.apache.org/jira/browse/FLINK-10886
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Common
>            Reporter: Jamie Grier
>            Priority: Minor
>              Labels: auto-deprioritized-major, auto-unassigned, stale-minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When reading from a source with many parallel partitions, especially when reading lots of historical data (or recovering from downtime and there is a backlog to read), it's quite common for there to develop an event-time skew across those partitions.
>  
> When doing event-time windowing -- or in fact any event-time driven processing -- the event time skew across partitions results directly in increased buffering in Flink and of course the corresponding state/checkpoint size growth.
>  
> As the event-time skew and state size grows larger this can have a major effect on application performance and in some cases result in a "death spiral" where the application performance get's worse and worse as the state size grows and grows.
>  
> So, one solution to this problem, outside of core changes in Flink itself, seems to be to try to coordinate sources across partitions so that they make progress through event time at roughly the same rate.  In fact if there is large skew the idea would be to slow or even stop reading from some partitions with newer data while first reading the partitions with older data.  Anyway, to do this we need to share state somehow amongst sub-tasks.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)