You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Yun Gao (Jira)" <ji...@apache.org> on 2022/04/13 06:28:06 UTC

[jira] [Updated] (FLINK-19774) Introduce Sub Partition View Version for Approximate Local Recovery

     [ https://issues.apache.org/jira/browse/FLINK-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yun Gao updated FLINK-19774:
----------------------------
    Fix Version/s: 1.16.0

> Introduce Sub Partition View Version for Approximate Local Recovery
> -------------------------------------------------------------------
>
>                 Key: FLINK-19774
>                 URL: https://issues.apache.org/jira/browse/FLINK-19774
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Task
>            Reporter: Yuan Mei
>            Priority: Major
>              Labels: auto-unassigned
>             Fix For: 1.15.0, 1.16.0
>
>
>  
> This ticket is to solve a corner case where a downstream task continuously fails multiple times, or an orphan task execution may exist for a short period of time after new execution is running (as described in the FLIP)
>  
> Here is an idea of how to cleanly and thoroughly solve this kind of problem:
>  # We go with the simplified release view version: only release view before a new creation (in thread2). That says we won't clean up view when downstream task disconnects ({{releaseView}} would not be called from the reference copy of view) (in thread1 or 2).
>  * 
>  ** This would greatly simplify the threading model
>  ** This won't cause any resource leak, since view release is only to notify the upstream result partition to releaseOnConsumption when all subpartitions are consumed in PipelinedSubPartitionView. In our case, we do not release the result partition on consumption any way (the result partition is put in track in JobMaster, similar to the ResultParition.blocking Type).
>       2. Each view is associated with a downstream task execution version
>  * 
>  ** This is making sense because we actually have different versions of view now, corresponding to the vertex.version of the downstream task.
>  ** createView is performed only if the new version to create is greater than the existing one
>  ** If we decide to create a new view, the old view should be released.
> I think this way, we can completely disconnect the old view with the subpartition. Besides that, the working handler in use would always hold the freshest view reference.
>  
> Point 1 has already been addressed in FLINK-19632. This ticket is to address Point 2.
> Details discussion in [https://github.com/apache/flink/pull/13648]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)