You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yun Gao (Jira)" <ji...@apache.org> on 2022/04/13 06:28:05 UTC

[jira] [Updated] (FLINK-21053) Prevent potential RejectedExecutionExceptions in CheckpointCoordinator failing JM

     [ https://issues.apache.org/jira/browse/FLINK-21053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yun Gao updated FLINK-21053:
----------------------------
    Fix Version/s: 1.16.0

> Prevent potential RejectedExecutionExceptions in CheckpointCoordinator failing JM
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-21053
>                 URL: https://issues.apache.org/jira/browse/FLINK-21053
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>            Reporter: Roman Khachatryan
>            Priority: Minor
>              Labels: auto-unassigned
>             Fix For: 1.15.0, 1.16.0
>
>
> In the past, there were multiple bugs caused by throwing/handling RejectedExecutionException in CheckpointCoordinator (FLINK-18290, FLINK-20992).
>  
> And I think it's still possible as there are many places where an executor is passed to calls to CompletableFuture.xxxAsync while it can already be shut down.
>  
> In FLINK-20992 we discussed two approaches to fix this.
> One approach is to check executor state inside a synchronized block every time when it is used.
> Second approach is to
>  # Create executors inside CheckpointCoordinator (both io & timer thread pools)
>  # Check isShutdown() in their RejectedExecution handlers (if yes and it's RejectedExecutionException then just log; otherwise delegate to FatalExitExceptionHandler)
>  # (this will allow to remove such RejectedExecutionException checks from coordinator code)
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)