You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yun Gao (Jira)" <ji...@apache.org> on 2022/04/13 06:28:05 UTC
[jira] [Updated] (FLINK-21053) Prevent potential RejectedExecutionExceptions in CheckpointCoordinator failing JM
[ https://issues.apache.org/jira/browse/FLINK-21053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yun Gao updated FLINK-21053:
----------------------------
Fix Version/s: 1.16.0
> Prevent potential RejectedExecutionExceptions in CheckpointCoordinator failing JM
> ---------------------------------------------------------------------------------
>
> Key: FLINK-21053
> URL: https://issues.apache.org/jira/browse/FLINK-21053
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Reporter: Roman Khachatryan
> Priority: Minor
> Labels: auto-unassigned
> Fix For: 1.15.0, 1.16.0
>
>
> In the past, there were multiple bugs caused by throwing/handling RejectedExecutionException in CheckpointCoordinator (FLINK-18290, FLINK-20992).
>
> And I think it's still possible as there are many places where an executor is passed to calls to CompletableFuture.xxxAsync while it can already be shut down.
>
> In FLINK-20992 we discussed two approaches to fix this.
> One approach is to check executor state inside a synchronized block every time when it is used.
> Second approach is to
> # Create executors inside CheckpointCoordinator (both io & timer thread pools)
> # Check isShutdown() in their RejectedExecution handlers (if yes and it's RejectedExecutionException then just log; otherwise delegate to FatalExitExceptionHandler)
> # (this will allow to remove such RejectedExecutionException checks from coordinator code)
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)