You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yi Zhang (Jira)" <ji...@apache.org> on 2023/09/14 17:44:00 UTC

[jira] [Updated] (FLINK-33090) CheckpointsCleaner clean individual checkpoint states in parallel

     [ https://issues.apache.org/jira/browse/FLINK-33090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yi Zhang updated FLINK-33090:
-----------------------------
    Description: 
Currently CheckpointsCleaner clean multiple checkpoints in parallel with JobManager's ioExecutor, however each checkpoint states is cleaned sequentially. With thousands of StateObjects to clean this can take long time on some checkpoint storage, if longer than the checkpoint interval this prevents new checkpointing.

The proposal is to use the same ioExecutor to clean up each checkpoints states in parallel as well. From my local testing, with default settings for ioExecutor thread pool for xK state files this can reduce clean up time from 10 minutes to <1 minute. 

  was:
Currently CheckpointsCleaner can clean multiple checkpoints in parallel with JobManager's ioExecutor, however each checkpoint states is cleaned sequentially. With thousands of StateObjects to clean this can take long time on some checkpoint storage, if longer than the checkpoint interval this prevents new checkpointing.

The proposal is to use the same ioExecutor to clean up each checkpoints states in parallel as well. From my local testing, with default settings for ioExecutor thread pool for xK state files this can reduce clean up time from 10 minutes to <1 minute. 


> CheckpointsCleaner clean individual checkpoint states in parallel
> -----------------------------------------------------------------
>
>                 Key: FLINK-33090
>                 URL: https://issues.apache.org/jira/browse/FLINK-33090
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.17.1
>            Reporter: Yi Zhang
>            Priority: Major
>
> Currently CheckpointsCleaner clean multiple checkpoints in parallel with JobManager's ioExecutor, however each checkpoint states is cleaned sequentially. With thousands of StateObjects to clean this can take long time on some checkpoint storage, if longer than the checkpoint interval this prevents new checkpointing.
> The proposal is to use the same ioExecutor to clean up each checkpoints states in parallel as well. From my local testing, with default settings for ioExecutor thread pool for xK state files this can reduce clean up time from 10 minutes to <1 minute. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)