You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Puneet Duggal <pu...@gmail.com> on 2021/09/24 13:19:29 UTC
Job Manager went down on cancelling job with savepoint
Hi,
So while cancelling one job with savepoint… even though job got cancelled successfully .. but somehow immediately after that job manager went down. Not able to deduce anything from given stack trace.. Any help is appreciated
2021-09-24 11:50:44,182 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping checkpoint coordinator for job 1f764a51996d206b28721aa4a1420bea.
2021-09-24 11:50:44,182 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Shutting down
2021-09-24 11:50:44,240 INFO org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore [] - Removing /flink/default_ns/checkpoints/1f764a51996d206b28721aa4a1420bea from ZooKeeper
2021-09-24 11:50:44,243 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter [] - Shutting down.
2021-09-24 11:50:44,243 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter [] - Removing /checkpoint-counter/1f764a51996d206b28721aa4a1420bea from ZooKeeper
2021-09-24 11:50:44,249 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job 1f764a51996d206b28721aa4a1420bea reached globally terminal state CANCELED.
2021-09-24 11:50:44,249 ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL: Thread 'cluster-io-thread-16' produced an uncaught exception. Stopping the process...
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@54a5137c rejected from java.util.concurrent.ScheduledThreadPoolExecutor@37ee0790[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 4513]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_232]
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_232]
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_232]
at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622) ~[?:1.8.0_232]
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_232]
at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Regards,
Puneet
Re: Job Manager went down on cancelling job with savepoint
Posted by Guowei Ma <gu...@gmail.com>.
Hi, Puneet
Could you share whether you are using Flink's session mode or application
mode?
From the log, you are using `StandaloneDispatcher`, but you will use it in
both session and application mode.
If you use application mode, this might be in line with expectations.
Best,
Guowei
On Fri, Sep 24, 2021 at 9:19 PM Puneet Duggal <pu...@gmail.com>
wrote:
> Hi,
>
> So while cancelling one job with savepoint… even though job got cancelled
> successfully .. but somehow immediately after that job manager went down.
> Not able to deduce anything from given stack trace.. Any help is appreciated
>
> 2021-09-24 11:50:44,182 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping
> checkpoint coordinator for job 1f764a51996d206b28721aa4a1420bea.
> 2021-09-24 11:50:44,182 INFO
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
> Shutting down
> 2021-09-24 11:50:44,240 INFO
> org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore [] - Removing
> /flink/default_ns/checkpoints/1f764a51996d206b28721aa4a1420bea from
> ZooKeeper
> 2021-09-24 11:50:44,243 INFO
> org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter [] -
> Shutting down.
> 2021-09-24 11:50:44,243 INFO
> org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter [] -
> Removing /checkpoint-counter/1f764a51996d206b28721aa4a1420bea from ZooKeeper
> 2021-09-24 11:50:44,249 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job
> 1f764a51996d206b28721aa4a1420bea reached globally terminal state CANCELED.
> 2021-09-24 11:50:44,249 ERROR
> org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL:
> Thread 'cluster-io-thread-16' produced an uncaught exception. Stopping the
> process...
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@54a5137c
> rejected from java.util.concurrent.ScheduledThreadPoolExecutor@37ee0790[Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 4513]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> ~[?:1.8.0_232]
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> ~[?:1.8.0_232]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
> ~[?:1.8.0_232]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
> ~[?:1.8.0_232]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622)
> ~[?:1.8.0_232]
> at
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
> ~[?:1.8.0_232]
> at
> org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:64)
> ~[flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1290)
> ~[flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:66)
> ~[flink-dist_2.12-1.12.1.jar:1.12.1]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_232]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_232]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>
> Regards,
> Puneet
>
>
>