You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marco Villalobos <mv...@kineteque.com> on 2021/09/27 04:48:33 UTC

could not stop with a Savepoint.

Today, I kept on receiving a timeout exception when stopping my job with a
savepoint.
This happened with Flink version 1.12.2 running in EMR.

I had to use the deprecated cancel with savepoint feature instead.

In fact, stopping with a savepoint, creating a savepoint, and cancelling
with a savepoint all gave me the timeout exception.

However, the cancel with savepoint started creating a savepoint on the
cluster.

The program finished with the following exception:

org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"5d6100984035db9541e9f08ecbd311bf".
at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:585)
at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1006)
at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:573)
at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1073)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
Caused by: java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:583)
... 9 more

Re: could not stop with a Savepoint.

Posted by Roman Khachatryan <ro...@apache.org>.
Hi,

The above exception may be caused by both savepoint timing out and job
termination timing out.
To distinguish between these two cases, could you please check the
status of the savepoint and the tasks in the Flink Web UI? IIUC, after
you get this exception on client, you still have the job running.
Could you also check if there are any exceptions in "Exceptions
history" or in the logs?

Regards,
Roman

On Mon, Sep 27, 2021 at 6:49 AM Marco Villalobos
<mv...@kineteque.com> wrote:
>
> Today, I kept on receiving a timeout exception when stopping my job with a savepoint.
> This happened with Flink version 1.12.2 running in EMR.
>
> I had to use the deprecated cancel with savepoint feature instead.
>
> In fact, stopping with a savepoint, creating a savepoint, and cancelling with a savepoint all gave me the timeout exception.
>
> However, the cancel with savepoint started creating a savepoint on the cluster.
>
> The program finished with the following exception:
>
> org.apache.flink.util.FlinkException: Could not stop with a savepoint job "5d6100984035db9541e9f08ecbd311bf".
> at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:585)
> at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1006)
> at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:573)
> at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1073)
> at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136)
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:583)
> ... 9 more
>
>
>