You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "刘方奇 (Jira)" <ji...@apache.org> on 2021/08/30 11:44:00 UTC

[jira] [Created] (FLINK-24053) stop with savepoint timeout

刘方奇 created FLINK-24053:
---------------------------

             Summary: stop with savepoint timeout
                 Key: FLINK-24053
                 URL: https://issues.apache.org/jira/browse/FLINK-24053
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing, Runtime / REST
    Affects Versions: 1.13.0, 1.12.0, 1.11.0
            Reporter: 刘方奇


Hello, when we use the "stop with savepoint" feature, we always meet a bug.

We will always cost 5 mins waiting the application to end, then the application will throw a timeout exception.

 
{code:java}
//代码占位符
java.util.concurrent.TimeoutException: nulljava.util.concurrent.TimeoutException: null at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) ~[classes/:?] at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211) ~[classes/:?] at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$14(FutureUtils.java:445) ~[classes/:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_251] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_251] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_251] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_251] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_251] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_251] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_251]
{code}
And we found there was always the function called org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers.SavepointStatusHandler.closeHandlerAsync() run timeout, and its timeout setting is 5mins.

There was a question that the handler 's close may be not important, cause the handler serves other handler called org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers.StopWithSavepointHandler which was already closed.So should we skip this close ?

PS : There was no problem when we test the code that skip the handler 's close.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)