You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Cussac, Franck" <Fr...@ext.bleckwen.ai> on 2018/11/07 16:34:15 UTC

flink run from savepoint

Hi,

I'm working with Flink 1.5.0 and I try to run a job from a savepoint. My jobmanager is dockerized and I try to run my flink job in another container.
The command :
flink run -m jobmanager:8081 myJar.jar
works fine, but when I try to run a job from a savepoint, I got  an Internal server error.

Here my command to run flink job and the stacktrace :

flink run -m jobmanager:8081 -s file:/tmp/test/savepoint/ myJar.jar
Starting execution of program

------------------------------------------------------------
The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.
               at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)
               at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
               at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
               at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
               at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
               at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
               at java.lang.reflect.Method.invoke(Method.java:498)
               at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
               at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
               at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
               at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:781)
               at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:275)
               at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
               at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1020)
               at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096)
               at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
               at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
               at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$5(RestClusterClient.java:357)
               at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
               at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
               at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
               at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
               at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:214)
               at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
               at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
               at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
               at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
               at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
               at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
               at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
               at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
               at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
               at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
               at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
               ... 12 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
               ... 10 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error.]
               at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
               at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
               at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
               at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
               at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
               ... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error.]
               at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:225)
               at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:209)
               at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
               ... 5 more

Any Idea ?

Thanks,
Franck Cussac.

Re: flink run from savepoint

Posted by Timo Walther <tw...@apache.org>.
Hi Franck,

as a first hint: paths are hard-coded in the savepoint's metadata so you 
should make sure that the path is still the same and accessible by all 
JobManagers and TaskManagers.

Can you share logs with us to figure out what caused the internal server 
error?

Thanks,
Timo


Am 07.11.18 um 17:34 schrieb Cussac, Franck:
>
> Hi,
>
> I’m working with Flink 1.5.0 and I try to run a job from a savepoint. 
> My jobmanager is dockerized and I try to run my flink job in another 
> container.
>
> The command :
>
> flink run -m jobmanager:8081 myJar.jar
>
> works fine, but when I try to run a job from a savepoint, I got  an 
> Internal server error.
>
> Here my command to run flink job and the stacktrace :
>
> flink run -m jobmanager:8081 -s file:/tmp/test/savepoint/ myJar.jar
>
> Starting execution of program
>
> ------------------------------------------------------------
>
> The program finished with the following exception:
>
> org.apache.flink.client.program.ProgramInvocationException: Could not 
> retrieve the execution result.
>
> at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)
>
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
>
> at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
>
> at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>
> at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
>
> at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
>
> at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:781)
>
> at 
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:275)
>
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
>
> at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1020)
>
> at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096)
>
> at 
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096)
>
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: 
> Failed to submit JobGraph.
>
> at 
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$5(RestClusterClient.java:357)
>
> at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>
> at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>
> at 
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:214)
>
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>
> at 
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
>
> at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
>
> at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could 
> not complete the operation. Exception is not retryable.
>
> at 
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>
> at 
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>
> at 
> java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>
> at 
> java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
>
> ... 12 more
>
> Caused by: 
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could 
> not complete the operation. Exception is not retryable.
>
> ... 10 more
>
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.rest.util.RestClientException: [Internal 
> server error.]
>
> at 
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>
> at 
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>
> at 
> java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>
> at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
>
> at 
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>
> ... 4 more
>
> Caused by: org.apache.flink.runtime.rest.util.RestClientException: 
> [Internal server error.]
>
> at 
> org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:225)
>
> at 
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:209)
>
> at 
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
>
> ... 5 more
>
> Any Idea ?
>
> Thanks,
>
> Franck Cussac.
>