You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Samir Tusharbhai Chauhan <sa...@prudential.com.sg> on 2020/03/05 16:49:04 UTC

Flink Deployment failing with RestClientException

Hi,
I am having issue where after deploying few jobs, it starts failing with below errors. I don't have such issue in other environments. What should I check first in such scenario?
My environment is
Azure Kubernetes 1.15.7
Flink 1.6.0
Zookeeper 3.4.10

The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: e83db2da358db355ccdcf6740c6bb134)
        at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
        at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
        at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
        at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
        at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
        at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
        ... 12 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
        ... 10 more
Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
        at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        ... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
        at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
        at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
        ... 5 more


More errors
at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more
2020-03-04 08:39:06,675 ERROR org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler  - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
        ... 9 more
2020-03-04 08:39:07,676 ERROR org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
        at akka.dispatch.OnComplete.internal(Future.scala:258)
        at akka.dispatch.OnComplete.internal(Future.scala:256)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
        at java.lang.Thread.run(Thread.java:748)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)



Warm Regards,
Samir Chauhan

Regional Infrastructure & Operations

[cid:image002.png@01D12B8E.C23F3E10]

Prudential Services Singapore Pte Ltd
1 Wallich Street #19-01, Guoco Tower Singapore 078881

Direct (65) 6704 7264 Mobile (65) 9721 7548
samir.tusharbhai.chauhan@prudential.com.sg<ma...@prudential.com.sg>

www.prudential.com.sg<http://www.prudential.com.sg/>


There's a reason we support Fair Dealing. YOU.


This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.  

No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS)  and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.

PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.

PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.

Re: Flink Deployment failing with RestClientException

Posted by Robert Metzger <rm...@apache.org>.
Hey Samir,

can you try setting the following configuration parameter (make sure the
JobManager log confirms that the changed value is in effect)
web.timeout: 300000

This might uncover the underlying problem (as we are waiting longer for the
underlying issue to timeout).

Are you able to upgrade to the latest Flink version easily?


On Thu, Mar 5, 2020 at 7:02 PM Andrey Zagrebin <az...@gmail.com>
wrote:

> Hi Samir,
>
> It may be a known issue [1][2] where some action during job submission
> takes too long time but eventually completes in job manager.
> Have you checked job manager logs whether there are any other failures,
> not “Ask timed out"?
> Have you checked Web UI whether all the jobs have been started in fact
> despite the client error?
>
> Best,
> Andrey
>
> [1] https://issues.apache.org/jira/browse/FLINK-16429
> [2] https://issues.apache.org/jira/browse/FLINK-16018
>
> On 5 Mar 2020, at 17:49, Samir Tusharbhai Chauhan <
> samir.tusharbhai.chauhan@prudential.com.sg> wrote:
>
> Hi,
>
> I am having issue where after deploying few jobs, it starts failing with
> below errors. I don’t have such issue in other environments. What should I
> check first in such scenario?
> *My environment is*
> Azure Kubernetes 1.15.7
> Flink 1.6.0
> Zookeeper 3.4.10
>
>
> The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not
> submit job (JobID: e83db2da358db355ccdcf6740c6bb134)
>         at
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
>         at
> org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
>         at
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
>         at
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
>         at
> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
>         at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
>         at
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
>         at
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>         at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed
> to submit JobGraph.
>         at
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
>         at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>         at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>         at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
>         at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>         at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>         at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
>         at
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
>         at
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not
> complete the operation. Exception is not retryable.
>         at
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>         at
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>         at
> java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>         at
> java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
>         ... 12 more
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException:
> Could not complete the operation. Exception is not retryable.
>         ... 10 more
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.runtime.rest.util.RestClientException: [Job submission
> failed.]
>         at
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>         at
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>         at
> java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>         at
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
>         at
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>         ... 4 more
> Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job
> submission failed.]
>         at
> org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
>         at
> org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
>         at
> java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
>         ... 5 more
>
>
>
>
> *More errors*
> at java.lang.Thread.run(Thread.java:748)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[
> akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null]
> sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>         ... 9 more
> 2020-03-04 08:39:06,675 ERROR
> org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler  -
> Could not retrieve the redirect address.
> java.util.concurrent.CompletionException:
> akka.pattern.AskTimeoutException: Ask timed out on [Actor[
> akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null]
> sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>         at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>         at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>         at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>         at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
>         at akka.dispatch.OnComplete.internal(Future.scala:258)
>         at akka.dispatch.OnComplete.internal(Future.scala:256)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>         at
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
>         at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>         at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>         at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>         at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[
> akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null]
> sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>         ... 9 more
> 2020-03-04 08:39:07,676 ERROR
> org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  - Could not
> retrieve the redirect address.
> java.util.concurrent.CompletionException:
> akka.pattern.AskTimeoutException: Ask timed out on [Actor[
> akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null]
> sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>         at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>         at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>         at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>         at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
>         at akka.dispatch.OnComplete.internal(Future.scala:258)
>         at akka.dispatch.OnComplete.internal(Future.scala:256)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>         at
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
>         at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>         at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>         at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>         at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[
> akka://flink/user/dispatcher#-1725880087]] after [10000 ms]. Sender[null]
> sent message of type
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>
>
>
>
> Warm Regards,
> *Samir Chauhan*
>
> *Regional Infrastructure & Operations*
>
> <image001.png>
>
> *Prudential Services Singapore Pte Ltd*
> 1 Wallich Street #19-01, Guoco Tower Singapore 078881
>
> Direct (65) 6704 7264 Mobile (65) 9721 7548
> samir.tusharbhai.chauhan@prudential.com.sg
>
> www.prudential.com.sg
>
>
>
> There's a reason we support Fair Dealing. YOU.
>
>
> This email and any files transmitted with it or attached to it (the
> [Email]) may contain confidential, proprietary or legally privileged
> information and is intended solely for the use of the individual or entity
> to whom it is addressed. If you are not the intended recipient of the
> Email, you must not, directly or indirectly, copy, use, print, distribute,
> disclose to any other party or take any action in reliance on any part of
> the Email. Please notify the system manager or sender of the error and
> delete all copies of the Email immediately.
>
> No statement in the Email should be construed as investment advice being
> given within or outside Singapore. Prudential Assurance Company Singapore
> (Pte) Limited (PACS) and each of its related entities shall not be
> responsible for any losses, claims, penalties, costs or damages arising
> from or in connection with the use of the Email or the information therein,
> in whole or in part. You are solely responsible for conducting any virus
> checks prior to opening, accessing or disseminating the Email.
>
> PACS (Company Registration No. 199002477Z) is a company incorporated under
> the laws of Singapore and has its registered office at 30 Cecil Street,
> #30-01, Prudential Tower, Singapore 049712.
>
> PACS is an indirect wholly owned subsidiary of Prudential plc of the
> United Kingdom. PACS and Prudential plc are not affiliated in any manner
> with Prudential Financial, Inc., a company whose principal place of
> business is in the United States of America.
>
>
>

Re: Flink Deployment failing with RestClientException

Posted by Andrey Zagrebin <az...@gmail.com>.
Hi Samir,

It may be a known issue [1][2] where some action during job submission takes too long time but eventually completes in job manager.
Have you checked job manager logs whether there are any other failures, not “Ask timed out"?
Have you checked Web UI whether all the jobs have been started in fact despite the client error?

Best,
Andrey

[1] https://issues.apache.org/jira/browse/FLINK-16429 <https://issues.apache.org/jira/browse/FLINK-16429>
[2] https://issues.apache.org/jira/browse/FLINK-16018 <https://issues.apache.org/jira/browse/FLINK-16018>

> On 5 Mar 2020, at 17:49, Samir Tusharbhai Chauhan <sa...@prudential.com.sg> wrote:
> 
> Hi,
> 
> I am having issue where after deploying few jobs, it starts failing with below errors. I don’t have such issue in other environments. What should I check first in such scenario?
> 
> My environment is
> Azure Kubernetes 1.15.7
> Flink 1.6.0
> Zookeeper 3.4.10
>  
> 
> The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not submit job (JobID: e83db2da358db355ccdcf6740c6bb134)
>         at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:249)
>         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:486)
>         at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
>         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:432)
>         at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:804)
>         at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:280)
>         at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
>         at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1044)
>         at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
>         at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>         at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
>         at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:379)
>         at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>         at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
>         at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>         at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
>         at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
>         at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
>         at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>         at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>         at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>         at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
>         ... 12 more
> Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Exception is not retryable.
>         ... 10 more
> Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
>         at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
>         at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
>         at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
>         at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
>         at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
>         ... 4 more
> Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job submission failed.]
>         at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:310)
>         at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:294)
>         at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
>         ... 5 more
>  
>  
> 
> More errors
> 
> at java.lang.Thread.run(Thread.java:748)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087 <akka://flink/user/dispatcher#-1725880087>]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>         ... 9 more
> 2020-03-04 08:39:06,675 ERROR org.apache.flink.runtime.rest.handler.cluster.ClusterOverviewHandler  - Could not retrieve the redirect address.
> java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087 <akka://flink/user/dispatcher#-1725880087>]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>         at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
>         at akka.dispatch.OnComplete.internal(Future.scala:258)
>         at akka.dispatch.OnComplete.internal(Future.scala:256)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>         at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
>         at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>         at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>         at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>         at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>         at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>         at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>         at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>         at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>         at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>         at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087 <akka://flink/user/dispatcher#-1725880087>]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>         ... 9 more
> 2020-03-04 08:39:07,676 ERROR org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler  - Could not retrieve the redirect address.
> java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087 <akka://flink/user/dispatcher#-1725880087>]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>         at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
>         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
>         at akka.dispatch.OnComplete.internal(Future.scala:258)
>         at akka.dispatch.OnComplete.internal(Future.scala:256)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
>         at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
>         at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
>         at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
>         at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
>         at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
>         at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>         at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>         at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>         at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>         at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>         at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
>         at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
>         at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1725880087 <akka://flink/user/dispatcher#-1725880087>]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>         at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>  
>  
> 
>  
> Warm Regards,
> Samir Chauhan
>  
> Regional Infrastructure & Operations
>  
> <image001.png>
>  
> Prudential Services Singapore Pte Ltd
> 1 Wallich Street #19-01, Guoco Tower Singapore 078881
>  
> Direct (65) 6704 7264 Mobile (65) 9721 7548
> samir.tusharbhai.chauhan@prudential.com.sg <ma...@prudential.com.sg>
>  
> www.prudential.com.sg <http://www.prudential.com.sg/>
>  
> 
> 
> There's a reason we support Fair Dealing. YOU.
> 
> 
> This email and any files transmitted with it or attached to it (the [Email]) may contain confidential, proprietary or legally privileged information and is intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient of the Email, you must not, directly or indirectly, copy, use, print, distribute, disclose to any other party or take any action in reliance on any part of the Email. Please notify the system manager or sender of the error and delete all copies of the Email immediately.  
> 
> No statement in the Email should be construed as investment advice being given within or outside Singapore. Prudential Assurance Company Singapore (Pte) Limited (PACS) and each of its related entities shall not be responsible for any losses, claims, penalties, costs or damages arising from or in connection with the use of the Email or the information therein, in whole or in part. You are solely responsible for conducting any virus checks prior to opening, accessing or disseminating the Email.
> 
> PACS (Company Registration No. 199002477Z) is a company incorporated under the laws of Singapore and has its registered office at 30 Cecil Street, #30-01, Prudential Tower, Singapore 049712.
> 
> PACS is an indirect wholly owned subsidiary of Prudential plc of the United Kingdom. PACS and Prudential plc are not affiliated in any manner with Prudential Financial, Inc., a company whose principal place of business is in the United States of America.