You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Radu Tudoran <ra...@huawei.com> on 2016/02/10 19:35:30 UTC

job manager timeout

Hi,

I am running a program that works fine locally, but when I try to run it on the cluster I get a timeout error from the client that tries to connect to the jobmanager. There is no issue with contacting the jobmanager form the machine, as it works just fine for other stream applications. I suspect that because the stream topology is rather complex, there is an issue with deploying the schematic. I am not sure if this is a normal behavior (IMHO I would think it should not fail just because the topology is more complex). Hence, if the error helps to identify the underlyin issue (if any) please see it below.
Meanwhile, can you please educate me on how I can configure the timeout such that it won't fail anymore.

Thanks



org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Job submission to the JobManager timed out.
        at org.apache.flink.client.program.Client.runBlocking(Client.java:370)
        at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:96)
        at application.MainStreamApp.main(MainStreamApp.java:108)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
        at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
        at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Communication with JobManager failed: Job submission to the JobManager timed out.
        at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:140)
        at org.apache.flink.client.program.Client.runBlocking(Client.java:368)
        ... 13 more
Caused by: org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: Job submission to the JobManager timed out.
        at org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:255)
        at org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)
        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



Dr. Radu Tudoran
Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudoran@huawei.com
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!


Re: job manager timeout

Posted by Robert Metzger <rm...@apache.org>.
Hi Radu,

did you check the JobManager logs as well? Maybe there you can see why the
JobManager is failing.

The timeout is configurable through the "akka.client.timeout" variable. The
default value is "60 s".

On Wed, Feb 10, 2016 at 7:35 PM, Radu Tudoran <ra...@huawei.com>
wrote:

> Hi,
>
>
>
> I am running a program that works fine locally, but when I try to run it
> on the cluster I get a timeout error from the client that tries to connect
> to the jobmanager. There is no issue with contacting the jobmanager form
> the machine, as it works just fine for other stream applications. I suspect
> that because the stream topology is rather complex, there is an issue with
> deploying the schematic. I am not sure if this is a normal behavior (IMHO I
> would think it should not fail just because the topology is more complex).
> Hence, if the error helps to identify the underlyin issue (if any) please
> see it below.
>
> Meanwhile, can you please educate me on how I can configure the timeout
> such that it won’t fail anymore.
>
>
>
> Thanks
>
>
>
>
>
>
>
> org.apache.flink.client.program.ProgramInvocationException: The program
> execution failed: Communication with JobManager failed: Job submission to
> the JobManager timed out.
>
>         at
> org.apache.flink.client.program.Client.runBlocking(Client.java:370)
>
>         at
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:96)
>
>         at application.MainStreamApp.main(MainStreamApp.java:108)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:606)
>
>         at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
>
>         at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
>
>         at
> org.apache.flink.client.program.Client.runBlocking(Client.java:252)
>
>         at
> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:676)
>
>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
>
>         at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:978)
>
>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1028)
>
> Caused by: org.apache.flink.runtime.client.JobExecutionException:
> Communication with JobManager failed: Job submission to the JobManager
> timed out.
>
>         at
> org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:140)
>
>         at
> org.apache.flink.client.program.Client.runBlocking(Client.java:368)
>
>         ... 13 more
>
> Caused by:
> org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException:
> Job submission to the JobManager timed out.
>
>         at
> org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:255)
>
>         at
> org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)
>
>         at
> org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
>
>         at
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
>
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>
>         at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
>
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>
>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>
>        at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
>
>
>
>
> Dr. Radu Tudoran
>
> Research Engineer - Big Data Expert
>
> IT R&D Division
>
>
>
> [image: cid:image007.jpg@01CD52EB.AD060EE0]
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
>
> European Research Center
>
> Riesstrasse 25, 80992 München
>
>
>
> E-mail: *radu.tudoran@huawei.com <ra...@huawei.com>*
>
> Mobile: +49 15209084330
>
> Telephone: +49 891588344173
>
>
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH
> Hansaallee 205, 40549 Düsseldorf, Germany, www.huawei.com
> Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
> Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
> Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
> Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
>
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
>
>