You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Fabian Hueske <fh...@gmail.com> on 2018/05/02 10:16:53 UTC

Re: Setting the parallelism in a cluster of machines properly

Hi,

did you try to increase the Akka timeout [1]?

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html#distributed-coordination-via-akka

2018-04-29 19:44 GMT+02:00 m@xi <ma...@gmail.com>:

> Guys seriously I have done the process as described in the documentation of
> the standalone cluster 20 times. After I start the cluster with
> ./start-cluster.sh, I normally see with jps the JobManager process running
> in the master and the TaskManager processes running in slaves. Although
> every time I am trying to run a simple Flink job I get the following :
>
> Submitting job with JobID: fb7e19d45ec5e6572dff6e33f4c3aad4. Waiting for
> job
> completion.
>
> ------------------------------------------------------------
>  The program finished with the following exception:
>
> org.apache.flink.client.program.ProgramInvocationException: The program
> execution failed: Couldn't retrieve the JobExecutionResult from the
> JobManager.
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:492)
>         at
> org.apache.flink.client.program.StandaloneClusterClient.submitJob(
> StandaloneClusterClient.java:105)
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:456)
>         at
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.
> execute(StreamContextEnvironment.java:66)
>         at org.apache.flink.streamjoin.srhc.JobSRHC.main(JobSRHC.java:140)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(
> PackagedProgram.java:525)
>         at
> org.apache.flink.client.program.PackagedProgram.
> invokeInteractiveModeForExecution(PackagedProgram.java:417)
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:396)
>         at org.apache.flink.client.CliFrontend.executeProgram(
> CliFrontend.java:802)
>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:282)
>         at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1054)
>         at org.apache.flink.client.CliFrontend$1.call(
> CliFrontend.java:1101)
>         at org.apache.flink.client.CliFrontend$1.call(
> CliFrontend.java:1098)
>         at
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(
> NoOpSecurityContext.java:30)
>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1098)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Couldn't
> retrieve the JobExecutionResult from the JobManager.
>         at
> org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:
> 300)
>         at
> org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.
> java:387)
>         at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:481)
>         ... 18 more
> Caused by:
> org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException:
> Lost connection to the JobManager.
>         at
> org.apache.flink.runtime.client.JobClientActor.
> handleMessage(JobClientActor.java:219)
>         at
> org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(
> FlinkUntypedActor.java:104)
>         at
> org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(
> FlinkUntypedActor.java:71)
>         at
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(
> UntypedActor.scala:165)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>         at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
>
>
> Weeeeell you are my last resort I guess! haha
>
> Please help if you have any ideas.
>
> Best,
> Max
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Re: Setting the parallelism in a cluster of machines properly

Posted by "m@xi" <ma...@gmail.com>.
Hey Fabian!

Sorry for being unaware regarding Flink configurations, but for me I have
followed every step but still setting a simple cluster of 2 nodes proved to
be a pain in the as@@#. 

So, to which value you think I should set the akka timeout?

Also, in my head the process is the following : Set up the cluster and then
transfer the fat jar and your data to the master and there run the job. The
data are forwarded to the slaves which are computing the job. Is this
correct?

Best,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Setting the parallelism in a cluster of machines properly

Posted by Fabian Hueske <fh...@gmail.com>.
It's not a requirement but the exception reads "org.apache.flink.runtime.
client.JobClientActorConnectionTimeoutException: Lost connection to the
JobManager.".
So increasing the timeout might help.

Best, Fabian

2018-05-02 12:20 GMT+02:00 m@xi <ma...@gmail.com>:

> Hello Fabian!
>
> Thanks for the answer. No I did not. Is this a requirement?
>
> Best,
> Max
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Re: Setting the parallelism in a cluster of machines properly

Posted by "m@xi" <ma...@gmail.com>.
Hello Fabian!

Thanks for the answer. No I did not. Is this a requirement? 

Best,
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/