You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Felipe Gutierrez <fe...@gmail.com> on 2019/09/06 08:36:14 UTC

How do I start a Flink application on my Flink+Mesos cluster?

Hi,

I am running Mesos without DC/OS [1] and Flink on it. Whe I start my
cluster I receive some messages that I suppose everything was started.
However, I see 0 slats available on the Flink web dashboard. But I suppose
that Mesos will allocate Slots and Task Managers dynamically. Is that right?

$ ./bin/mesos-appmaster.sh &
[1] 16723
flink@r03:~/flink-1.9.0$ I0906 10:22:45.080328 16943 sched.cpp:239]
Version: 1.9.0
I0906 10:22:45.082672 16996 sched.cpp:343] New master detected at
master@XXX.XXX.XXX.XXX:5050
I0906 10:22:45.083276 16996 sched.cpp:363] No credentials provided.
Attempting to register without authentication
I0906 10:22:45.086840 16997 sched.cpp:751] Framework registered with
22f6a553-e8ac-42d4-9a90-96a8d5f002f0-0003

Then I deploy my Flink application. When I use the first command to deploy
the application starts. However, the tasks remain CREATED until Flink
throws a timeout exception. In other words, it never turns to RUNNING.
When I use the second comman to deploy the application it does not start
and I receive the exception of "Could not allocate all requires slots
within timeout of 300000 ms. Slots required: 2". The full stacktrace is
below.

$ /home/flink/flink-1.9.0/bin/flink run
/home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
$ ./bin/mesos-appmaster-job.sh run
/home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &


[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html#mesos-without-dcos
ps.: my application runs normally on a standalone Flink cluster.

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Job failed.
(JobID: 7ad8d71faaceb1ac469353452c43dc2a)
at
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
at
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:60)
at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
at org.hello_flink_mesos.App.<init>(App.java:35)
at org.hello_flink_mesos.App.main(App.java:285)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:576)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:438)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
execution failed.
at
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
... 22 more
Caused by:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate all requires slots within timeout of 300000 ms. Slots
required: 2, slots allocated: 0, previous allocation IDs: [], execution
status: completed exceptionally: java.util.concurrent.CompletionException:
java.util.concurrent.CompletionException:
java.util.concurrent.TimeoutException/java.util.concurrent.CompletableFuture@b520de3[Completed
exceptionally], incomplete: java.util.concurrent.CompletableFuture@36f3d30c[Not
completed, 1 dependents]
at
org.apache.flink.runtime.executiongraph.SchedulingUtils.lambda$scheduleEager$1(SchedulingUtils.java:194)
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:633)
at
org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.lambda$new$0(FutureUtils.java:656)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:190)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:700)
at
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:484)
at
org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:380)
at
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
at
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:998)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Thanks,
Felipe
*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*

Re: How do I start a Flink application on my Flink+Mesos cluster?

Posted by Felipe Gutierrez <fe...@gmail.com>.
Thanks Gary,

I am compiling a new version of Mesos and when I test it again I will reply
here if I found an error.


On Wed, 11 Sep 2019, 09:22 Gary Yao, <ga...@ververica.com> wrote:

> Hi Felipe,
>
> I am glad that you were able to fix the problem yourself.
>
> > But I suppose that Mesos will allocate Slots and Task Managers
> dynamically.
> > Is that right?
>
> Yes, that is the case since Flink 1.5 [1].
>
> > Then I put the number of "mesos.resourcemanager.tasks.cpus" to be equal
> or
> > less the available cores on a single node of the cluster. I am not sure
> about
> > this parameter, but only after this configuration it worked.
>
> I would need to see JobManager and Mesos logs to understand why this
> resolved
> your issue. If you do not set mesos.resourcemanager.tasks.cpus explicitly,
> Flink will request CPU resources equal to the number of TaskManager slots
> (taskmanager.numberOfTaskSlots) [2]. Maybe this value was too high in your
> configuration?
>
> Best,
> Gary
>
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
> [2]
> https://github.com/apache/flink/blob/0a405251b297109fde1f9a155eff14be4d943887/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosTaskManagerParameters.java#L344
>
> On Tue, Sep 10, 2019 at 10:41 AM Felipe Gutierrez <
> felipe.o.gutierrez@gmail.com> wrote:
>
>> I managed to find what was going wrong. I will write here just for the
>> record.
>>
>> First, the master machine was not login automatically at itself. So I had
>> to give permission for it.
>>
>> chmod og-wx ~/.ssh/authorized_keys
>> chmod 750 $HOME
>>
>> Then I put the number of "mesos.resourcemanager.tasks.cpus" to be equal
>> or less the available cores on a single node of the cluster. I am not sure
>> about this parameter, but only after this configuration it worked.
>>
>> Felipe
>> *--*
>> *-- Felipe Gutierrez*
>>
>> *-- skype: felipe.o.gutierrez*
>> *--* *https://felipeogutierrez.blogspot.com
>> <https://felipeogutierrez.blogspot.com>*
>>
>>
>> On Fri, Sep 6, 2019 at 10:36 AM Felipe Gutierrez <
>> felipe.o.gutierrez@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am running Mesos without DC/OS [1] and Flink on it. Whe I start my
>>> cluster I receive some messages that I suppose everything was started.
>>> However, I see 0 slats available on the Flink web dashboard. But I suppose
>>> that Mesos will allocate Slots and Task Managers dynamically. Is that right?
>>>
>>> $ ./bin/mesos-appmaster.sh &
>>> [1] 16723
>>> flink@r03:~/flink-1.9.0$ I0906 10:22:45.080328 16943 sched.cpp:239]
>>> Version: 1.9.0
>>> I0906 10:22:45.082672 16996 sched.cpp:343] New master detected at
>>> master@XXX.XXX.XXX.XXX:5050
>>> I0906 10:22:45.083276 16996 sched.cpp:363] No credentials provided.
>>> Attempting to register without authentication
>>> I0906 10:22:45.086840 16997 sched.cpp:751] Framework registered with
>>> 22f6a553-e8ac-42d4-9a90-96a8d5f002f0-0003
>>>
>>> Then I deploy my Flink application. When I use the first command to
>>> deploy the application starts. However, the tasks remain CREATED until
>>> Flink throws a timeout exception. In other words, it never turns to RUNNING.
>>> When I use the second comman to deploy the application it does not start
>>> and I receive the exception of "Could not allocate all requires slots
>>> within timeout of 300000 ms. Slots required: 2". The full stacktrace is
>>> below.
>>>
>>> $ /home/flink/flink-1.9.0/bin/flink run
>>> /home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
>>> $ ./bin/mesos-appmaster-job.sh run
>>> /home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
>>>
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html#mesos-without-dcos
>>> ps.: my application runs normally on a standalone Flink cluster.
>>>
>>> ------------------------------------------------------------
>>>  The program finished with the following exception:
>>>
>>> org.apache.flink.client.program.ProgramInvocationException: Job failed.
>>> (JobID: 7ad8d71faaceb1ac469353452c43dc2a)
>>> at
>>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
>>> at
>>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
>>> at
>>> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:60)
>>> at
>>> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
>>> at org.hello_flink_mesos.App.<init>(App.java:35)
>>> at org.hello_flink_mesos.App.main(App.java:285)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at
>>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:576)
>>> at
>>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:438)
>>> at
>>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
>>> at
>>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
>>> at
>>> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
>>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
>>> at
>>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
>>> at
>>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>>> at
>>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
>>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>>> execution failed.
>>> at
>>> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
>>> at
>>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
>>> ... 22 more
>>> Caused by:
>>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
>>> Could not allocate all requires slots within timeout of 300000 ms. Slots
>>> required: 2, slots allocated: 0, previous allocation IDs: [], execution
>>> status: completed exceptionally: java.util.concurrent.CompletionException:
>>> java.util.concurrent.CompletionException:
>>> java.util.concurrent.TimeoutException/java.util.concurrent.CompletableFuture@b520de3[Completed
>>> exceptionally], incomplete: java.util.concurrent.CompletableFuture@36f3d30c[Not
>>> completed, 1 dependents]
>>> at
>>> org.apache.flink.runtime.executiongraph.SchedulingUtils.lambda$scheduleEager$1(SchedulingUtils.java:194)
>>> at
>>> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>>> at
>>> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>> at
>>> org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:633)
>>> at
>>> org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.lambda$new$0(FutureUtils.java:656)
>>> at
>>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>>> at
>>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>> at
>>> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:190)
>>> at
>>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>>> at
>>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>> at
>>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:700)
>>> at
>>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:484)
>>> at
>>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:380)
>>> at
>>> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>>> at
>>> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>>> at
>>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>>> at
>>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>>> at
>>> org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:998)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
>>> at
>>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>>> at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>
>>> Thanks,
>>> Felipe
>>> *--*
>>> *-- Felipe Gutierrez*
>>>
>>> *-- skype: felipe.o.gutierrez*
>>> *--* *https://felipeogutierrez.blogspot.com
>>> <https://felipeogutierrez.blogspot.com>*
>>>
>>

Re: How do I start a Flink application on my Flink+Mesos cluster?

Posted by Gary Yao <ga...@ververica.com>.
Hi Felipe,

I am glad that you were able to fix the problem yourself.

> But I suppose that Mesos will allocate Slots and Task Managers
dynamically.
> Is that right?

Yes, that is the case since Flink 1.5 [1].

> Then I put the number of "mesos.resourcemanager.tasks.cpus" to be equal or
> less the available cores on a single node of the cluster. I am not sure
about
> this parameter, but only after this configuration it worked.

I would need to see JobManager and Mesos logs to understand why this
resolved
your issue. If you do not set mesos.resourcemanager.tasks.cpus explicitly,
Flink will request CPU resources equal to the number of TaskManager slots
(taskmanager.numberOfTaskSlots) [2]. Maybe this value was too high in your
configuration?

Best,
Gary


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
[2]
https://github.com/apache/flink/blob/0a405251b297109fde1f9a155eff14be4d943887/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosTaskManagerParameters.java#L344

On Tue, Sep 10, 2019 at 10:41 AM Felipe Gutierrez <
felipe.o.gutierrez@gmail.com> wrote:

> I managed to find what was going wrong. I will write here just for the
> record.
>
> First, the master machine was not login automatically at itself. So I had
> to give permission for it.
>
> chmod og-wx ~/.ssh/authorized_keys
> chmod 750 $HOME
>
> Then I put the number of "mesos.resourcemanager.tasks.cpus" to be equal or
> less the available cores on a single node of the cluster. I am not sure
> about this parameter, but only after this configuration it worked.
>
> Felipe
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>
>
> On Fri, Sep 6, 2019 at 10:36 AM Felipe Gutierrez <
> felipe.o.gutierrez@gmail.com> wrote:
>
>> Hi,
>>
>> I am running Mesos without DC/OS [1] and Flink on it. Whe I start my
>> cluster I receive some messages that I suppose everything was started.
>> However, I see 0 slats available on the Flink web dashboard. But I suppose
>> that Mesos will allocate Slots and Task Managers dynamically. Is that right?
>>
>> $ ./bin/mesos-appmaster.sh &
>> [1] 16723
>> flink@r03:~/flink-1.9.0$ I0906 10:22:45.080328 16943 sched.cpp:239]
>> Version: 1.9.0
>> I0906 10:22:45.082672 16996 sched.cpp:343] New master detected at
>> master@XXX.XXX.XXX.XXX:5050
>> I0906 10:22:45.083276 16996 sched.cpp:363] No credentials provided.
>> Attempting to register without authentication
>> I0906 10:22:45.086840 16997 sched.cpp:751] Framework registered with
>> 22f6a553-e8ac-42d4-9a90-96a8d5f002f0-0003
>>
>> Then I deploy my Flink application. When I use the first command to
>> deploy the application starts. However, the tasks remain CREATED until
>> Flink throws a timeout exception. In other words, it never turns to RUNNING.
>> When I use the second comman to deploy the application it does not start
>> and I receive the exception of "Could not allocate all requires slots
>> within timeout of 300000 ms. Slots required: 2". The full stacktrace is
>> below.
>>
>> $ /home/flink/flink-1.9.0/bin/flink run
>> /home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
>> $ ./bin/mesos-appmaster-job.sh run
>> /home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
>>
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html#mesos-without-dcos
>> ps.: my application runs normally on a standalone Flink cluster.
>>
>> ------------------------------------------------------------
>>  The program finished with the following exception:
>>
>> org.apache.flink.client.program.ProgramInvocationException: Job failed.
>> (JobID: 7ad8d71faaceb1ac469353452c43dc2a)
>> at
>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
>> at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
>> at
>> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:60)
>> at
>> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
>> at org.hello_flink_mesos.App.<init>(App.java:35)
>> at org.hello_flink_mesos.App.main(App.java:285)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:576)
>> at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:438)
>> at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
>> at
>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
>> at
>> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
>> at
>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
>> at
>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>> at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>> execution failed.
>> at
>> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
>> at
>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
>> ... 22 more
>> Caused by:
>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
>> Could not allocate all requires slots within timeout of 300000 ms. Slots
>> required: 2, slots allocated: 0, previous allocation IDs: [], execution
>> status: completed exceptionally: java.util.concurrent.CompletionException:
>> java.util.concurrent.CompletionException:
>> java.util.concurrent.TimeoutException/java.util.concurrent.CompletableFuture@b520de3[Completed
>> exceptionally], incomplete: java.util.concurrent.CompletableFuture@36f3d30c[Not
>> completed, 1 dependents]
>> at
>> org.apache.flink.runtime.executiongraph.SchedulingUtils.lambda$scheduleEager$1(SchedulingUtils.java:194)
>> at
>> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>> at
>> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>> at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>> at
>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>> at
>> org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:633)
>> at
>> org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.lambda$new$0(FutureUtils.java:656)
>> at
>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>> at
>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>> at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>> at
>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>> at
>> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:190)
>> at
>> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>> at
>> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>> at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>> at
>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>> at
>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:700)
>> at
>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:484)
>> at
>> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:380)
>> at
>> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>> at
>> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>> at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>> at
>> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>> at
>> org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:998)
>> at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
>> at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
>> at
>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>> at
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> Thanks,
>> Felipe
>> *--*
>> *-- Felipe Gutierrez*
>>
>> *-- skype: felipe.o.gutierrez*
>> *--* *https://felipeogutierrez.blogspot.com
>> <https://felipeogutierrez.blogspot.com>*
>>
>

Re: How do I start a Flink application on my Flink+Mesos cluster?

Posted by Felipe Gutierrez <fe...@gmail.com>.
I managed to find what was going wrong. I will write here just for the
record.

First, the master machine was not login automatically at itself. So I had
to give permission for it.

chmod og-wx ~/.ssh/authorized_keys
chmod 750 $HOME

Then I put the number of "mesos.resourcemanager.tasks.cpus" to be equal or
less the available cores on a single node of the cluster. I am not sure
about this parameter, but only after this configuration it worked.

Felipe
*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*


On Fri, Sep 6, 2019 at 10:36 AM Felipe Gutierrez <
felipe.o.gutierrez@gmail.com> wrote:

> Hi,
>
> I am running Mesos without DC/OS [1] and Flink on it. Whe I start my
> cluster I receive some messages that I suppose everything was started.
> However, I see 0 slats available on the Flink web dashboard. But I suppose
> that Mesos will allocate Slots and Task Managers dynamically. Is that right?
>
> $ ./bin/mesos-appmaster.sh &
> [1] 16723
> flink@r03:~/flink-1.9.0$ I0906 10:22:45.080328 16943 sched.cpp:239]
> Version: 1.9.0
> I0906 10:22:45.082672 16996 sched.cpp:343] New master detected at
> master@XXX.XXX.XXX.XXX:5050
> I0906 10:22:45.083276 16996 sched.cpp:363] No credentials provided.
> Attempting to register without authentication
> I0906 10:22:45.086840 16997 sched.cpp:751] Framework registered with
> 22f6a553-e8ac-42d4-9a90-96a8d5f002f0-0003
>
> Then I deploy my Flink application. When I use the first command to deploy
> the application starts. However, the tasks remain CREATED until Flink
> throws a timeout exception. In other words, it never turns to RUNNING.
> When I use the second comman to deploy the application it does not start
> and I receive the exception of "Could not allocate all requires slots
> within timeout of 300000 ms. Slots required: 2". The full stacktrace is
> below.
>
> $ /home/flink/flink-1.9.0/bin/flink run
> /home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
> $ ./bin/mesos-appmaster-job.sh run
> /home/flink/hello-flink-mesos/target/hello-flink-mesos.jar &
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html#mesos-without-dcos
> ps.: my application runs normally on a standalone Flink cluster.
>
> ------------------------------------------------------------
>  The program finished with the following exception:
>
> org.apache.flink.client.program.ProgramInvocationException: Job failed.
> (JobID: 7ad8d71faaceb1ac469353452c43dc2a)
> at
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
> at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
> at
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:60)
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
> at org.hello_flink_mesos.App.<init>(App.java:35)
> at org.hello_flink_mesos.App.main(App.java:285)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:576)
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:438)
> at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
> at
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
> at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
> at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
> at
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
> execution failed.
> at
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
> at
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
> ... 22 more
> Caused by:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Could not allocate all requires slots within timeout of 300000 ms. Slots
> required: 2, slots allocated: 0, previous allocation IDs: [], execution
> status: completed exceptionally: java.util.concurrent.CompletionException:
> java.util.concurrent.CompletionException:
> java.util.concurrent.TimeoutException/java.util.concurrent.CompletableFuture@b520de3[Completed
> exceptionally], incomplete: java.util.concurrent.CompletableFuture@36f3d30c[Not
> completed, 1 dependents]
> at
> org.apache.flink.runtime.executiongraph.SchedulingUtils.lambda$scheduleEager$1(SchedulingUtils.java:194)
> at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
> at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at
> org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.handleCompletedFuture(FutureUtils.java:633)
> at
> org.apache.flink.runtime.concurrent.FutureUtils$ResultConjunctFuture.lambda$new$0(FutureUtils.java:656)
> at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at
> org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:190)
> at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:700)
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:484)
> at
> org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:380)
> at
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
> at
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> at
> org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:998)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> Thanks,
> Felipe
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>