You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ashutosh Kumar <as...@gmail.com> on 2016/03/10 09:45:59 UTC

Running Flink 1.0.0 on YARN

I have a yarn setup with 1 master and 2 slaves.
    When I run yarn session with  bin/yarn-session.sh -n 2 -jm 1024 -tm
1024 and  submit job with bin/flink run examples/batch/WordCount.jar , the
job succeeds . It shows status on yarn UI http://x.x.x.x:8088/cluster .
However it does not show anything on Flink UI http://x.x.x.x:8081/#/overview
.

Is this expected behavior ?

If I run using bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096
examples/batch/WordCount.jar then the job fails with following error.

     * java.lang.IllegalStateException: Update task on instance
451105022ff3b4cd6e2c307e239d1595 @ slave2 - 2 slots - URL:
akka.tcp://flink@*















*x.x.x.x:43272/user/taskmanager failed due to:        at
org.apache.flink.runtime.executiongraph.Execution$6.onFailure(Execution.java:954)
at akka.dispatch.OnFailure.internal(Future.scala:228)        at
akka.dispatch.OnFailure.internal(Future.scala:227)        at
akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)        at
akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)        at
scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at
scala.runtime.AbstractPartialFunction.applyOrElse(AbstractPartialFunction.scala:25)
at
scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
at
scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:134)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)        at
scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)Caused
by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka.tcp://flink@*













*x.x.x.x:43272/user/taskmanager#1361901425]] after [10000 ms]        at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)        at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)03/10/2016 08:36:46     Job
execution switched to status FAILING.*

*Thanks*

*Ashutosh*

Re: Running Flink 1.0.0 on YARN

Posted by Robert Metzger <rm...@apache.org>.
Hi,

the first issue you are describing is expected. Flink is starting the web
interface on the container running the JobManager, not on the resource
manager.
Also, the port is allocated dynamically, to avoid port collisions. So its
not started on 8081.
However, you can access the web interface from the proxy provided in the
application overview.

Regarding the second error, can you check the log files of the TaskManager
(running on *x.x.x.x:43272)* which failed?
I'm pretty sure there is some information in there why it didn't respond.


On Thu, Mar 10, 2016 at 9:45 AM, Ashutosh Kumar <as...@gmail.com>
wrote:

> I have a yarn setup with 1 master and 2 slaves.
>     When I run yarn session with  bin/yarn-session.sh -n 2 -jm 1024 -tm
> 1024 and  submit job with bin/flink run examples/batch/WordCount.jar , the
> job succeeds . It shows status on yarn UI http://x.x.x.x:8088/cluster .
> However it does not show anything on Flink UI
> http://x.x.x.x:8081/#/overview .
>
> Is this expected behavior ?
>
> If I run using bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096
> examples/batch/WordCount.jar then the job fails with following error.
>
>      * java.lang.IllegalStateException: Update task on instance
> 451105022ff3b4cd6e2c307e239d1595 @ slave2 - 2 slots - URL:
> akka.tcp://flink@*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *x.x.x.x:43272/user/taskmanager failed due to:        at
> org.apache.flink.runtime.executiongraph.Execution$6.onFailure(Execution.java:954)
> at akka.dispatch.OnFailure.internal(Future.scala:228)        at
> akka.dispatch.OnFailure.internal(Future.scala:227)        at
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)        at
> akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)        at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at
> scala.runtime.AbstractPartialFunction.applyOrElse(AbstractPartialFunction.scala:25)
> at
> scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
> at
> scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:134)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)        at
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)Caused
> by: akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka.tcp://flink@*
>
>
>
>
>
>
>
>
>
>
>
>
>
> *x.x.x.x:43272/user/taskmanager#1361901425]] after [10000 ms]        at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
> at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)        at
> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
> at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
> at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
> at
> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
> at
> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
> at
> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
> at java.lang.Thread.run(Thread.java:745)03/10/2016 08:36:46     Job
> execution switched to status FAILING.*
>
> *Thanks*
>
> *Ashutosh*
>