You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Krzysztof Zarzycki <k....@gmail.com> on 2015/07/10 13:18:03 UTC

Samza job on YARN stuck Unassigned

Hi there Samza developers,

I have a problem that I cannot overcome with deploying Samza task on YARN.
When I submitted the task, ApplicationMasters get created (2 of them), job
is visible, but in state UNASSIGNED. After some time the job FAILED.

application information on resource manager panel is :
State: FAILED
FinalStatus: FAILED
Elapsed: 25mins, 2sec
Diagnostics: Application application_1424354741837_0380 failed 2 times due
to ApplicationMaster for attempt appattempt_1424354741837_0380_000002 timed
out. Failing the application.


When I look into the logs of ApplicationMaster I see no errors, no
warnings, anything wrong: Please see the output of "yarn logs" comand
attached.

My guess would be that connection failed between some components (container
to ApplicationMaster? NodeManager? ).  I suspect that when looking at
jstack output in the AM:

"main" #1 prio=5 os_prio=0 tid=0x00007f9338015000 nid=0x6f2f waiting on
condition [0x00007f933de6e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
  at java.lang.Thread.sleep(Native Method)
  at
org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43)
  at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:154)
  at com.sun.proxy.$Proxy18.registerApplicationMaster(Unknown Source)
  at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
  at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
  at
org.apache.samza.job.yarn.SamzaAppMasterLifecycle.onInit(SamzaAppMasterLifecycle.scala:39)
  at
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:108)
  at
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:108)
  at scala.collection.immutable.List.foreach(List.scala:318)
  at org.apache.samza.job.yarn.SamzaAppMaster$.run(SamzaAppMaster.scala:108)
  at org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:95)
  at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)


On the other hand I see in logs correct RM addresses:
15/07/10 12:17:30 INFO client.RMProxy: Connecting to ResourceManager at
hdnn02.company.com/148.251.82.11:8030
15/07/10 12:17:31 INFO client.RMProxy: Connecting to ResourceManager at
hdnn02.company.com/148.251.82.11:8050
...
2015-07-10 12:17:31,032 [main] INFO  o.apache.samza.job.yarn.ClientHelper -
trying to connect to RM hdnn02.company.com:8050
...
2015-07-10 12:17:31,680 [main] INFO  o.a.s.job.yarn.SamzaAppMasterService -
Webapp is started at (rpc http://78.46.56.88:43268/, tracking http://


Does anyone knows what could be wrong here? I'll be grateful for any help,
also in just debugging the case.
I start with a simple question: do you know how to set log4j for AM &
containers to DEBUG?

Thank you!
Krzysztof

Re: Samza job on YARN stuck Unassigned

Posted by Gustavo Anatoly <gu...@gmail.com>.
Hi Krzysztof,

I had connectivity errors, but in my case was the /etc/hosts misconfigured.

Cheers.

2015-07-10 12:11 GMT-03:00 Roger Hoover <ro...@gmail.com>:

> Hi Krzysztof,
>
> I haven't seen that error before.  It does sound like it could be a
> connection issue.  Did you check that the YARN node has access
> to hdfs:///user/samza/deploy/event-log-etl-nested-0.1.0-dist.tar.gz?
>
> One way to set the AM and containers to debug is to include a log4j.xml
> file in your tar.gz on the lib folder.  There special logic in the start
> scripts (
>
> https://github.com/apache/samza/blob/master/samza-shell/src/main/bash/run-class.sh#L40
> )
> that checks for that path and doesn't work with log4j.properties, for
> example.
>
> Cheers,
>
> Roger
>
>
>
> On Fri, Jul 10, 2015 at 4:18 AM, Krzysztof Zarzycki <k....@gmail.com>
> wrote:
>
> > Hi there Samza developers,
> >
> > I have a problem that I cannot overcome with deploying Samza task on
> YARN.
> > When I submitted the task, ApplicationMasters get created (2 of them),
> job
> > is visible, but in state UNASSIGNED. After some time the job FAILED.
> >
> > application information on resource manager panel is :
> > State: FAILED
> > FinalStatus: FAILED
> > Elapsed: 25mins, 2sec
> > Diagnostics: Application application_1424354741837_0380 failed 2 times
> due
> > to ApplicationMaster for attempt appattempt_1424354741837_0380_000002
> timed
> > out. Failing the application.
> >
> >
> > When I look into the logs of ApplicationMaster I see no errors, no
> > warnings, anything wrong: Please see the output of "yarn logs" comand
> > attached.
> >
> > My guess would be that connection failed between some components
> > (container to ApplicationMaster? NodeManager? ).  I suspect that when
> > looking at jstack output in the AM:
> >
> > "main" #1 prio=5 os_prio=0 tid=0x00007f9338015000 nid=0x6f2f waiting on
> > condition [0x00007f933de6e000]
> >    java.lang.Thread.State: TIMED_WAITING (sleeping)
> >   at java.lang.Thread.sleep(Native Method)
> >   at
> >
> org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43)
> >   at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:154)
> >   at com.sun.proxy.$Proxy18.registerApplicationMaster(Unknown Source)
> >   at
> >
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
> >   at
> >
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
> >   at
> >
> org.apache.samza.job.yarn.SamzaAppMasterLifecycle.onInit(SamzaAppMasterLifecycle.scala:39)
> >   at
> >
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:108)
> >   at
> >
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:108)
> >   at scala.collection.immutable.List.foreach(List.scala:318)
> >   at
> > org.apache.samza.job.yarn.SamzaAppMaster$.run(SamzaAppMaster.scala:108)
> >   at
> > org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:95)
> >   at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
> >
> >
> > On the other hand I see in logs correct RM addresses:
> > 15/07/10 12:17:30 INFO client.RMProxy: Connecting to ResourceManager at
> > hdnn02.company.com/148.251.82.11:8030
> > 15/07/10 12:17:31 INFO client.RMProxy: Connecting to ResourceManager at
> > hdnn02.company.com/148.251.82.11:8050
> > ...
> > 2015-07-10 12:17:31,032 [main] INFO  o.apache.samza.job.yarn.ClientHelper
> > - trying to connect to RM hdnn02.company.com:8050
> > ...
> > 2015-07-10 12:17:31,680 [main] INFO  o.a.s.job.yarn.SamzaAppMasterService
> > - Webapp is started at (rpc http://78.46.56.88:43268/, tracking http://
> >
> >
> > Does anyone knows what could be wrong here? I'll be grateful for any
> help,
> > also in just debugging the case.
> > I start with a simple question: do you know how to set log4j for AM &
> > containers to DEBUG?
> >
> > Thank you!
> > Krzysztof
> >
> >
> >
>

Re: Samza job on YARN stuck Unassigned

Posted by Roger Hoover <ro...@gmail.com>.
Hi Krzysztof,

I haven't seen that error before.  It does sound like it could be a
connection issue.  Did you check that the YARN node has access
to hdfs:///user/samza/deploy/event-log-etl-nested-0.1.0-dist.tar.gz?

One way to set the AM and containers to debug is to include a log4j.xml
file in your tar.gz on the lib folder.  There special logic in the start
scripts (
https://github.com/apache/samza/blob/master/samza-shell/src/main/bash/run-class.sh#L40)
that checks for that path and doesn't work with log4j.properties, for
example.

Cheers,

Roger



On Fri, Jul 10, 2015 at 4:18 AM, Krzysztof Zarzycki <k....@gmail.com>
wrote:

> Hi there Samza developers,
>
> I have a problem that I cannot overcome with deploying Samza task on YARN.
> When I submitted the task, ApplicationMasters get created (2 of them), job
> is visible, but in state UNASSIGNED. After some time the job FAILED.
>
> application information on resource manager panel is :
> State: FAILED
> FinalStatus: FAILED
> Elapsed: 25mins, 2sec
> Diagnostics: Application application_1424354741837_0380 failed 2 times due
> to ApplicationMaster for attempt appattempt_1424354741837_0380_000002 timed
> out. Failing the application.
>
>
> When I look into the logs of ApplicationMaster I see no errors, no
> warnings, anything wrong: Please see the output of "yarn logs" comand
> attached.
>
> My guess would be that connection failed between some components
> (container to ApplicationMaster? NodeManager? ).  I suspect that when
> looking at jstack output in the AM:
>
> "main" #1 prio=5 os_prio=0 tid=0x00007f9338015000 nid=0x6f2f waiting on
> condition [0x00007f933de6e000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at
> org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43)
>   at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:154)
>   at com.sun.proxy.$Proxy18.registerApplicationMaster(Unknown Source)
>   at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
>   at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
>   at
> org.apache.samza.job.yarn.SamzaAppMasterLifecycle.onInit(SamzaAppMasterLifecycle.scala:39)
>   at
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:108)
>   at
> org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:108)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at
> org.apache.samza.job.yarn.SamzaAppMaster$.run(SamzaAppMaster.scala:108)
>   at
> org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:95)
>   at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
>
>
> On the other hand I see in logs correct RM addresses:
> 15/07/10 12:17:30 INFO client.RMProxy: Connecting to ResourceManager at
> hdnn02.company.com/148.251.82.11:8030
> 15/07/10 12:17:31 INFO client.RMProxy: Connecting to ResourceManager at
> hdnn02.company.com/148.251.82.11:8050
> ...
> 2015-07-10 12:17:31,032 [main] INFO  o.apache.samza.job.yarn.ClientHelper
> - trying to connect to RM hdnn02.company.com:8050
> ...
> 2015-07-10 12:17:31,680 [main] INFO  o.a.s.job.yarn.SamzaAppMasterService
> - Webapp is started at (rpc http://78.46.56.88:43268/, tracking http://
>
>
> Does anyone knows what could be wrong here? I'll be grateful for any help,
> also in just debugging the case.
> I start with a simple question: do you know how to set log4j for AM &
> containers to DEBUG?
>
> Thank you!
> Krzysztof
>
>
>