You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/04/18 19:07:59 UTC

[jira] [Comment Edited] (TEZ-2338) Tez job failed due to AM Container-Launch failure at windows

    [ https://issues.apache.org/jira/browse/TEZ-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501475#comment-14501475 ] 

Hitesh Shah edited comment on TEZ-2338 at 4/18/15 5:07 PM:
-----------------------------------------------------------

[~KaveenBigdata] Can you confirm that a mapreduce job works seamlessly on the multi-node setup? I am guessing this might be a setup issue for Yarn itself and not an issue with Tez as the ApplicationMaster itself is failing to launch. 


was (Author: hitesh):
[~KaveenBigdata] Can you confirm that a mapreduce job works seamlessly? I am guessing this might be a setup issue for Yarn itself and not an issue with Tez as the ApplicationMaster itself is failing to launch. 

> Tez job failed due to AM Container-Launch failure at windows
> ------------------------------------------------------------
>
>                 Key: TEZ-2338
>                 URL: https://issues.apache.org/jira/browse/TEZ-2338
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>         Environment: Windows server 2012 and Windows-8
> Hadoop-2.5.2
> Java-1.7
>            Reporter: Kaveen Raajan
>
> I successfully Build Tez-0.6.0 against Hadoop-2.5.2
> Then I configured Tez-0.6.0 as like in http://tez.apache.org/install.html
> Moved Tez lib package to HDFS location and updated my tez-site.xml
> {code:xml}
>  <property>
>     <name>tez.lib.uris</name>
> <value>${fs.default.name}/apps/Tez/,${fs.default.name}/apps/Tez/lib/</value>
>   </property>
> {code}
> After that I tried the sample test for tez
> _hadoop jar tez-examples-0.6.0.jar orderedwordcount <input> <output>_
> But I face following error while running this command
> {code}
> Running OrderedWordCount
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/C:/Hadoop/
> share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBind
> er.class]
> SLF4J: Found binding in [jar:file:/C:/Tez/lib
> /slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 15/04/15 10:47:57 INFO client.TezClient: Tez Client Version: [ component=tez-api
> , version=0.6.0, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apa
> che.org/repos/asf/tez.git, buildTime=2015-04-15T01:13:02Z ]
> 15/04/15 10:48:00 INFO client.TezClient: Submitting DAG application with id: app
> lication_1429073725727_0005
> 15/04/15 10:48:00 INFO Configuration.deprecation: fs.default.name is deprecated.
>  Instead, use fs.defaultFS
> 15/04/15 10:48:00 INFO client.TezClientUtils: Using tez.lib.uris value from conf
> iguration: hdfs://HA-Cluster/apps/Tez/,hdfs://HA-Cluster/apps/Tez/lib/
> 15/04/15 10:48:01 INFO client.TezClient: Stage directory /tmp/app/tez/sta
> ging doesn't exist and is created
> 15/04/15 10:48:01 INFO client.TezClient: Tez system stage directory hdfs://HA-cluster
> /tmp/app/tez/staging/.tez/application_1429073725727_0005 doesn't ex
> ist and is created
> 15/04/15 10:48:02 INFO client.TezClient: Submitting DAG to YARN, applicationId=a
> pplication_1429073725727_0005, dagName=OrderedWordCount
> 15/04/15 10:48:03 INFO impl.YarnClientImpl: Submitted application application_14
> 29073725727_0005
> 15/04/15 10:48:03 INFO client.TezClient: The url to track the Tez AM: http://syn
> cserver34:8088/proxy/application_1429073725727_0005/
> 15/04/15 10:48:03 INFO client.DAGClientImpl: Waiting for DAG to start running
> 15/04/15 10:48:09 INFO client.DAGClientImpl: DAG completed. FinalState=FAILED
> OrderedWordCount failed with diagnostics: [Application application_1429073725727
> _0005 failed 2 times due to AM Container for appattempt_1429073725727_0005_00000
> 2 exited with  exitCode: -1073741515 due to: Exception from container-launch: Ex
> itCodeException exitCode=-1073741515:
> ExitCodeException exitCode=-1073741515:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>         at org.apache.hadoop.util.Shell.run(Shell.java:455)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
> 702)
>         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
> unchContainer(DefaultContainerExecutor.java:195)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
> ontainerLaunch.call(ContainerLaunch.java:300)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
> ontainerLaunch.call(ContainerLaunch.java:81)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:615)
>         at java.lang.Thread.run(Thread.java:744)
>         1 file(s) moved.
> Container exited with a non-zero exit code -1073741515
> .Failing this attempt.. Failing the application.]
> {code}
> While Seeing at Resourcemanager log:
> {code}
> 15/04/15 12:56:15 ERROR scheduler.SchedulerApplicationAttempt: Error trying to a
> ssign container token and NM token to an allocated container container_142908227
> 1173_0001_01_000001
> java.lang.IllegalArgumentException: java.net.UnknownHostException: MasterNode
>         at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUti
> l.java:373)
>         at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(Bu
> ilderUtils.java:247)
>         at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTok
> enSecretManager.createContainerToken(RMContainerTokenSecretManager.java:199)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerAppl
> icationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttem
> pt.java:425)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.F
> iCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:248)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Capa
> cityScheduler.allocate(CapacityScheduler.java:736)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
> mptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:816)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
> mptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:809)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.
> doTransition(StateMachineFactory.java:385)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMa
> chineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMach
> ineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine
> .doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
> mptImpl.handle(RMAppAttemptImpl.java:649)
>         at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
> mptImpl.handle(RMAppAttemptImpl.java:104)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$Applica
> tionAttemptEventDispatcher.handle(ResourceManager.java:761)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$Applica
> tionAttemptEventDispatcher.handle(ResourceManager.java:742)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher
> .java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.ja
> va:106)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.net.UnknownHostException: MasterNode
>         ... 19 more
> {code}
> Problem might be while connecting to nodemanager it unable to handshake with ResourceManager.
> If I try in single node hadoop cluster mean It working correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)