You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/01/07 23:04:37 UTC

[jira] [Resolved] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

     [ https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hitesh Shah resolved TEZ-1924.
------------------------------
       Resolution: Fixed
    Fix Version/s: 0.5.4

Committed to master, branch 0.5 and branch 0.6. Thanks for your contribution [~ivanmi]

> Tez AM does not register with AM with full FQDN causing jobs to fail in some environments
> -----------------------------------------------------------------------------------------
>
>                 Key: TEZ-1924
>                 URL: https://issues.apache.org/jira/browse/TEZ-1924
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.2
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>             Fix For: 0.5.4
>
>         Attachments: TEZ-1924.2.patch, TEZ-20.patch
>
>
> Issue originally reported by [~Karam Singh].
> All OrderWordCount, WordCount and Tez tests faultTolerance system tests failed due to java.net.UnknownHostException
> Interesting other tez examples such as mrrsleep, randomwriter, randomtextwriter, sort, join_inner, join_outer, terasort, groupbyorderbymrrtest ran fine
> one such example is following
> {code}
> RUNNING: /usr/lib/hadoop/bin/hadoop jar /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount "-DUSE_TEZ_SESSION=true" "-Dmapreduce.map.memory.mb=2048" "-Dtez.am.shuffle-vertex-manager.max-src-fraction=0" "-Dmapreduce.reduce.memory.mb=2048" "-Dmapreduce.framework.name=yarn-tez" "-Dtez.am.container.reuse.enabled=false" "-Dtez.am.log.level=DEBUG" "-Dmapreduce.map.java.opts=-Xmx1024m" "-Dtez.am.shuffle-vertex-manager.min-src-fraction=0" "-Dmapreduce.job.reduce.slowstart.completedmaps=0.01" "-Dmapreduce.reduce.java.opts=-Xmx1024m" "-Dtez.am.container.session.delay-allocation-millis=120000" /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 -generateSplitsInClient true
> 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
> 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
> 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
> 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
> 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
> 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics system started
> 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging directory wasb://humb-tez1@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016 are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx------
> 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
> 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: http://0.0.0.0:8188/ws/v1/timeline/
> 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
> 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
> 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application application_1418977790315_0016
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, outputPath=/user/hrt_qa/Tez_CROutput_1
> 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, splitsDir=wasb://humb-tez1@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
> 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 20
> 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to get into ready state
> 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via proxy
> org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "workernode1":59575; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
> 	at org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
> 	at org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
> 	at org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> 	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> 	at org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "workernode1":59575; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> 	at com.sun.proxy.$Proxy24.getAMStatus(Unknown Source)
> 	at org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:337)
> 	... 14 more
> Caused by: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "workernode1":59575; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
> 	at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:400)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	... 16 more
> Caused by: java.net.UnknownHostException
> 	... 21 more
> ....................
> ....................
> Caused by: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "workernode1":59575; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
> 	at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:742)
> 	at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:400)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1452)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	... 16 more
> Caused by: java.net.UnknownHostException
> 	... 21 more
> 14/12/19 09:25:19 ERROR examples.OrderedWordCount: Error occurred when submitting/running DAGs
> java.lang.RuntimeException: TezSession has already shutdown
> 	at org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:540)
> 	at org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> 	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> 	at org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/12/19 09:25:19 INFO examples.OrderedWordCount: Shutting down session
> 14/12/19 09:25:19 INFO client.TezSession: Shutting down Tez Session, sessionName=OrderedWordCountSession, applicationId=application_1418977790315_0016
> 14/12/19 09:25:19 INFO client.TezSession: Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1418977790315_0016, yarnApplicationState=FINISHED, finalApplicationStatus=SUCCEEDED, trackingUrl=http://headnode0.humb-tez1-ssh.d5.internal.cloudapp.net:8088/proxy/application_1418977790315_0016/A
> 	at org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
> 	at org.apache.tez.client.TezSession.stop(TezSession.java:281)
> 	at org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:524)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> 	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> 	at org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/12/19 09:25:19 INFO client.TezSession: Could not connect to AM, killing session via YARN, sessionName=OrderedWordCountSession, applicationId=application_1418977790315_0016
> 14/12/19 09:25:19 INFO impl.YarnClientImpl: Killed application application_1418977790315_0016
> java.lang.RuntimeException: TezSession has already shutdown
> 	at org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:540)
> 	at org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> 	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> 	at org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code} 
> Contents of /etc/hosts are:
> {code}
> 127.0.0.1 localhost
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> {code}
> and contents of resolv.conf are:
> {code}
> # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
> #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
> nameserver 168.63.129.16
> search humb-tez1-ssh.d5.internal.cloudapp.net
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)