You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Gäde, Sebastian <s1...@hft-leipzig.de> on 2014/05/13 12:52:57 UTC

all tasks failing for MR job on Hadoop 2.4

Hi,

I've set up a Hadoop 2.4 cluster with three nodes. Namenode and Resourcemanager are running on one node, Datanodes and Nodemanagers on the other two. All services are starting up without problems (as far as I can see), web apps show all nodes as running.

However, I am not able to run MapReduce jobs:
yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
submits the job, it appears in the web app, but state is stuck in ACCEPTED. Instead I'm receiving messages:

14/05/13 12:15:48 INFO mapreduce.Job: Task Id : attempt_1399971492349_0004_m_000000_0, Status : FAILED
14/05/13 12:15:48 INFO mapreduce.Job: Task Id : attempt_1399971492349_0004_m_000001_0, Status : FAILED


the log shows:

2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir;  Ignoring.
2014-05-13 12:15:27,896 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-05-13 12:15:28,146 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-05-13 12:15:28,146 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1399971492349_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address change detected. Old: localhost/127.0.1.1:41395 New: localhost/127.0.0.1:41395
2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on connection exception: java.net.ConnectException: Verbindungsaufbau abgelehnt; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
	at org.apache.hadoop.ipc.Client.call(Client.java:1414)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
	at com.sun.proxy.$Proxy9.getTask(Unknown Source)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
	... 4 more

Not sure about
a) the 90 seconds break between 12:13 - 12:15. I think I'm running into some kind of timeout, but I don't know how to find out what the system is doing during that time.
b) the localhost:41395. I cannot find a deamon listening using netstat. I suppose this is some kind of local IPC deamon which is also affected by a timeout?

Any ideas?

Cheers
Seb.

Re: all tasks failing for MR job on Hadoop 2.4

Posted by Sebastian Gäde <s1...@hft-leipzig.de>.
Thanks for your feedback. No 'localhost' in the conf files...

However, as I'm not relying on DNS but on the hosts files on the nodes, 
I found that there was a missing entry on one node for its hostname 
pointing to its own IP address. Since I fixed that, MR jobs are working 
fine. :-)

Cheers
Seb.

Am 16.05.2014 04:07, schrieb Stanley Shi:
> please check you configuration files, are there anywhere mentioning
> "localhost"? "localhost" should not be used if you are deploying an
> distributed cluster.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s116102@hft-leipzig.de
> <ma...@hft-leipzig.de>> wrote:
>
>     Hi,
>
>     I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
>     Resourcemanager are running on one node, Datanodes and Nodemanagers
>     on the other two. All services are starting up without problems (as
>     far as I can see), web apps show all nodes as running.
>
>     However, I am not able to run MapReduce jobs:
>     yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
>     submits the job, it appears in the web app, but state is stuck in
>     ACCEPTED. Instead I'm receiving messages:
>
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000000_0, Status : FAILED
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
>     the log shows:
>
>     2014-05-13 12:13:56,702 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to override
>     final parameter: mapreduce.cluster.temp.dir;  Ignoring.
>     2014-05-13 12:15:27,896 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
>     from hadoop-metrics2.properties
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
>     snapshot period at 10 second(s).
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics
>     system started
>     2014-05-13 12:15:28,185 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Executing with tokens:
>     2014-05-13 12:15:28,192 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service:
>     job_1399971492349_0004, Ident:
>     (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
>     2014-05-13 12:15:28,453 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying
>     again. Got null now.
>     2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client:
>     Address change detected. Old: localhost/127.0.1.1:41395
>     <http://127.0.1.1:41395> New: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>
>     2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 0 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 1 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 2 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 3 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 4 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 5 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 6 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 7 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 8 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 9 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,675 WARN [main]
>     org.apache.hadoop.mapred.YarnChild: Exception running child :
>     java.net.ConnectException: Call From
>     hd-slave-172.ffm.telekom.de/164.26.155.172
>     <http://hd-slave-172.ffm.telekom.de/164.26.155.172> to
>     localhost:41395 failed on connection exception:
>     java.net.ConnectException: Verbindungsaufbau abgelehnt; For more
>     details see: http://wiki.apache.org/hadoop/ConnectionRefused
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>              at
>     sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>              at
>     java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>              at
>     org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>              at
>     org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>              at
>     org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>              at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
>     Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>              at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>              at
>     sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>              at
>     org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>              at
>     org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>              at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>              ... 4 more
>
>     Not sure about
>     a) the 90 seconds break between 12:13 - 12:15. I think I'm running
>     into some kind of timeout, but I don't know how to find out what the
>     system is doing during that time.
>     b) the localhost:41395. I cannot find a deamon listening using
>     netstat. I suppose this is some kind of local IPC deamon which is
>     also affected by a timeout?
>
>     Any ideas?
>
>     Cheers
>     Seb.
>
>

Re: all tasks failing for MR job on Hadoop 2.4

Posted by Sebastian Gäde <s1...@hft-leipzig.de>.
Thanks for your feedback. No 'localhost' in the conf files...

However, as I'm not relying on DNS but on the hosts files on the nodes, 
I found that there was a missing entry on one node for its hostname 
pointing to its own IP address. Since I fixed that, MR jobs are working 
fine. :-)

Cheers
Seb.

Am 16.05.2014 04:07, schrieb Stanley Shi:
> please check you configuration files, are there anywhere mentioning
> "localhost"? "localhost" should not be used if you are deploying an
> distributed cluster.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s116102@hft-leipzig.de
> <ma...@hft-leipzig.de>> wrote:
>
>     Hi,
>
>     I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
>     Resourcemanager are running on one node, Datanodes and Nodemanagers
>     on the other two. All services are starting up without problems (as
>     far as I can see), web apps show all nodes as running.
>
>     However, I am not able to run MapReduce jobs:
>     yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
>     submits the job, it appears in the web app, but state is stuck in
>     ACCEPTED. Instead I'm receiving messages:
>
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000000_0, Status : FAILED
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
>     the log shows:
>
>     2014-05-13 12:13:56,702 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to override
>     final parameter: mapreduce.cluster.temp.dir;  Ignoring.
>     2014-05-13 12:15:27,896 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
>     from hadoop-metrics2.properties
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
>     snapshot period at 10 second(s).
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics
>     system started
>     2014-05-13 12:15:28,185 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Executing with tokens:
>     2014-05-13 12:15:28,192 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service:
>     job_1399971492349_0004, Ident:
>     (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
>     2014-05-13 12:15:28,453 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying
>     again. Got null now.
>     2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client:
>     Address change detected. Old: localhost/127.0.1.1:41395
>     <http://127.0.1.1:41395> New: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>
>     2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 0 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 1 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 2 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 3 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 4 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 5 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 6 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 7 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 8 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 9 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,675 WARN [main]
>     org.apache.hadoop.mapred.YarnChild: Exception running child :
>     java.net.ConnectException: Call From
>     hd-slave-172.ffm.telekom.de/164.26.155.172
>     <http://hd-slave-172.ffm.telekom.de/164.26.155.172> to
>     localhost:41395 failed on connection exception:
>     java.net.ConnectException: Verbindungsaufbau abgelehnt; For more
>     details see: http://wiki.apache.org/hadoop/ConnectionRefused
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>              at
>     sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>              at
>     java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>              at
>     org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>              at
>     org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>              at
>     org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>              at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
>     Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>              at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>              at
>     sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>              at
>     org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>              at
>     org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>              at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>              ... 4 more
>
>     Not sure about
>     a) the 90 seconds break between 12:13 - 12:15. I think I'm running
>     into some kind of timeout, but I don't know how to find out what the
>     system is doing during that time.
>     b) the localhost:41395. I cannot find a deamon listening using
>     netstat. I suppose this is some kind of local IPC deamon which is
>     also affected by a timeout?
>
>     Any ideas?
>
>     Cheers
>     Seb.
>
>

Re: all tasks failing for MR job on Hadoop 2.4

Posted by Sebastian Gäde <s1...@hft-leipzig.de>.
Thanks for your feedback. No 'localhost' in the conf files...

However, as I'm not relying on DNS but on the hosts files on the nodes, 
I found that there was a missing entry on one node for its hostname 
pointing to its own IP address. Since I fixed that, MR jobs are working 
fine. :-)

Cheers
Seb.

Am 16.05.2014 04:07, schrieb Stanley Shi:
> please check you configuration files, are there anywhere mentioning
> "localhost"? "localhost" should not be used if you are deploying an
> distributed cluster.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s116102@hft-leipzig.de
> <ma...@hft-leipzig.de>> wrote:
>
>     Hi,
>
>     I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
>     Resourcemanager are running on one node, Datanodes and Nodemanagers
>     on the other two. All services are starting up without problems (as
>     far as I can see), web apps show all nodes as running.
>
>     However, I am not able to run MapReduce jobs:
>     yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
>     submits the job, it appears in the web app, but state is stuck in
>     ACCEPTED. Instead I'm receiving messages:
>
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000000_0, Status : FAILED
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
>     the log shows:
>
>     2014-05-13 12:13:56,702 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to override
>     final parameter: mapreduce.cluster.temp.dir;  Ignoring.
>     2014-05-13 12:15:27,896 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
>     from hadoop-metrics2.properties
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
>     snapshot period at 10 second(s).
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics
>     system started
>     2014-05-13 12:15:28,185 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Executing with tokens:
>     2014-05-13 12:15:28,192 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service:
>     job_1399971492349_0004, Ident:
>     (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
>     2014-05-13 12:15:28,453 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying
>     again. Got null now.
>     2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client:
>     Address change detected. Old: localhost/127.0.1.1:41395
>     <http://127.0.1.1:41395> New: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>
>     2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 0 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 1 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 2 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 3 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 4 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 5 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 6 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 7 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 8 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 9 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,675 WARN [main]
>     org.apache.hadoop.mapred.YarnChild: Exception running child :
>     java.net.ConnectException: Call From
>     hd-slave-172.ffm.telekom.de/164.26.155.172
>     <http://hd-slave-172.ffm.telekom.de/164.26.155.172> to
>     localhost:41395 failed on connection exception:
>     java.net.ConnectException: Verbindungsaufbau abgelehnt; For more
>     details see: http://wiki.apache.org/hadoop/ConnectionRefused
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>              at
>     sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>              at
>     java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>              at
>     org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>              at
>     org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>              at
>     org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>              at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
>     Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>              at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>              at
>     sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>              at
>     org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>              at
>     org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>              at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>              ... 4 more
>
>     Not sure about
>     a) the 90 seconds break between 12:13 - 12:15. I think I'm running
>     into some kind of timeout, but I don't know how to find out what the
>     system is doing during that time.
>     b) the localhost:41395. I cannot find a deamon listening using
>     netstat. I suppose this is some kind of local IPC deamon which is
>     also affected by a timeout?
>
>     Any ideas?
>
>     Cheers
>     Seb.
>
>

Re: all tasks failing for MR job on Hadoop 2.4

Posted by Sebastian Gäde <s1...@hft-leipzig.de>.
Thanks for your feedback. No 'localhost' in the conf files...

However, as I'm not relying on DNS but on the hosts files on the nodes, 
I found that there was a missing entry on one node for its hostname 
pointing to its own IP address. Since I fixed that, MR jobs are working 
fine. :-)

Cheers
Seb.

Am 16.05.2014 04:07, schrieb Stanley Shi:
> please check you configuration files, are there anywhere mentioning
> "localhost"? "localhost" should not be used if you are deploying an
> distributed cluster.
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s116102@hft-leipzig.de
> <ma...@hft-leipzig.de>> wrote:
>
>     Hi,
>
>     I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
>     Resourcemanager are running on one node, Datanodes and Nodemanagers
>     on the other two. All services are starting up without problems (as
>     far as I can see), web apps show all nodes as running.
>
>     However, I am not able to run MapReduce jobs:
>     yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
>     submits the job, it appears in the web app, but state is stuck in
>     ACCEPTED. Instead I'm receiving messages:
>
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000000_0, Status : FAILED
>     14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
>     attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
>     the log shows:
>
>     2014-05-13 12:13:56,702 WARN [main]
>     org.apache.hadoop.conf.Configuration: job.xml:an attempt to override
>     final parameter: mapreduce.cluster.temp.dir;  Ignoring.
>     2014-05-13 12:15:27,896 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties
>     from hadoop-metrics2.properties
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
>     snapshot period at 10 second(s).
>     2014-05-13 12:15:28,146 INFO [main]
>     org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics
>     system started
>     2014-05-13 12:15:28,185 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Executing with tokens:
>     2014-05-13 12:15:28,192 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service:
>     job_1399971492349_0004, Ident:
>     (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
>     2014-05-13 12:15:28,453 INFO [main]
>     org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying
>     again. Got null now.
>     2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client:
>     Address change detected. Old: localhost/127.0.1.1:41395
>     <http://127.0.1.1:41395> New: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>
>     2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 0 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 1 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 2 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 3 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 4 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 5 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 6 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 7 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 8 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client:
>     Retrying connect to server: localhost/127.0.0.1:41395
>     <http://127.0.0.1:41395>. Already tried 9 time(s); retry policy is
>     RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>     MILLISECONDS)
>     2014-05-13 12:15:38,675 WARN [main]
>     org.apache.hadoop.mapred.YarnChild: Exception running child :
>     java.net.ConnectException: Call From
>     hd-slave-172.ffm.telekom.de/164.26.155.172
>     <http://hd-slave-172.ffm.telekom.de/164.26.155.172> to
>     localhost:41395 failed on connection exception:
>     java.net.ConnectException: Verbindungsaufbau abgelehnt; For more
>     details see: http://wiki.apache.org/hadoop/ConnectionRefused
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>              at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>              at
>     sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>              at
>     java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>              at
>     org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>              at
>     org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>              at
>     org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>              at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>              at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
>     Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>              at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>              at
>     sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>              at
>     org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>              at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>              at
>     org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>              at
>     org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>              at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>              at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>              ... 4 more
>
>     Not sure about
>     a) the 90 seconds break between 12:13 - 12:15. I think I'm running
>     into some kind of timeout, but I don't know how to find out what the
>     system is doing during that time.
>     b) the localhost:41395. I cannot find a deamon listening using
>     netstat. I suppose this is some kind of local IPC deamon which is
>     also affected by a timeout?
>
>     Any ideas?
>
>     Cheers
>     Seb.
>
>

Re: all tasks failing for MR job on Hadoop 2.4

Posted by Stanley Shi <ss...@gopivotal.com>.
please check you configuration files, are there anywhere mentioning
"localhost"? "localhost" should not be used if you are deploying an
distributed cluster.

Regards,
*Stanley Shi,*



On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s1...@hft-leipzig.de>wrote:

> Hi,
>
> I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
> Resourcemanager are running on one node, Datanodes and Nodemanagers on the
> other two. All services are starting up without problems (as far as I can
> see), web apps show all nodes as running.
>
> However, I am not able to run MapReduce jobs:
> yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
> submits the job, it appears in the web app, but state is stuck in
> ACCEPTED. Instead I'm receiving messages:
>
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000000_0, Status : FAILED
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
> the log shows:
>
> 2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration:
> job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir;
>  Ignoring.
> 2014-05-13 12:15:27,896 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
> started
> 2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Executing with tokens:
> 2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Kind: mapreduce.job, Service: job_1399971492349_0004, Ident:
> (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
> 2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Sleeping for 0ms before retrying again. Got null now.
> 2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address
> change detected. Old: localhost/127.0.1.1:41395 New: localhost/
> 127.0.0.1:41395
> 2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.net.ConnectException: Call From
> hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on
> connection exception: java.net.ConnectException: Verbindungsaufbau
> abgelehnt; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>         at
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>         at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
> Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>         at
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>         ... 4 more
>
> Not sure about
> a) the 90 seconds break between 12:13 - 12:15. I think I'm running into
> some kind of timeout, but I don't know how to find out what the system is
> doing during that time.
> b) the localhost:41395. I cannot find a deamon listening using netstat. I
> suppose this is some kind of local IPC deamon which is also affected by a
> timeout?
>
> Any ideas?
>
> Cheers
> Seb.

Re: all tasks failing for MR job on Hadoop 2.4

Posted by Stanley Shi <ss...@gopivotal.com>.
please check you configuration files, are there anywhere mentioning
"localhost"? "localhost" should not be used if you are deploying an
distributed cluster.

Regards,
*Stanley Shi,*



On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s1...@hft-leipzig.de>wrote:

> Hi,
>
> I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
> Resourcemanager are running on one node, Datanodes and Nodemanagers on the
> other two. All services are starting up without problems (as far as I can
> see), web apps show all nodes as running.
>
> However, I am not able to run MapReduce jobs:
> yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
> submits the job, it appears in the web app, but state is stuck in
> ACCEPTED. Instead I'm receiving messages:
>
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000000_0, Status : FAILED
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
> the log shows:
>
> 2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration:
> job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir;
>  Ignoring.
> 2014-05-13 12:15:27,896 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
> started
> 2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Executing with tokens:
> 2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Kind: mapreduce.job, Service: job_1399971492349_0004, Ident:
> (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
> 2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Sleeping for 0ms before retrying again. Got null now.
> 2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address
> change detected. Old: localhost/127.0.1.1:41395 New: localhost/
> 127.0.0.1:41395
> 2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.net.ConnectException: Call From
> hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on
> connection exception: java.net.ConnectException: Verbindungsaufbau
> abgelehnt; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>         at
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>         at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
> Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>         at
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>         ... 4 more
>
> Not sure about
> a) the 90 seconds break between 12:13 - 12:15. I think I'm running into
> some kind of timeout, but I don't know how to find out what the system is
> doing during that time.
> b) the localhost:41395. I cannot find a deamon listening using netstat. I
> suppose this is some kind of local IPC deamon which is also affected by a
> timeout?
>
> Any ideas?
>
> Cheers
> Seb.

Re: all tasks failing for MR job on Hadoop 2.4

Posted by Stanley Shi <ss...@gopivotal.com>.
please check you configuration files, are there anywhere mentioning
"localhost"? "localhost" should not be used if you are deploying an
distributed cluster.

Regards,
*Stanley Shi,*



On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s1...@hft-leipzig.de>wrote:

> Hi,
>
> I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
> Resourcemanager are running on one node, Datanodes and Nodemanagers on the
> other two. All services are starting up without problems (as far as I can
> see), web apps show all nodes as running.
>
> However, I am not able to run MapReduce jobs:
> yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
> submits the job, it appears in the web app, but state is stuck in
> ACCEPTED. Instead I'm receiving messages:
>
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000000_0, Status : FAILED
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
> the log shows:
>
> 2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration:
> job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir;
>  Ignoring.
> 2014-05-13 12:15:27,896 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
> started
> 2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Executing with tokens:
> 2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Kind: mapreduce.job, Service: job_1399971492349_0004, Ident:
> (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
> 2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Sleeping for 0ms before retrying again. Got null now.
> 2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address
> change detected. Old: localhost/127.0.1.1:41395 New: localhost/
> 127.0.0.1:41395
> 2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.net.ConnectException: Call From
> hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on
> connection exception: java.net.ConnectException: Verbindungsaufbau
> abgelehnt; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>         at
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>         at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
> Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>         at
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>         ... 4 more
>
> Not sure about
> a) the 90 seconds break between 12:13 - 12:15. I think I'm running into
> some kind of timeout, but I don't know how to find out what the system is
> doing during that time.
> b) the localhost:41395. I cannot find a deamon listening using netstat. I
> suppose this is some kind of local IPC deamon which is also affected by a
> timeout?
>
> Any ideas?
>
> Cheers
> Seb.

Re: all tasks failing for MR job on Hadoop 2.4

Posted by Stanley Shi <ss...@gopivotal.com>.
please check you configuration files, are there anywhere mentioning
"localhost"? "localhost" should not be used if you are deploying an
distributed cluster.

Regards,
*Stanley Shi,*



On Tue, May 13, 2014 at 6:52 PM, Gäde, Sebastian <s1...@hft-leipzig.de>wrote:

> Hi,
>
> I've set up a Hadoop 2.4 cluster with three nodes. Namenode and
> Resourcemanager are running on one node, Datanodes and Nodemanagers on the
> other two. All services are starting up without problems (as far as I can
> see), web apps show all nodes as running.
>
> However, I am not able to run MapReduce jobs:
> yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
> submits the job, it appears in the web app, but state is stuck in
> ACCEPTED. Instead I'm receiving messages:
>
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000000_0, Status : FAILED
> 14/05/13 12:15:48 INFO mapreduce.Job: Task Id :
> attempt_1399971492349_0004_m_000001_0, Status : FAILED
>
>
> the log shows:
>
> 2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration:
> job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir;
>  Ignoring.
> 2014-05-13 12:15:27,896 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2014-05-13 12:15:28,146 INFO [main]
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
> started
> 2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Executing with tokens:
> 2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Kind: mapreduce.job, Service: job_1399971492349_0004, Ident:
> (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
> 2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Sleeping for 0ms before retrying again. Got null now.
> 2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address
> change detected. Old: localhost/127.0.1.1:41395 New: localhost/
> 127.0.0.1:41395
> 2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1000 MILLISECONDS)
> 2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.net.ConnectException: Call From
> hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on
> connection exception: java.net.ConnectException: Verbindungsaufbau
> abgelehnt; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>         at
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
>         at com.sun.proxy.$Proxy9.getTask(Unknown Source)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
> Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
>         at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
>         at
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1381)
>         ... 4 more
>
> Not sure about
> a) the 90 seconds break between 12:13 - 12:15. I think I'm running into
> some kind of timeout, but I don't know how to find out what the system is
> doing during that time.
> b) the localhost:41395. I cannot find a deamon listening using netstat. I
> suppose this is some kind of local IPC deamon which is also affected by a
> timeout?
>
> Any ideas?
>
> Cheers
> Seb.