You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brandon Dimcheff <bd...@wieldim.com> on 2008/12/16 18:49:07 UTC

ReduceTask can't connect to itself

I'm having some trouble on one node of a 5-node cluster.  I can  
successfully run maps on all of them, but the reduce phase always  
stalls on one particular host.  It throws a connection refused  
exception when attempting to connect to itself to get the data from  
the map outputs.  The only difference between host5 and the other  
hosts that I can see is that on host5, its hostname resolves to  
127.0.0.1 instead of its external IP address.  I can't imagine that  
should prevent it from connecting to itself, however.  Has anyone else  
had a similar problem?  Is there a document somewhere that indicates  
the requirements for host name resolution for nodes in a cluster?

Thanks,
Brandon

snippet of log of the reduce failing to copy data from itself on  
host5.test:

2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask:  
attempt_200812161640_0003_r_000004_0: Got 2 new map-outputs & number  
of known map outputs is 2
2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask:  
attempt_200812161640_0003_r_000004_0 Scheduled 1 of 2 known outputs (0  
slow hosts and 1 dup hosts)
2008-12-16 12:25:12,533 WARN org.apache.hadoop.mapred.ReduceTask:  
attempt_200812161640_0003_r_000004_0 copy failed:  
attempt_200812161640_0003_m_000003_0 from host5.test
2008-12-16 12:25:12,534 WARN org.apache.hadoop.mapred.ReduceTask:  
java.net.ConnectException: Connection refused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native  
Method)
	at  
sun 
.reflect 
.NativeConstructorAccessorImpl 
.newInstance(NativeConstructorAccessorImpl.java:39)
	at  
sun 
.reflect 
.DelegatingConstructorAccessorImpl 
.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at sun.net.www.protocol.http.HttpURLConnection 
$6.run(HttpURLConnection.java:1296)
	at java.security.AccessController.doPrivileged(Native Method)
	at  
sun 
.net 
.www 
.protocol 
.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1290)
	at  
sun 
.net 
.www 
.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java: 
944)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.getInputStream(ReduceTask.java:1143)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.getMapOutput(ReduceTask.java:1084)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.copyOutput(ReduceTask.java:997)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.run(ReduceTask.java:946)
Caused by: java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
	at java.net.Socket.connect(Socket.java:519)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
	at sun.net.www.http.HttpClient.New(HttpClient.java:306)
	at sun.net.www.http.HttpClient.New(HttpClient.java:323)
	at  
sun 
.net 
.www 
.protocol 
.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
	at  
sun 
.net 
.www 
.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java: 
729)
	at  
sun 
.net 
.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
	at  
sun 
.net 
.www 
.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java: 
977)

Re: ReduceTask can't connect to itself

Posted by Brandon Dimcheff <bd...@wieldim.com>.
Silly me... my processes were only bound to my external IPs.  :-/

On Dec 16, 2008, at 12:49, Brandon Dimcheff wrote:

> I'm having some trouble on one node of a 5-node cluster.  I can  
> successfully run maps on all of them, but the reduce phase always  
> stalls on one particular host.  It throws a connection refused  
> exception when attempting to connect to itself to get the data from  
> the map outputs.  The only difference between host5 and the other  
> hosts that I can see is that on host5, its hostname resolves to  
> 127.0.0.1 instead of its external IP address.  I can't imagine that  
> should prevent it from connecting to itself, however.  Has anyone  
> else had a similar problem?  Is there a document somewhere that  
> indicates the requirements for host name resolution for nodes in a  
> cluster?
>
> Thanks,
> Brandon
>
> snippet of log of the reduce failing to copy data from itself on  
> host5.test:
>
> 2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask:  
> attempt_200812161640_0003_r_000004_0: Got 2 new map-outputs & number  
> of known map outputs is 2
> 2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask:  
> attempt_200812161640_0003_r_000004_0 Scheduled 1 of 2 known outputs  
> (0 slow hosts and 1 dup hosts)
> 2008-12-16 12:25:12,533 WARN org.apache.hadoop.mapred.ReduceTask:  
> attempt_200812161640_0003_r_000004_0 copy failed:  
> attempt_200812161640_0003_m_000003_0 from host5.test
> 2008-12-16 12:25:12,534 WARN org.apache.hadoop.mapred.ReduceTask:  
> java.net.ConnectException: Connection refused
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native  
> Method)
> 	at  
> sun 
> .reflect 
> .NativeConstructorAccessorImpl 
> .newInstance(NativeConstructorAccessorImpl.java:39)
> 	at  
> sun 
> .reflect 
> .DelegatingConstructorAccessorImpl 
> .newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at sun.net.www.protocol.http.HttpURLConnection 
> $6.run(HttpURLConnection.java:1296)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at  
> sun 
> .net 
> .www 
> .protocol 
> .http.HttpURLConnection.getChainedException(HttpURLConnection.java: 
> 1290)
> 	at  
> sun 
> .net 
> .www 
> .protocol 
> .http.HttpURLConnection.getInputStream(HttpURLConnection.java:944)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $MapOutputCopier.getInputStream(ReduceTask.java:1143)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $MapOutputCopier.getMapOutput(ReduceTask.java:1084)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $MapOutputCopier.copyOutput(ReduceTask.java:997)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $MapOutputCopier.run(ReduceTask.java:946)
> Caused by: java.net.ConnectException: Connection refused
> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
> 	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> 	at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java: 
> 195)
> 	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> 	at java.net.Socket.connect(Socket.java:519)
> 	at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
> 	at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
> 	at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
> 	at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
> 	at sun.net.www.http.HttpClient.New(HttpClient.java:306)
> 	at sun.net.www.http.HttpClient.New(HttpClient.java:323)
> 	at  
> sun 
> .net 
> .www 
> .protocol 
> .http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
> 	at  
> sun 
> .net 
> .www 
> .protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java: 
> 729)
> 	at  
> sun 
> .net 
> .www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java: 
> 654)
> 	at  
> sun 
> .net 
> .www 
> .protocol 
> .http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)