You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brandon Dimcheff <bd...@wieldim.com> on 2008/12/16 18:49:07 UTC
ReduceTask can't connect to itself
I'm having some trouble on one node of a 5-node cluster. I can
successfully run maps on all of them, but the reduce phase always
stalls on one particular host. It throws a connection refused
exception when attempting to connect to itself to get the data from
the map outputs. The only difference between host5 and the other
hosts that I can see is that on host5, its hostname resolves to
127.0.0.1 instead of its external IP address. I can't imagine that
should prevent it from connecting to itself, however. Has anyone else
had a similar problem? Is there a document somewhere that indicates
the requirements for host name resolution for nodes in a cluster?
Thanks,
Brandon
snippet of log of the reduce failing to copy data from itself on
host5.test:
2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200812161640_0003_r_000004_0: Got 2 new map-outputs & number
of known map outputs is 2
2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200812161640_0003_r_000004_0 Scheduled 1 of 2 known outputs (0
slow hosts and 1 dup hosts)
2008-12-16 12:25:12,533 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_200812161640_0003_r_000004_0 copy failed:
attempt_200812161640_0003_m_000003_0 from host5.test
2008-12-16 12:25:12,534 WARN org.apache.hadoop.mapred.ReduceTask:
java.net.ConnectException: Connection refused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun
.reflect
.NativeConstructorAccessorImpl
.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun
.reflect
.DelegatingConstructorAccessorImpl
.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection
$6.run(HttpURLConnection.java:1296)
at java.security.AccessController.doPrivileged(Native Method)
at
sun
.net
.www
.protocol
.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1290)
at
sun
.net
.www
.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:
944)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.getInputStream(ReduceTask.java:1143)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.getMapOutput(ReduceTask.java:1084)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.copyOutput(ReduceTask.java:997)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.run(ReduceTask.java:946)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:519)
at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
at sun.net.www.http.HttpClient.New(HttpClient.java:306)
at sun.net.www.http.HttpClient.New(HttpClient.java:323)
at
sun
.net
.www
.protocol
.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
at
sun
.net
.www
.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:
729)
at
sun
.net
.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
at
sun
.net
.www
.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:
977)
Re: ReduceTask can't connect to itself
Posted by Brandon Dimcheff <bd...@wieldim.com>.
Silly me... my processes were only bound to my external IPs. :-/
On Dec 16, 2008, at 12:49, Brandon Dimcheff wrote:
> I'm having some trouble on one node of a 5-node cluster. I can
> successfully run maps on all of them, but the reduce phase always
> stalls on one particular host. It throws a connection refused
> exception when attempting to connect to itself to get the data from
> the map outputs. The only difference between host5 and the other
> hosts that I can see is that on host5, its hostname resolves to
> 127.0.0.1 instead of its external IP address. I can't imagine that
> should prevent it from connecting to itself, however. Has anyone
> else had a similar problem? Is there a document somewhere that
> indicates the requirements for host name resolution for nodes in a
> cluster?
>
> Thanks,
> Brandon
>
> snippet of log of the reduce failing to copy data from itself on
> host5.test:
>
> 2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200812161640_0003_r_000004_0: Got 2 new map-outputs & number
> of known map outputs is 2
> 2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_200812161640_0003_r_000004_0 Scheduled 1 of 2 known outputs
> (0 slow hosts and 1 dup hosts)
> 2008-12-16 12:25:12,533 WARN org.apache.hadoop.mapred.ReduceTask:
> attempt_200812161640_0003_r_000004_0 copy failed:
> attempt_200812161640_0003_m_000003_0 from host5.test
> 2008-12-16 12:25:12,534 WARN org.apache.hadoop.mapred.ReduceTask:
> java.net.ConnectException: Connection refused
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun
> .reflect
> .NativeConstructorAccessorImpl
> .newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun
> .reflect
> .DelegatingConstructorAccessorImpl
> .newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at sun.net.www.protocol.http.HttpURLConnection
> $6.run(HttpURLConnection.java:1296)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> sun
> .net
> .www
> .protocol
> .http.HttpURLConnection.getChainedException(HttpURLConnection.java:
> 1290)
> at
> sun
> .net
> .www
> .protocol
> .http.HttpURLConnection.getInputStream(HttpURLConnection.java:944)
> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $MapOutputCopier.getInputStream(ReduceTask.java:1143)
> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $MapOutputCopier.getMapOutput(ReduceTask.java:1084)
> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $MapOutputCopier.copyOutput(ReduceTask.java:997)
> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> $MapOutputCopier.run(ReduceTask.java:946)
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:
> 195)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> at java.net.Socket.connect(Socket.java:519)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
> at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
> at sun.net.www.http.HttpClient.New(HttpClient.java:306)
> at sun.net.www.http.HttpClient.New(HttpClient.java:323)
> at
> sun
> .net
> .www
> .protocol
> .http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
> at
> sun
> .net
> .www
> .protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:
> 729)
> at
> sun
> .net
> .www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:
> 654)
> at
> sun
> .net
> .www
> .protocol
> .http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)