You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rong-en Fan <gr...@gmail.com> on 2008/06/30 13:30:27 UTC

DataXceiver: java.io.IOException: Connection reset by peer

Hi,

I'm using Hadoop 0.17.1 with HBase trunk, and notice lots of exception
in hadoop's log (it's a 3-node hdfs):

2008-06-30 19:27:45,760 ERROR org.apache.hadoop.dfs.DataNode: 192.168.23.1:500
10:DataXceiver: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
        at sun.nio.ch.IOUtil.write(IOUtil.java:75)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
        at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
        at java.lang.Thread.run(Thread.java:619)

It seems to me that the datanode can not handle the incoming traffic.
If so, what parameters in hadoop sire and/or in os (I'm using rhel 4) that
I can play with?

Thanks,
Rong-En Fan

Re: DataXceiver: java.io.IOException: Connection reset by peer

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
The difference is that same client behaviour results in different 
exception in 0.16 and 0.17 because of change to use NIO sockets. The 
current code ignores "SocketException". But with nio sockets, we just 
get an IOException. I will file a jira to avoid these error messages.

For now these can be ignored.

Raghu.

Brian Karlak wrote:
> 
>> 2008-06-30 19:27:45,760 ERROR org.apache.hadoop.dfs.DataNode: 
>> 192.168.23.1:500
>> 10:DataXceiver: java.io.IOException: Connection reset by peer
> 
> 
> Hello All --
> 
> We also see this behavior.  The Hadoop infrastructure appears to handle 
> these exceptions, in so much as the jobs still complete normally, but it 
> is disconcerting to see so many exceptions popping up in the logs.
> 
> This behavior appears to have started as soon as we upgraded to 0.17.0.  
> It is still occurring in yesterday's 0.17.1 release.  I have not been 
> able to reproduce it in the 0.16.4 or 0.16.3 releases.
> 
> I'm a bit of a noob, but I wonder it it is possibly related to 
> HADOOP-2346, the introduction of timeouts on socket writes?  Are there 
> any parameters to alter the timeout behavior? Or is the timeout hardcoded?
> 
> We are also investigating HADOOP-3051 as a possible factor, considering 
> that the base exception is being raised in the sun.nio.ch package.
> 
> This issue is consistent and reproducible in both of our clusters.  it 
> appears to occur with high I/O load jobs.  For instance, it occurs on 
> both our current production cluster as well as our the new 3-node 
> cluster whenever we run the "sort" test in the example jobs.  It does 
> NOT occur when running the "pi" test.
> 
> Any clues or leads would be most appreciated.
> 
> Thanks,
> Brian
> 
> On Jun 30, 2008, at 4:30 AM, Rong-en Fan wrote:
> 
>> Hi,
>>
>> I'm using Hadoop 0.17.1 with HBase trunk, and notice lots of exception
>> in hadoop's log (it's a 3-node hdfs):
>>
>> 2008-06-30 19:27:45,760 ERROR org.apache.hadoop.dfs.DataNode: 
>> 192.168.23.1:500
>> 10:DataXceiver: java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcher.write0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>>        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>>        at sun.nio.ch.IOUtil.write(IOUtil.java:75)
>>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>>        at 
>> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53) 
>>
>>        at 
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) 
>>
>>        at 
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144) 
>>
>>        at 
>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105) 
>>
>>        at 
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>        at 
>> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
>>        at 
>> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
>>        at 
>> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
>>        at 
>> org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> It seems to me that the datanode can not handle the incoming traffic.
>> If so, what parameters in hadoop sire and/or in os (I'm using rhel 4) 
>> that
>> I can play with?
>>
>> Thanks,
>> Rong-En Fan
> 
> 


Re: DataXceiver: java.io.IOException: Connection reset by peer

Posted by Brian Karlak <ze...@metaweb.com>.
> 2008-06-30 19:27:45,760 ERROR org.apache.hadoop.dfs.DataNode:  
> 192.168.23.1:500
> 10:DataXceiver: java.io.IOException: Connection reset by peer


Hello All --

We also see this behavior.  The Hadoop infrastructure appears to  
handle these exceptions, in so much as the jobs still complete  
normally, but it is disconcerting to see so many exceptions popping up  
in the logs.

This behavior appears to have started as soon as we upgraded to  
0.17.0.  It is still occurring in yesterday's 0.17.1 release.  I have  
not been able to reproduce it in the 0.16.4 or 0.16.3 releases.

I'm a bit of a noob, but I wonder it it is possibly related to  
HADOOP-2346, the introduction of timeouts on socket writes?  Are there  
any parameters to alter the timeout behavior? Or is the timeout  
hardcoded?

We are also investigating HADOOP-3051 as a possible factor,  
considering that the base exception is being raised in the sun.nio.ch  
package.

This issue is consistent and reproducible in both of our clusters.  it  
appears to occur with high I/O load jobs.  For instance, it occurs on  
both our current production cluster as well as our the new 3-node  
cluster whenever we run the "sort" test in the example jobs.  It does  
NOT occur when running the "pi" test.

Any clues or leads would be most appreciated.

Thanks,
Brian

On Jun 30, 2008, at 4:30 AM, Rong-en Fan wrote:

> Hi,
>
> I'm using Hadoop 0.17.1 with HBase trunk, and notice lots of exception
> in hadoop's log (it's a 3-node hdfs):
>
> 2008-06-30 19:27:45,760 ERROR org.apache.hadoop.dfs.DataNode:  
> 192.168.23.1:500
> 10:DataXceiver: java.io.IOException: Connection reset by peer
>        at sun.nio.ch.FileDispatcher.write0(Native Method)
>        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>        at sun.nio.ch.IOUtil.write(IOUtil.java:75)
>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java: 
> 334)
>        at org.apache.hadoop.net.SocketOutputStream 
> $Writer.performIO(SocketOutputStream.java:53)
>        at  
> org 
> .apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java: 
> 140)
>        at  
> org 
> .apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java: 
> 144)
>        at  
> org 
> .apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java: 
> 105)
>        at  
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at org.apache.hadoop.dfs.DataNode 
> $BlockSender.sendChunks(DataNode.java:1774)
>        at org.apache.hadoop.dfs.DataNode 
> $BlockSender.sendBlock(DataNode.java:1813)
>        at org.apache.hadoop.dfs.DataNode 
> $DataXceiver.readBlock(DataNode.java:1039)
>        at org.apache.hadoop.dfs.DataNode 
> $DataXceiver.run(DataNode.java:968)
>        at java.lang.Thread.run(Thread.java:619)
>
> It seems to me that the datanode can not handle the incoming traffic.
> If so, what parameters in hadoop sire and/or in os (I'm using rhel  
> 4) that
> I can play with?
>
> Thanks,
> Rong-En Fan


Re: DataXceiver: java.io.IOException: Connection reset by peer

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Raghu Angadi wrote:
> 
> This is mostly harmless. We should remove these message. What mostly 
> happened is that client opens a file to read x bytes and closes the 
> connection after reading x bytes. But datanode does not know that (while 
> using normal read() interface).
> 
> Please file a jira to get rid of this message. It is really confusing to 
> the user and pollutes the log file.

Filed HADOOP-3678.

Raghu.

Re: DataXceiver: java.io.IOException: Connection reset by peer

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
This is mostly harmless. We should remove these message. What mostly 
happened is that client opens a file to read x bytes and closes the 
connection after reading x bytes. But datanode does not know that (while 
using normal read() interface).

Please file a jira to get rid of this message. It is really confusing to 
the user and pollutes the log file.

Raghu.

Rong-en Fan wrote:
> Hi,
> 
> I'm using Hadoop 0.17.1 with HBase trunk, and notice lots of exception
> in hadoop's log (it's a 3-node hdfs):
> 
> 2008-06-30 19:27:45,760 ERROR org.apache.hadoop.dfs.DataNode: 192.168.23.1:500
> 10:DataXceiver: java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.write0(Native Method)
>         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:75)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>         at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
>         at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
>         at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
>         at java.lang.Thread.run(Thread.java:619)
> 
> It seems to me that the datanode can not handle the incoming traffic.
> If so, what parameters in hadoop sire and/or in os (I'm using rhel 4) that
> I can play with?
> 
> Thanks,
> Rong-En Fan