You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Ferdy Galema <fe...@kalooga.com> on 2011/04/12 19:37:40 UTC

socket timeouts, dropped packages

Hi,

We're running into issues were we are seeing timeouts when 
writing/reading a lot of hdfs data. (Hadoop is version CDH4B3 and hdfs 
appending is enabled). The type of exceptions vary a lot, but most of 
the times it's whenever a DFSClient writes data into the datanodes 
pipeline.

For example, one datanode logs "Exception in receiveBlock for block 
blk_5476601577216704980_62953994 java.io.EOFException: while trying to 
read 65557 bytes" and the other side logs "writeBlock 
blk_5476601577216704980_62953994 received exception 
java.net.SocketTimeoutException: Read timed out". That's it.

We cannot seem to determine the exact problem. The read timeout is 
default (60 sec). The open files limit and the number of xceivers is 
upped a lot. A full GC never takes longer than a second.

However, we are seeing a lot of dropped packages on the networking 
interface. Could these problems be related?

Any advice will be helpful.

Ferdy.

Re: socket timeouts, dropped packages

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Please keep Cloudera issues off this list.

On Apr 13, 2011, at 12:22 PM, Eli Collins wrote:

> Hey Ferdy,
>
> If you're seeing this after bumping fs.datanode.max.xcievers and the
> nfiles ulimit, and you're also seeing dropped packets it sounds like
> you're having networking issues.
>
> See the following as well:
> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/1d3a377bd605e1bd/d3d8ec0d14c065bb? 
> #d3d8ec0d14c065bb
>
> Thanks,
> Eli
>
> On Tue, Apr 12, 2011 at 10:37 AM, Ferdy Galema <ferdy.galema@kalooga.com 
> > wrote:
>> Hi,
>>
>> We're running into issues were we are seeing timeouts when writing/ 
>> reading a
>> lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is  
>> enabled).
>> The type of exceptions vary a lot, but most of the times it's  
>> whenever a
>> DFSClient writes data into the datanodes pipeline.
>>
>> For example, one datanode logs "Exception in receiveBlock for block
>> blk_5476601577216704980_62953994 java.io.EOFException: while trying  
>> to read
>> 65557 bytes" and the other side logs "writeBlock
>> blk_5476601577216704980_62953994 received exception
>> java.net.SocketTimeoutException: Read timed out". That's it.
>>
>> We cannot seem to determine the exact problem. The read timeout is  
>> default
>> (60 sec). The open files limit and the number of xceivers is upped  
>> a lot. A
>> full GC never takes longer than a second.
>>
>> However, we are seeing a lot of dropped packages on the networking
>> interface. Could these problems be related?
>>
>> Any advice will be helpful.
>>
>> Ferdy.
>>


Re: socket timeouts, dropped packages

Posted by Allen Wittenauer <aw...@apache.org>.
On Apr 13, 2011, at 1:06 PM, Ferdy Galema wrote:
> 
> @Allen/Arun
> My bad, I was not aware that cloudera releases could not be discussed here at all. I was thinking that even though cloudera releases are somewhat different, issues that are probably generic could still discussed here. (Surely I would use the cloudera lists when I'm pretty sure it's absolutely specific to cloudera).

Unfortunately, all of the various forks of the Apache releases (regardless of where they come from) have diverged enough that the issues are rarely generic anymore, outside of those answered on the FAQ. :(


Re: socket timeouts, dropped packages

Posted by Ferdy Galema <fe...@kalooga.com>.
Hey,

Thanks for replying. By now we are also pretty sure it's an issue in the 
hardware layer. We have updated the system (kernel/NIC drivers) 
therefore eliminating any possible bugs in there. But still encountering 
timeouts and dropped packets.

@Allen/Arun
My bad, I was not aware that cloudera releases could not be discussed 
here at all. I was thinking that even though cloudera releases are 
somewhat different, issues that are probably generic could still 
discussed here. (Surely I would use the cloudera lists when I'm pretty 
sure it's absolutely specific to cloudera).

Anyway, I will update the list when we have figured the problem out. The 
right list, cdh-user ;)

Ferdy.

On 04/13/2011 09:22 PM, Eli Collins wrote:
> Hey Ferdy,
>
> If you're seeing this after bumping fs.datanode.max.xcievers and the
> nfiles ulimit, and you're also seeing dropped packets it sounds like
> you're having networking issues.
>
> See the following as well:
> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/1d3a377bd605e1bd/d3d8ec0d14c065bb?#d3d8ec0d14c065bb
>
> Thanks,
> Eli
>
> On Tue, Apr 12, 2011 at 10:37 AM, Ferdy Galema<fe...@kalooga.com>  wrote:
>> Hi,
>>
>> We're running into issues were we are seeing timeouts when writing/reading a
>> lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is enabled).
>> The type of exceptions vary a lot, but most of the times it's whenever a
>> DFSClient writes data into the datanodes pipeline.
>>
>> For example, one datanode logs "Exception in receiveBlock for block
>> blk_5476601577216704980_62953994 java.io.EOFException: while trying to read
>> 65557 bytes" and the other side logs "writeBlock
>> blk_5476601577216704980_62953994 received exception
>> java.net.SocketTimeoutException: Read timed out". That's it.
>>
>> We cannot seem to determine the exact problem. The read timeout is default
>> (60 sec). The open files limit and the number of xceivers is upped a lot. A
>> full GC never takes longer than a second.
>>
>> However, we are seeing a lot of dropped packages on the networking
>> interface. Could these problems be related?
>>
>> Any advice will be helpful.
>>
>> Ferdy.
>>

Re: socket timeouts, dropped packages

Posted by Eli Collins <el...@cloudera.com>.
Hey Ferdy,

If you're seeing this after bumping fs.datanode.max.xcievers and the
nfiles ulimit, and you're also seeing dropped packets it sounds like
you're having networking issues.

See the following as well:
https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/1d3a377bd605e1bd/d3d8ec0d14c065bb?#d3d8ec0d14c065bb

Thanks,
Eli

On Tue, Apr 12, 2011 at 10:37 AM, Ferdy Galema <fe...@kalooga.com> wrote:
> Hi,
>
> We're running into issues were we are seeing timeouts when writing/reading a
> lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is enabled).
> The type of exceptions vary a lot, but most of the times it's whenever a
> DFSClient writes data into the datanodes pipeline.
>
> For example, one datanode logs "Exception in receiveBlock for block
> blk_5476601577216704980_62953994 java.io.EOFException: while trying to read
> 65557 bytes" and the other side logs "writeBlock
> blk_5476601577216704980_62953994 received exception
> java.net.SocketTimeoutException: Read timed out". That's it.
>
> We cannot seem to determine the exact problem. The read timeout is default
> (60 sec). The open files limit and the number of xceivers is upped a lot. A
> full GC never takes longer than a second.
>
> However, we are seeing a lot of dropped packages on the networking
> interface. Could these problems be related?
>
> Any advice will be helpful.
>
> Ferdy.
>

Re: socket timeouts, dropped packages

Posted by Allen Wittenauer <aw...@apache.org>.
On Apr 12, 2011, at 10:37 AM, Ferdy Galema wrote:
> We're running into issues were we are seeing timeouts when writing/reading a lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is enabled). 


....

> Any advice will be helpful.

	You should ask Cloudera since you are running their fork of Apache Hadoop.