You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Raghu Angadi <an...@gmail.com> on 2010/01/05 23:12:40 UTC

Re: time outs when accessing port 50010

On Mon, Dec 21, 2009 at 11:57 AM, dave bayer <da...@cloudfactory.org> wrote:

>
> On Nov 25, 2009, at 11:27 AM, David J. O'Dell wrote:
>
>  I've intermittently seen the following errors on both of my clusters, it
>> happens when writing files.
>> I was hoping this would go away with the new version but I see the same
>> behavior on both versions.
>> The namenode logs don't show any problems, its always on the client and
>> datanodes.
>>
>
> [leaving errors below for reference]
>
> I've seen similar errors on my 0.19.2 cluster when the cluster is decently
> busy. I've traced this more or less to the host in question doing
> verification on its blocks, an operation which seems to take the datanode
> out for upwards of 500 seconds in some cases.
>
>
This issue is fixed in 0.21 (
https://issues.apache.org/jira/browse/HADOOP-4584 )

But the original issue reported is _not_ related to Datanode locking. For
the user, connect() it self times out, which has nothing to do with
FSDataset locking during block report processing.

Raghu.

In 0.19.2, if you look at o.a.h.hdfs.server.datanode.FSDataset.FSVolumeSet,
> you will see that all methods are synchronized. All operations for the
> dataset on the node seem to drop through methods in this class which in turn
> causes a backup when one thread spends a large amount of time locking the
> monitor...
>
> You can grab a few jstacks and use a dump analyzer (like
> https://tda.dev.java.net/) to poke through them to see if you have the
> same behavior.
>
> I have not spent enough time digging into this to understand whether the
> whole dataset really needs to be locked during the operation or if the locks
> could be moved closer to the FSDir operations.
>
> dave bayer
>
> original logs clips included here:
>
>
>> Client log:
>> 09/11/25 10:54:15 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.net.SocketTimeoutException: 69000 millis
>> timeout while waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10.1.75.11:37852remote=/
>> 10.1.75.125:50010]
>> 09/11/25 10:54:15 INFO hdfs.DFSClient: Abandoning block
>> blk_-105422935413230449_22608
>> 09/11/25 10:54:15 INFO hdfs.DFSClient: Waiting to find target node:
>> 10.1.75.125:50010
>>
>> Datanode log:
>> 2009-11-25 10:54:51,170 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.1.75.125:50010,
>> storageID=DS-1401408597-10.1.75.125-50010-1258737830230, infoPort=50075,
>> ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 120000 millis timeout while waiting for
>> channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/
>> 10.1.75.104:50010]
>>      at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>>      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>      at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:282)
>>      at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>>      at java.lang.Thread.run(Thread.java:619)
>>
>
>