You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Yogini Gulkotwar <yo...@flutura.com> on 2014/02/21 10:47:32 UTC

Datanodes going out of reach in hadoop

Hello,
I am working with a 5 node hadoop cluster. The hdfs is on a shared NFS
directory of 98TB.
So when we view the namenode UI, the following is displayed:

 *Node*
*Last Contact* *Admin State*
*ConfiguredCapacity (TB)*
*Used(TB)*
*Non DFS Used (TB)*
*Remaining(TB)*
*Used(%)*
*Remaining(%)* *Blocks*
*Block PoolUsed (TB)*
*Block PoolUsed (%)> Blocks* *Failed Volumes*  Datanode1 0 In Service 97.39
1.83 38.04 57.52 1.88 59.06 80653 1.83 1.88 0  Datanode2 1 In Service 97.39
1.18 38.69 57.52 1.21 59.06 54536 1.18 1.21 0  Datanode3 0 In Service 97.39
1.61 38.26 57.52 1.65 59.06 66902 1.61 1.65 0  Datanode4 2 In Service 97.39
0.65 39.22 57.52 0.67 59.06 32821 0.65 0.67 0  Datanode5 2 In Service 97.39
0.58 39.29 57.52 0.6 59.06 29278 0.58 0.6 0

As can be seen, the each datanode thinks that it has the entire 98TB to
itself. And three of the datanodes (1,2,3) have comparatively more data.
The balancing command doesn't help due to this situation.

And in the recent times, I have come across a strange issue. The three
datanodes with more data go out of reach from the namenode (at different
instances).
That is, the services on the datanode is running but the "LAST CONTACT"
column in the above table reports a high value and after a while NAMENODE
reports the node as DEAD.
Within 10 minutes or so, the datanode goes LIVE again.
I tried going through the logs, but couldn't find any error.
I tried increasing the ulimit on these datanodes, but in vain.

Is there something that needs to done to overcome this issue?

Any configuration changes? Any help would be appreciated.

Thanks & Regards,

Yogini Gulkotwar│Data Scientist

*Flutura Business Solutions Private Limited*
*​​*

​*BANGALORE*

​