You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Kumar, Amit H." <AH...@odu.edu> on 2009/03/05 21:53:01 UTC

Live Datanodes only 1; all the time

Hi All,

Very Interesting behavior:

http://machine2.xxx.xxx.xxx:50070/dfshealth.jsp shows that only one Live Nodes exist. Every time I refresh this page it shows a different node as alive. But Jobtracker Shows there are 8 nodes in the cluster summary.

Any Idea what could be going on here with the following detailed setup I am trying?

I am trying to configure hadoop as follows:

Cluster Setup: Version 0-18.3
1)      I want every user working on Login Nodes of our cluster to have their own Config dir. Hence edited the following in $HADOOP_HOME/conf/hadoop-env.sh
HADOOP_CONF_DIR=$HOME/hadoop/conf
Similarly HADOOP_LOD_DIR=$HOME/hadoop/logs
Note: $HADOOP_HOME is shared NFS hadoop install folder on the cluster head node. There are three Login Nodes for our cluster, excluding head node.  Head node is inaccessible to users.

2)      Every user will have his own 'masters' and 'slaves' files under their $ HADOOP_CONF_DIR
a.      When I had this setup and removed the masters file from $HADOOP_HOME it complained that it could not start SecondaryNameNode. Hence I replaced the 'masters' file with an entry for our Login node. This worked and SecondaryNameNode starts without any error.

3)      As a user, I chose One of the Login box as an entry to my $HOME/hadoop/conf/masters file. 'Slaves' file includes few compute nodes.
4)      I don't see any errors when I start the Hadoop daemons, using start-dfs.sh and start-mapred.sh
5)      Only when I try to 'bin/hadoop fs -put conf input'  files onto HDFS it complains as shown below in the snip section.

NOTE: "grep ERROR *" in logs directory had no results.!!!!

Does any of the below error messages lights a bulb? Please help me understand what I could be doing wrong ???

Thank you,
Amit



<snip>

[ahkumar@machine2 ~/hadoop]$ $hbin/hadoop fs -put conf input
09/03/05 15:20:26 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ahkumar/input/hadoop-metrics.properties could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)

        at org.apache.hadoop.ipc.Client.call(Client.java:716)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922)

09/03/05 15:20:26 WARN dfs.DFSClient: NotReplicatedYetException sleeping /user/ahkumar/input/hadoop-metrics.properties retries left 4
09/03/05 15:20:26 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ahkumar/input/hadoop-metrics.properties could only be replicated to 0 nodes, instead of 1
        <... same as above>

09/03/05 15:20:26 WARN dfs.DFSClient: NotReplicatedYetException sleeping /user/ahkumar/input/hadoop-metrics.properties retries left 3
09/03/05 15:20:27 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ahkumar/input/hadoop-metrics.properties could only be replicated to 0 nodes, instead of 1
        <... same as above>

09/03/05 15:20:27 WARN dfs.DFSClient: NotReplicatedYetException sleeping /user/ahkumar/input/hadoop-metrics.properties retries left 2
09/03/05 15:20:29 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ahkumar/input/hadoop-metrics.properties could only be replicated to 0 nodes, instead of 1
        <... same as above>

09/03/05 15:20:29 WARN dfs.DFSClient: NotReplicatedYetException sleeping /user/ahkumar/input/hadoop-metrics.properties retries left 1
09/03/05 15:20:32 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ahkumar/input/hadoop-metrics.properties could only be replicated to 0 nodes, instead of 1
        <... same as above>

09/03/05 15:20:32 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]
put: Could not get block locations. Aborting...
Exception closing file /user/ahkumar/input/hadoop-metrics.properties
java.io.IOException: Could not get block locations. Aborting...
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899)

</snip>