You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by "Xu, Richard " <ri...@citi.com> on 2011/05/31 16:14:22 UTC

Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Hi Folks,

We have asked this question in common-users hadoop mail list, but not resolved for a week.

We try to get hbase and hadoop running on clusters, take 2 Solaris servers(also tried 1 linux, 1 Solaris) for now.

Because of the incompatibility issue between hbase and hadoop, we have to stick with hadoop 0.20.2-append release.

It is very straight forward to make hadoop-0.20.203 running, but stuck for several days with hadoop-0.20.2, even the official release, not the append version.

1. Once try to run start-mapred.sh(hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker), following errors shown in namenode and jobtracker logs:

2011-05-26 12:30:29,169 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1
2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, DFSCl
ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n
odes, instead of 1
java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
       at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
       at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:396)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


2. Also, Configured Capacity is 0, cannot put any file to HDFS.

3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing:
2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 41904: starting
2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 41904: starting
2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 41904: starting
2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 41904: starting
2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 41904: starting
                                       .....
2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server handler 63 on 41904: starting
2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:41904
2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_loanps3d:localhost/127.0.0.1:41904


I have tried all suggestions found so far, including
    1) remove hadoop-name and hadoop-data folders and reformat namenode;
    2) clean up all temp files/folders under /tmp;

But nothing works.

Your help is greatly appreciated.

Thanks,

RX

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Posted by Harsh J <ha...@cloudera.com>.

Hello RX,

On Tue, May 31, 2011 at 9:05 PM, Xu, Richard <ri...@citi.com> wrote:
> Running on namenode(hostname: loanps4d):
> :/opt/hadoop-install/hadoop-0.20.2/bin:59 > hadoop dfsadmin -report
> Configured Capacity: 0 (0 KB)
> Present Capacity: 3072 (3 KB)
> DFS Remaining: 0 (0 KB)
> DFS Used: 3072 (3 KB)
> DFS Used%: 100%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 1 (1 total, 0 dead)
>
> Name: 169.193.181.213:50010
> Decommission Status : Normal
> Configured Capacity: 0 (0 KB)
> DFS Used: 3072 (3 KB)
> Non DFS Used: 0 (0 KB)
> DFS Remaining: 0(0 KB)
> DFS Used%: 100%
> DFS Remaining%: 0%
> Last contact: Tue May 31 11:30:37 EDT 2011

Yup, for some reason the DN's not picking up any space stats on your platform.

Could you give me the local command outputs of the following from both
your Solaris and Linux systems?

$ df -k /opt/hadoop-install/hadoop-0.20.2/hadoop-data
$ du -sk /opt/hadoop-install/hadoop-0.20.2/hadoop-data

FWIW, the code am reading says that the DU and DF util classes have
only been tested on Cygwin, Linux and FreeBSD. I think Solaris may
need a bit of tweaking, but am not aware of a resource for this off
the top of my head.

-- 
Harsh J

RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Posted by "Xu, Richard " <ri...@citi.com>.

Running on namenode(hostname: loanps4d):
:/opt/hadoop-install/hadoop-0.20.2/bin:59 > hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 3072 (3 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 3072 (3 KB)
DFS Used%: 100%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Name: 169.193.181.213:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 3072 (3 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Tue May 31 11:30:37 EDT 2011

Datanode(hostname: loanps3d) log:
:/opt/hadoop-install/hadoop-0.20.2/logs:60 > tail hadoop-cfadm-datanode-loanps3d.log
2011-05-31 11:29:04,076 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2011-05-31 11:29:04,086 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2011-05-31 11:29:04,086 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2011-05-31 11:29:04,087 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2011-05-31 11:29:04,087 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(loanps3d:50010, storageID=, infoPort=50075, ipcPort=50020)
2011-05-31 11:29:04,142 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id DS-1373813798-169.193.181.213-50010-1306855744095 is assigned to data-node 169.193.181.213:50010
2011-05-31 11:29:04,145 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(169.193.181.213:50010, storageID=DS-1373813798-169.193.181.213-50010-1306855744095, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/opt/hadoop-install/hadoop-0.20.2/hadoop-data/current'}
2011-05-31 11:29:04,146 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2011-05-31 11:29:04,679 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 28 msecs
2011-05-31 11:29:04,683 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner.


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: Tuesday, May 31, 2011 10:23 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Xu,

Please post the output of `hadoop dfsadmin -report` and attach the
tail of a started DN's log?

On Tue, May 31, 2011 at 7:44 PM, Xu, Richard <ri...@citi.com> wrote:
> 2. Also, Configured Capacity is 0, cannot put any file to HDFS.

This might easily be the cause. I'm not sure if its a Solaris thing
that can lead to this though.

> 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing:

I don't see any suspicious log message in what you'd posted. Anyhow,
the TT does not matter here.

--
Harsh J

RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Posted by "Xu, Richard " <ri...@citi.com>.

“Are you using the same Java version on both systems”
---Yes.

“Can you test with one NN and two DN?”
---We tested with 1 namenode and 4 datanode, and encountered this problem. We tried to narrow it down, so that tried with 1 NN and 1 DN.


From: Marcos Ortiz [mailto:mlortiz@uci.cu]
Sent: Tuesday, May 31, 2011 11:46 AM
To: hdfs-user@hadoop.apache.org
Cc: Xu, Richard [ICG-IT]
Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

On 05/31/2011 10:06 AM, Xu, Richard wrote:
1 namenode, 1 datanode. Dfs.replication=3. We also tried 0, 1, 2, same result.

From: Yaozhen Pan [mailto:itzhak.pan@gmail.com]
Sent: Tuesday, May 31, 2011 10:34 AM
To: hdfs-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster


How many datanodes are in your cluster? and what is the value of "dfs.replication" in hdfs-site.xml (if not specified, default value is 3)?

From the error log, it seems there are not enough datanodes to replicate the files in hdfs.
在 2011 5 31 22:23，"Harsh J" <ha...@cloudera.com>>写道：
Xu,

Please post the output of `hadoop dfsadmin -report` and attach the
tail of a started DN's log?

On Tue, May 31, 2011 at 7:44 PM, Xu, Richard <ri...@citi.com>> wrote:
> 2. Also, Configured Cap...
This might easily be the cause. I'm not sure if its a Solaris thing
that can lead to this though.

> 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing:...
I don't see any suspicious log message in what you'd posted. Anyhow,
the TT does not matter here.

--
Harsh J
Regards, Xu
When you installed on Solaris:
- Did you syncronize the ntp server on all nodes:
  echo "server youservernetp.com" > /etc/inet/ntp.conf
  svcadm enable svc:/network/ntp:default

- Are you using the same Java version on both systems (Ubuntu and Solaris)?

- Can you test with one NN and two DN?





--

Marcos Luis Ortiz Valmaseda

 Software Engineer (Distributed Systems)

 http://uncubanitolinuxero.blogspot.com

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Posted by Marcos Ortiz <ml...@uci.cu>.

On 05/31/2011 10:06 AM, Xu, Richard wrote:
>
> 1 namenode, 1 datanode. Dfs.replication=3. We also tried 0, 1, 2, same 
> result.
>
> *From:*Yaozhen Pan [mailto:itzhak.pan@gmail.com]
> *Sent:* Tuesday, May 31, 2011 10:34 AM
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Re: Unable to start hadoop-0.20.2 but able to start 
> hadoop-0.20.203 cluster
>
> How many datanodes are in your cluster? and what is the value of 
> "dfs.replication" in hdfs-site.xml (if not specified, default value is 
> 3)?
>
> From the error log, it seems there are not enough datanodes to 
> replicate the files in hdfs.
>
>     在 2011 5 31 22:23，"Harsh J" <harsh@cloudera.com
>     <ma...@cloudera.com>>写道：
>     Xu,
>
>     Please post the output of `hadoop dfsadmin -report` and attach the
>     tail of a started DN's log?
>
>
>     On Tue, May 31, 2011 at 7:44 PM, Xu, Richard <richard.xu@citi.com
>     <ma...@citi.com>> wrote:
>     > 2. Also, Configured Cap...
>
>     This might easily be the cause. I'm not sure if its a Solaris thing
>     that can lead to this though.
>
>
>     > 3. in datanode server, no error in logs, but tasktracker logs has
>     the following suspicious thing:...
>
>     I don't see any suspicious log message in what you'd posted. Anyhow,
>     the TT does not matter here.
>
>     --
>     Harsh J
>
Regards, Xu
When you installed on Solaris:
- Did you syncronize the ntp server on all nodes:
   echo "server youservernetp.com" > /etc/inet/ntp.conf
   svcadm enable svc:/network/ntp:default

- Are you using the same Java version on both systems (Ubuntu and Solaris)?

- Can you test with one NN and two DN?



-- 
Marcos Luis Ortiz Valmaseda
  Software Engineer (Distributed Systems)
  http://uncubanitolinuxero.blogspot.com

RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Posted by "Xu, Richard " <ri...@citi.com>.

1 namenode, 1 datanode. Dfs.replication=3. We also tried 0, 1, 2, same result.

From: Yaozhen Pan [mailto:itzhak.pan@gmail.com]
Sent: Tuesday, May 31, 2011 10:34 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

How many datanodes are in your cluster? and what is the value of "dfs.replication" in hdfs-site.xml (if not specified, default value is 3)?

From the error log, it seems there are not enough datanodes to replicate the files in hdfs.
在 2011 5 31 22:23，"Harsh J" <ha...@cloudera.com>>写道：
Xu,

Please post the output of `hadoop dfsadmin -report` and attach the
tail of a started DN's log?

On Tue, May 31, 2011 at 7:44 PM, Xu, Richard <ri...@citi.com>> wrote:
> 2. Also, Configured Cap...
This might easily be the cause. I'm not sure if its a Solaris thing
that can lead to this though.

> 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing:...
I don't see any suspicious log message in what you'd posted. Anyhow,
the TT does not matter here.

--
Harsh J

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Posted by Yaozhen Pan <it...@gmail.com>.

How many datanodes are in your cluster? and what is the value of
"dfs.replication" in hdfs-site.xml (if not specified, default value is 3)?

>From the error log, it seems there are not enough datanodes to replicate the
files in hdfs.

在 2011 5 31 22:23，"Harsh J" <ha...@cloudera.com>写道：
Xu,

Please post the output of `hadoop dfsadmin -report` and attach the
tail of a started DN's log?


On Tue, May 31, 2011 at 7:44 PM, Xu, Richard <ri...@citi.com> wrote:
> 2. Also, Configured Cap...
This might easily be the cause. I'm not sure if its a Solaris thing
that can lead to this though.


> 3. in datanode server, no error in logs, but tasktracker logs has the
following suspicious thing:...
I don't see any suspicious log message in what you'd posted. Anyhow,
the TT does not matter here.

--
Harsh J

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

Posted by Harsh J <ha...@cloudera.com>.

Xu,

Please post the output of `hadoop dfsadmin -report` and attach the
tail of a started DN's log?

On Tue, May 31, 2011 at 7:44 PM, Xu, Richard <ri...@citi.com> wrote:
> 2. Also, Configured Capacity is 0, cannot put any file to HDFS.

This might easily be the cause. I'm not sure if its a Solaris thing
that can lead to this though.

> 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing:

I don't see any suspicious log message in what you'd posted. Anyhow,
the TT does not matter here.

-- 
Harsh J