You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Alex Li <al...@gmail.com> on 2010/05/14 12:27:55 UTC

Problem starting datanode inside Solaris zone

Hi all,

I'm inside a OpenSolaris zone, or more precisely a Joyent Accelerator.

I can't seem to get a datanode started. I can start a namenode fine. I can
"bin/hadoop datanode -format' fine. JAVA_HOME is set to "/usr/jdk/latest"
which is a symlink to whatever the latest version is.

I'm running it as user 'jill' and I don't even know where that

  "24 /tmp/hadoop-jill/dfs/data"

is coming from.

What am I missing? I'm very baffled :(

In the log file all I'm getting is this:

2010-05-14 05:30:28,059 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = <somehost>/10.181.x.x
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2010-05-14 05:30:28,255 INFO org.apache.hadoop.hdfs.server.common.Storage:
Storage directory /tmp/hadoop-jill/dfs/data is not formatted.
2010-05-14 05:30:28,255 INFO org.apache.hadoop.hdfs.server.common.Storage:
Formatting ...
2010-05-14 05:30:28,275 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
java.lang.NumberFormatException:* For input string: "24
/tmp/hadoop-jill/dfs/data"*
        at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
        at java.lang.Long.parseLong(Long.java:419)
        at java.lang.Long.parseLong(Long.java:468)
        at org.apache.hadoop.fs.DU.parseExecResult(DU.java:187)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
        at org.apache.hadoop.util.Shell.run(Shell.java:134)
        at org.apache.hadoop.fs.DU.<init>(DU.java:53)
        at org.apache.hadoop.fs.DU.<init>(DU.java:63)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.<init>(FSDataset.java:333)
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(FSDataset.java:689)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:302)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216)

Thanks!

-alex

Re: Problem starting datanode inside Solaris zone

Posted by Alex Li <al...@joyent.com>.
Forgot, OS version, Nevada build 121

[root@alextest ~]# uname -a
SunOS alextest 5.11 snv_121 i86pc i386 i86pc

Cheers!

On Sat, May 15, 2010 at 1:20 AM, Alex Li <al...@joyent.com> wrote:

> Thanks again!
>
>
> On Sat, May 15, 2010 at 1:08 AM, Allen Wittenauer <
> awittenauer@linkedin.com> wrote:
>
>>
>> On May 14, 2010, at 9:57 AM, Alex Li wrote:
>> > [jill@alextest ~]$ which du
>> > /usr/xpg4/bin/du
>>
>> POSIX du
>>
>> > [jill@alextest ~]$ which gdu
>> > /opt/local/bin/gdu
>>
>> GNU du
>>
>> > [jill@alextest ~]$
>> >
>> > It turns out what I got isn't the GNU du.
>>
>>
>> This is actually very concerning.  Hadoop should work with POSIX du.  If
>> it doesn't, it is a bug.
>>
>> > alias didn't work. hadoop must be looking for 'whch du' and use
>> whichever if
>> > found first in the PATH.
>>
>>
>> Essentially, yes.  It is surprising that POSIX du fails but SysV du (which
>> is what we use here) and GNU du work.
>>
>> I'll have to play with this and see what is going on.
>
>
>

Re: Problem starting datanode inside Solaris zone

Posted by Alex Li <al...@joyent.com>.
Thanks again!

On Sat, May 15, 2010 at 1:08 AM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
> On May 14, 2010, at 9:57 AM, Alex Li wrote:
> > [jill@alextest ~]$ which du
> > /usr/xpg4/bin/du
>
> POSIX du
>
> > [jill@alextest ~]$ which gdu
> > /opt/local/bin/gdu
>
> GNU du
>
> > [jill@alextest ~]$
> >
> > It turns out what I got isn't the GNU du.
>
>
> This is actually very concerning.  Hadoop should work with POSIX du.  If it
> doesn't, it is a bug.
>
> > alias didn't work. hadoop must be looking for 'whch du' and use whichever
> if
> > found first in the PATH.
>
>
> Essentially, yes.  It is surprising that POSIX du fails but SysV du (which
> is what we use here) and GNU du work.
>
> I'll have to play with this and see what is going on.

Re: Problem starting datanode inside Solaris zone

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 14, 2010, at 9:57 AM, Alex Li wrote:
> [jill@alextest ~]$ which du
> /usr/xpg4/bin/du

POSIX du

> [jill@alextest ~]$ which gdu
> /opt/local/bin/gdu

GNU du 

> [jill@alextest ~]$
> 
> It turns out what I got isn't the GNU du.


This is actually very concerning.  Hadoop should work with POSIX du.  If it doesn't, it is a bug.

> alias didn't work. hadoop must be looking for 'whch du' and use whichever if
> found first in the PATH.


Essentially, yes.  It is surprising that POSIX du fails but SysV du (which is what we use here) and GNU du work.

I'll have to play with this and see what is going on.

Re: Problem starting datanode inside Solaris zone

Posted by Alex Li <al...@joyent.com>.
Hi Allen,

Thanks for the pointer! You are dead on!

This is what I got:

[jill@alextest ~]$ du /storage/hadoop-jill/dfs/data/
3 /storage/hadoop-jill/dfs/data/detach
6 /storage/hadoop-jill/dfs/data/current
3 /storage/hadoop-jill/dfs/data/tmp
18 /storage/hadoop-jill/dfs/data
[jill@alextest ~]$
[jill@alextest ~]$ which du
/usr/xpg4/bin/du
[jill@alextest ~]$ which gdu
/opt/local/bin/gdu
[jill@alextest ~]$

It turns out what I got isn't the GNU du.

I got it to run by doing this under /opt/local/bin:

   ln -s du gdu

alias didn't work. hadoop must be looking for 'whch du' and use whichever if
found first in the PATH.

Thanks so much! My data node is now up!

-alex


On Sat, May 15, 2010 at 12:14 AM, Allen Wittenauer <awittenauer@linkedin.com
> wrote:

>
> On May 14, 2010, at 3:27 AM, Alex Li wrote:
> > I'm running it as user 'jill' and I don't even know where that
> >
> >  "24 /tmp/hadoop-jill/dfs/data"
> >
> > is coming from.
> >
> > What am I missing? I'm very baffled :(
>
> It is likely coming from the output of du, which the datanode uses to
> determine space.  We run Hadoop on Solaris, but not in a zone so there
> shouldn't be any issues there, unless Joyent is doing odd things.
>
> What version of Solaris and what does your output of du
> /tmp/hadoop-jill/dfs/data give?  [tabs vs. spaces, etc, counts!]
>
>
>
>

Re: Problem starting datanode inside Solaris zone

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 14, 2010, at 3:27 AM, Alex Li wrote:
> I'm running it as user 'jill' and I don't even know where that
> 
>  "24 /tmp/hadoop-jill/dfs/data"
> 
> is coming from.
> 
> What am I missing? I'm very baffled :(

It is likely coming from the output of du, which the datanode uses to determine space.  We run Hadoop on Solaris, but not in a zone so there shouldn't be any issues there, unless Joyent is doing odd things.

What version of Solaris and what does your output of du /tmp/hadoop-jill/dfs/data give?  [tabs vs. spaces, etc, counts!]