You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Andrew Nguyen <an...@ucsfcti.org> on 2010/05/13 02:19:06 UTC

Setting up a second cluster and getting a weird issue

I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:

2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
        at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)


There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.

Any thoughts?

Thanks!

--Andrew

Re: Setting up a second cluster and getting a weird issue

Posted by Andrew Nguyen <an...@ucsfcti.org>.
Yeah, I tried some more experiments today and the error messages were more helpful.  It does seem that some of the values were defaulting to ones very different from what I had configured.  

I have been looking into Puppet but figured with 4 slaves, it shouldn't be a problem to use NFS.  Guess I was wrong!

Thanks all,
Andrew

On May 14, 2010, at 7:41 PM, Hemanth Yamijala wrote:

> Andrew,
> 
>> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS.  I don't see how this would cause a conflict - do you have any additional information?
> 
> FWIW, we had an experience where we were storing config files on NFS
> on a large cluster. Randomly, (and we guess due to NFS problems),
> Hadoop would fail picking up the config files on NFS and instead use
> its defaults. The config values for some directory paths defined in
> default being different from the actual config values was resulting in
> very odd errors. We were able to eventually solve the problem by
> moving the config files off NFS. Of course, the size of the cluster
> (several hundreds of slaves) was probably a reason. But nevertheless,
> you may want to try pulling everything off NFS.
> 
> Thanks
> Hemanth
> 
>> 
>> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS...
>> 
>> Thanks,
>> Andrew
>> 
>> On May 13, 2010, at 6:51 PM, Jeff Zhang wrote:
>> 
>>> It is not suggested to deploy hadoop on NFS, there will be conflict
>>> between data nodes, because NFS share the same namespace of file
>>> system.
>>> 
>>> 
>>> 
>>> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <an...@ucsfcti.org> wrote:
>>>> 
>>>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  The log and pid directories are local.
>>>> 
>>>> Thanks!
>>>> 
>>>> --Andrew
>>>> 
>>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
>>>> 
>>>>> These 4 nodes share NFS ?
>>>>> 
>>>>> 
>>>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
>>>>> <an...@ucsfcti.org> wrote:
>>>>>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
>>>>>> 
>>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
>>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>>>>>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>>>>>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>>>>>>        at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>>>>>> 
>>>>>> 
>>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>>>> 
>>>>>> Any thoughts?
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> --Andrew
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 


Re: Setting up a second cluster and getting a weird issue

Posted by Andrew Nguyen <an...@ucsfcti.org>.
Sorry for bothering everyone, I accidentally configured my dfs.data.dir and mapred.local.dir to the same directory...  Bad copy/paste job.

Thanks for everyone's help!

Re: Setting up a second cluster and getting a weird issue

Posted by Andrew Nguyen <an...@ucsfcti.org>.
So I pulled everything of NFS and I'm still getting the original error with a FileNotFoundException for current/VERSION.

I only have 4 slaves and scp'ed the Hadoop directory to all 4 slaves.

Any other ideas?

On May 14, 2010, at 7:41 PM, Hemanth Yamijala wrote:

> Andrew,
> 
>> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS.  I don't see how this would cause a conflict - do you have any additional information?
> 
> FWIW, we had an experience where we were storing config files on NFS
> on a large cluster. Randomly, (and we guess due to NFS problems),
> Hadoop would fail picking up the config files on NFS and instead use
> its defaults. The config values for some directory paths defined in
> default being different from the actual config values was resulting in
> very odd errors. We were able to eventually solve the problem by
> moving the config files off NFS. Of course, the size of the cluster
> (several hundreds of slaves) was probably a reason. But nevertheless,
> you may want to try pulling everything off NFS.
> 
> Thanks
> Hemanth
> 
>> 
>> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS...
>> 
>> Thanks,
>> Andrew
>> 
>> On May 13, 2010, at 6:51 PM, Jeff Zhang wrote:
>> 
>>> It is not suggested to deploy hadoop on NFS, there will be conflict
>>> between data nodes, because NFS share the same namespace of file
>>> system.
>>> 
>>> 
>>> 
>>> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <an...@ucsfcti.org> wrote:
>>>> 
>>>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  The log and pid directories are local.
>>>> 
>>>> Thanks!
>>>> 
>>>> --Andrew
>>>> 
>>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
>>>> 
>>>>> These 4 nodes share NFS ?
>>>>> 
>>>>> 
>>>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
>>>>> <an...@ucsfcti.org> wrote:
>>>>>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
>>>>>> 
>>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
>>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>>>>>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>>>>>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>>>>>>        at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>>>>>> 
>>>>>> 
>>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>>>> 
>>>>>> Any thoughts?
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> --Andrew
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
>> 


Re: Setting up a second cluster and getting a weird issue

Posted by Hemanth Yamijala <yh...@gmail.com>.
Andrew,

> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS.  I don't see how this would cause a conflict - do you have any additional information?

FWIW, we had an experience where we were storing config files on NFS
on a large cluster. Randomly, (and we guess due to NFS problems),
Hadoop would fail picking up the config files on NFS and instead use
its defaults. The config values for some directory paths defined in
default being different from the actual config values was resulting in
very odd errors. We were able to eventually solve the problem by
moving the config files off NFS. Of course, the size of the cluster
(several hundreds of slaves) was probably a reason. But nevertheless,
you may want to try pulling everything off NFS.

Thanks
Hemanth

>
> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS...
>
> Thanks,
> Andrew
>
> On May 13, 2010, at 6:51 PM, Jeff Zhang wrote:
>
>> It is not suggested to deploy hadoop on NFS, there will be conflict
>> between data nodes, because NFS share the same namespace of file
>> system.
>>
>>
>>
>> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <an...@ucsfcti.org> wrote:
>>>
>>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  The log and pid directories are local.
>>>
>>> Thanks!
>>>
>>> --Andrew
>>>
>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
>>>
>>>> These 4 nodes share NFS ?
>>>>
>>>>
>>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
>>>> <an...@ucsfcti.org> wrote:
>>>>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
>>>>>
>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>>>>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>>>>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>>>>>        at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>>>>>
>>>>>
>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --Andrew
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>

Re: Setting up a second cluster and getting a weird issue

Posted by Andrew Nguyen <an...@ucsfcti.org>.
My hdfs-site.xml file:

 11 <configuration>
 12   <property>
 13     <name>dfs.replication</name>
 14     <value>3</value>
 15   </property>
 16   <property>
 17     <name>dfs.name.dir</name>
 18     <value>/srv/hadoop/dfs.name.dir</value>
 19   </property>
 20   <property>
 21     <name>dfs.data.dir</name>
 22     <value>/srv/hadoop/dfs/1</value>
 23   </property>
 24 </configuration>

Here is my /srv/hadoop/hadoop directory listing:

total 5068
drwxr-xr-x  2 hadoop hadoop    4096 2010-05-12 16:10 bin
-rw-rw-r--  1 hadoop hadoop   73847 2010-03-21 23:17 build.xml
drwxr-xr-x  5 hadoop hadoop    4096 2010-03-21 23:17 c++
-rw-rw-r--  1 hadoop hadoop  348624 2010-03-21 23:17 CHANGES.txt
drwxr-xr-x  4 hadoop hadoop    4096 2010-05-12 09:29 cloudera
lrwxrwxrwx  1 hadoop hadoop      15 2010-05-12 15:54 conf -> ../hadoop-conf/
drwxr-xr-x 15 hadoop hadoop    4096 2010-03-21 23:17 contrib
drwxr-xr-x  9 hadoop hadoop    4096 2010-05-12 09:29 docs
drwxr-xr-x  3 hadoop hadoop    4096 2010-03-21 23:17 example-confs
-rw-rw-r--  1 hadoop hadoop    6839 2010-03-21 23:17 hadoop-0.20.2+228-ant.jar
-rw-rw-r--  1 hadoop hadoop 2806445 2010-03-21 23:17 hadoop-0.20.2+228-core.jar
-rw-rw-r--  1 hadoop hadoop  142466 2010-03-21 23:17 hadoop-0.20.2+228-examples.jar
-rw-rw-r--  1 hadoop hadoop 1637240 2010-03-21 23:17 hadoop-0.20.2+228-test.jar
-rw-rw-r--  1 hadoop hadoop   70090 2010-03-21 23:17 hadoop-0.20.2+228-tools.jar
drwxr-xr-x  2 hadoop hadoop    4096 2010-05-12 09:29 ivy
-rw-rw-r--  1 hadoop hadoop    9103 2010-03-21 23:17 ivy.xml
drwxr-xr-x  5 hadoop hadoop    4096 2010-05-12 09:29 lib
-rw-rw-r--  1 hadoop hadoop   13366 2010-03-21 23:17 LICENSE.txt
lrwxrwxrwx  1 hadoop hadoop       8 2010-05-12 16:28 logs -> ../logs/
drwxr-xr-x  3 hadoop hadoop    4096 2010-05-12 16:16 logs-old
-rw-rw-r--  1 hadoop hadoop     101 2010-03-21 23:17 NOTICE.txt
lrwxrwxrwx  1 hadoop hadoop       7 2010-05-12 16:28 pids -> ../pids
drwxr-xr-x  2 hadoop hadoop    4096 2010-05-12 16:10 pids-old
-rw-rw-r--  1 hadoop hadoop    1366 2010-03-21 23:17 README.txt
drwxr-xr-x 15 hadoop hadoop    4096 2010-05-12 09:29 src
drwxr-xr-x  8 hadoop hadoop    4096 2010-03-21 23:17 webapps

The only NFS shared directories are /srv/hadoop/hadoop and /srv/hadoop/hadoop-conf

On May 14, 2010, at 1:06 PM, Andrew Nguyen wrote:

> I'm pretty sure I just set my dfs.data.dir to be /srv/hadoop/dfs/1
> 
> <property>
> <name>dfs.data.dir</name>
> <value>/srv/hadoop/dfs/1</value>
> </property>
> 
> I don't have hadoop.tmp.dir set to anything so it's whatever the default is.
> 
> I don't have access to the cluster right now but will update with the exact settings when I get a chance.
> 
> I have 4 slaves with identical hardware.  Each has a separate SCSI drive mounted at /srv/hadooop/dfs/1.  The same config file is used across all the slaves.  I know the NFS approach isn't ideal for larger deployments but right now, I'm still in the tweaking stage and figured NFS was the fastest way to propagate changes.
> 
> Thanks!
> 
> On May 14, 2010, at 9:17 AM, Allen Wittenauer wrote:
> 
>> 
>> On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote:
>> 
>>> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS.  I don't see how this would cause a conflict - do you have any additional information?
>>> 
>>> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS...
>>>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
>> 
>>>>>>> 
>>>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>>>>> 
>>>>>>> Any thoughts?
>> 
>> Something is deleting the contents of /srv/hadoop/dfs/1.  How did you set your dfs.data.dir in the config file?  Or did you just change hadoop.tmp.dir?
>> 
>> 
> 


Re: Setting up a second cluster and getting a weird issue

Posted by Andrew Nguyen <an...@ucsfcti.org>.
I'm pretty sure I just set my dfs.data.dir to be /srv/hadoop/dfs/1

<property>
<name>dfs.data.dir</name>
<value>/srv/hadoop/dfs/1</value>
</property>

I don't have hadoop.tmp.dir set to anything so it's whatever the default is.

I don't have access to the cluster right now but will update with the exact settings when I get a chance.

I have 4 slaves with identical hardware.  Each has a separate SCSI drive mounted at /srv/hadooop/dfs/1.  The same config file is used across all the slaves.  I know the NFS approach isn't ideal for larger deployments but right now, I'm still in the tweaking stage and figured NFS was the fastest way to propagate changes.

Thanks!

On May 14, 2010, at 9:17 AM, Allen Wittenauer wrote:

> 
> On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote:
> 
>> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS.  I don't see how this would cause a conflict - do you have any additional information?
>> 
>> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS...
>>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
> 
>>>>>> 
>>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>>>> 
>>>>>> Any thoughts?
> 
> Something is deleting the contents of /srv/hadoop/dfs/1.  How did you set your dfs.data.dir in the config file?  Or did you just change hadoop.tmp.dir?
> 
> 


Re: Setting up a second cluster and getting a weird issue

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote:

> Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS.  I don't see how this would cause a conflict - do you have any additional information?
> 
> The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS...
>>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)

>>>>> 
>>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>>> 
>>>>> Any thoughts?

Something is deleting the contents of /srv/hadoop/dfs/1.  How did you set your dfs.data.dir in the config file?  Or did you just change hadoop.tmp.dir?



Re: Setting up a second cluster and getting a weird issue

Posted by Andrew Nguyen <an...@ucsfcti.org>.
Just to be clear, I'm only sharing the Hadoop binaries and config files via NFS.  I don't see how this would cause a conflict - do you have any additional information?

The referenced path in the error below (/srv/hadoop/dfs/1) is not being shared via NFS...

Thanks,
Andrew

On May 13, 2010, at 6:51 PM, Jeff Zhang wrote:

> It is not suggested to deploy hadoop on NFS, there will be conflict
> between data nodes, because NFS share the same namespace of file
> system.
> 
> 
> 
> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <an...@ucsfcti.org> wrote:
>> 
>> Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  The log and pid directories are local.
>> 
>> Thanks!
>> 
>> --Andrew
>> 
>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
>> 
>>> These 4 nodes share NFS ?
>>> 
>>> 
>>> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
>>> <an...@ucsfcti.org> wrote:
>>>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
>>>> 
>>>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>>>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>>>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>>>>        at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>>>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>>>> 
>>>> 
>>>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>>>> 
>>>> Any thoughts?
>>>> 
>>>> Thanks!
>>>> 
>>>> --Andrew
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>> 
> 
> 
> 
> --
> Best Regards
> 
> Jeff Zhang


Re: Setting up a second cluster and getting a weird issue

Posted by Jeff Zhang <zj...@gmail.com>.
It is not suggested to deploy hadoop on NFS, there will be conflict
between data nodes, because NFS share the same namespace of file
system.



On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen <an...@ucsfcti.org> wrote:
>
> Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  The log and pid directories are local.
>
> Thanks!
>
> --Andrew
>
> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
>
> > These 4 nodes share NFS ?
> >
> >
> > On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
> > <an...@ucsfcti.org> wrote:
> >> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
> >>
> >> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
> >>        at java.io.RandomAccessFile.open(Native Method)
> >>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> >>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
> >>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
> >>        at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
> >>        at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
> >>        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
> >>        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
> >>        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
> >>
> >>
> >> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
> >>
> >> Any thoughts?
> >>
> >> Thanks!
> >>
> >> --Andrew
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
>



--
Best Regards

Jeff Zhang

Re: Setting up a second cluster and getting a weird issue

Posted by Andrew Nguyen <an...@ucsfcti.org>.
Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  The log and pid directories are local.

Thanks!

--Andrew

On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:

> These 4 nodes share NFS ?
> 
> 
> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
> <an...@ucsfcti.org> wrote:
>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
>> 
>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
>>       at java.io.RandomAccessFile.open(Native Method)
>>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>>       at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>>       at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>>       at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>>       at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>>       at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>>       at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>>       at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>> 
>> 
>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>> 
>> Any thoughts?
>> 
>> Thanks!
>> 
>> --Andrew
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Setting up a second cluster and getting a weird issue

Posted by Andrew Nguyen <an...@ucsfcti.org>.
Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  The log and pid directories are local.

Thanks!

--Andrew

On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:

> These 4 nodes share NFS ?
> 
> 
> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
> <an...@ucsfcti.org> wrote:
>> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
>> 
>> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
>>        at java.io.RandomAccessFile.open(Native Method)
>>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>>        at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>>        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>> 
>> 
>> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>> 
>> Any thoughts?
>> 
>> Thanks!
>> 
>> --Andrew
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang


Re: Setting up a second cluster and getting a weird issue

Posted by Jeff Zhang <zj...@gmail.com>.
These 4 nodes share NFS ?


On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
<an...@ucsfcti.org> wrote:
> I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes:
>
> 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory)
>        at java.io.RandomAccessFile.open(Native Method)
>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
>        at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
>        at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
>        at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
>        at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
>        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
>        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
>
>
> There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
>
> Any thoughts?
>
> Thanks!
>
> --Andrew



-- 
Best Regards

Jeff Zhang