You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nathan Wang <na...@yahoo.com> on 2008/02/23 02:25:59 UTC

Problems with NFS share in dfs.name.dir

Hi,
We're having problems when trying to deal with the namenode failover, by following the wiki
    http://wiki.apache.org/hadoop/NameNodeFailover

If we point dfs.name.dir to 2 local directories, it works fine.
But, if one of the directories is NFS mounted, we're having these problems:

1) "hadoop dfs -ls" takes 1-2 minutes to finish, and returns error: 
    Bad connection to FS. command aborted.

2) When stop and restart hadoop, Namenode fails to start due to a zombie process.
    The previous Namenode process became a zombie. 

We're using hadoop-0.15.3_64.  

What is the correct way to set this up?  Really appreciate your input.

Thanks,
Nathan

Re: Problems with NFS share in dfs.name.dir

Posted by "prasana.iyengar" <pr...@gmail.com>.
Thanks - that's the part I missed in reading the stack trace. In gluster I'll
explore the possibility of enabling locking.

-prasana

You should probably check what is you OS and FS on the NFS share.
I did not see problems with NFS per se, but some (local) file systems do not
support file locks.
--Konstantin


-- 
View this message in context: http://www.nabble.com/Problems-with-NFS-share-in-dfs.name.dir-tp15647255p17564617.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Problems with NFS share in dfs.name.dir

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
You should probably check what is you OS and FS on the NFS share.
I did not see problems with NFS per se, but some (local) file systems do not support file locks.
--Konstantin

prasana.iyengar wrote:
> Using hadoop-0.16.0 I am  having a similar problem in bringup.
> 
> I added a gluster mount point to the dfs.name.dir list 
> 
> And now dfs starts up and dies with these exceptions; what does this failure
> mean? I checked the perms of the new directory I added 
> 2008-05-29 20:47:55,596 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2008-05-29 20:47:55,598 INFO org.apache.hadoop.dfs.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2008-05-29 20:47:55,642 INFO org.apache.hadoop.fs.FSNamesystem:
> fsOwner=admob,ops,adm,www-data
> 2008-05-29 20:47:55,642 INFO org.apache.hadoop.fs.FSNamesystem:
> supergroup=supergroup
> 2008-05-29 20:47:55,642 INFO org.apache.hadoop.fs.FSNamesystem:
> isPermissionEnabled=true
> 2008-05-29 20:47:55,678 INFO org.apache.hadoop.dfs.Storage:
> java.io.IOException: Function not implemented
>         at sun.nio.ch.FileChannelImpl.lock0(Native Method)
>         at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822)
>         at java.nio.channels.FileChannel.tryLock(FileChannel.java:967)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.lock(Storage.java:393)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:278)
>         at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:148)
>         at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
>         at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:235)
>         at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:175)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:161)
>         at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
> 
> 2008-05-29 20:47:55,679 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 9000
> 2008-05-29 20:47:55,679 ERROR org.apache.hadoop.dfs.NameNode:
> java.io.IOException: Function not implemented
>         at sun.nio.ch.FileChannelImpl.lock0(Native Method)
>         at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822)
>         at java.nio.channels.FileChannel.tryLock(FileChannel.java:967)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.lock(Storage.java:393)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:278)
>         at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:148)
>         at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
>         at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:235)
>         at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:175)
>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:161)
>         at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
> 
> 2008-05-29 20:47:55,680 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG: 
> 
> 
> 
>>If we point dfs.name.dir to 2 local directories, it works fine.
>>But, if one of the directories is NFS mounted, we're having these
>>problems:
>>
>>1) "hadoop dfs -ls" takes 1-2 minutes to finish, and returns error: 
> 
> 

Re: Problems with NFS share in dfs.name.dir

Posted by "prasana.iyengar" <pr...@gmail.com>.
Using hadoop-0.16.0 I am  having a similar problem in bringup.

I added a gluster mount point to the dfs.name.dir list 

And now dfs starts up and dies with these exceptions; what does this failure
mean? I checked the perms of the new directory I added 
2008-05-29 20:47:55,596 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2008-05-29 20:47:55,598 INFO org.apache.hadoop.dfs.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2008-05-29 20:47:55,642 INFO org.apache.hadoop.fs.FSNamesystem:
fsOwner=admob,ops,adm,www-data
2008-05-29 20:47:55,642 INFO org.apache.hadoop.fs.FSNamesystem:
supergroup=supergroup
2008-05-29 20:47:55,642 INFO org.apache.hadoop.fs.FSNamesystem:
isPermissionEnabled=true
2008-05-29 20:47:55,678 INFO org.apache.hadoop.dfs.Storage:
java.io.IOException: Function not implemented
        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822)
        at java.nio.channels.FileChannel.tryLock(FileChannel.java:967)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.lock(Storage.java:393)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:278)
        at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:148)
        at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
        at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
        at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:235)
        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:175)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:161)
        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)

2008-05-29 20:47:55,679 INFO org.apache.hadoop.ipc.Server: Stopping server
on 9000
2008-05-29 20:47:55,679 ERROR org.apache.hadoop.dfs.NameNode:
java.io.IOException: Function not implemented
        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822)
        at java.nio.channels.FileChannel.tryLock(FileChannel.java:967)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.lock(Storage.java:393)
        at
org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:278)
        at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:148)
        at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
        at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
        at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:235)
        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:175)
        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:161)
        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)

2008-05-29 20:47:55,680 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG: 


> If we point dfs.name.dir to 2 local directories, it works fine.
> But, if one of the directories is NFS mounted, we're having these
> problems:
> 
> 1) "hadoop dfs -ls" takes 1-2 minutes to finish, and returns error: 

-- 
View this message in context: http://www.nabble.com/Problems-with-NFS-share-in-dfs.name.dir-tp15647255p17548570.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Problems with NFS share in dfs.name.dir

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Check the namenode log. It is possible that your NFS mount has problems 
  NameNode might be stuck trying to write to it.

If log is not useful, you can attach jstack output for NameNode when it 
seems to be stuck.

Raghu.

Nathan Wang wrote:
> Hi,
> We're having problems when trying to deal with the namenode failover, by following the wiki
>     http://wiki.apache.org/hadoop/NameNodeFailover
> 
> If we point dfs.name.dir to 2 local directories, it works fine.
> But, if one of the directories is NFS mounted, we're having these problems:
> 
> 1) "hadoop dfs -ls" takes 1-2 minutes to finish, and returns error: 
>     Bad connection to FS. command aborted.
> 
> 2) When stop and restart hadoop, Namenode fails to start due to a zombie process.
>     The previous Namenode process became a zombie. 
> 
> We're using hadoop-0.15.3_64.  
> 
> What is the correct way to set this up?  Really appreciate your input.
> 
> Thanks,
> Nathan