You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Lohit Vijayarenu (JIRA)" <ji...@apache.org> on 2008/07/09 04:08:31 UTC

[jira] Created: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Namenode does not start due to exception throw while saving Image
-----------------------------------------------------------------

                 Key: HADOOP-3724
                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.18.0
            Reporter: Lohit Vijayarenu


Re-start of namenode failed with this stack trace while savingImage during initialization

{noformat}
2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
{noformat}

Looks like it was throwing IOException in saveFilesUnderConstruction

Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 

{noformat}
2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
{noformat}

These 2 lines repeated forever, to a point where I see that a 7 node cluster had namenode log with size close to 41G.

Could not find any other information about the file as there were not previous namenode logs. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616651#action_12616651 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3724:
------------------------------------------------

Oops, my previous comment was wrong: dir.getFileInfo(dst) is changing. It will be different if we call it after rename.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu reopened HADOOP-3724:
--------------------------------------


We saw a case again similar to this. Here, namenode was not able to start even after moving to a newer hadoop 0.18. So, we recovered the image and are running new hadoop 0.18 for a while now. Will monitor log and restart sometime next week to see if we hit this again. Till then, would like to keep this open.


> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616644#action_12616644 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3724:
------------------------------------------------

In FSNamesystem.renameToInternal(...), could you move dir.getFileInfo(dst) inside the if-statment?  i.e.
{code}
--- src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java	(revision 679248)
+++ src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java	(working copy)
@@ -1459,8 +1459,9 @@
     if (dir.renameTo(src, dst)) {
-      changeLease(src, dst);     // update lease with new filename
+      changeLease(src, dst, dir.getFileInfo(dst));     // update lease with new filename
       return true;
     }
{code}
If dir.renameTo(src, dst) return false, we should not call dir.getFileInfo(dst).

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615310#action_12615310 ] 

lohit edited comment on HADOOP-3724 at 7/21/08 11:29 AM:
--------------------------------------------------------------------

After looking at the logs, here is something which might have triggered this

- 00:06:35,042  /mapredsystem/user/mapredsystem/machine/job_200807182151_0100/job.jar was created 
- 00:06:36,641 request for NameSystem.allocateBlock blk_353852126366132690_6689141 of file  /mapredsystem/user/mapredsystem/machine/job_200807182151_0100/job.jar
- 00:10:18,138 request for rename of  /mapredsystem/user/mapredsystem/machine to /user/user/.Trash/Current/machine
- 01:06:35,583 After around one hour, lease recovery for the open file happens but with new path under .Trash With this exception going on forever 
{noformat}
INFO org.apache.hadoop.dfs.LeaseManager: Lease Monitor: Removing lease [Lease.  Holder: DFSClient_-2067534834, pendingcreates: 1], sortedLeases.size()=: 13
INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-2067534834, pendingcreates: 1], src=/user/user/.Trash/Current/machine/machine/job_200807182151_0100/job.jar
WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /user/user/.Trash/Current/machine/machine/job_200807182151_0100/job.jar file does not exist.
{noformat}

Also,
- For this particular file, allocating a block was not followed by addStoredBlock, none of the datanodes reported back to namenode with block length, hence the file was still open


      was (Author: lohit):
    After looking at the logs, here is something which might have triggered this

- 00:06:35,042  /mapredsystem/user/mapredsystem/machine/job_200807182151_0100/job.jar was created 
- 00:06:36,641 request for NameSystem.allocateBlock blk_353852126366132690_6689141 of file  /mapredsystem/user/mapredsystem/machine/job_200807182151_0100/job.jar
- 00:10:18,138 request for rename of  /mapredsystem/user/mapredsystem/machine to /user/user/.Trash/Current/machine
- 01:06:35,583 After around one hour, lease recovery for the open file happens but with new path under .Trash With this exception going on forever 
{noformat}
INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-2067534834, pendingcreates: 1], src=/user/user/.Trash/Current/machine/machine/job_200807182151_0100/job.jar
{noformat}

Also,
- For this particular file, allocating a block was not followed by addStoredBlock, none of the datanodes reported back to namenode with block length, hence the file was still open

  
> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612802#action_12612802 ] 

Lohit Vijayarenu commented on HADOOP-3724:
------------------------------------------

I use earlier version of hadoop-0.18.0 with this fsimage and edits and was able to reproduce the exception. But after refreshing hadoop-0.18.0 to latest build, the same fsimage and edits seem to bootup the namenode without the exception. I think we can close this as invalid, sorry about the confusion. I think we should have a build number apart from build version. 

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-3724:
-------------------------------------

    Attachment: renameWhileOpen2.patch

Incorporated Lohit's review comments.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-3724:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch, renameWhileOpen3.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616616#action_12616616 ] 

Lohit Vijayarenu commented on HADOOP-3724:
------------------------------------------

+1 Changes look good.
PS : there is one javac warning. 

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616818#action_12616818 ] 

Hadoop QA commented on HADOOP-3724:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12386833/renameWhileOpen3.patch
  against trunk revision 679601.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2948/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2948/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2948/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2948/console

This message is automatically generated.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch, renameWhileOpen3.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615419#action_12615419 ] 

Lohit Vijayarenu commented on HADOOP-3724:
------------------------------------------

We started the namenode by ignoring the save lease exception. The namespace did not have any file named /user/user/.Trash/Current/machine/machine/job_200807182151_0100/job.jar. But it did have a file named  /user/user/.Trash/Current/machine/job_200807182151_0100/job.jar. One directory named 'machine' was not present. 

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-3724:
-------------------------------------

    Status: Patch Available  (was: Reopened)

Triggering HudsonQA tests.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-3724:
-------------------------------------

    Attachment: renameWhileOpen3.patch

Fix javac warnings.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch, renameWhileOpen3.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612588#action_12612588 ] 

Lohit Vijayarenu commented on HADOOP-3724:
------------------------------------------

grepping for the filename in fsimage and edits show very little information

{noformat}
[lohit@current]$ strings edits | grep "/foo/bar/jambajuice"
[lohit@current]$ strings fsimage | grep "/foo/bar/jambajuice"
7/foo/bar/jambajuice
7/foo/bar/jambajuice
[lohit@current]$
{noformat}

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615063#action_12615063 ] 

dhruba borthakur commented on HADOOP-3724:
------------------------------------------

I would like to look at the logs and image. Is it possible for me to access these? Thanks.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616548#action_12616548 ] 

dhruba borthakur commented on HADOOP-3724:
------------------------------------------

The core test failure during Hudson is expected (HADOOP-3809). This patch does not cause this test failure.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur resolved HADOOP-3724.
--------------------------------------

    Resolution: Invalid

This problem existed in pre-0.18 release. It has been fixed in 0.18 release.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616541#action_12616541 ] 

Hadoop QA commented on HADOOP-3724:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12386773/renameWhileOpen2.patch
  against trunk revision 679286.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 519 javac compiler warnings (more than the trunk's current 518 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2938/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2938/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2938/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2938/console

This message is automatically generated.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-3724:
-------------------------------------

    Status: Open  (was: Patch Available)

Have to fix javac warnings.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch, renameWhileOpen3.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-3724:
----------------------------------------

      Description: 
Re-start of namenode failed with this stack trace while savingImage during initialization

{noformat}
2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
{noformat}

Looks like it was throwing IOException in saveFilesUnderConstruction

Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 

{noformat}
2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
{noformat}

These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.

Could not find any other information about the file as there were not previous namenode logs. 


  was:
Re-start of namenode failed with this stack trace while savingImage during initialization

{noformat}
2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
{noformat}

Looks like it was throwing IOException in saveFilesUnderConstruction

Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 

{noformat}
2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
{noformat}

These 2 lines repeated forever, to a point where I see that a 7 node cluster had namenode log with size close to 41G.

Could not find any other information about the file as there were not previous namenode logs. 


         Priority: Blocker  (was: Major)
    Fix Version/s: 0.18.0
         Assignee: dhruba borthakur

It looks like there are 2 problems here related to *LEASE RECOVERY*.
# The system falls into infinite loop trying to recover a lease
# making the namespace image inconsistent so that the name-node cannot not re-start.

Both of them are critical for the 0.18 release.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624759#action_12624759 ] 

Hudson commented on HADOOP-3724:
--------------------------------

Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/])

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch, renameWhileOpen3.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612794#action_12612794 ] 

dhruba borthakur commented on HADOOP-3724:
------------------------------------------

I extracted the ascii strings from the fsimage, edits and edits.new. The file /foo/bar/jambajuice" apears in the fsimage as a regular file. It also appears in the fsimage as a saved lease. Both of them are valid entries. Then I took fsimage/edits/... into a existing namenode, and started namenode. Namenode started without a hitch. It processed the contents of all three files. This test was done with 0.18

On further inspection of the customer install (with help from Lohit), we found that this was not running hadoop 0.18 release. Rather, it was running a much earlier version of the software. We verified that the workspace from which that buggy version of hadoop was build did not have the latest fixes in LeaseManager.java and FSNamesystem.java.

Thanks to Lohit for all his hard work and time.

Possible fixes that have gone in to solve this type of problem: HADOOP-3269 HADOOP-3349 HADOOP-3375 HADOOP-3418

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615724#action_12615724 ] 

Lohit Vijayarenu commented on HADOOP-3724:
------------------------------------------

Changes look good. minor changes if it makes sense.
- after restart can the tests check for the existence of new path which are still under lease?
- possible write to them and validate them?
- Leasemanager::changeLease() has log message about changing lease at debug level, should it be at info level? Do we expect lot of messages? I think if we had it at info level, we could have caught this error.  

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615046#action_12615046 ] 

Lohit Vijayarenu commented on HADOOP-3724:
------------------------------------------

We did a restart and unfortunately we still see this. Will update the bug if I see something in logs.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-3724:
-------------------------------------

    Hadoop Flags: [Reviewed]
          Status: Patch Available  (was: Open)

Fix javac warnings.

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch, renameWhileOpen2.patch, renameWhileOpen3.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-3724:
-------------------------------------

    Attachment: renameWhileOpen.patch

Preliminary patch for review. 

> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: renameWhileOpen.patch
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3724) Namenode does not start due to exception throw while saving Image

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615310#action_12615310 ] 

Lohit Vijayarenu commented on HADOOP-3724:
------------------------------------------

After looking at the logs, here is something which might have triggered this

- 00:06:35,042  /mapredsystem/user/mapredsystem/machine/job_200807182151_0100/job.jar was created 
- 00:06:36,641 request for NameSystem.allocateBlock blk_353852126366132690_6689141 of file  /mapredsystem/user/mapredsystem/machine/job_200807182151_0100/job.jar
- 00:10:18,138 request for rename of  /mapredsystem/user/mapredsystem/machine to /user/user/.Trash/Current/machine
- 01:06:35,583 After around one hour, lease recovery for the open file happens but with new path under .Trash With this exception going on forever 
{noformat}
INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-2067534834, pendingcreates: 1], src=/user/user/.Trash/Current/machine/machine/job_200807182151_0100/job.jar
{noformat}

Also,
- For this particular file, allocating a block was not followed by addStoredBlock, none of the datanodes reported back to namenode with block length, hence the file was still open


> Namenode does not start due to exception throw while saving Image
> -----------------------------------------------------------------
>
>                 Key: HADOOP-3724
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3724
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Lohit Vijayarenu
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> Re-start of namenode failed with this stack trace while savingImage during initialization
> {noformat}
> 2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
> 2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
> at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
> at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
> at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
> {noformat}
> Looks like it was throwing IOException in saveFilesUnderConstruction
> Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this 
> {noformat}
> 2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
> 2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
> {noformat}
> These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.
> Could not find any other information about the file as there were not previous namenode logs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.