You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Wang Xu (JIRA)" <ji...@apache.org> on 2009/04/23 16:32:31 UTC

[jira] Created: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

SecondaryNameNode:  should not throw exception and exit if only one makedir failure
-----------------------------------------------------------------------------------

                 Key: HADOOP-5730
                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.19.1
            Reporter: Wang Xu
            Assignee: Wang Xu
             Fix For: 0.19.2


In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
will throw an exception and exit. 

However, because the editlog has been closed before, the editStreams
of FSEditLog of NameNode will becomes empty as a result, which
will affect any further logSync operations.

Hence we think it should only print  WARN message instead of 
throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702198#action_12702198 ] 

Wang Xu commented on HADOOP-5730:
---------------------------------

>   1. What happens if all directories are removed on SecondareNameNode?

It's quite a problem, and do you think SecondaryNameNode should throw Exception or kill itself ?

>   2. Why do you remove directories only if mkdir() fails? What if rename() fails before mkdir() for example.

I think rename maybe should also be "try...catch"

>   3. You cannot just remove a list entry while iterating, this will cause ConcurrentModificationException 
>   on the next iteration of the loop.

oh. I am sorry for that, I will change its position.

I will modify it and upload another patch. And I wonder whether it is OK  if we only record this problem
in logfiles and ignore it.

> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704152#action_12704152 ] 

Wang Xu commented on HADOOP-5730:
---------------------------------

I updated the patch to fix the above issue.

However, I do not satisfy with it yet:  if an exception is throwed, the checkpointing will be interrupted and the former closed editStreams of FSEditLog in SecondaryNameNode.startCheckpoint() would never be re-open again. 

Any suggestion?

> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wang Xu updated HADOOP-5730:
----------------------------

    Attachment:     (was: secondarynamenode-startcp.patch)

> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wang Xu updated HADOOP-5730:
----------------------------

    Attachment: secondarynamenode-startcp.patch

catch both mkdir and rename exceptions, and throw exception when all of the dirs are failed

> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702831#action_12702831 ] 

Hadoop QA commented on HADOOP-5730:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12406238/secondarynamenode-startcp.patch
  against trunk revision 768376.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/243/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/243/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/243/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/243/console

This message is automatically generated.

> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702073#action_12702073 ] 

Konstantin Shvachko commented on HADOOP-5730:
---------------------------------------------

Three things:
# What happens if all directories are removed on SecondareNameNode?
# Why do you remove directories only if mkdir() fails? What if rename() fails before mkdir() for example.
# You cannot just remove a list entry while iterating, this will cause ConcurrentModificationException on the next iteration of the loop.

> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wang Xu updated HADOOP-5730:
----------------------------

    Attachment: secondarynamenode-startcp.patch

log a warning instead of throw an exception when secondaryNameNode failed during mkdir.

> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wang Xu updated HADOOP-5730:
----------------------------

    Status: Patch Available  (was: Open)

Log a warning message instead of throw an exception when SecondareNameNode cannot mkdir.

> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-5730) SecondaryNameNode: should not throw exception and exit if only one makedir failure

Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704152#action_12704152 ] 

Wang Xu edited comment on HADOOP-5730 at 4/29/09 8:21 AM:
----------------------------------------------------------

Hi Konstantin,

I updated the patch to fix the above issue.

However, I do not satisfy with it yet:  if an exception is throwed, the checkpointing will be interrupted and the former closed editStreams of FSEditLog in SecondaryNameNode.startCheckpoint() would never be re-open again. 

Any suggestion?

      was (Author: gnawux):
    I updated the patch to fix the above issue.

However, I do not satisfy with it yet:  if an exception is throwed, the checkpointing will be interrupted and the former closed editStreams of FSEditLog in SecondaryNameNode.startCheckpoint() would never be re-open again. 

Any suggestion?
  
> SecondaryNameNode:  should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5730
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5730
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.19.2
>
>         Attachments: secondarynamenode-startcp.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it 
> will throw an exception and exit. 
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print  WARN message instead of 
> throw the exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.