You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Wang Xu (JIRA)" <ji...@apache.org> on 2009/04/23 16:32:31 UTC
[jira] Created: (HADOOP-5730) SecondaryNameNode: should not throw
exception and exit if only one makedir failure
SecondaryNameNode: should not throw exception and exit if only one makedir failure
-----------------------------------------------------------------------------------
Key: HADOOP-5730
URL: https://issues.apache.org/jira/browse/HADOOP-5730
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.19.1
Reporter: Wang Xu
Assignee: Wang Xu
Fix For: 0.19.2
In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
will throw an exception and exit.
However, because the editlog has been closed before, the editStreams
of FSEditLog of NameNode will becomes empty as a result, which
will affect any further logSync operations.
Hence we think it should only print WARN message instead of
throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5730) SecondaryNameNode: should not
throw exception and exit if only one makedir failure
Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702198#action_12702198 ]
Wang Xu commented on HADOOP-5730:
---------------------------------
> 1. What happens if all directories are removed on SecondareNameNode?
It's quite a problem, and do you think SecondaryNameNode should throw Exception or kill itself ?
> 2. Why do you remove directories only if mkdir() fails? What if rename() fails before mkdir() for example.
I think rename maybe should also be "try...catch"
> 3. You cannot just remove a list entry while iterating, this will cause ConcurrentModificationException
> on the next iteration of the loop.
oh. I am sorry for that, I will change its position.
I will modify it and upload another patch. And I wonder whether it is OK if we only record this problem
in logfiles and ignore it.
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5730) SecondaryNameNode: should not
throw exception and exit if only one makedir failure
Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704152#action_12704152 ]
Wang Xu commented on HADOOP-5730:
---------------------------------
I updated the patch to fix the above issue.
However, I do not satisfy with it yet: if an exception is throwed, the checkpointing will be interrupted and the former closed editStreams of FSEditLog in SecondaryNameNode.startCheckpoint() would never be re-open again.
Any suggestion?
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5730) SecondaryNameNode: should not throw
exception and exit if only one makedir failure
Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wang Xu updated HADOOP-5730:
----------------------------
Attachment: (was: secondarynamenode-startcp.patch)
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5730) SecondaryNameNode: should not throw
exception and exit if only one makedir failure
Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wang Xu updated HADOOP-5730:
----------------------------
Attachment: secondarynamenode-startcp.patch
catch both mkdir and rename exceptions, and throw exception when all of the dirs are failed
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5730) SecondaryNameNode: should not
throw exception and exit if only one makedir failure
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702831#action_12702831 ]
Hadoop QA commented on HADOOP-5730:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12406238/secondarynamenode-startcp.patch
against trunk revision 768376.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 Eclipse classpath. The patch retains Eclipse classpath integrity.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/243/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/243/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/243/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/243/console
This message is automatically generated.
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5730) SecondaryNameNode: should not
throw exception and exit if only one makedir failure
Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702073#action_12702073 ]
Konstantin Shvachko commented on HADOOP-5730:
---------------------------------------------
Three things:
# What happens if all directories are removed on SecondareNameNode?
# Why do you remove directories only if mkdir() fails? What if rename() fails before mkdir() for example.
# You cannot just remove a list entry while iterating, this will cause ConcurrentModificationException on the next iteration of the loop.
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5730) SecondaryNameNode: should not throw
exception and exit if only one makedir failure
Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wang Xu updated HADOOP-5730:
----------------------------
Attachment: secondarynamenode-startcp.patch
log a warning instead of throw an exception when secondaryNameNode failed during mkdir.
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5730) SecondaryNameNode: should not throw
exception and exit if only one makedir failure
Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wang Xu updated HADOOP-5730:
----------------------------
Status: Patch Available (was: Open)
Log a warning message instead of throw an exception when SecondareNameNode cannot mkdir.
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-5730) SecondaryNameNode:
should not throw exception and exit if only one makedir failure
Posted by "Wang Xu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704152#action_12704152 ]
Wang Xu edited comment on HADOOP-5730 at 4/29/09 8:21 AM:
----------------------------------------------------------
Hi Konstantin,
I updated the patch to fix the above issue.
However, I do not satisfy with it yet: if an exception is throwed, the checkpointing will be interrupted and the former closed editStreams of FSEditLog in SecondaryNameNode.startCheckpoint() would never be re-open again.
Any suggestion?
was (Author: gnawux):
I updated the patch to fix the above issue.
However, I do not satisfy with it yet: if an exception is throwed, the checkpointing will be interrupted and the former closed editStreams of FSEditLog in SecondaryNameNode.startCheckpoint() would never be re-open again.
Any suggestion?
> SecondaryNameNode: should not throw exception and exit if only one makedir failure
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-5730
> URL: https://issues.apache.org/jira/browse/HADOOP-5730
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.1
> Reporter: Wang Xu
> Assignee: Wang Xu
> Fix For: 0.19.2
>
> Attachments: secondarynamenode-startcp.patch
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> In CheckpointStorage.startCheckPointing(), if one mkdir failed, it
> will throw an exception and exit.
> However, because the editlog has been closed before, the editStreams
> of FSEditLog of NameNode will becomes empty as a result, which
> will affect any further logSync operations.
> Hence we think it should only print WARN message instead of
> throw the exception
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.