You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org> on 2009/03/25 22:33:51 UTC

[jira] Created: (HADOOP-5573) TestBackupNode sometimes fails

TestBackupNode sometimes fails
------------------------------

                 Key: HADOOP-5573
                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
            Reporter: Tsz Wo (Nicholas), SZE


TestBackupNode may fail with different reasons:
- Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
- NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
- Fatal Error : All storage directories are inaccessible.
Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697538#action_12697538 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5573:
------------------------------------------------

Seems TestBackupNode still having problems.  It failed on Hudson [build #160|http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/160/testReport/].

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik reassigned HADOOP-5573:
--------------------------------------

    Assignee: Boris Shkolnik

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Boris Shkolnik
>         Attachments: TestBNFailure.log
>
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701867#action_12701867 ] 

Hadoop QA commented on HADOOP-5573:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12405815/HADOOP-5573.patch
  against trunk revision 767699.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/231/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/231/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/231/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/231/console

This message is automatically generated.

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Boris Shkolnik
>         Attachments: HADOOP-5573.patch, TestBNFailure.log
>
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702035#action_12702035 ] 

Boris Shkolnik commented on HADOOP-5573:
----------------------------------------

No test is needed because this patch fixes the test.

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Boris Shkolnik
>         Attachments: HADOOP-5573.patch, TestBNFailure.log
>
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689249#action_12689249 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5573:
------------------------------------------------

The failures can be reproduced by repeatedly running TestBackupNode a few times.

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-5573:
----------------------------------------

    Attachment: TestBNFailure.log

Attaching failure log.
Looks like that BackupNode fails in the very end doing processIOError()
{code}
[junit] 2009-04-07 19:35:53,848 ERROR common.Storage (FSImage.java:resetVersion(1489)) - Cannot write file 
        /home/hudson/hudson-slave/workspace/Hadoop-Patch-vesta.apache.org/trunk/build/test/data/dfs/name-backup1
[junit] 2009-04-07 19:35:53,849 WARN  common.Storage (FSImage.java:processIOError(744)) - FSImage:processIOError: removing storage: 
        /home/hudson/hudson-slave/workspace/Hadoop-Patch-vesta.apache.org/trunk/build/test/data/dfs/name-backup1
[junit] 2009-04-07 19:35:53,849 INFO  namenode.FSNamesystem (FSEditLog.java:processIOError(471)) - current list of storage dirs:
[junit] 2009-04-07 19:35:53,849 FATAL namenode.FSNamesystem (FSEditLog.java:processIOError(479)) - Fatal Error : All storage directories are inaccessible.
{code}

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>         Attachments: TestBNFailure.log
>
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated HADOOP-5573:
-----------------------------------

    Status: Patch Available  (was: Open)

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Boris Shkolnik
>         Attachments: HADOOP-5573.patch, TestBNFailure.log
>
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689248#action_12689248 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5573:
------------------------------------------------

Here are more details:

- Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
{noformat}
2009-03-24 17:36:39,421 WARN  namenode.FSNamesystem (FSEditLog.java:open(371)) - Unable to open edit log
 file d:\@sze\hadoop\latest\build\test\data\dfs\name-backup1\current\edits
2009-03-24 17:36:39,421 ERROR namenode.Checkpointer (Checkpointer.java:run(138)) - Exception in doCheckpoint: 
java.io.IOException: Could not locate checkpoint directories
	at org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:157)
	at org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:232)
	at org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:134)
	at java.lang.Thread.run(Thread.java:619)
{noformat}

- NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
{noformat}
2009-03-24 17:56:09,750 INFO  ipc.Server (Server.java:run(968)) - IPC Server handler 6 on 1441, call startCheckpoint(
NamenodeRegistration(xx.xx.xx.xx:50100, role=Backup Node)) from 127.0.0.1:1485: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
	at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
	at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:83)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:989)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCheckpoint(FSNamesystem.java:4395)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.startCheckpoint(NameNode.java:440)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
{noformat}

- Fatal Error : All storage directories are inaccessible.
{noformat}
2009-03-25 14:27:06,828 INFO  namenode.FSNamesystem (FSEditLog.java:printStatistics(1044))
 - Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 
2009-03-25 14:27:06,937 WARN  namenode.FSNamesystem (FSEditLog.java:close(420))
 - FSEditLog:close - failed to close stream d:\@sze\hadoop\testing\build\test\data\dfs\name-checkpoint1\current\edits
2009-03-25 14:27:06,937 ERROR namenode.FSNamesystem (FSEditLog.java:processIOError(506))
 - Unable to log edits to d:\@sze\hadoop\testing\build\test\data\dfs\name-checkpoint1\current\edits
2009-03-25 14:27:06,937 FATAL namenode.FSNamesystem (FSEditLog.java:processIOError(450))
 - Fatal Error : All storage directories are inaccessible.
2009-03-25 14:27:06,937 INFO  namenode.NameNode (NameNode.java:errorReport(421))
 - Error report from NamenodeRegistration(servicehot-dx.ds.corp.yahoo.com:50100, role=Checkpoint Node): Shutting down.
2009-03-25 14:27:06,953 WARN  namenode.DecommissionManager (DecommissionManager.java:run(67))
 - Monitor interrupted: java.lang.InterruptedException: sleep interrupted
2009-03-25 14:27:06,953 WARN  namenode.FSNamesystem (FSNamesystem.java:run(2346))
 - ReplicationMonitor thread received InterruptedException.java.lang.InterruptedException: sleep interrupted
Test org.apache.hadoop.hdfs.server.namenode.TestBackupNode FAILED (crashed)
{noformat}

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709508#action_12709508 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5573:
------------------------------------------------

Konstantin, do you think the patch is good?

TestBackupNode.testBackupRegistration is still failing, see [build #337|http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/337/testReport/org.apache.hadoop.hdfs.server.namenode/TestBackupNode/testBackupRegistration/].

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Boris Shkolnik
>         Attachments: HADOOP-5573.patch, TestBNFailure.log
>
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695642#action_12695642 ] 

Konstantin Shvachko commented on HADOOP-5573:
---------------------------------------------

The first two bugs (NPE) are fixed by HADOOP-5119.
The story here is that {{testBackupRegistration()}} starts two backup nodes one ofter another. The first one keeps making chackpoints. But the second is just initializing. During initialization it creates new {{FSNamesystem}} class, which in the beginning sets the static variable {{fsNamesystemObject}} to null. It takes time to initialize the BackupNode until it will set {{fsNamesystemObject = this}}.
In the meantime the first backup node start a checkpoint, which accesses {{FSNamesystem}} via {{fsNamesystemObject}}. Since it is static it contains the value the second node assigned it, which is null at that moment. Therefore different NPEs depending on the timing of the checkpoint.
We should not see that again, since HADOOP-5119 eliminated {{fsNamesystemObject}}.

Third error is also gone, because {{processIOError()}} was recently changed by HADOOP-4045.
But I am still looking at it. I am getting some strange asserts there.

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713525#action_12713525 ] 

Steve Loughran commented on HADOOP-5573:
----------------------------------------

I'm seeing this test fail with a timeout when I test everything. when I only run this testcase, all is well. 

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Boris Shkolnik
>         Attachments: HADOOP-5573.patch, TestBNFailure.log
>
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5573) TestBackupNode sometimes fails

Posted by "Boris Shkolnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Shkolnik updated HADOOP-5573:
-----------------------------------

    Attachment: HADOOP-5573.patch

Put synchronization (wait/notify) for BackupNode to wait for any undergoing Checkpoint to complete before stopping.

> TestBackupNode sometimes fails
> ------------------------------
>
>                 Key: HADOOP-5573
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5573
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Boris Shkolnik
>         Attachments: HADOOP-5573.patch, TestBNFailure.log
>
>
> TestBackupNode may fail with different reasons:
> - Unable to open edit log file .\build\test\data\dfs\name-backup1\current\edits (FSEditLog.java:open(371))
> - NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:163)
> - Fatal Error : All storage directories are inaccessible.
> Will provide more information in the comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.