You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Lin Yiqun (JIRA)" <ji...@apache.org> on 2016/02/26 13:02:18 UTC

[jira] [Created] (HDFS-9865) TestBlockReplacement fails intermittently in trunk

Lin Yiqun created HDFS-9865:
-------------------------------

             Summary: TestBlockReplacement fails intermittently in trunk
                 Key: HDFS-9865
                 URL: https://issues.apache.org/jira/browse/HDFS-9865
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: test
    Affects Versions: 2.7.1
            Reporter: Lin Yiqun
            Assignee: Lin Yiqun


I found the testcase {{TestBlockReplacement}} will be failed sometimes in testing. And I looked the unit log, always I will found these infos:
{code}
org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement
testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement)  Time elapsed: 8.764 sec  <<< FAILURE!
java.lang.AssertionError: The block should be only on 1 datanode  expected:<1> but was:<2>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:555)
	at org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436)
{code}
Finally I found the reason is that not deleting block completely in testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. And the time to wait FsDatasetAsyncDsikService to delete the block is not a accurate value. 
{code}
LOG.info("replaceBlock:  " + replaceBlock(block,
          (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc,
          (DatanodeInfo)destDnDesc));
// Waiting for the FsDatasetAsyncDsikService to delete the block
Thread.sleep(3000);
{code}
When I adjust this time to 1 seconds, it will be always failed. Also the 3 seconds in test is not a accurate value too. We should adjust these code's logic to a better way such as waiting for the block to be replicated in testDecommision.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)