You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "lohit vijayarenu (JIRA)" <ji...@apache.org> on 2008/05/16 09:50:55 UTC

[jira] Updated: (HADOOP-3396) Unit test TestDatanodeBlockScanner fails on Windows

     [ https://issues.apache.org/jira/browse/HADOOP-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-3396:
-------------------------------------

    Attachment: HADOOP-3396-1.patch

This test was failing sporadically when run in a loop multiple times even on LINUX. The problem seemed to be the way block reports and bad block reports  sent by Datanode to the Namenode. When a cluster (namenode and datanode) is started, it may sometimes happen that BlockScanner Thread could send a report about corrupt block even before block report is sent by DataNode to NameNode. Now, NameNode would have an entry in corruptReplicasMap about a block which is not in BlocksMap yet. At this point of time, if we invoke getBlockLocation, we hit ArrayIndexOutOfBound exception becuase we have a case where corruptReplicas is more than actual Replicas! 

This patch adds a check to FSNamesystem to make sure we do not insert corrupt replica into corruptRepliasMap if there is not Block in blocks map yet. We log it and ignore it. The patch also changes few things. Instead of restarting the whole cluster, we restart only the DataNode. Added few more checks to loop until we get correct values of replicas when asked for via getBlockLocations. 

I ran this test which was failing 25 times in a loop and it succeeds both on LINUX and WINDOWS  

> Unit test TestDatanodeBlockScanner fails on Windows
> ---------------------------------------------------
>
>                 Key: HADOOP-3396
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3396
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>         Environment: windows
>            Reporter: Mukund Madhugiri
>            Assignee: lohit vijayarenu
>            Priority: Critical
>         Attachments: HADOOP-3396-1.patch
>
>
> Unit test fails on Windows: TestDatanodeBlockScanner
> I see this assertion:
> junit.framework.AssertionFailedError
> 	at org.apache.hadoop.dfs.TestDatanodeBlockScanner.testBlockCorruptionPolicy(TestDatanodeBlockScanner.java:201)
> and this in the standard error:
> Waiting for the Mini HDFS Cluster to start...
> Waiting for the Mini HDFS Cluster to start...
> Waiting for the Mini HDFS Cluster to start...
> Waiting for the Mini HDFS Cluster to start...
> Waiting for the Mini HDFS Cluster to start...
> Waiting for the Mini HDFS Cluster to start...
> Waiting for the Mini HDFS Cluster to start...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.