You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2009/01/24 02:22:02 UTC

[jira] Issue Comment Edited: (HADOOP-4103) Alert for missing blocks

    [ https://issues.apache.org/jira/browse/HADOOP-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666819#action_12666819 ] 

rangadi edited comment on HADOOP-4103 at 1/23/09 5:21 PM:
---------------------------------------------------------------

(Edit : formatting only)

The scope of the fix is narrowed to the following :

* NameNode webui shows in (probably in red) indicating if there are any missing blocks.
    ** will  mostly add simon stats for such a number.

* 'dfsadmin -metasave' can be used to find all the missing blocks
     ** a later jira will enhance -metasave or have different command that is more user friendly. currently -metasave is mainly meant for developers.

For this to be a straight forward fix, I need to make one policy change: currently if a block does not have any good replicas left it is not included in "neededReplications" list. I think this was done mainly as an "optimization". But a cluster should not have any blocks this state. even 'neededReplications' name implies such blocks should be included. It would be better if I don't need to add another list that need to be maintained.





      was (Author: rangadi):
    The scope of the fix is narrowed to the following :

# NameNode webui shows in (probably in red) indicating if there are any missing blocks.
     #will  mostly add simon stats for such a number.

# 'dfsadmin -metasave' can be used to find all the missing blocks
     ## later jira will enhance -metasave or have different command that is more user friendly. currently -metasave is mainly meant for developers.

For this to be a straight forward fix, I need to make one policy change: currently if a block does not have any good replicas left it is not included in "neededReplications" list. I think this was done mainly as an "optimization". But a cluster should not have any blocks this state. even 'neededReplications' name implies such blocks should be included. It would be better if I don't need to add another list that need to be maintained.




  
> Alert for missing blocks
> ------------------------
>
>                 Key: HADOOP-4103
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4103
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Raghu Angadi
>
> A whole bunch of datanodes became dead because of some network problems resulting in  heartbeat timeouts although datanodes were fine.
> Many processes started to fail because of the corrupted filesystem.
> In order to catch and diagnose such problems faster the namenode should detect the corruption automatically and provide a way to alert operations. At the minimum it should show the fact of corruption on the GUI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.