You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Robert Chansler (JIRA)" <ji...@apache.org> on 2008/04/29 02:25:55 UTC

[jira] Created: (HADOOP-3323) Name node should notify administrator if when struggling with replication

Name node should notify administrator if when struggling with replication
-------------------------------------------------------------------------

                 Key: HADOOP-3323
                 URL: https://issues.apache.org/jira/browse/HADOOP-3323
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
            Reporter: Robert Chansler


Name node performance suffers if either the replication queue is to big, or the avail space at data nodes is too small. In either case, the administrator should be notified.

If the situation is really desperate, the name node perhaps should enter safe mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3323) Name node should notify administrator if when struggling with replication

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613759#action_12613759 ] 

Chris Douglas commented on HADOOP-3323:
---------------------------------------

After some discussion, it's become clear that this may be completed in two parts:

# A brief health check the namenode can perform itself
# A metrics-based solution tracking namenode throughput over time, capable of inferring more complex and nuanced desperation

Work on (2) will fall out of a generalized metrics reporting and alerting mechanism to be completed in concert with HADOOP-3719. The particular set of metrics and implementation will remain in this JIRA. Specifically, the implementation will likely correlate the size of the replication queue (FSNamesystemMetrics::pendingReplicationBlocks) with Datanode metrics tracking replicated blocks (DataNodeMetrics::blocksReplicated) aggregated across the cluster. The intent would be to track replication throughput, presuming that slow replication at the datanodes, a slow-draining replication queue, and low storage capacity would accurately capture the conditions called out here.

In a separate JIRA, (1) will track a ping-like facility for querying the baseline health of the Namenode. In particular, it will verify that all expected threads are alive, perform inexpensive sanity checks on data structures, etc. Administrators periodically running this check can configure/attach to the notification scheme used in their deployment.

> Name node should notify administrator if when struggling with replication
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3323
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3323
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Robert Chansler
>
> Name node performance suffers if either the replication queue is to big, or the avail space at data nodes is too small. In either case, the administrator should be notified.
> If the situation is really desperate, the name node perhaps should enter safe mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-3323) Name node should notify administrator if when struggling with replication

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas reassigned HADOOP-3323:
-------------------------------------

    Assignee: Mac Yang

> Name node should notify administrator if when struggling with replication
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3323
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3323
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Robert Chansler
>            Assignee: Mac Yang
>
> Name node performance suffers if either the replication queue is to big, or the avail space at data nodes is too small. In either case, the administrator should be notified.
> If the situation is really desperate, the name node perhaps should enter safe mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.