You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Marton Elek (Jira)" <ji...@apache.org> on 2020/02/10 10:44:00 UTC

[jira] [Updated] (HDDS-2113) Update JMX metrics for node count in SCMNodeMetrics for Decommission and Maintenance

     [ https://issues.apache.org/jira/browse/HDDS-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marton Elek updated HDDS-2113:
------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Update JMX metrics for node count in SCMNodeMetrics for Decommission and Maintenance
> ------------------------------------------------------------------------------------
>
>                 Key: HDDS-2113
>                 URL: https://issues.apache.org/jira/browse/HDDS-2113
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the class SCMNodeMetrics exposes JMX metrics for the number of HEALTHY, STALE and DEAD nodes.
> It also exposes the disk capacity of the cluster and the amount of space used and available.
> We need to decide how we want to display things in JMX when nodes are in and entering maintenance, decommissioning and decommissioned.
> We now have 15 states rather than the previous 3, as we can have nodes in:
>  * IN_SERVICE
>  * ENTERING_MAINTENANCE
>  * IN_MAINTENANCE
>  * DECOMMISSIONING
>  * DECOMMISSIONED
> And in each of these states, nodes can be:
>  * HEALTHY
>  * STALE
>  * DEAD
> The simplest case would be to expose these 15 states directly in JMX, as it gives the complete picture, but I wonder if we need any summary JMX metrics too?
>  
> We also need to consider how to count disk capacity and usage. For example:
>  # Do we count capacity and usage on a DECOMMISSIONING node? This is not a clear cut answer, as a decommissioning node does not provide any capacity for writers in the cluster, but it does use capacity.
>  # For a DECOMMISSIONED node, we probably should not count capacity or usage
>  # For an ENTERING_MAINTENANCE node, do we count capacity and usage? I suspect we should include the capacity and usage in the totals, however a node in this state will not be available for writes.
>  # For an IN_MAINTENANCE node that is healthy?
>  # For an IN_MAINTENANCE node that is dead?
> I would welcome any thoughts on this before changing the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org