You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2020/01/09 14:32:00 UTC

[jira] [Created] (HDDS-2860) Cluster disk space metrics should reflect decommission and maintenance states

Stephen O'Donnell created HDDS-2860:
---------------------------------------

             Summary: Cluster disk space metrics should reflect decommission and maintenance states
                 Key: HDDS-2860
                 URL: https://issues.apache.org/jira/browse/HDDS-2860
             Project: Hadoop Distributed Data Store
          Issue Type: Sub-task
          Components: SCM
    Affects Versions: 0.5.0
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


Now we have decommission states, we need to adjust the cluster capacity, space used and available metrics which are exposed via JMX.

For a node decommissioning, the space used on the node effectively needs to be transfer to other nodes via container replication before decommission can complete, but this is difficult to track from a space usage perspective. When a node completes decommission, we can assume it provides no capacity to the cluster and uses none. Therefore, for decommissioning + decommissioned nodes, the simplest calculation is to exclude the node completely in a similar way to a dead node.

For maintenance nodes, things are even less clear. For a maintenance node, it is read only so it cannot provide capacity to the cluster, but it is expected to return to service, so excluding it completely probably does not make sense. However, perhaps the simplest solution is to do the following:

1. For any node not IN_SERVICE, do not include its usage or space in the cluster capacity totals.
2. Introduce some new metrics to account for the maintenance and perhaps decommission capacity, so it is not lost eg:

{code}
# Existing metrics
"DiskCapacity" : 62725623808,
"DiskUsed" : 4096,
"DiskRemaining" : 50459619328,

# Suggested additional new ones, with the above only considering IN_SERVICE nodes:

"MaintenanceDiskCapacity": 0
"MaintenanceDiskUsed": 0
"MaintenanceDiskRemaining": 0
"DecommissionedDiskCapacity": 0
"DecommissionedDiskUsed": 0
"DecommissionedDiskRemaining": 0
...
{code}

That way, the cluster totals are only what is currently "online", but we have the other metrics to track what has been removed etc. The key advantage of this, is that it is easy to understand.

There could also be an argument that the new decommissionedDisk metrics are not needed as that capacity is technically lost from the cluster forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org