You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mark Gui (Jira)" <ji...@apache.org> on 2021/07/01 06:45:00 UTC

[jira] [Created] (HDDS-5401) Add more metrics to ReplicationManager to help monitor replication progress

Mark Gui created HDDS-5401:
------------------------------

             Summary: Add more metrics to ReplicationManager to help monitor replication progress
                 Key: HDDS-5401
                 URL: https://issues.apache.org/jira/browse/HDDS-5401
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Mark Gui
            Assignee: Mark Gui


For now SCM ReplicationManager only has 2 metrics: inflightReplication and inflightDeletion.

We could add more metrics to help better monitor the replication progress(via prometheus e.g.).

Then we could also estimate the time needed to complete the whole replication.

Some proposed metrics:
 * number of replicate/delete cmds sent
 * number of replicate/delete cmds completed
 * number of replicate/delete cmds timeout

These metrics will be refreshed for each replication round(300s by default). So we could calculate how many replicate/delete are completed between 2 successive rounds and how many are undergoing, thus we could estimate how much more time it needs.

Two more metrics to help more accurate estimation since closed containers could be in different sizes:
 * number of replicate bytes total
 * number of replicate bytes completed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org