You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2022/05/13 10:21:00 UTC

[jira] [Updated] (HDDS-6744) EC: ReplicationManager - create ContainerReplicaPendingOps class and integrate with ContainerManager

     [ https://issues.apache.org/jira/browse/HDDS-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell updated HDDS-6744:
------------------------------------
    Summary: EC: ReplicationManager - create ContainerReplicaPendingOps class and integrate with ContainerManager  (was: EC: ReplicationManager - create PendingContainerOps class and integrate with ContainerManager)

> EC: ReplicationManager - create ContainerReplicaPendingOps class and integrate with ContainerManager
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-6744
>                 URL: https://issues.apache.org/jira/browse/HDDS-6744
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> The legacy replication manager internally keeps a list of all pending replications and deletes. Each time a container is checked, it check this list and removes any replications that have been completed or expired. Then it gets the list of remaining pending operations to help decide if container is healthy or not.
> Rather than the ReplicationManager removing the completed and expired replications, we could have a standalone PendingContainerOps monitor, that works as follows:
> 1. Replication Manager adds pending replications and deletes to it.
> 2. Replication Manager queries it for anything pending for the current container and gets a list of PendingActions back.
> 3. The PendingReplicationMonitor has its own internal thread that checks for expired replications and removes them.
> 4. Completed replications and deletes are removed in ComtainerManagerImpl, which has add and removeContainer triggered via the container reports (ICR and FCR) from the datanodes as they are replicated.
> This way, the ReplicationManager does not need to worry about expiring replications or removing completed entries. We also get the ability to have a more up-to-date view of the system, as the ICR / FCRs will keep the pending table up-to-date in real time, rather than having to wait for the container to be re-check inside replication manager.
> We can have a fairly simple "ContainerReplicaPendingOps" class that is basically standalone and inject it into ReplicationManager and ContainerManagerImpl. This would allow for removing some complexity from RM and let the expiry and completion be tested in an isolated way.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org