You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/05/05 13:11:22 UTC

[GitHub] [ozone] JacksonYao287 opened a new pull request, #3384: HDDS-6572. EC: RM. add inflightActionManager and split moveSchudler from legacy RM

JacksonYao287 opened a new pull request, #3384:
URL: https://github.com/apache/ozone/pull/3384

   ## What changes were proposed in this pull request?
   
   1 extract moveScheduler from Legacy RM to a standalone interface and implementation.
   2 add inflightActionManager to manager all the inflight actions in a standalone class.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6572
   
   ## How was this patch tested?
   
   current RM UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 closed pull request #3384: HDDS-6572. EC: RM. add inflightActionManager and split moveSchudler from legacy RM

Posted by GitBox <gi...@apache.org>.
JacksonYao287 closed pull request #3384: HDDS-6572. EC: RM. add inflightActionManager and split moveSchudler from legacy RM
URL: https://github.com/apache/ozone/pull/3384


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #3384: HDDS-6572. EC: RM. add inflightActionManager and split moveSchudler from legacy RM

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on PR #3384:
URL: https://github.com/apache/ozone/pull/3384#issuecomment-1124510693

   @sodonnel thanks for the comment.
   > We would also avoid excessive changes to the LegacyManager for now until we have a better idea about how the new ReplicationManager looks.
   
   since Legacy Replication Manager has no scheduling logic now, the only work it does is to check the container   health, so it should be changed to defaultContainerHealthChecker later. 
   
   since we have EC now , we should support moving EC container  for container balancer , so move scheduler should be extracted out from Legacy replication manager to a standalone class to handle move option for all kinds of container.
   
   so IMHO, the new ReplicationManager should include the following part:
   
   1 a main class - RepliactionManager.java, which will only handle scheduling related issues.
   
   2 move scheduler, which will handle move options for container balancer. we already have a move scheduler interface and an implementation for ratis contaier now. we could also add a ECMoveScheduler if necessary.
   
   3 health checker. we can add a containerHealthChecker interface , which has a function like
   ```
   ContainerHealthWithCommands  checkContainerState(ContainerInfo containerInfo, Set<ContainerReplica> replicas);
   ```
   and then we can change Legacy Replication Manager to an implementation of that interface to check ratis and standalone container, and then add a ECContainerHealtherChecker for ec container.  we could add more healthChecker implementation for containers of new types in the future if necessary.
   
   4 since inflightActions are shared among different moveSchedulers and containerHealthCheckers, and some external class will also access and get the infomation of inflightActions, then i think it is better to have a inflightActionManager to manager all the inflightActions.
   
   5 other functions, like sending command which is got from `ContainerHealthWithCommands`, could be wrapped into a standalone function or class, and can be called in Replication Manager after calling `checkContainerState` to  send the command included in `ContainerHealthWithCommands`
   
   by the way, since refactor will bring a lot of changes , we can consider Replication Manager as a whole and use the current unit test and integration test  to verify its correctness, and keep the logic which it exposed to external  the same as before. after we finish refactoring , tests for different class can be added one by one
   
   what do you think? or do you have other better ideas about this? we can also discuss this in slack
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3384: HDDS-6572. EC: RM. add inflightActionManager and split moveSchudler from legacy RM

Posted by GitBox <gi...@apache.org>.
sodonnel commented on PR #3384:
URL: https://github.com/apache/ozone/pull/3384#issuecomment-1123886620

   This change feels like we are moving a lot of pieces from LegacyReplicationManager into InflightActionManager.java, and I am not sure if they all belong in that new class either.
   
   I had imagined creating a small "InflightAction" class that is a wrapper around the maps used to store the inflight actions, and then allow that to be shared by both Replication Managers. However I am not even sure on that now, as the Legacy manager will deal with a different set of containers than the new manager. There is a lot of logic in the Legacy RM that is bound tightly to the inflightActions map and its not clear if we will want to carry that over into the new one, or do things in a slightly different way.
   
   I wonder if we may get a better result if we start building up the new replication manager piece by piece and then see if we get something a little cleaner. We would also avoid excessive changes to the LegacyManager for now until we have a better idea about how the new ReplicationManager looks.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #3384: HDDS-6572. EC: RM. add inflightActionManager and split moveSchudler from legacy RM

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on PR #3384:
URL: https://github.com/apache/ozone/pull/3384#issuecomment-1126613413

   @sodonnel  thanks for the comments!
   one aim of this patch is changing the current Replication Manager to what it should look like, and sketching the architecture, so that we can be clear about what kind of modules it will have , and we can start working on them independently. I do not refine any part and just move the code to the module it should belong, since i think we should create separate jira  for any refination of any module, so that it will be easy to review and test.
   
   >Rather than the ReplicationManager removing the completed and expired replications, we could have a standalone PendingContainerOps monitor, that works as follows:
   
   >Replication Manager adds pending replications and deletes to it.
   >Replication Manager queries it for anything pending for the current container and gets a list of PendingActions back.
   >The PendingReplicationMonitor has its own internal thread that checks for expired replications and removes them.
   >Completed replications and deletes are removed in ComtainerManagerImpl, which has updateContainerReplica and >removeContainerReplica triggered via the container reports (ICR and FCR) from the datanodes as they are replicated.
   
   this is a good idea , and it used to come to my mind also. the inflightActions should be removed by ContainerReportHandler through InflightActionManager as soon as possible when receiving ICR or FCR. but whatever, the main idea is we should have a standalone class to manage all the inflightActions, be responsible to add and delete inflight actions, and any other class should query inflight action through this manager(or monitor) . we can add a internal thread in InflightActionManager(or some other name) to check the expired replications and removes them. but this seems a refination work, and we can create a separate jira to solve. if we put all refactor and refine work into a single jira , seems a little complex for review and test.
   
   >I will have a go at creating the outline of what I described above and see how it looks.
   sure, please go ahead, looking forward to see it. or if you have any other ideal , we can discuss through slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3384: HDDS-6572. EC: RM. add inflightActionManager and split moveSchudler from legacy RM

Posted by GitBox <gi...@apache.org>.
sodonnel commented on PR #3384:
URL: https://github.com/apache/ozone/pull/3384#issuecomment-1125885960

   For now, I would like to leave the current LegacyReplicationManager class as it is and not focus on Balancer for EC. I feel that a lot of the logic for pending replications is overly complex, and if we just lift it out of LegacyReplicationManager and into a new class, it does not improve things - we have really just renamed the code.
   
   The legacy replication manager internally keeps a list of all pending replications and deletes. Each time a container is checked, it check this list and removes any replications that have been completed or expired. Then it gets the list of remaining pending operations to help decide if container is healthy or not.
   
   Rather than the ReplicationManager removing the completed and expired replications, we could have a standalone PendingContainerOps monitor, that works as follows:
   
   1. Replication Manager adds pending replications and deletes to it.
   2. Replication Manager queries it for anything pending for the current container and gets a list of PendingActions back.
   3. The PendingReplicationMonitor has its own internal thread that checks for expired replications and removes them.
   4. Completed replications and deletes are removed in ComtainerManagerImpl, which has updateContainerReplica and removeContainerReplica triggered via the container reports (ICR and FCR) from the datanodes as they are replicated.
   
   This way, the ReplicationManager does not need to worry about expiring replications or removing completed entries. We also get the ability to have a more up-to-date view of the system, as the ICR / FCRs will keep the pending table up-to-date in real time, rather than having to wait for the container to be re-checked inside replication manager.
   
   We can have a fairly simple "ContainerReplicaPendingOps" class that is basically standalone and inject it into ReplicationManager and ContainerManagerImpl. This would allow for removing some complexity from RM and let the expiry and completion be tested in an isolated way.
   
   I generally agree with your suggestions on the classes / functions we need to have around ReplicationManager. I have already started working on the health check interface in [HDDS-6697](https://issues.apache.org/jira/browse/HDDS-6697), but I got sidetracked into the EcContainerReplicaCounts class, which I realised I needed before going much further.
   
   I will have a go at creating the outline of what I described above and see how it looks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3384: HDDS-6572. EC: RM. add inflightActionManager and split moveSchudler from legacy RM

Posted by GitBox <gi...@apache.org>.
sodonnel commented on PR #3384:
URL: https://github.com/apache/ozone/pull/3384#issuecomment-1129280008

   @JacksonYao287 I posted the PR I outlined in #3425


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org