You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "adoroszlai (via GitHub)" <gi...@apache.org> on 2023/05/03 10:25:56 UTC

[GitHub] [ozone] adoroszlai opened a new pull request, #4645: HDDS-8494. Adjust replication queue limits for out-of-service nodes

adoroszlai opened a new pull request, #4645:
URL: https://github.com/apache/ozone/pull/4645

   ## What changes were proposed in this pull request?
   
   When a datanode switches to a decommissioning state, it will adjust the size of the replication supervisor thread pool higher, and if the node returns to the In Service state, it will return to the lower thread pool limit.
   
   Similarly when scheduling commands, SCM can allocate more commands to the decommissioning host, as it should process them more quickly due to the lower load and increased threadpool.
   
    * Scale the size of executor thread pool and command queue for replication in datanode if state changes between in-service and out-of-service
    * Similarly scale the limit of pending replication commands at SCM
    * Simplify `TestReplicationSupervisor#testMaxQueueSize` to avoid the use of thread pool (possible source of intermittent failures recently [1](https://github.com/apache/ozone/actions/runs/4866567976/jobs/8678244002#step:6:1219), [2](https://github.com/apache/ozone/actions/runs/4869395310/jobs/8683855578#step:6:1219), [3](https://github.com/apache/ozone/actions/runs/4870358276/jobs/8686015439#step:6:1219))
   
   https://issues.apache.org/jira/browse/HDDS-8494
   
   ## How was this patch tested?
   
   Added unit test.
   
   Tested in `ozone` compose environment with 6 nodes: created RATIS and EC keys, decommissioned and recommissioned one of the datanodes.
   
   ````
   2023-05-02 16:37:20,303 [Command processor thread] INFO replication.ReplicationSupervisor: Node state updated to DECOMMISSIONING, scaling executor pool size to 20
   ...
   2023-05-02 16:39:16,353 [Command processor thread] INFO replication.ReplicationSupervisor: Node state updated to IN_SERVICE, scaling executor pool size to 10
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #4645: HDDS-8494. Adjust replication queue limits for out-of-service nodes

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #4645:
URL: https://github.com/apache/ozone/pull/4645#issuecomment-1534343113

   Thanks @sodonnel for the review.  Added a test for `testSendThrottledReplicateContainerCommand`.  I'll add the config in a follow-up task, need more time to think about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #4645: HDDS-8494. Adjust replication queue limits for out-of-service nodes

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #4645:
URL: https://github.com/apache/ozone/pull/4645#issuecomment-1533159333

   Change looks good. The only two suggests I have:
   
   1. I wonder if we should make the "scaling factor" which we increase the limit and thread pool by configurable with a default of 2? If we make it configurable, should it be a decimal rather than an integer so we can scale by 1.5, 2.5 etc? I guess the config would need to apply on both the DN and RM, so I am not 100% sure where we should define it. It would be a shame to need two configs as they could get out of sync.
   
   2. Might be good to also add a test based on `testSendThrottledReplicateContainerCommand` in the `TestReplicationManager` class to validate the decommissioning host is picked as a target when all nodes are over the original limit. This is kind of covered in the excluded nodes test, but excludes nodes are only updated when the new command pushes it over the limit. This test would ensure decommissioning nodes are still picked if they are over the original limit but under the extended limit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai merged pull request #4645: HDDS-8494. Adjust replication queue limits for out-of-service nodes

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai merged PR #4645:
URL: https://github.com/apache/ozone/pull/4645


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org