You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/06/14 07:19:18 UTC

[GitHub] [ozone] errose28 opened a new pull request #2327: HDDS-5336. Fix datanode capacity related race condition.

errose28 opened a new pull request #2327:
URL: https://github.com/apache/ozone/pull/2327


   ## What changes were proposed in this pull request?
   
   After merging master into the upgrade branch in HDDS-5321, an intermittent failure was noticed in TestSCMNodeManager#testLayoutOnHeartbeat.
   
   The issue occurs in SCMNodeManager#register, where the node is added to the nodeStateManager firing the NEW_NODE event, before the node report containing storage information for the new node is processed. The event triggers a one shot run on the background pipeline creator which will read the node's storage information to determine if it can hold a pipeline. If the storage report has not yet been processed when this happens, no pipeline will be created to use the new node when it is registered, because the node still appears to have no free space.
   
   In this fix, the NEW_NODE event is moved to not fire until all reports given to SCMNodeManager#register have been processed. The output of the existing test has also been modified for clearer output if a similar event occurs in the future.
   
   Although the bug is present on master as well, the fix was done on the upgrade branch because:
   1. That is where the test to reproduce the issue resides.
   2. The fix works better with SCMNodeManager modifications already done for upgrades.
   3. Upgrade branch is expected to be merged into master soon.
   
   ## What is the link to the Apache JIRA
   
   HDDS-5336
   
   ## How was this patch tested?
   
   Original failed multiple times on CI:
   - https://github.com/apache/ozone/runs/2787582345
   - https://github.com/errose28/hadoop-ozone/runs/2748815362
   
   Fix was tested with 100 runs on CI:
   - https://github.com/errose28/hadoop-ozone/runs/2797649712
   
   I was unable to reproduce the original failure with 400 runs locally, so CI was used for testing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] avijayanhwx commented on pull request #2327: HDDS-5336. Fix datanode capacity related race condition.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on pull request #2327:
URL: https://github.com/apache/ozone/pull/2327#issuecomment-862018716


   Thank you @errose28, merging this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on pull request #2327: HDDS-5336. Fix datanode capacity related race condition.

Posted by GitBox <gi...@apache.org>.
errose28 commented on pull request #2327:
URL: https://github.com/apache/ozone/pull/2327#issuecomment-862017913


   Thanks for the review and fixes @avijayanhwx. The fixes for unit tests applied on top of the original change LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on pull request #2327: HDDS-5336. Fix datanode capacity related race condition.

Posted by GitBox <gi...@apache.org>.
errose28 commented on pull request #2327:
URL: https://github.com/apache/ozone/pull/2327#issuecomment-859903582


   Unit test failure for `TestSCMNodeManager#testScmLayoutOnRegister` came in with the merge from master. It is different than this PR which uses `TestSCMNodeManager#testLayoutOnHeartbeat`. Currently investigating...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] avijayanhwx merged pull request #2327: HDDS-5336. Fix datanode capacity related race condition.

Posted by GitBox <gi...@apache.org>.
avijayanhwx merged pull request #2327:
URL: https://github.com/apache/ozone/pull/2327


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org