You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/03/10 20:34:34 UTC

[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #659: HDDS-2989. Intermittent timeout in TestBlockManager

adoroszlai opened a new pull request #659: HDDS-2989. Intermittent timeout in TestBlockManager
URL: https://github.com/apache/hadoop-ozone/pull/659
 
 
   ## What changes were proposed in this pull request?
   
   `TestBlockManager` intermittently times out waiting for exit from safe mode.  This happens due to race condition between two safe mode status events in different handler threads (but the same handler object): one from SCM, another from the test code.
   
   Temporary debug log (in "passing" order):
   
   ```
   (SafeModeHandler.java:onMessage(103)) - SafeModeHandler@2bde2598 handling safe mode status event in thread 26: true
   (SafeModeHandler.java:onMessage(103)) - SafeModeHandler@2bde2598 handling safe mode status event in thread 28: false
   ```
   
   If the order is reversed, SCM may stay in safe mode as far as `BlockManagerImpl` sees it.  Worse, it may return to safe mode while `BlockManagerImpl` is trying to perform some operation, eg.:
   
   ```
   SCMException: SafeModePrecheck failed for allocateBlock
   ...
     at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:160)
     at org.apache.hadoop.hdds.scm.block.TestBlockManager.testAllocateBlock(TestBlockManager.java:150)
   ```
   
   The proposed fix is to disable safe mode status emission (ie. ignore the event from SCM) and let the test set safe mode explicitly in `BlockManagerImpl`.  This should be fine since this is a unit test, not integration one.
   
   https://issues.apache.org/jira/browse/HDDS-2989
   
   ## How was this patch tested?
   
   Ran TestBlockManager 10x:
   https://github.com/adoroszlai/hadoop-ozone/runs/497791137
   
   then 50x:
   https://github.com/adoroszlai/hadoop-ozone/runs/497839450
   
   and regular full CI:
   https://github.com/adoroszlai/hadoop-ozone/runs/498781616

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek closed pull request #659: HDDS-2989. Intermittent timeout in TestBlockManager

Posted by GitBox <gi...@apache.org>.
elek closed pull request #659: HDDS-2989. Intermittent timeout in TestBlockManager
URL: https://github.com/apache/hadoop-ozone/pull/659
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] adoroszlai commented on issue #659: HDDS-2989. Intermittent timeout in TestBlockManager

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on issue #659: HDDS-2989. Intermittent timeout in TestBlockManager
URL: https://github.com/apache/hadoop-ozone/pull/659#issuecomment-597697480
 
 
   Thanks @elek for reviewing and committing it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org