You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2019/12/05 14:38:55 UTC

[GitHub] [hadoop-ozone] lokeshj1703 opened a new pull request #310: HDDS-2542: Race condition between read and write stateMachineData.

lokeshj1703 opened a new pull request #310: HDDS-2542: Race condition between read and write stateMachineData.
URL: https://github.com/apache/hadoop-ozone/pull/310
 
 
   ## What changes were proposed in this pull request?
   
   The write payload (the chunk itself) is sent to the Ratis as an external, binary byte array. It's not part of the LogEntry and saved from an async thread with calling ContainerStateMachine.writeStateMachineData
   
   As it's an async thread it's possible that the stateMachineData is not yet written when the data should be sent to the followers in the next heartbeat.
   
   By design a cache is used to avoid this issue but there are multiple problems with the cache.
   
   First, the current cache size is chunkExecutor.getCorePoolSize() which is not enough. By default it means 60 executor threads and a cache with size 60. But in case of one very slow and 59 very fast writer the cache entries can be invalidated before the write.
   In tests (freon datanode-chunk-writer-generator) @elek saw missed cache hits even with cache size 5000.
   
   Second: as the readStateMachineData and writeStateMachien data are called from two different thread there is a race condition independent from the the cache size. It's possible that the write thread has not yet added the data to the cache but the read thread needs it.
   
   The proposal is to replace the cache with an internal implementation of a Map interface. The default map would have limits so that map does not exceed a specified byte limit and specified limit on the number of entries.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2542
   
   ## How was this patch tested?
   
   There was a test added for the default map implementation. The CI runs should pass with the patch.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org